Summary: | [BDW]90% CPU usage at on one of the CPU cores when run Synmark2_v5_3_0_OglDrvCtx | ||
---|---|---|---|
Product: | Mesa | Reporter: | meng <mengmeng.meng> |
Component: | Drivers/DRI/i965 | Assignee: | Ian Romanick <idr> |
Status: | RESOLVED INVALID | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | christophe.prigent, eero.t.tamminen, intel-gfx-bugs |
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
dmesg.log
perf report for HSW perf report for BDW |
Description
meng
2014-07-02 05:04:31 UTC
Created attachment 102102 [details]
dmesg.log
sudo perf top would give more clues. The issue is probably semaphores, amongst a myriad of missing tweaks for bdw. The bug here isn't CPU utilization, but BDW performance compared to HSW. DrvCtx tests mainly GL context (re-)creation speed, so large CPU usage and test being CPU bound is *expected*. CPU utilization just provides clue what could be the cause for bad performance. Mengmeng, if the CPU utilization is 90% on BDW, is it lower or higher on HSW (when you use same GPU & CPU speed for both)? If CPU utilization on BDW is higher, issue is on CPU side, if it's lower on BDW, issue is GPU side. Just to verify, were both BDW and HSW: - using same versions of kernel, X, X intel driver, Mesa - using SNA + DRI3 config ? --- Providing "perf" information requested by Chris can be done by: - changing the test to run longer from its config file - running for ~1/2 min following command (e.g. through ssh): perf record -a before ^C'ing it, to to profile CPU usage - saving the profile output from "perf report -n" command, and - attaching here that output, prefably also from HSW machine, not just BDW one (In reply to comment #3) > Mengmeng, if the CPU utilization is 90% on BDW, is it lower or higher on HSW > (when you use same GPU & CPU speed for both)? > > If CPU utilization on BDW is higher, issue is on CPU side, if it's lower on > BDW, issue is GPU side. With same GPU & CPU speed for both, BDW is higher than HSW(<40%). (In reply to comment #4) > Just to verify, were both BDW and HSW: > - using same versions of kernel, X, X intel driver, Mesa > - using SNA + DRI3 config > ? Yes, the same setting on both BDW and HSW. Please see perf report log attached. Created attachment 102180 [details]
perf report for HSW
Created attachment 102181 [details]
perf report for BDW
I was wrong, that doesn't look like a kernel issue at all. Next you want to investigate why malloc() (called on behalf of Synmark) takes longer on bdw than on hsw - is it just called more frequently, or is each invocation slower? (Presuming that the relative increase in CPU overhead explains the perf drop, ofc.) Looking at the perf reports... Processes: BDW: 71.73% 417189 synmark2 18.48% 107495 X 7.38% 42920 gnome-shell 1.84% 10673 swapper 0.24% 1368 rcu_sched HSW: 59.63% 494907 synmark2 27.77% 230493 X 6.01% 49843 swapper 5.58% 46285 gnome-shell 0.60% 4957 rcu_sched With BDW CPU utilization being 90% and HSW 40%, that means X using less CPU on HSW too, although its part of perf samples is larger. gnome-shell using significantly less CPU on HSW although it should be dealing with slightly more X damage events (with higher FPS) is a bit suspicious. DSOs: BDW: 39.78% 231385 libc-2.17.so 20.25% 117783 [kernel.kallsyms] 17.71% 102981 i965_dri.so 6.55% 38122 Xorg 4.48% 26043 libpthread-2.17.so 2.73% 15887 libxcb.so.1.1.0 1.63% 9476 libX11.so.6.3.0 1.25% 7299 libglib-2.0.so.0.3600.4 HSW: 32.29% 268036 libc-2.17.so 29.15% 241954 [kernel.kallsyms] 11.84% 98267 Xorg 8.80% 73022 i965_dri.so 5.34% 44350 libpthread-2.17.so 3.55% 29504 libxcb.so.1.1.0 2.26% 18773 libX11.so.6.3.0 1.01% 8363 libglib-2.0.so.0.3600.4 It's not just malloc which is used a lot more, the graphics driver does a lot more work on BDW. Comparing all the i965 driver functions in the perf data, BDW does much more work with (anonymous namespace)::builtin_variable_generator::add_uniform() and hash table handling, whereas HSW driver does most work with _mesa_make_extension_string()??? One possibility is that one of the profiles wasn't taken from normal running of the test program, but e.g. from its startup or end. Mengmeng? (In reply to comment #11) The perf data is collected between "after run" and "before end", which running about ~2 min. BTW, due running longer from its config file, add "RenderingTime = 300.0". Nobody's looked at this bug in ages, and I'm not sure what it's trying to say. High CPU usage when running a CPU-bound test makes sense. We can optimize the microbenchmark, but I don't think any of the information here will help us do that. We may as well just start over now that it's 2017. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.