System Environment: ---------------------------------------------------------------------- Platform: BDW Libdrm:(master)libdrm-2.4.54-17-ge8c3c1358ecaf4e90f7d43762357ae6f8e2022b6 Mesa:(master)1bfc0a11027449ae7ab7c28eb695f26de530eccf Xf86_video_intel:(master)2.99.912-200-ge6e5330857097eb2caafa89d571d12e4bb15f539 Cairo:(master)550385fb004e6064305518cf265adc03bd2d0c0b Libva:(master)c61d8c6ce9ffc27320e9e177c1e1123d5f1b5014 Libva_intel_driver:(master)c5cb17ea86f0065a939d3636dd26651c93d497c8 Kernel: (drm-intel-nightly)git-9bfcb9 Bug detailed description: ------------------------------------------------------------------------ Compared with HSW, BDW 90% CPU usage at on one of the CPU cores when run Synmark2_v5_3_0_OglDrvCtx. And BDW OglDrvCtx performance is slower by 26% than HSW. The issue is not a regression and exists on all kernel branches: -drm-intel-nightly bad commit a7665faa31dbbbae25e376508a9b3781e25d09e2 Author: Jani Nikula <jani.nikula@intel.com> Date: Mon Jun 30 13:50:19 2014 +0300 drm-intel-nightly: 2014y-06m-30d-13h-49m-54s integration manifest -drm-intel-fixes bad commit 84b4e042c4707bd1bf05094a51111403d680dc39 Author: Jesse Barnes <jbarnes@virtuousgeek.org> Date: Wed Jun 25 08:24:29 2014 -0700 drm/i915: only apply crt_present check on VLV -drm-intel-next-queued bad commit 91565c85b66db820f01894a971d39aaef60c4325 Author: Matt Roper <matthew.d.roper@intel.com> Date: Tue Jun 24 17:05:02 2014 -0700 drm/i915: Don't try to look up object for non-existent fb CPU and GPU on BDW ------------- - ~90% CPU usage at least on one of the CPU cores,"according to "top" (press 1 to see all cores) - clearly <100% "render" value in "intel_gpu_top" Reproduce steps: -------------------------------------------------------------------- 1. xinit& 2../synmark2 OglDrvCtx 3. top(get "top" output from during that test run (not from first 5 secs of calibration, but after that)
Created attachment 102102 [details] dmesg.log
sudo perf top would give more clues. The issue is probably semaphores, amongst a myriad of missing tweaks for bdw.
The bug here isn't CPU utilization, but BDW performance compared to HSW. DrvCtx tests mainly GL context (re-)creation speed, so large CPU usage and test being CPU bound is *expected*. CPU utilization just provides clue what could be the cause for bad performance. Mengmeng, if the CPU utilization is 90% on BDW, is it lower or higher on HSW (when you use same GPU & CPU speed for both)? If CPU utilization on BDW is higher, issue is on CPU side, if it's lower on BDW, issue is GPU side.
Just to verify, were both BDW and HSW: - using same versions of kernel, X, X intel driver, Mesa - using SNA + DRI3 config ? --- Providing "perf" information requested by Chris can be done by: - changing the test to run longer from its config file - running for ~1/2 min following command (e.g. through ssh): perf record -a before ^C'ing it, to to profile CPU usage - saving the profile output from "perf report -n" command, and - attaching here that output, prefably also from HSW machine, not just BDW one
(In reply to comment #3) > Mengmeng, if the CPU utilization is 90% on BDW, is it lower or higher on HSW > (when you use same GPU & CPU speed for both)? > > If CPU utilization on BDW is higher, issue is on CPU side, if it's lower on > BDW, issue is GPU side. With same GPU & CPU speed for both, BDW is higher than HSW(<40%).
(In reply to comment #4) > Just to verify, were both BDW and HSW: > - using same versions of kernel, X, X intel driver, Mesa > - using SNA + DRI3 config > ? Yes, the same setting on both BDW and HSW. Please see perf report log attached.
Created attachment 102180 [details] perf report for HSW
Created attachment 102181 [details] perf report for BDW
I was wrong, that doesn't look like a kernel issue at all. Next you want to investigate why malloc() (called on behalf of Synmark) takes longer on bdw than on hsw - is it just called more frequently, or is each invocation slower?
(Presuming that the relative increase in CPU overhead explains the perf drop, ofc.)
Looking at the perf reports... Processes: BDW: 71.73% 417189 synmark2 18.48% 107495 X 7.38% 42920 gnome-shell 1.84% 10673 swapper 0.24% 1368 rcu_sched HSW: 59.63% 494907 synmark2 27.77% 230493 X 6.01% 49843 swapper 5.58% 46285 gnome-shell 0.60% 4957 rcu_sched With BDW CPU utilization being 90% and HSW 40%, that means X using less CPU on HSW too, although its part of perf samples is larger. gnome-shell using significantly less CPU on HSW although it should be dealing with slightly more X damage events (with higher FPS) is a bit suspicious. DSOs: BDW: 39.78% 231385 libc-2.17.so 20.25% 117783 [kernel.kallsyms] 17.71% 102981 i965_dri.so 6.55% 38122 Xorg 4.48% 26043 libpthread-2.17.so 2.73% 15887 libxcb.so.1.1.0 1.63% 9476 libX11.so.6.3.0 1.25% 7299 libglib-2.0.so.0.3600.4 HSW: 32.29% 268036 libc-2.17.so 29.15% 241954 [kernel.kallsyms] 11.84% 98267 Xorg 8.80% 73022 i965_dri.so 5.34% 44350 libpthread-2.17.so 3.55% 29504 libxcb.so.1.1.0 2.26% 18773 libX11.so.6.3.0 1.01% 8363 libglib-2.0.so.0.3600.4 It's not just malloc which is used a lot more, the graphics driver does a lot more work on BDW. Comparing all the i965 driver functions in the perf data, BDW does much more work with (anonymous namespace)::builtin_variable_generator::add_uniform() and hash table handling, whereas HSW driver does most work with _mesa_make_extension_string()??? One possibility is that one of the profiles wasn't taken from normal running of the test program, but e.g. from its startup or end. Mengmeng?
(In reply to comment #11) The perf data is collected between "after run" and "before end", which running about ~2 min. BTW, due running longer from its config file, add "RenderingTime = 300.0".
Nobody's looked at this bug in ages, and I'm not sure what it's trying to say. High CPU usage when running a CPU-bound test makes sense. We can optimize the microbenchmark, but I don't think any of the information here will help us do that. We may as well just start over now that it's 2017.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.