Bug 80784 - [BDW]90% CPU usage at on one of the CPU cores when run Synmark2_v5_3_0_OglDrvCtx
Summary: [BDW]90% CPU usage at on one of the CPU cores when run Synmark2_v5_3_0_OglDrvCtx
Status: RESOLVED INVALID
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: medium normal
Assignee: Ian Romanick
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-07-02 05:04 UTC by meng
Modified: 2017-09-14 22:29 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg.log (127.04 KB, text/plain)
2014-07-02 05:18 UTC, meng
Details
perf report for HSW (174.03 KB, text/plain)
2014-07-03 05:57 UTC, meng
Details
perf report for BDW (157.37 KB, text/plain)
2014-07-03 05:58 UTC, meng
Details

Description meng 2014-07-02 05:04:31 UTC
System Environment:       
----------------------------------------------------------------------
Platform: BDW
Libdrm:(master)libdrm-2.4.54-17-ge8c3c1358ecaf4e90f7d43762357ae6f8e2022b6
Mesa:(master)1bfc0a11027449ae7ab7c28eb695f26de530eccf
Xf86_video_intel:(master)2.99.912-200-ge6e5330857097eb2caafa89d571d12e4bb15f539
Cairo:(master)550385fb004e6064305518cf265adc03bd2d0c0b
Libva:(master)c61d8c6ce9ffc27320e9e177c1e1123d5f1b5014
Libva_intel_driver:(master)c5cb17ea86f0065a939d3636dd26651c93d497c8
Kernel:	(drm-intel-nightly)git-9bfcb9

Bug detailed description:
------------------------------------------------------------------------
Compared with HSW, BDW 90% CPU usage at on one of the CPU cores when run Synmark2_v5_3_0_OglDrvCtx. And BDW OglDrvCtx performance is slower by 26% than HSW.
The issue is not a regression and exists on all kernel branches:
-drm-intel-nightly bad
 commit a7665faa31dbbbae25e376508a9b3781e25d09e2
 Author: Jani Nikula <jani.nikula@intel.com>
 Date:   Mon Jun 30 13:50:19 2014 +0300
    drm-intel-nightly: 2014y-06m-30d-13h-49m-54s integration manifest

-drm-intel-fixes bad
 commit 84b4e042c4707bd1bf05094a51111403d680dc39
 Author: Jesse Barnes <jbarnes@virtuousgeek.org>
 Date:   Wed Jun 25 08:24:29 2014 -0700
    drm/i915: only apply crt_present check on VLV

-drm-intel-next-queued bad
 commit 91565c85b66db820f01894a971d39aaef60c4325
 Author: Matt Roper <matthew.d.roper@intel.com>
 Date:   Tue Jun 24 17:05:02 2014 -0700
    drm/i915: Don't try to look up object for non-existent fb


CPU and GPU on BDW
-------------
- ~90% CPU usage at least on one of the CPU cores,"according to "top" (press 1 to see all cores)
- clearly <100% "render" value in "intel_gpu_top"

Reproduce steps:
--------------------------------------------------------------------
1. xinit&
2../synmark2 OglDrvCtx
3. top(get "top" output from during that test run (not from first 5 secs of calibration, but after that)
Comment 1 meng 2014-07-02 05:18:34 UTC
Created attachment 102102 [details]
dmesg.log
Comment 2 Chris Wilson 2014-07-02 06:22:32 UTC
sudo perf top would give more clues. The issue is probably semaphores, amongst a myriad of missing tweaks for bdw.
Comment 3 Eero Tamminen 2014-07-02 06:59:12 UTC
The bug here isn't CPU utilization, but BDW performance compared to HSW.  DrvCtx tests mainly GL context (re-)creation speed, so large CPU usage and test being CPU bound is *expected*.  CPU utilization just provides clue what could be the cause for bad performance.

Mengmeng, if the CPU utilization is 90% on BDW, is it lower or higher on HSW (when you use same GPU & CPU speed for both)?

If CPU utilization on BDW is higher, issue is on CPU side, if it's lower on BDW, issue is GPU side.
Comment 4 Eero Tamminen 2014-07-02 11:56:41 UTC
Just to verify, were both BDW and HSW:
- using same versions of kernel, X, X intel driver, Mesa
- using SNA + DRI3 config
?

---

Providing "perf" information requested by Chris can be done by:
- changing the test to run longer from its config file
- running for ~1/2 min following command (e.g. through ssh):
    perf record -a
  before ^C'ing it, to to profile CPU usage
- saving the profile output from "perf report -n" command, and
- attaching here that output, prefably also from HSW machine, not just BDW one
Comment 5 meng 2014-07-03 05:38:15 UTC
(In reply to comment #3)
> Mengmeng, if the CPU utilization is 90% on BDW, is it lower or higher on HSW
> (when you use same GPU & CPU speed for both)?
> 
> If CPU utilization on BDW is higher, issue is on CPU side, if it's lower on
> BDW, issue is GPU side.

With same GPU & CPU speed for both, BDW is higher than HSW(<40%).
Comment 6 meng 2014-07-03 05:53:21 UTC
(In reply to comment #4)
> Just to verify, were both BDW and HSW:
> - using same versions of kernel, X, X intel driver, Mesa
> - using SNA + DRI3 config
> ?

Yes, the same setting on both BDW and HSW. 
Please see perf report log attached.
Comment 7 meng 2014-07-03 05:57:59 UTC
Created attachment 102180 [details]
perf report for HSW
Comment 8 meng 2014-07-03 05:58:30 UTC
Created attachment 102181 [details]
perf report for BDW
Comment 9 Chris Wilson 2014-07-03 08:36:34 UTC
I was wrong, that doesn't look like a kernel issue at all. Next you want to investigate why malloc() (called on behalf of Synmark) takes longer on bdw than on hsw - is it just called more frequently, or is each invocation slower?
Comment 10 Chris Wilson 2014-07-03 08:37:17 UTC
(Presuming that the relative increase in CPU overhead explains the perf drop, ofc.)
Comment 11 Eero Tamminen 2014-07-03 13:26:39 UTC
Looking at the perf reports...

Processes:

BDW:
 71.73%   417189 synmark2
 18.48%   107495 X
  7.38%    42920 gnome-shell
  1.84%    10673 swapper
  0.24%     1368 rcu_sched
HSW:
 59.63%   494907 synmark2
 27.77%   230493 X
  6.01%    49843 swapper
  5.58%    46285 gnome-shell
  0.60%     4957 rcu_sched

With BDW CPU utilization being 90% and HSW 40%, that means X using less CPU on HSW too, although its part of perf samples is larger. gnome-shell using significantly less CPU on HSW although it should be dealing with slightly more X damage events (with higher FPS) is a bit suspicious.


DSOs:

BDW:
 39.78%   231385 libc-2.17.so
 20.25%   117783 [kernel.kallsyms]
 17.71%   102981 i965_dri.so
  6.55%    38122 Xorg
  4.48%    26043 libpthread-2.17.so
  2.73%    15887 libxcb.so.1.1.0
  1.63%     9476 libX11.so.6.3.0
  1.25%     7299 libglib-2.0.so.0.3600.4
HSW:
 32.29%   268036 libc-2.17.so
 29.15%   241954 [kernel.kallsyms]
 11.84%    98267 Xorg
  8.80%    73022 i965_dri.so
  5.34%    44350 libpthread-2.17.so
  3.55%    29504 libxcb.so.1.1.0
  2.26%    18773 libX11.so.6.3.0
  1.01%     8363 libglib-2.0.so.0.3600.4

It's not just malloc which is used a lot more, the graphics driver does a lot more work on BDW.

Comparing all the i965 driver functions in the perf data, BDW does much more work with (anonymous namespace)::builtin_variable_generator::add_uniform() and hash table handling, whereas HSW driver does most work with _mesa_make_extension_string()???

One possibility is that one of the profiles wasn't taken from normal running of the test program, but e.g. from its startup or end.  Mengmeng?
Comment 12 meng 2014-07-04 01:24:19 UTC
(In reply to comment #11)
The perf data is collected between "after run" and "before end", which running about ~2 min.
BTW, due running longer from its config file, add "RenderingTime = 300.0".
Comment 13 Kenneth Graunke 2017-09-14 22:29:48 UTC
Nobody's looked at this bug in ages, and I'm not sure what it's trying to say.  High CPU usage when running a CPU-bound test makes sense.  We can optimize the microbenchmark, but I don't think any of the information here will help us do that.  We may as well just start over now that it's 2017.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.