Bug 45289 - [snb] cpufreq doesn't take account of GPU activity
Summary: [snb] cpufreq doesn't take account of GPU activity
Status: CLOSED NOTOURBUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: high normal
Assignee: Jesse Barnes
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-01-26 15:22 UTC by Chris Wilson
Modified: 2017-07-24 23:02 UTC (History)
5 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Make sure RPS limit writes land (732 bytes, patch)
2012-04-24 14:00 UTC, Jesse Barnes
no flags Details | Splinter Review

Description Chris Wilson 2012-01-26 15:22:14 UTC
On i5-2520m, x11perf -getimage10 -getimage500:
 120000 reps @   0.0548 msec ( 18200.0/sec): GetImage 10x10 square
 120000 reps @   0.0259 msec ( 38700.0/sec): GetImage 10x10 square
 120000 reps @   0.0232 msec ( 43200.0/sec): GetImage 10x10 square
 120000 reps @   0.0546 msec ( 18300.0/sec): GetImage 10x10 square
 120000 reps @   0.0550 msec ( 18200.0/sec): GetImage 10x10 square
 600000 trep @   0.0427 msec ( 23400.0/sec): GetImage 10x10 square

   8000 reps @   1.1966 msec (   836.0/sec): GetImage 500x500 square
   8000 reps @   1.1982 msec (   835.0/sec): GetImage 500x500 square
   8000 reps @   1.1995 msec (   834.0/sec): GetImage 500x500 square
   8000 reps @   1.1966 msec (   836.0/sec): GetImage 500x500 square
   8000 reps @   1.1971 msec (   835.0/sec): GetImage 500x500 square
  40000 trep @   1.1976 msec (   835.0/sec): GetImage 500x500 square

But if you add a dummy load, 
yes > /dev/null & x11perf -getimage10 -getimage500:
 240000 reps @   0.0224 msec ( 44700.0/sec): GetImage 10x10 square
 240000 reps @   0.0223 msec ( 44900.0/sec): GetImage 10x10 square
 240000 reps @   0.0223 msec ( 44800.0/sec): GetImage 10x10 square
 240000 reps @   0.0223 msec ( 44700.0/sec): GetImage 10x10 square
 240000 reps @   0.0224 msec ( 44700.0/sec): GetImage 10x10 square
1200000 trep @   0.0223 msec ( 44800.0/sec): GetImage 10x10 square

  12000 reps @   0.4948 msec (  2020.0/sec): GetImage 500x500 square
  12000 reps @   0.4949 msec (  2020.0/sec): GetImage 500x500 square
  12000 reps @   0.4949 msec (  2020.0/sec): GetImage 500x500 square
  12000 reps @   0.4944 msec (  2020.0/sec): GetImage 500x500 square
  12000 reps @   0.4954 msec (  2020.0/sec): GetImage 500x500 square
  60000 trep @   0.4949 msec (  2020.0/sec): GetImage 500x500 square
Comment 1 Chris Wilson 2012-02-07 01:05:18 UTC
Contrary to earlier suggestions, this is not a BLT effect.
Comment 2 Daniel Vetter 2012-03-02 08:04:05 UTC
Paranoid check: Does this still happen with the autoreport_head disaster fixed?
Comment 3 Chris Wilson 2012-03-02 08:11:15 UTC
Yes.
Comment 4 Jesse Barnes 2012-03-02 10:09:18 UTC
these tests must not be triggering the counters we use for threshold interrupts with enough frequency to cause an interrupt.

You can play around with the EI and threshold values in the drps enable routine and see what works for you, but it'll have power and perf implications for other workloads as well.
Comment 5 Chris Wilson 2012-03-04 15:16:43 UTC
One bit of insight:

During slow getimage, render P-state: 133.
During fast getimage, render P-state: 131.
Comment 6 Daniel Vetter 2012-03-24 16:48:00 UTC
Maybe related, inverse bug though (failure to scale frequency down and hence burn through way too much power):

https://bugzilla.kernel.org/process_bug.cgi
Comment 7 Jesse Barnes 2012-04-24 14:00:26 UTC
Created attachment 60546 [details] [review]
Make sure RPS limit writes land
Comment 8 Chris Wilson 2012-04-24 14:21:54 UTC
A quietly documented side-effect of our force_wake_put routines is that they also perform a POSTING_READ.
Comment 9 Jesse Barnes 2012-04-24 14:33:29 UTC
Ok then I'm seeing a false positive here for that patch.

That said, writing these regs from the handler seems sketchy; I wouldn't be surprised if that were causing some trouble.
Comment 10 Chris Wilson 2012-04-26 07:19:21 UTC
Note this bug is only reproducible on SNB-mobile (and still present on the latest dinq+). Having just wasted some time trying to find the fix for SNB-desktop!

Note I can also reproduce this on an i7-3720m,
x11perf -getimage500: 1330.0/sec
yes > /dev/null & ":  3600.0/sec
Comment 11 Chris Wilson 2012-05-08 08:23:20 UTC
As suggested by Arjan, this could be considered an issue with the cpufreq governor not taking into account GPU activity, in particular CPU-GPU ping-pong.

With powersave:
  40000 trep @   1.2010 msec (   833.0/sec): GetImage 500x500 square
With performance:
  80000 trep @   0.4062 msec (  2460.0/sec): GetImage 500x500 square
Comment 12 Daniel Vetter 2012-05-08 09:15:38 UTC
And we have a patch for that. Please try http://cgit.freedesktop.org/~danvet/drm/log/?h=better-gpu_cpufreq
Comment 13 Chris Wilson 2012-05-08 10:00:37 UTC
With better-gpu-cpufreq + ondemand governor:
  40000 trep @   1.1543 msec (   866.0/sec): GetImage 500x500 square
Comment 14 Chris Wilson 2012-05-08 10:24:10 UTC
perf timechart sleep 0.5 during x11perf -getimage500 with better-gpu-cpufreq: http://people.freedesktop.org/~ickle/x11perf.svg
Comment 15 Daniel Vetter 2012-05-08 11:30:20 UTC
Partial fix:

numactl -C 3&   Xrun x11perf -getimage500
1340.0/sec

Compared to on my system:
yes > /dev/null &   Xrun x11perf -getimage500
1620.0/sec
just with ondemand governor: 569.0/sec
with performance governor: 1920.0/sec

None of this was with the gpu_wait_begin/end patch applied.
Comment 16 Daniel Vetter 2012-05-08 11:44:36 UTC
Hm, I've just retested with ondemand and the gpu-iowait patch applied, and it seems to work pretty well (albeit not perfectly) here. I can reach the top throughput of the performance governor, but jitters a bit:

  12000 reps @   0.5884 msec (  1700.0/sec): GetImage 500x500 square
  12000 reps @   0.5192 msec (  1930.0/sec): GetImage 500x500 square
  12000 reps @   0.5274 msec (  1900.0/sec): GetImage 500x500 square
  12000 reps @   0.5429 msec (  1840.0/sec): GetImage 500x500 square
  12000 reps @   0.6336 msec (  1580.0/sec): GetImage 500x500 square
  60000 trep @   0.5623 msec (  1780.0/sec): GetImage 500x500 square

1930.0/sec is the (pretty constant) peak throughput with the performance governor.
Comment 17 Daniel Vetter 2012-05-08 12:40:24 UTC
Ok, time for some embarrassement: The perf improvements I've seen have only been due to the possible imbalance in the rq->nr_iowait accounting. With the fixed v2 I don't see any speedup at all :(
Comment 18 Chris Wilson 2012-05-08 13:09:37 UTC
Looking at the trace of a getimage500 shows that we are not blocking on iowait at all. Reading the debug log confirms this, only the first read is from the GPU dirty buffer with subsequent reads already in CPU domain. So the issue turns out to be the ondemand heuristics prevent the CPU upclocking for the ping-pong between x11perf/X.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.