86366 – Slow i965 performance (30 FPS), but super-fast with INTEL_DEBUG=sync

Bug 86366 - Slow i965 performance (30 FPS), but super-fast with INTEL_DEBUG=sync

Summary: Slow i965 performance (30 FPS), but super-fast with INTEL_DEBUG=sync

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/DRI/i965 (show other bugs)
Version:	10.3
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Ian Romanick
QA Contact:	Intel 3D Bugs Mailing List

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-11-17 03:43 UTC by Daniel van Vugt
Modified:	2018-02-28 03:30 UTC (History)
CC List:	1 user (show)

See Also:	https://launchpad.net/bugs/1377872
i915 platform:
i915 features:

Attachments

Description Daniel van Vugt 2014-11-17 03:43:35 UTC

In the Mir server code (DRI output) we call:
  1. eglSwapBuffers()
  2. get the new front buffer
  3. schedule a page flip: drmModePageFlip()

This works well, however if I force it to wait for the page flip immediately:
  4. select() on the DRM fd and then drmHandleEvent()
then step 4 (under some rare but predictable rendering loads) takes 32ms to complete.

I've now confirmed it is just the page flip event that takes almost two frames to arrive. And there are two workarounds that seem to successfully kick the driver into action:
  3.5. glFinish()
or
  0. env INTEL_DEBUG=sync
Using either of these workarounds, rendering completes in about 1ms and select then returns the next page flip event (~16ms interval).

So it seems the intel batching logic is deferring rendering way too long, or the page flip event delivery is being deferred. However the two workarounds suggest the former.

Using:
Mesa 10.3.2-0ubuntu1 (Ubuntu 15.04 vivid)
Intel® HD Graphics 4600 (Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz)

Comment 1 Daniel van Vugt 2014-11-17 04:04:10 UTC

I wonder if this is just a race?

If the page flip actually completes faster than select() takes to start up then that would explain it.

Comment 2 Daniel van Vugt 2015-02-11 07:18:05 UTC

Hmm, I wonder if this is another case of the i915 driver not keeping the kernel awake enough? Although I found this bug on a reasonably powerful i7, I recently also found an intel sleep states bug that affects low-end chips:
https://bugs.launchpad.net/mir/+bug/1388490

Maybe these two bugs are in the same ballpark...?

Comment 3 Daniel van Vugt 2015-03-10 07:50:21 UTC

Sounds like there is some related movement happening:
https://nouveau.freedesktop.org/patch/40616/
http://patchwork.freedesktop.org/patch/44172/

Although it kind of sounds like the problem might get worse rather than better. Not sure.

Comment 4 Daniel van Vugt 2015-03-10 08:45:54 UTC

Digging in the kernel, there's some suspicious logic in the i915 driver (used by Mesa i965 etc):

/* Throttle our rendering by waiting until the ring has completed our requests
 * emitted over 20 msec ago.
 *
 * Note that if we were to use the current jiffies each time around the loop,
 * we wouldn't escape the function with any frames outstanding if the time to
 * render a frame was over 20ms.
 *
 * This should get us reasonable parallelism between CPU and GPU but also
 * relatively low latency when blocking on a particular request to finish.
 */
static int
i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)


I think that's the problem. Maybe Mir is behaving so well that a single frame doesn't fill the ring (when Mir is only double buffering). So we have to rely on the 20ms delay in the i915 kernel module that causes us to skip a frame.

I'm still hoping to be wrong, and that this isn't a *feature* of the i915 kernel module.

Comment 5 Eero Tamminen 2016-09-12 14:23:53 UTC

(In reply to Daniel van Vugt from comment #2)
> Hmm, I wonder if this is another case of the i915 driver not keeping the
> kernel awake enough? Although I found this bug on a reasonably powerful i7,
> I recently also found an intel sleep states bug that affects low-end chips:
> https://bugs.launchpad.net/mir/+bug/1388490

It's not i915 driver's responsibility to keep kernel unnecessarily awake (that would be a bug).

Launchpad bug is still mentioning CPU side power management in Ubuntu 16.04 4.4 kernel with 4.6 i915 driver backport.

Have you tried using "performance" pstate CPU governor instead of Ubuntu's default "powersave" pstate governor?  That ramps CPU frequency up much faster and therefore there's less chance for vicious cycles where CPU & GPU both get further downclocked because the other side was so slow (due to already running at low frequency).


(In reply to Daniel van Vugt from comment #4)
> Digging in the kernel, there's some suspicious logic in the i915 driver
> (used by Mesa i965 etc):
> 
> /* Throttle our rendering by waiting until the ring has completed our
> requests
>  * emitted over 20 msec ago.
>  *
>  * Note that if we were to use the current jiffies each time around the loop,
>  * we wouldn't escape the function with any frames outstanding if the time to
>  * render a frame was over 20ms.
>  *
>  * This should get us reasonable parallelism between CPU and GPU but also
>  * relatively low latency when blocking on a particular request to finish.
>  */
> static int
> i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)

Did some digging in kernel & Mesa sources.

That function is implementation for DRM_I915_GEM_THROTTLE ioctl().  It's kind of fence, user-space process sleeps until GPU has caught up i.e. it doesn't throttle rendering of already submitted requests, it throttles submitting of more requests.

It's used by Mesa frame throttling when doing front buffer flushing, buffer swap throttling Mesa does by itself (every 2 frames).

Throttling is to make sure user-interface keeps interactive also for processes that have GPU-heavy frames.  Without it many of them could be rendered without compositor being able to show them to user (if app fills GPU batch queue with heavy frames, compositor frame actually putting them on screen would come only after earlier frames in queue have been rendered).

For compositor itself it can make sense to disable this throttling.  You can do that for swap buffer throttling with Mesa "disable_throttling=true" environment variable (I would assume it to work also from drirc).

Comment 6 Daniel van Vugt 2018-02-28 03:30:16 UTC

The downstream bug was closed because no issue could be reproduced any more.

https://launchpad.net/bugs/1377872

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.