In the Mir server code (DRI output) we call:
2. get the new front buffer
3. schedule a page flip: drmModePageFlip()
This works well, however if I force it to wait for the page flip immediately:
4. select() on the DRM fd and then drmHandleEvent()
then step 4 (under some rare but predictable rendering loads) takes 32ms to complete.
I've now confirmed it is just the page flip event that takes almost two frames to arrive. And there are two workarounds that seem to successfully kick the driver into action:
0. env INTEL_DEBUG=sync
Using either of these workarounds, rendering completes in about 1ms and select then returns the next page flip event (~16ms interval).
So it seems the intel batching logic is deferring rendering way too long, or the page flip event delivery is being deferred. However the two workarounds suggest the former.
Mesa 10.3.2-0ubuntu1 (Ubuntu 15.04 vivid)
Intel® HD Graphics 4600 (Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz)
I wonder if this is just a race?
If the page flip actually completes faster than select() takes to start up then that would explain it.
Hmm, I wonder if this is another case of the i915 driver not keeping the kernel awake enough? Although I found this bug on a reasonably powerful i7, I recently also found an intel sleep states bug that affects low-end chips:
Maybe these two bugs are in the same ballpark...?
Sounds like there is some related movement happening:
Although it kind of sounds like the problem might get worse rather than better. Not sure.
Digging in the kernel, there's some suspicious logic in the i915 driver (used by Mesa i965 etc):
/* Throttle our rendering by waiting until the ring has completed our requests
* emitted over 20 msec ago.
* Note that if we were to use the current jiffies each time around the loop,
* we wouldn't escape the function with any frames outstanding if the time to
* render a frame was over 20ms.
* This should get us reasonable parallelism between CPU and GPU but also
* relatively low latency when blocking on a particular request to finish.
i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
I think that's the problem. Maybe Mir is behaving so well that a single frame doesn't fill the ring (when Mir is only double buffering). So we have to rely on the 20ms delay in the i915 kernel module that causes us to skip a frame.
I'm still hoping to be wrong, and that this isn't a *feature* of the i915 kernel module.
(In reply to Daniel van Vugt from comment #2)
> Hmm, I wonder if this is another case of the i915 driver not keeping the
> kernel awake enough? Although I found this bug on a reasonably powerful i7,
> I recently also found an intel sleep states bug that affects low-end chips:
It's not i915 driver's responsibility to keep kernel unnecessarily awake (that would be a bug).
Launchpad bug is still mentioning CPU side power management in Ubuntu 16.04 4.4 kernel with 4.6 i915 driver backport.
Have you tried using "performance" pstate CPU governor instead of Ubuntu's default "powersave" pstate governor? That ramps CPU frequency up much faster and therefore there's less chance for vicious cycles where CPU & GPU both get further downclocked because the other side was so slow (due to already running at low frequency).
(In reply to Daniel van Vugt from comment #4)
> Digging in the kernel, there's some suspicious logic in the i915 driver
> (used by Mesa i965 etc):
> /* Throttle our rendering by waiting until the ring has completed our
> * emitted over 20 msec ago.
> * Note that if we were to use the current jiffies each time around the loop,
> * we wouldn't escape the function with any frames outstanding if the time to
> * render a frame was over 20ms.
> * This should get us reasonable parallelism between CPU and GPU but also
> * relatively low latency when blocking on a particular request to finish.
> static int
> i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
Did some digging in kernel & Mesa sources.
That function is implementation for DRM_I915_GEM_THROTTLE ioctl(). It's kind of fence, user-space process sleeps until GPU has caught up i.e. it doesn't throttle rendering of already submitted requests, it throttles submitting of more requests.
It's used by Mesa frame throttling when doing front buffer flushing, buffer swap throttling Mesa does by itself (every 2 frames).
Throttling is to make sure user-interface keeps interactive also for processes that have GPU-heavy frames. Without it many of them could be rendered without compositor being able to show them to user (if app fills GPU batch queue with heavy frames, compositor frame actually putting them on screen would come only after earlier frames in queue have been rendered).
For compositor itself it can make sense to disable this throttling. You can do that for swap buffer throttling with Mesa "disable_throttling=true" environment variable (I would assume it to work also from drirc).
The downstream bug was closed because no issue could be reproduced any more.