Summary: | Slow i965 performance (30 FPS), but super-fast with INTEL_DEBUG=sync | ||
---|---|---|---|
Product: | Mesa | Reporter: | Daniel van Vugt <daniel.van.vugt> |
Component: | Drivers/DRI/i965 | Assignee: | Ian Romanick <idr> |
Status: | RESOLVED WORKSFORME | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | eero.t.tamminen |
Version: | 10.3 | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
See Also: | https://launchpad.net/bugs/1377872 | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
Daniel van Vugt
2014-11-17 03:43:35 UTC
I wonder if this is just a race? If the page flip actually completes faster than select() takes to start up then that would explain it. Hmm, I wonder if this is another case of the i915 driver not keeping the kernel awake enough? Although I found this bug on a reasonably powerful i7, I recently also found an intel sleep states bug that affects low-end chips: https://bugs.launchpad.net/mir/+bug/1388490 Maybe these two bugs are in the same ballpark...? Sounds like there is some related movement happening: https://nouveau.freedesktop.org/patch/40616/ http://patchwork.freedesktop.org/patch/44172/ Although it kind of sounds like the problem might get worse rather than better. Not sure. Digging in the kernel, there's some suspicious logic in the i915 driver (used by Mesa i965 etc): /* Throttle our rendering by waiting until the ring has completed our requests * emitted over 20 msec ago. * * Note that if we were to use the current jiffies each time around the loop, * we wouldn't escape the function with any frames outstanding if the time to * render a frame was over 20ms. * * This should get us reasonable parallelism between CPU and GPU but also * relatively low latency when blocking on a particular request to finish. */ static int i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file) I think that's the problem. Maybe Mir is behaving so well that a single frame doesn't fill the ring (when Mir is only double buffering). So we have to rely on the 20ms delay in the i915 kernel module that causes us to skip a frame. I'm still hoping to be wrong, and that this isn't a *feature* of the i915 kernel module. (In reply to Daniel van Vugt from comment #2) > Hmm, I wonder if this is another case of the i915 driver not keeping the > kernel awake enough? Although I found this bug on a reasonably powerful i7, > I recently also found an intel sleep states bug that affects low-end chips: > https://bugs.launchpad.net/mir/+bug/1388490 It's not i915 driver's responsibility to keep kernel unnecessarily awake (that would be a bug). Launchpad bug is still mentioning CPU side power management in Ubuntu 16.04 4.4 kernel with 4.6 i915 driver backport. Have you tried using "performance" pstate CPU governor instead of Ubuntu's default "powersave" pstate governor? That ramps CPU frequency up much faster and therefore there's less chance for vicious cycles where CPU & GPU both get further downclocked because the other side was so slow (due to already running at low frequency). (In reply to Daniel van Vugt from comment #4) > Digging in the kernel, there's some suspicious logic in the i915 driver > (used by Mesa i965 etc): > > /* Throttle our rendering by waiting until the ring has completed our > requests > * emitted over 20 msec ago. > * > * Note that if we were to use the current jiffies each time around the loop, > * we wouldn't escape the function with any frames outstanding if the time to > * render a frame was over 20ms. > * > * This should get us reasonable parallelism between CPU and GPU but also > * relatively low latency when blocking on a particular request to finish. > */ > static int > i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file) Did some digging in kernel & Mesa sources. That function is implementation for DRM_I915_GEM_THROTTLE ioctl(). It's kind of fence, user-space process sleeps until GPU has caught up i.e. it doesn't throttle rendering of already submitted requests, it throttles submitting of more requests. It's used by Mesa frame throttling when doing front buffer flushing, buffer swap throttling Mesa does by itself (every 2 frames). Throttling is to make sure user-interface keeps interactive also for processes that have GPU-heavy frames. Without it many of them could be rendered without compositor being able to show them to user (if app fills GPU batch queue with heavy frames, compositor frame actually putting them on screen would come only after earlier frames in queue have been rendered). For compositor itself it can make sense to disable this throttling. You can do that for swap buffer throttling with Mesa "disable_throttling=true" environment variable (I would assume it to work also from drirc). The downstream bug was closed because no issue could be reproduced any more. https://launchpad.net/bugs/1377872 |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.