Summary: | [snb] TearFree now triggers stuck semaphores (due to Eliminate the synchronous wait from inside TearFree) | ||||||
---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | FBrown <francisbrwn9> | ||||
Component: | Driver/intel | Assignee: | Chris Wilson <chris> | ||||
Status: | RESOLVED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||
Severity: | normal | ||||||
Priority: | low | CC: | inbox, samuel | ||||
Version: | git | ||||||
Hardware: | x86-64 (AMD64) | ||||||
OS: | Linux (All) | ||||||
Whiteboard: | |||||||
i915 platform: | i915 features: | ||||||
Attachments: |
|
Description
FBrown
2013-10-22 14:38:45 UTC
My error. Problematic commit actually seems to be the previous one. sna: Eliminate the synchronous wait from inside TearFree The stuck semaphore is an old, old hw issue. Are you using TearFree? (In reply to comment #2) > The stuck semaphore is an old, old hw issue. Are you using TearFree? Yes, as get tearing on video without it. See #54226 for the stuck semaphore issue. I will have a think about why the update made it easier to hit. Thanks. To the best of my knowledge I've not "hit" that error on this hardware before through multiple distribution and major kernel version upgrades. May be an old problem per se, but new with that commit in this case. In recent kernels (by which I mean drm-intel-nightly), hangcheck will capture the error upon the first struck semaphore. That /sys/class/drm/card0/error would be useful to see when the error occurs. (I'm wondering if this something funky with the pageflip for instance.) (In reply to comment #6) > In recent kernels (by which I mean drm-intel-nightly), hangcheck will > capture the error upon the first struck semaphore. That > /sys/class/drm/card0/error would be useful to see when the error occurs. > (I'm wondering if this something funky with the pageflip for instance.) I'll grab a build from here http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly/current/ and try that at some point later today then. Created attachment 88007 [details]
crash log on drm-intel-nightly
OK. Following.
[drm] stuck on render ring
[ 610.496014] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 610.496015] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 610.496015] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 610.496016] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 610.496016] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
crash dump is attached
Doesn't seem to be an extraordinary hang, so it would seem to be something peculiar to the timing of the new rendering. Not intentionally targetting this bug, but since this should affect timing subtly might as well see if it has any impact here: commit 247ebf86a32b518536a11c2f866b2c885bd55d38 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Oct 22 07:58:03 2013 +0100 sna: Only force the TearFree exchange before a write We can read from the current scanout buffer without incurring a visual tear, so delay the GPU bo exchange until we see a write to the object. Recompiled up to latest git and seems to significantly better, although not perfect. Whereas before it would hang roughly once every 3-4 mins, now only had a couple in 30-40 mins. Is there any more info I can give to help with this? If not, then for now I would prefer to revert back to the last non buggy commit or stable snapshot. One last thing you can test is: diff --git a/src/sna/sna_display.c b/src/sna/sna_display.c index 7fcade6..1ae9ec0 100644 --- a/src/sna/sna_display.c +++ b/src/sna/sna_display.c @@ -4303,12 +4303,14 @@ static bool wait_for_shadow(struct sna *sna, struct sna_pixmap *priv, unsigned f sna_mode_wakeup(sna); if (sna->mode.shadow_flip) { - bo = kgem_create_2d(&sna->kgem, - pixmap->drawable.width, - pixmap->drawable.height, - pixmap->drawable.bitsPerPixel, - priv->gpu_bo->tiling, - CREATE_EXACT | CREATE_SCANOUT); + bo = NULL; + if (0) + bo = kgem_create_2d(&sna->kgem, + pixmap->drawable.width, + pixmap->drawable.height, + pixmap->drawable.bitsPerPixel, + priv->gpu_bo->tiling, + CREATE_EXACT | CREATE_SCANOUT); if (bo != NULL) { DBG(("%s: replacing still-attached GPU bo\n", __FUNCTION__)); applied to commit b70390b4f824afeed8f1b5a9baf79e33377405cd Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Oct 23 11:15:28 2013 +0100 sna: Reset bo after allocation failure during wait-for-shadow OK. When I see that commit in the repo I'll give the patch at try. Oops, should be there now. It is. I'll pull the changes in, apply that patch and test. Thanks. Patch does not seem to apply on top of that commit? Could just be me, as I don't do these things manually very often, but can you re-check? Ok, try this simpler one instead: diff --git a/src/sna/sna_display.c b/src/sna/sna_display.c index 7fcade6..5c04d78 100644 --- a/src/sna/sna_display.c +++ b/src/sna/sna_display.c @@ -4299,7 +4299,7 @@ static bool wait_for_shadow(struct sna *sna, struct sna_pixmap *priv, unsigned f damage = sna->mode.shadow_damage; sna->mode.shadow_damage = NULL; - while (sna->mode.shadow_flip && sna_mode_has_pending_events(sna)) + while (sna->mode.shadow_flip && sna_mode_wait_for_event(sna)) sna_mode_wakeup(sna); if (sna->mode.shadow_flip) { Have both patchs applying now. Can I just check whether you want both applied to test, or just one of the 2? They both have the same effect, and can be used combined or individually. Definitely a marked improvement with those patches. Not perfect, as had one hang on DE login and a couple of others over the last day, but still a very significant reduction making. So not fixed, but you would have to say the desktop is now usable if you accept the very rare hang. As a note, now reverted and turned off tearfree via xorg.conf, as I noticed KWIN has some newer anti-tearing options that I had not tried before. Tearing prevention (vsync) set at "full screen repaints" in KWIN seems to achieve much the same result as tearfree via xorg/driver option, which seems to make sense to me. Yes, using the tear-free mode of the compositor (if you are using a compositor) is the preferrable solution. The driver option is really for only those people who insist on not using a compositor, but still demand tearfree, or if they require rotated outputs (which cannot yet be done with xrandr & compositing managers). Thanks for the feedback, I shall try and work out why that particular copy is more prone to triggering the bad semaphore update. Yes, but unfortunately prior to the latest KWIN updates I had tearing under KWIN no matter what options were selected. Hence needing it enabled by the driver, which granted was not ideal. KWIN in KDE 4.11 introduced more options for tearing prevention, rather than a just a toggle on/off for vsync, and one of those options seems to work nicely with this hardware allowing me to dispense with the driver tearfree option (I hope). Thanks for the assistance, and hopefully has been useful regardless. I've been tweaking TearFree (in xf86-video-intel.git) to avoid more copies, any chance you can see if that helps? Have been using KDE's own tear free compositing options since this bug, but if I can set things back to how they originally were using the xorg.conf tear free instead, then I can have a test sometime in the next week. With the caveat of course that my system, kernel, org, etc etc have gone through at least one version upgrade since I originally reported. *** Bug 84755 has been marked as a duplicate of this bug. *** I believe with commit b47161858ba13c9c7e03333132230d66e008dd55 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Apr 27 13:41:17 2015 +0100 drm/i915: Implement inter-engine read-read optimisations the problem is reduce to the normal semaphores bug 54226 (i.e. frequency should be reduced). So closing this instance. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.