Created attachment 54019 [details] xorg.log Originally reported at: https://lkml.org/lkml/2011/12/1/208 > - Can you check whether upgrading the ddx Sorry, what is ddx? > - If you're using swap, can you check whether disabling it works around > the issue? No, I have no swap.
Created attachment 54020 [details] full dmesg
Created attachment 54021 [details] intel_reg_dumper output
ddx = X driver = xf86-video-intel, 2.17 is the latest release.
Created attachment 54023 [details] xorg with ddx 2.17
It still happens with 2.17. Do you need reg dump with the driver?
Created attachment 54115 [details] i915_error_state after "Bad Tiling" error I received a "bad tiling" crash on a Thinkpad T60, with a 945GM chipset, while trying to toy around with the i915 gallium driver. 00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller [8086:27a2] (rev 03) My memory configuration is dual channel asymmetric (2GB + 1GB), currently using an external monitor at 1920x1080 connected via DVI. The kernel messages are [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state render error detected, EIR: 0x00000010 page table error PGTBL_ER: 0x00000040 [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking render error detected, EIR: 0x00000010 page table error PGTBL_ER: 0x00000040 [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 19397591 at 19397588, next 19397602) [drm:i915_reset] *ERROR* Failed to reset chip. After that error, I observed minor redraw issues in the GUI, probably related to the crashed GPU, but X11 is still mostly working, which is kind-of surprising to me (I am using xfce4 without compiz at the moment, so no 3D/compositing should be active). The new attachment shows the i915_error_state output.
Created attachment 54116 [details] xrandr output for details of the display config
Created attachment 54117 [details] xrandr output for details of the display config (this time as text/plain, sorry)
Accompanied with the GPU crash is the X server disabling acceleration (which explains why X still works): [ 28347.679] [mi] EQ overflowing. The server is probably stuck in an infinite loop. [ 28347.679] Backtrace: [ 28347.830] 0: /usr/bin/Xorg (xorg_backtrace+0x26) [0x7f160d4e58f6] [ 28347.830] 1: /usr/bin/Xorg (mieqEnqueue+0x191) [0x7f160d4c6201] [ 28347.830] 2: /usr/bin/Xorg (0x7f160d361000+0x65224) [0x7f160d3c6224] [ 28347.830] 3: /usr/bin/Xorg (xf86PostMotionEventP+0x4a) [0x7f160d400b4a] [...] [ 28347.831] 22: /usr/bin/Xorg (0x7f160d361000+0x414ad) [0x7f160d3a24ad] [ 28348.024] (EE) intel(0): Detected a hung GPU, disabling acceleration. [ 28348.024] (EE) intel(0): When reporting this, please include i915_error_state from debugfs and the full dmesg. (dmesg and i915_error_state are already quoted)
I still see the invalid tiles, should I try to update ddx (and depending libdrm)? Note that the two other reports seem to be completely different. My GPU does not get stuck. IT just misrenderes e.g. some map tiles when browsing mapy.cz.
Michael Kracher, can you please file a separate bug for your issue?
Created attachment 55910 [details] example corruption linked from the lkml discussion
I bisected the driver. It lead me to this commit from 2.13: commit cc930a37612341a1f2457adb339523c215879d82 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sun Nov 14 19:47:00 2010 +0000 uxa: Relax fencing some more for gen3 If I revert that on the top of 2.17.0, it works fine. So far.
(In reply to comment #13) > If I revert that on the top of 2.17.0, it works fine. So far. Heh, this really fixed all the symptoms: * bad tiles in maps * 2 GPU hangs I encountered a week * video stops playing in kaffeine when browsing maps or doing other gfx intensive work
Jiri, what GPU hangs? You haven't attached any example i915_error_states? There is one particularly nasty bug, which has been reported to cause tiling corruption, and is likely to be the culprit here. Turning off relaxed-fencing is just likely to lower the reuse rate of bo, increase aperture thrashing and just make that exact path harder to hit (in particular it will eliminate the reuse of render targets as batch buffers which is the crux of the hangs). But if my guess is correct, and there is so far no evidence to suggest otherwise ;-), then it won't eliminate the risk of that hang entirely.
(In reply to comment #15) > Jiri, what GPU hangs? Oh, I thought there is a link. Apparently not. Here it is: https://lkml.org/lkml/2012/1/24/165 > You haven't attached any example i915_error_states? There is one particularly nasty bug, which has been reported to cause tiling > corruption, and is likely to be the culprit here. Do you mean "drm/i915: Only clear the GPU domains upon a successful finish"? Without that patch I see daily GPU hangs. With that patch this was reduced to 2 hangs per week. > Turning off relaxed-fencing is just likely to lower the reuse rate of bo, increase aperture thrashing and > just make that exact path harder to hit (in particular it will eliminate the reuse of render targets as batch buffers which is the crux of the hangs). But if > my guess is correct, and there is so far no evidence to suggest otherwise ;-), then it won't eliminate the risk of that hang entirely. Ok, if you have any ideas what to test, let me know. The revert, as a workaround, allows me to work at least :).
Can you please attach an example of the current hangs with the finish-gpu patch applied? I'm hoping that they follow a different pattern and are either a userspace driver bug, or might shed light on the use-after-free bug that we have been theorizing exists in the kernel. Or it could be much more mundane.
Created attachment 56981 [details] [review] Different kind of finish_gpu patch Can you also try this patch _instead_ of the finish_gpu one you're currently using? If this also ends up in a gpu hang, please attach the error_state.
(In reply to comment #17) > Can you please attach an example of the current hangs with the finish-gpu patch applied? I'm hoping that they follow a different pattern and are either a > userspace driver bug, or might shed light on the use-after-free bug that we have been theorizing exists in the kernel. Or it could be much more mundane. There is a link to one in the lkml post: http://www.fi.muni.cz/~xslaby/sklad/panics/915_error_state This *is* with the patch applied.
(In reply to comment #18) > Created attachment 56981 [details] [review] [review] > Different kind of finish_gpu patch > > Can you also try this patch _instead_ of the finish_gpu one you're currently using? If this also ends up in a gpu hang, please attach the error_state. This also causes bad tiles. I will report if this causes GPU hangs in few days (not so easy to reproduce).
The GPU state looks internally consistent, I haven't spotted the error that is causing it to hang. Which is why I want more error states to see if the pattern is the same, or to see if the problem becomes more apparent.
Created attachment 57123 [details] error state from today
(In reply to comment #21) > The GPU state looks internally consistent, I haven't spotted the error that is causing it to hang. Which is why I want more error states to see if the pattern > is the same, or to see if the problem becomes more apparent. Ok, I attached one that happened few minutes ago. This is with patch from comment #18. Do you want more with the "drm/i915: Only clear the GPU domains upon a successful finish" or it doesn't matter which patch is applied? And yes, misrendered tiles are definitely bound to these GPU hangs. I saw more and more such tiles on the maps so I tried the usual trigger of a GPU hang -- open iGoogle page in firefox. And voilà, it indeed hanged ;).
And if it is important dmesg said for the state in comment #22: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 6822006 at 6821999, next 6822007) [drm:i915_reset] *ERROR* Failed to reset chip.
Here we go: buffer: 0b000000 8192 0006 0000 00681876 P X dirty render uncached (fence: 8) fence[8] = 0b000001 valid, x-tiled, pitch: 512, start: 0x03000000, size: 104857 but used consistently within the batch as 0x0a200b80: 0x54300004: XY_COLOR_BLT (rgb enabled, alpha enabled, src tile 0, dst tile 0) 0x0a200b84: 0x03f000c0: format 8888, pitch 192, rop 0xf0, clipping disab led, 0x0a200b88: 0x00000000: (0,0) 0x0a200b8c: 0x00250028: (40,37) 0x0a200b90: 0x0b000000: offset 0x0b000000 0x0a200b94: 0x00000000: color or 0x0a200a34: 0x7d8e0001: 3DSTATE_BUFFER_INFO 0x0a200a38: 0x03000040: color, tiling = none, pitch=64 0x0a200a3c: 0x0b000000: address i.e.as an untiled temporary render target. So it looks like this is entirely an ddx vs kernel confusion. The ddx believes that it has an untiled buffer, but the kernel is insistent that it never received the command to clear the tiling.
Created attachment 57231 [details] error state from today Today, another one. I suppose you don't need more of them (I switched back to the driver with the workaround)? [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 389582 at 389576, next 389583) [drm:i915_reset] *ERROR* Failed to reset chip.
Ok, that error state confirms the pattern. It is dieing on a BLT command that conflicts with the fence registers.
I've looked again at the example corruption image and some wrong fencing (it looks like broken stride with the corruption nicely aligned to X-tiled tiles) looks most plausible.
Today I hit the following warning in the kernel. Probably after a GPU hang: WARN_ON(dev_priv->fence_regs[obj->fence_reg].pin_count);
Can you please attach the entire backtrace? Am 21.02.2012 12:09 schrieb <bugzilla-daemon@freedesktop.org>: > https://bugs.freedesktop.org/show_bug.cgi?id=43427 > > --- Comment #29 from Jiri Slaby <jirislaby@gmail.com> 2012-02-21 03:09:33 > PST --- > Today I hit the following warning in the kernel. Probably after a GPU hang: > WARN_ON(dev_priv->fence_regs[obj->fence_reg].pin_count); > > -- > Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are on the CC list for the bug. > You are the assignee for the bug. >
(In reply to comment #30) > Can you please attach the entire backtrace? Unfortunately no, because it was not logged :( (and I expected it to be). So I remembered only the line. And after reboot I looked into the code and pasted the WARN here.
Created attachment 57409 [details] [review] Maintain fenced gpu access until flushed Hmm, once upon a time I thought this was a required bug fix. So probably it still is relevant.
(In reply to comment #32) > Created attachment 57409 [details] [review] [review] > Maintain fenced gpu access until flushed I suppose I should apply that instead of the patch from comment #18 and in companion with "drm/i915: Only clear the GPU domains upon a successful finish", right?
> --- Comment #33 from Jiri Slaby <jirislaby@gmail.com> 2012-02-22 00:28:48 PST --- > (In reply to comment #32) >> Created attachment 57409 [details] [review] [review] >> Maintain fenced gpu access until flushed > > I suppose I should apply that instead of the patch from comment #18 and in > companion with "drm/i915: Only clear the GPU domains upon a successful finish", > right? I think both this patch alone and this patch + "drm/i915: Only clear the GPU domains upon a successful finish" are interesting combinations, so please try both of them.
Created attachment 57711 [details] 915_error_state with "Maintain fenced gpu access until flushed" only (In reply to comment #34) > I think both this patch alone This means a death within hours. An error state attached. Dmesg: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 286724 at 286718, next 286726) [drm:i915_reset] *ERROR* Failed to reset chip. > and this patch + "drm/i915: Only clear > the GPU domains upon a successful finish" are interesting > combinations Now running a kernel with both of them, bad tiles in maps are still there. If this leads to a GPU hang, I will report l8r.
Created attachment 57827 [details] 915_error_state with both patches (In reply to comment #35) > Now running a kernel with both of them, bad tiles in maps are still there. If > this leads to a GPU hang, I will report l8r. Yes, with both patches applied, I still get GPU hangs like this: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 2215296 at 2215286, next 2215297) [drm:i915_reset] *ERROR* Failed to reset chip.
(In reply to comment #30) > Can you please attach the entire backtrace? Here you are: WARNING: at drivers/gpu/drm/i915/i915_gem.c:2368 i915_gem_object_put_fence+0xbd/0xd0() Hardware name: To Be Filled By O.E.M. Modules linked in: pl2303 usbserial microcode Pid: 4287, comm: Xorg Not tainted 3.3.0-rc5-next-20120227_64+ #1655 Call Trace: [<ffffffff81065b6a>] warn_slowpath_common+0x7a/0xb0 [<ffffffff81065bb5>] warn_slowpath_null+0x15/0x20 [<ffffffff8134c73d>] i915_gem_object_put_fence+0xbd/0xd0 [<ffffffff8134dbef>] i915_gem_object_unbind+0x7f/0x1b0 [<ffffffff8134dd3a>] i915_gem_free_object_tail+0x1a/0xd0 [<ffffffff81350651>] i915_gem_free_object+0x51/0x60 [<ffffffff813261d5>] drm_gem_object_free+0x25/0x40 [<ffffffff81359e18>] intel_user_framebuffer_destroy+0x68/0x70 [<ffffffff813343a3>] drm_fb_release+0x83/0xb0 [<ffffffff81325e58>] drm_release+0x5d8/0x6d0 [<ffffffff81121372>] fput+0xe2/0x250 [<ffffffff8111dd21>] filp_close+0x61/0x90 [<ffffffff81069270>] put_files_struct+0x80/0xe0 [<ffffffff81069375>] exit_files+0x45/0x50 [<ffffffff81069d53>] do_exit+0x683/0x900 [<ffffffff8113c09f>] ? mntput+0x1f/0x30 [<ffffffff81121439>] ? fput+0x1a9/0x250 [<ffffffff8163bb14>] ? __schedule+0x294/0x670 [<ffffffff8106a30f>] do_group_exit+0x3f/0xb0 [<ffffffff8106a392>] sys_exit_group+0x12/0x20 [<ffffffff8163d7a2>] system_call_fastpath+0x16/0x1b
I think I may have stumbled upon something... I've put some patches up at http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=for-jiri of particular interest is http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-jiri&id=79710e6ccabdac80c65cd13b944695ecc3e42a9d The problem that I spotted is that a batch with an unfenced BLT command is not marked with fenced_gpu_access which means that we think we can modify the fence whilst that command is in flight. obj->fenced_gpu_access |= obj->pending_fenced_gpu_access I think was a partial solution to that problem without. So the key change in that patch is @@ -494,12 +493,12 @@ pin_and_fence_object(struct drm_i915_gem_object *obj, entry->flags |= __EXEC_OBJECT_HAS_FENCE; i915_gem_object_pin_fence(obj); } else { - ret = i915_gem_object_put_fence(obj); + ret = i915_gem_object_put_fence(obj, ring); if (ret) goto err_unpin; } + obj->pending_fenced_gpu_access = true; } - obj->pending_fenced_gpu_access = need_fence; } (with some supporting chunks required, the rest were trying to make pipelined fencing happy.)
Created attachment 58757 [details] [review] Mark untiled BLT commands as fenced
(In reply to comment #38) > I think I may have stumbled upon something... Bad tiles in maps are gone with any of: - with your kernel and ddx 2.17 - with 3.3.0-rc7-next-20120319 and ddx 2.18 I believe the GPU hangs are connected to bad tiles in maps. So I would say it is fixed. And I would say something in between 2.17 and 2.18 made the problem harder to reproduce. (Or fixed it differently.) Because the problem is gone with unpatched kernel, but with 2.18 used.
Jiri, do you mind attaching the Xorg.log from 2.17.0 and 2.18.0? From the other bug, it seems SNA was switched on for 2.18.0 and I was to confirm that and that 2.17.0 is UXA. (From other reports, SNA is a lot more resilient to tiling corruption than UXA. The only significant difference there would be the buffer management resulting in different usage patterns, I guess.)
Created attachment 58782 [details] xorg.log with 2.18 (UXA) (In reply to comment #41) > Jiri, do you mind attaching the Xorg.log from 2.17.0 and 2.18.0? So this is 2.18 compiled from git (which doesn't crash -- bug 47597); without SNA support. With my -next kernel, I cannot reproduce here. There is xorg log as attachment 24023 [details] already. Do you need a fresh one?
No, just trying to identify the commit that likely changed the behaviour from 2.17 with UXA (crash/tiling corruption) to 2.18 with UXA (stable).
*** Bug 47398 has been marked as a duplicate of this bug. ***
(In reply to comment #43) > No, just trying to identify the commit that likely changed the behaviour from > 2.17 with UXA (crash/tiling corruption) to 2.18 with UXA (stable). Scratch that. Today I got the tiling problem with 2.18+UXA+unpatched_kernel.
Just to check: Has this tiling issue ever showed up with the "Mark untiled BLT commands as fenced" kernel patch?
(In reply to comment #46) > Just to check: Has this tiling issue ever showed up with the "Mark untiled BLT > commands as fenced" kernel patch? No, I haven't seen it since then.
commit 7dd4906586274f3945f2aeaaa5a33b451c3b4bba Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Mar 21 10:48:18 2012 +0000 drm/i915: Mark untiled BLT commands as fenced on gen2/3 The BLT commands on gen2/3 utilize the fence registers and so we cannot modify any fences for the object whilst those commands are in flight. Currently we marked tiled commands as occupying a fence, but forgot to restrict the untiled commands from preventing a fence being assigned before they were completed. One side-effect is that we ten have to double check that a fence was allocated for a fenced buffer during move-to-active. Reported-by: Jiri Slaby <jirislaby@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43427 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47990 Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Testcase: i-g-t/tests/gem_tiled_after_untiled_blt Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(In reply to comment #48) > commit 7dd4906586274f3945f2aeaaa5a33b451c3b4bba > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Wed Mar 21 10:48:18 2012 +0000 > > drm/i915: Mark untiled BLT commands as fenced on gen2/3 Bad news. This version of patch causes a regression during resume. It looks like the console is not switched back to X. I still see the kernel messages. If I revert 7dd49065862 and apply ttp://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=for-jiri&id=79710e6ccabdac80c65cd13b944695ecc3e42a9d instead, it works. Bisection log if you care. (Crap is that something around 3.4-rc1 does not boot here so this was not easy to find :P.) git bisect start '--' 'drivers/gpu/drm/' # good: [c16fa4f2ad19908a47c63d8fa436a1178438c7e7] Linux 3.3 git bisect good c16fa4f2ad19908a47c63d8fa436a1178438c7e7 # bad: [0034102808e0dbbf3a2394b82b1bb40b5778de9e] Linux 3.4-rc2 git bisect bad 0034102808e0dbbf3a2394b82b1bb40b5778de9e # good: [c57ebf5ef3588d21031f12e39131d79071269845] drm/nv50/pm: wait for all fifo-connected engines to idle before reclocking git bisect good c57ebf5ef3588d21031f12e39131d79071269845 # good: [43b3cd995f304c983393b7ed6563f09781bc41d0] drm/radeon/kms: add initial DCE6 display watermark support git bisect good 43b3cd995f304c983393b7ed6563f09781bc41d0 # skip: [09fa30226130652af75152d9010c603c66d46f6e] Merge branch 'drm-radeon-sitn-support' of git://people.freedesktop.org/~airlied/linux git bisect skip 09fa30226130652af75152d9010c603c66d46f6e # good: [1b2681ba271c9f5bb66cb0d8ceeaa215fcd218d8] drm/radeon/kms: update duallink checks for DCE6 git bisect good 1b2681ba271c9f5bb66cb0d8ceeaa215fcd218d8 # skip: [59365671464539dc695bbf4d4bf37aabfd8604f2] drm/nouveau/i2c: fix thinko/regression on really old chipsets git bisect skip 59365671464539dc695bbf4d4bf37aabfd8604f2 # good: [1c9c20f60230bd5a6195d41f9dd2dfa60874b1da] drm: remove the second argument of k[un]map_atomic() git bisect good 1c9c20f60230bd5a6195d41f9dd2dfa60874b1da # bad: [83b7f9ac9126f0532ca34c14e4f0582c565c6b0d] drm/i915: allow to select rc6 modes via kernel parameter git bisect bad 83b7f9ac9126f0532ca34c14e4f0582c565c6b0d # good: [1898f4426b3863216a9041389b34a3b995883027] Merge branch 'drm-nouveau-next' of git://git.freedesktop.org/git/nouveau/linux-2.6 into drm-next git bisect good 1898f4426b3863216a9041389b34a3b995883027 # skip: [a1978f74da69565a2e472394c7dcb2cfb31b3e45] gma500: medfield: fix build without CONFIG_BACKLIGHT_CLASS_DEVICE git bisect skip a1978f74da69565a2e472394c7dcb2cfb31b3e45 # good: [55a254ac63a3ac1867d1501030e7fba69c7d4aeb] drm/i915: properly restore the ppgtt page directory on resume git bisect good 55a254ac63a3ac1867d1501030e7fba69c7d4aeb # bad: [7dd4906586274f3945f2aeaaa5a33b451c3b4bba] drm/i915: Mark untiled BLT commands as fenced on gen2/3 git bisect bad 7dd4906586274f3945f2aeaaa5a33b451c3b4bba
To clarify: If you revert 7dd4906586274f3945f2aeaaa5a33b451c3b4bba on top of 3.4-rc2, the resume regression is gone (but the tiling corruption is still there), but if you use plain 3.4-rc2, resume is broken?
(In reply to comment #50) > To clarify: If you revert 7dd4906586274f3945f2aeaaa5a33b451c3b4bba on top of > 3.4-rc2, the resume regression is gone (but the tiling corruption is still > there), but if you use plain 3.4-rc2, resume is broken? I don't know. I use -next tree from today. So: 3.4.0-rc2-next-20120410 -- broken resume 3.4.0-rc2-next-20120410 minus 7dd4906 -- working resume 3.4.0-rc2-next-20120410 minus 7dd4906 plus patch from here -- working resume I haven't investigated the tiling corruption in any of the cases above.
That's even strange, because -next shouldn't contain any drm/i915 patches yet ... Can you try to reproduce this on plain 3.4-rc2? I'm digging for a baseline, -next is a way to volatile target ...
(In reply to comment #52) > That's even strange, because -next shouldn't contain any drm/i915 patches yet > ... But it contains 3.4-rc2 and more :).
(In reply to comment #51) > (In reply to comment #50) > > To clarify: If you revert 7dd4906586274f3945f2aeaaa5a33b451c3b4bba on top of > > 3.4-rc2, the resume regression is gone (but the tiling corruption is still > > there), but if you use plain 3.4-rc2, resume is broken? > > I don't know. I use -next tree from today. So: > 3.4.0-rc2-next-20120410 -- broken resume BTW from the bisection log, you can see I started with 3.4-rc2 which does not work. # bad: [0034102808e0dbbf3a2394b82b1bb40b5778de9e] Linux 3.4-rc2 > 3.4.0-rc2-next-20120410 minus 7dd4906 -- working resume > 3.4.0-rc2-next-20120410 minus 7dd4906 plus patch from here -- working resume What is worse, I have just bisected that the patch from here causes a ton of spurious interrupts. See https://lkml.org/lkml/2012/3/27/79
The patch doesn't cause the interrupts itself, they already exist in the command stream but are masked until we need to wait to avoid the GPU hangs/corruption.
(In reply to comment #54) > BTW from the bisection log, you can see I started with 3.4-rc2 which does not > work. > # bad: [0034102808e0dbbf3a2394b82b1bb40b5778de9e] Linux 3.4-rc2 Sorry, I've missed that. So reverting 7dd4906586274f3945f2aeaaa5a33b451c3b4bba on top of 3.4-rc2 (with no other patches applied) does fix resume for you again?
(In reply to comment #56) > (In reply to comment #54) > > BTW from the bisection log, you can see I started with 3.4-rc2 which does not > > work. > > # bad: [0034102808e0dbbf3a2394b82b1bb40b5778de9e] Linux 3.4-rc2 > > Sorry, I've missed that. So reverting 7dd4906586274f3945f2aeaaa5a33b451c3b4bba > on top of 3.4-rc2 (with no other patches applied) does fix resume for you > again? Yes, exactly. (Except bad tiling should appear if I did not use SNA, but UXA. It's very hard to reproduce with SNA. [I haven't tried UXA.])
Shotgun cleanup: http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=amalgam&id=f59160192f91f5719eae840816792e5372a81b61
The shotgun was accurate. Kudos to Daniel for the clean fix though, commit 15a13bbdffb0d6288a5dd04aee9736267da1335f Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Thu Apr 12 01:27:57 2012 +0200 drm/i915: clear fencing tracking state when retiring requests This fixes a resume regression introduced in commit 7dd4906586274f3945f2aeaaa5a33b451c3b4bba Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Mar 21 10:48:18 2012 +0000 drm/i915: Mark untiled BLT commands as fenced on gen2/3 which fixed fencing tracking for untiled blt commands. A side effect of that patch was that now also untiled objects have a non-zero obj->last_fenced_seqno to track when a fence can be set up after a pipelined tiling change. Unfortunately this was only cleared by the fence setup and teardown code, resulting in tons of untiled but inactive objects with non-zero last_fenced_seqno. Now after resume we completely reset the seqno tracking, both on the driver side (by setting dev_priv->next_seqno = 1) and on the hw side (by allocating a new hws page, which contains the seqnos). Hilarity and indefinite waits ensued from the stale seqnos in obj->last_fenced_seqno from before the suspend. The fix is to properly clear the fencing tracking state like we already do for the normal gpu rendering while moving objects off the active list. Reported-and-tested-by: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Jiri Slaby <jslaby@suse.cz> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.