Created attachment 64586 [details] after doing S3 with X and glxgears dmesg info System Environment: -------------------------- Platform: IvyBridge Kernel:(drm-intel-testing)b5430f2760caadd38009e2290d070c700f Bug detailed description: ------------------------- After resuming from S3 with X and glxgears,the dmesg shows GPU hung,so I attach the dmesg. I also try S3 without X ,S4 with X and glxgears and S4 without X, they all can work well.
Is this a regression?
In the reset code it dies on BUG_ON(obj->base.write_domain & ~I915_GEM_GPU_DOMAINS); in move_to_inactive.
(In reply to comment #1) > Is this a regression? It's a regression.
What are the version of the other driver components (especially mesa is important here)?
(In reply to comment #4) > What are the version of the other driver components (especially mesa is > important here)? Here is the environment: Libdrm: (master)libdrm-2.4.37-11-gfaf26b689d4a2a6d1e851a1ea2fd657406eebfff Mesa: (master)cfdf60f236a525a0309146ce2da156bd3856c8b7 Xserver: (master)xorg-server-1.12.99.902 Xf86_video_intel: (master)2.20.1 Cairo: (master)21e3f2e9034b64131075d82a4e34868dc72f2249 Libva: (staging)f12f80371fb534e6bbf248586b3c17c298a31f4e Libva_intel_driver: (staging)82fa52510a37ab645daaa3bb7091ff5096a20d0b
Can you please check whether this issue is caused by the same patch as bug #52424, i.e. whether reverting 74792b53cfc2f235bc0e2eef39029817dc2cb726 fixes it? If not, I guess we need the bisect for this one here, too - I've tried to reproduce it (even tried to manually hang the gpu), but couldn't.
(In reply to comment #6) > Can you please check whether this issue is caused by the same patch as bug > #52424, i.e. whether reverting 74792b53cfc2f235bc0e2eef39029817dc2cb726 fixes > it? > > If not, I guess we need the bisect for this one here, too - I've tried to > reproduce it (even tried to manually hang the gpu), but couldn't. It's the different with bug #52424, I try to bisect and find that : e158c5aa1776372cd751e2c395300a3a6ff0bc9c is the first bad commit commit e158c5aa1776372cd751e2c395300a3a6ff0bc9c Author: Ben Widawsky <ben@bwidawsk.net> Date: Sun Jun 17 09:37:24 2012 -0700 drm/i915: disable contexts on old HW This got dropped as a result of the last round of comments. I didn't test it on unsupported HW (which this is likely the case). Note that this prevents hw context from blowing up on any pre-gen6 hw. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51142 [danvet: Added note and buglink.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> when I revert this commit,the issue is gone.
Ben, can you add a module option to disable hw contexts for ease of debugging?
(In reply to comment #8) > Ben, can you add a module option to disable hw contexts for ease of debugging? I'd prefer to not add a module option since that creates a slippery slope. I'll attach a patch to unconditionally disable contexts. Yanguang, can you apply this patch on top of whatever repo you are using and report the results.
Created attachment 64962 [details] [review] Unconditionally disable contexts
Created attachment 64970 [details] dmesg info after S3 (In reply to comment #10) > Created attachment 64962 [details] [review] [review] > Unconditionally disable contexts I try your patch with Kernel: (drm-intel-next-queued)ab3951eb74e7c33a2f5b7b64d72e82f1eea61571, the issue is gone, and I attach the dmesg resume from S3.
Yangguang, can you reproduce this every time? Do you run glxgears with vsync? Do you see it on multiple platforms? I am unable to hit this on my IVB The bisection point doesn't make much sense as it should have no effect on IVB.
(In reply to comment #12) > Yangguang, can you reproduce this every time? Do you run glxgears with vsync? > Do you see it on multiple platforms? I am unable to hit this on my IVB > > The bisection point doesn't make much sense as it should have no effect on IVB. Yes,I can reproduce this issue every time,we run glxgears with vsync as default, I only catch this with IVB, I'm confused with that bisect result,too.But when I revert this commit,the issue is gone.
(In reply to comment #13) > (In reply to comment #12) > > Yangguang, can you reproduce this every time? Do you run glxgears with vsync? > > Do you see it on multiple platforms? I am unable to hit this on my IVB > > > > The bisection point doesn't make much sense as it should have no effect on IVB. > Yes,I can reproduce this issue every time,we run glxgears with vsync as > default, I only catch this with IVB, I'm confused with that bisect > result,too.But when I revert this commit,the issue is gone. Is this a composited desktop? Can you get the error state? Can you try to reproduce this with a mesa that doesn't use contexts (8.0.4 or something should be fine)?
Hmm, also; do you always see these messages before the hang? And always 3 of them? [ 1049.967346] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... render ring idle
Created attachment 65260 [details] [review] Make sure we see idle message Yanguang, can you please apply this patch to make sure we don't miss idle errors. Send another dmesg after the error with this patch.
Created attachment 65326 [details] dmesg info with Ben's patch (In reply to comment #16) > Created attachment 65260 [details] [review] [review] > Make sure we see idle message > > Yanguang, can you please apply this patch to make sure we don't miss idle > errors. Send another dmesg after the error with this patch. Ben, I try your patch with the latest upstream kernel: Kernel: (drm-intel-next-queued)65bccb5c708bd9f00d24f041f4f7c45130359448 I catch call trace after S3 and glxgears and I attach the dmesg info. (In reply to comment #15) > Hmm, also; do you always see these messages before the hang? And always 3 of > them? > [ 1049.967346] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer > elapsed... render ring idle With the new dmesg of your patch, I can find 3 of the messages: [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... render ring idle (In reply to comment #14) > (In reply to comment #13) > > (In reply to comment #12) > > > Yangguang, can you reproduce this every time? Do you run glxgears with vsync? > > > Do you see it on multiple platforms? I am unable to hit this on my IVB > > > > > > The bisection point doesn't make much sense as it should have no effect on IVB. > > Yes,I can reproduce this issue every time,we run glxgears with vsync as > > default, I only catch this with IVB, I'm confused with that bisect > > result,too.But when I revert this commit,the issue is gone. > > > Is this a composited desktop? > Can you get the error state? > Can you try to reproduce this with a mesa that doesn't use contexts (8.0.4 or > something should be fine)? I only run X, without any composite manager. I can't get the error state because the GPU hang. After rebooting, the error state is empty. I try with mesa 8.0.4, the issue is gone. and I attach the dmesg.
Created attachment 65327 [details] dmesg info with mesa 8.0.4 This is the dmesg with mesa 8.0.4
Mesa 8.0.x doesn't use contexts.
(In reply to comment #19) > Mesa 8.0.x doesn't use contexts. Yes. I asked for this to verify it doesn't occur with just the default context.
There appears to be list corruption occurring with this test case. I've been unable thus far to track down how the list is getting corrupted, and have no theories about it either. It only occurs at resume. However, I've created some patches which address other potential issues. I'll update the patches with better commit messages later, but for now we can just test them. Yangguang, please try this: git://people.freedesktop.org/~bwidawsk/drm-intel bug_52429
Just pushed a fix for the list corruption. I have no more issues on my IVB with S3 now.
Created attachment 65486 [details] with ben's new branch of bug_52429 's debug info (In reply to comment #23) > Just pushed a fix for the list corruption. > > I have no more issues on my IVB with S3 now. Ben,where do you push your fix patch? I try ith the repo: git://people.freedesktop.org/~bwidawsk/drm-intel bug_52429 your latest commit: Kernel: (context_support_rev2)f1b8d863ac4b4ac7edc1107b19a7ce90b116ff96. Still can catch Call Trace. I attach the dmesg info.
How about the error state?
To elaborate a bit, the order of events seem to be: 1. resume 2. gpu hang 3. ring init fail 4. pin fail The last one may very well be my fault, but also seems to be the lowest priority.
Created attachment 65487 [details] error_state with ben's branch (In reply to comment #25) > How about the error state? Sorry for this delayed error state.
I've just pushed a test patch. In the error state, instdone1 is 0, which seems quite odd. I want to try to ignore it when detecting hangs to see what happens. This is mostly just a guess, just want to try it while our timezones overlap :-) The relevant sha is 4ea7e2c74f43f4798f5e1494b69b9720e5aa0846 It is still here: git://people.freedesktop.org/~bwidawsk/drm-intel bug_52429 Thank you.
(In reply to comment #28) > I've just pushed a test patch. In the error state, instdone1 is 0, which seems > quite odd. I want to try to ignore it when detecting hangs to see what happens. > > This is mostly just a guess, just want to try it while our timezones overlap > :-) > > The relevant sha is 4ea7e2c74f43f4798f5e1494b69b9720e5aa0846 > > It is still here: > git://people.freedesktop.org/~bwidawsk/drm-intel bug_52429 > > Thank you. Ben, this commit 4ea7e2c74f43f4798f5e1494b69b9720e5aa0846 can work well,
yanguang, I've just forced push the patch series which I would like to submit to intel-gfx. Would you please try it out and tell me how it goes? It's a bit different than what was there previously. Thanks.
(In reply to comment #30) > yanguang, I've just forced push the patch series which I would like to submit > to intel-gfx. Would you please try it out and tell me how it goes? It's a bit > different than what was there previously. > > Thanks. Ben, I see branch of bug_52429 has been updated, you mean I need to try the newest commit 9b524fe712f7d6c7c7cc83947920aefcf9fb8867?
(In reply to comment #31) > (In reply to comment #30) > > yanguang, I've just forced push the patch series which I would like to submit > > to intel-gfx. Would you please try it out and tell me how it goes? It's a bit > > different than what was there previously. > > > > Thanks. > Ben, I see branch of bug_52429 has been updated, you mean I need to try the > newest commit 9b524fe712f7d6c7c7cc83947920aefcf9fb8867? Just try the whole branch like you did before. I just wanted to point out that I did a force push, so you should do something like `git reset --hard bwidawsk/bug_52429`. There are 4 patches in there in all which I wanted tested. If it works, I'll add your tested-by and submit it to intel-gfx mailing list. THanks.
Created attachment 65527 [details] dmesg info with newest ben's branch (In reply to comment #32) > (In reply to comment #31) > > (In reply to comment #30) > > > yanguang, I've just forced push the patch series which I would like to submit > > > to intel-gfx. Would you please try it out and tell me how it goes? It's a bit > > > different than what was there previously. > > > > > > Thanks. > > Ben, I see branch of bug_52429 has been updated, you mean I need to try the > > newest commit 9b524fe712f7d6c7c7cc83947920aefcf9fb8867? > > Just try the whole branch like you did before. I just wanted to point out that > I did a force push, so you should do something like `git reset --hard > bwidawsk/bug_52429`. There are 4 patches in there in all which I wanted tested. > If it works, I'll add your tested-by and submit it to intel-gfx mailing list. > > THanks. I try with the latest commit 9b524fe712f7d6c7c7cc83947920aefcf9fb8867, it can work well and the issue is gone,I attach the dmesg.
Hi Yanguang. Daniel has taken one of the patches already for drm-intel-next-queued. Can you tell whether or not that patch alone makes the issue go away? If it doesn't I'll work on getting the other patches upstream as well.
(In reply to comment #34) > Hi Yanguang. Daniel has taken one of the patches already for > drm-intel-next-queued. Can you tell whether or not that patch alone makes the > issue go away? > > If it doesn't I'll work on getting the other patches upstream as well. Ben, I have try the newest drm-intel-next-queued,the issue still occurs.
Erhm, I've merged b6c7488df68ae3660d81b into -fixes (and nothing yet into -queued), can you please test whether -fixes works better?
(In reply to comment #36) > Erhm, I've merged b6c7488df68ae3660d81b into -fixes (and nothing yet into > -queued), can you please test whether -fixes works better? I try the newest -fixes kernel,it can work well, the issue is gone.
Ok, thanks for testing, I'll close this as fixed.
Confirmed, -fixes kernel can fix this issue.
I'm happy that it's fixed with that one patch - but I'm also a bit leery that we shouldn't throw out the other patches just yet.
A patch referencing this bug report has been merged in Linux v3.6-rc3: commit b6c7488df68ae3660d81b149b61b55b97929da83 Author: Ben Widawsky <ben@bwidawsk.net> Date: Tue Aug 14 14:35:14 2012 -0700 drm/i915/contexts: fix list corruption
Closing old verified+fixed.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.