System Environment: -------------------------- Platform: All Kernel: (drm-intel-next)82d165557ef094d4b4dfc05871aee618ec7102b0 Bug detailed description: ------------------------- With all the platforms ,run gem_linear_blits of the Intel-gpu-tools will make the system crash . I find the latest good kernel is: Kernel: (drm-intel-next)64a742fac3a22f57303d8f1b7e347350a1c48254
That's ... unexpected. Can you please bisect this one? Also check whether the issue isn't due to an update of intel-gpu-tool.
And perhaps the details of the kernel crash?
(In reply to comment #1) > That's ... unexpected. Can you please bisect this one? Also check whether the > issue isn't due to an update of intel-gpu-tool. The kernel commits above are closed,the second one is behind the first,I have try some old Intel-gpu-tools commits,they are good,so I think maybe the issue isn't due to the update of intel-gpu-tool.
> --- Comment #3 from yangguang <guang.a.yang@intel.com> 2011-10-26 02:25:19 PDT --- > (In reply to comment #1) > > That's ... unexpected. Can you please bisect this one? Also check whether the > > issue isn't due to an update of intel-gpu-tool. > > The kernel commits above are closed,the second one is behind the first,I have > try some old Intel-gpu-tools commits,they are good,so I think maybe the issue > isn't due to the update of intel-gpu-tool. Just to clarify: The kernel still crashes with an older i-g-t? Also, please attach the dmesg after the kernel crashed. Thanks, Daniel
(In reply to comment #4) > > --- Comment #3 from yangguang <guang.a.yang@intel.com> 2011-10-26 02:25:19 PDT --- > > (In reply to comment #1) > > > That's ... unexpected. Can you please bisect this one? Also check whether the > > > issue isn't due to an update of intel-gpu-tool. > > > > The kernel commits above are closed,the second one is behind the first,I have > > try some old Intel-gpu-tools commits,they are good,so I think maybe the issue > > isn't due to the update of intel-gpu-tool. > Just to clarify: The kernel still crashes with an older i-g-t? Also, > please attach the dmesg after the kernel crashed. > Thanks, Daniel Oh,sorry Daniel,I want to mean that the kernel still crashes with an older i-g-t,I can't get the dmesg because I can't ssh when kernel crashed.
Ok, I've bisected this to commit 5c0422878fcdc279ae9a8e8b66972a15b5efb67f Author: Ben Widawsky <ben@bwidawsk.net> Date: Mon Oct 17 15:51:55 2011 -0700 drm/i915: ILK + VT-d workaround And a small rant towards our qa-team: - When filing a bug against the kernel, please always attach the full dmesg. If the machine crashes, try to capture as much with netconsole or something similar. - When the bug is a regression, _always_ bisect it. Really. Without these 2 things done, I consider the bug report rather incomplete.
Created attachment 52826 [details] BUG capture on my snb with netconsole "Thread overran stack, or stack corrupted" is the important bit ... everything else kinda stops making sense with that ;-)
See id:20111028114241.GA13603@elgon.mountain Ok, so we are doing the idle-flushes. Why is that destablising the system?
Ah, recursion. remove-pte -> wait -> retire -> move-to-inactive -> unref -> unbind -> remove-pte diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index a546a71..6ce1396 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2106,7 +2106,7 @@ i915_wait_request(struct intel_ring_buffer *ring, * buffer to have made it to the inactive list, and we would need * a separate wait queue to handle that. */ - if (ret == 0) + if (ret == 0 && dev_priv->mm.interruptible) i915_gem_retire_requests_ring(ring); return ret;
(In reply to comment #9) > Ah, recursion. > > remove-pte -> wait -> retire -> move-to-inactive -> unref -> unbind -> > remove-pte > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index a546a71..6ce1396 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -2106,7 +2106,7 @@ i915_wait_request(struct intel_ring_buffer *ring, > * buffer to have made it to the inactive list, and we would need > * a separate wait queue to handle that. > */ > - if (ret == 0) > + if (ret == 0 && dev_priv->mm.interruptible) > i915_gem_retire_requests_ring(ring); > > return ret; Looks good to me. Feel free to r-b me when you submit this patch.
(In reply to comment #7) > Created attachment 52826 [details] > BUG capture on my snb with netconsole > "Thread overran stack, or stack corrupted" is the important bit ... everything > else kinda stops making sense with that ;-) BTW,I found that this bad commit : Kernel: (drm-intel-next)82d165557ef094d4b4dfc05871aee618ec7102b0 has contained in the 3.1 release kernel.When we run the i-g-t with 3.1 release,it will cause crash.
Can you try this patch: http://lists.freedesktop.org/archives/intel-gfx/2011-October/012984.html
> --- Comment #11 from yangguang <guang.a.yang@intel.com> 2011-10-31 18:15:04 UTC --- > (In reply to comment #7) > > Created attachment 52826 [details] > > BUG capture on my snb with netconsole > > "Thread overran stack, or stack corrupted" is the important bit ... everything > > else kinda stops making sense with that ;-) > > BTW,I found that this bad commit : > Kernel: (drm-intel-next)82d165557ef094d4b4dfc05871aee618ec7102b0 > has contained in the 3.1 release kernel.When we run the i-g-t with 3.1 > release,it will cause crash. This is not how it works. The commit you've mentioned changes a few things in the PCH modeset code. It's extremely unlikely that this will break gem_linear_blits. So it's probably a new bug somewhere else. So _please_ gather all the required details (machine details, what kind of crash, dmesg, crash output over netconsole if there's nothing in the logs, which test exactly fails, ...) and open a new bug report. Yours, Daniel
(In reply to comment #13) > > --- Comment #11 from yangguang <guang.a.yang@intel.com> 2011-10-31 18:15:04 UTC --- > > (In reply to comment #7) > > > Created attachment 52826 [details] > > > BUG capture on my snb with netconsole > > > "Thread overran stack, or stack corrupted" is the important bit ... everything > > > else kinda stops making sense with that ;-) > > > > BTW,I found that this bad commit : > > Kernel: (drm-intel-next)82d165557ef094d4b4dfc05871aee618ec7102b0 > > has contained in the 3.1 release kernel.When we run the i-g-t with 3.1 > > release,it will cause crash. > > This is not how it works. The commit you've mentioned changes a few things > in the PCH modeset code. It's extremely unlikely that this will break > gem_linear_blits. So it's probably a new bug somewhere else. > > So _please_ gather all the required details (machine details, what > kind of crash, dmesg, crash output over netconsole if there's nothing > in the logs, which test exactly fails, ...) and open a new bug report. > Hi Daniel, I think Guang emphasized the issue had appeared on the master branch. Now the Ben's patch is able to fix the issue.
(In reply to comment #12) > Can you try this patch: > http://lists.freedesktop.org/archives/intel-gfx/2011-October/012984.html Okay, it works well
Ben has already submitted a patch to fix this, so please close when it lands in Keith's tree.
Has the patch committed?
(In reply to comment #17) > Has the patch committed? Keith took Daniel's patch which doesn't work for unknown reasons. I believe nobody (except me) has ever tested my patch. Please refer to this email/thread, and ping Keith if you'd like him to try merging my patch to -next. Otherwise we have nothing. http://lists.freedesktop.org/archives/dri-devel/2011-December/017520.html
(In reply to comment #18) > (In reply to comment #17) > > Has the patch committed? > Keith took Daniel's patch which doesn't work for unknown reasons. I believe > nobody (except me) has ever tested my patch. > Please refer to this email/thread, and ping Keith if you'd like him to try > merging my patch to -next. Otherwise we have nothing. > http://lists.freedesktop.org/archives/dri-devel/2011-December/017520.html Hi Ben,yi have said at comment 15 ,we have already test your patch, it can work well. The patch test-by Guang Yang <guang.a.yang.intel.com>.
I(In reply to comment #19) > (In reply to comment #18) > > (In reply to comment #17) > > > Has the patch committed? > > Keith took Daniel's patch which doesn't work for unknown reasons. I believe > > nobody (except me) has ever tested my patch. > > Please refer to this email/thread, and ping Keith if you'd like him to try > > merging my patch to -next. Otherwise we have nothing. > > http://lists.freedesktop.org/archives/dri-devel/2011-December/017520.html > Hi Ben,yi have said at comment 15 ,we have already test your patch, it can work > well. > The patch test-by Guang Yang <guang.a.yang.intel.com>. I pinged Keith on IRC. It's up to him whether or not he takes it.
A patch referencing this bug report has been merged in Linux v3.2-rc5: commit eb1711bb94991e93669c5a1b5f84f11be2d51ea1 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Tue Dec 6 12:12:33 2011 +0100 drm/i915: fix infinite recursion on unbind due to ilk vt-d w/a
A patch referencing a commit referencing this bug report has been merged in Linux v3.2-rc6: commit ed4a51842a9d9e618d4f4c31349b15b974dba5df Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Fri Dec 16 12:58:39 2011 -0800 Revert "drm/i915: fix infinite recursion on unbind due to ilk vt-d w/a"
Ben's patch is now merged to -next.
(In reply to comment #23) > Ben's patch is now merged to -next. Is it targeted only to 3.4 kernel? If so I'd suggest removing blocking relatationship with 3.2 and 3.3 tracker.
A patch referencing this bug report has been merged in Linux v3.4-rc1: commit 8436473a4b10243fd4c3009b97b6646c2ba642f7 Author: Ben Widawsky <ben@bwidawsk.net> Date: Tue Jan 24 20:36:15 2012 -0800 drm/i915: drm/i915: Fix recursive calls to unmap
Closing old verified.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.