Summary: | System hang while running gem_linear_blits of Intel-gpu-tools | ||||||
---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Guang Yang <guang.a.yang> | ||||
Component: | DRM/Intel | Assignee: | Ben Widawsky <ben> | ||||
Status: | CLOSED FIXED | QA Contact: | |||||
Severity: | major | ||||||
Priority: | high | CC: | ben, chris, daniel, jbarnes, keithp, yi.sun | ||||
Version: | unspecified | Keywords: | patch | ||||
Hardware: | All | ||||||
OS: | Linux (All) | ||||||
Whiteboard: | |||||||
i915 platform: | i915 features: | ||||||
Bug Depends on: | |||||||
Bug Blocks: | 40928, 42991, 44622 | ||||||
Attachments: |
|
Description
Guang Yang
2011-10-24 19:28:31 UTC
That's ... unexpected. Can you please bisect this one? Also check whether the issue isn't due to an update of intel-gpu-tool. And perhaps the details of the kernel crash? (In reply to comment #1) > That's ... unexpected. Can you please bisect this one? Also check whether the > issue isn't due to an update of intel-gpu-tool. The kernel commits above are closed,the second one is behind the first,I have try some old Intel-gpu-tools commits,they are good,so I think maybe the issue isn't due to the update of intel-gpu-tool. > --- Comment #3 from yangguang <guang.a.yang@intel.com> 2011-10-26 02:25:19 PDT ---
> (In reply to comment #1)
> > That's ... unexpected. Can you please bisect this one? Also check whether the
> > issue isn't due to an update of intel-gpu-tool.
>
> The kernel commits above are closed,the second one is behind the first,I have
> try some old Intel-gpu-tools commits,they are good,so I think maybe the issue
> isn't due to the update of intel-gpu-tool.
Just to clarify: The kernel still crashes with an older i-g-t? Also,
please attach the dmesg after the kernel crashed.
Thanks, Daniel
(In reply to comment #4) > > --- Comment #3 from yangguang <guang.a.yang@intel.com> 2011-10-26 02:25:19 PDT --- > > (In reply to comment #1) > > > That's ... unexpected. Can you please bisect this one? Also check whether the > > > issue isn't due to an update of intel-gpu-tool. > > > > The kernel commits above are closed,the second one is behind the first,I have > > try some old Intel-gpu-tools commits,they are good,so I think maybe the issue > > isn't due to the update of intel-gpu-tool. > Just to clarify: The kernel still crashes with an older i-g-t? Also, > please attach the dmesg after the kernel crashed. > Thanks, Daniel Oh,sorry Daniel,I want to mean that the kernel still crashes with an older i-g-t,I can't get the dmesg because I can't ssh when kernel crashed. Ok, I've bisected this to commit 5c0422878fcdc279ae9a8e8b66972a15b5efb67f Author: Ben Widawsky <ben@bwidawsk.net> Date: Mon Oct 17 15:51:55 2011 -0700 drm/i915: ILK + VT-d workaround And a small rant towards our qa-team: - When filing a bug against the kernel, please always attach the full dmesg. If the machine crashes, try to capture as much with netconsole or something similar. - When the bug is a regression, _always_ bisect it. Really. Without these 2 things done, I consider the bug report rather incomplete. Created attachment 52826 [details]
BUG capture on my snb with netconsole
"Thread overran stack, or stack corrupted" is the important bit ... everything else kinda stops making sense with that ;-)
See id:20111028114241.GA13603@elgon.mountain Ok, so we are doing the idle-flushes. Why is that destablising the system? Ah, recursion. remove-pte -> wait -> retire -> move-to-inactive -> unref -> unbind -> remove-pte diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index a546a71..6ce1396 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2106,7 +2106,7 @@ i915_wait_request(struct intel_ring_buffer *ring, * buffer to have made it to the inactive list, and we would need * a separate wait queue to handle that. */ - if (ret == 0) + if (ret == 0 && dev_priv->mm.interruptible) i915_gem_retire_requests_ring(ring); return ret; (In reply to comment #9) > Ah, recursion. > > remove-pte -> wait -> retire -> move-to-inactive -> unref -> unbind -> > remove-pte > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index a546a71..6ce1396 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -2106,7 +2106,7 @@ i915_wait_request(struct intel_ring_buffer *ring, > * buffer to have made it to the inactive list, and we would need > * a separate wait queue to handle that. > */ > - if (ret == 0) > + if (ret == 0 && dev_priv->mm.interruptible) > i915_gem_retire_requests_ring(ring); > > return ret; Looks good to me. Feel free to r-b me when you submit this patch. (In reply to comment #7) > Created attachment 52826 [details] > BUG capture on my snb with netconsole > "Thread overran stack, or stack corrupted" is the important bit ... everything > else kinda stops making sense with that ;-) BTW,I found that this bad commit : Kernel: (drm-intel-next)82d165557ef094d4b4dfc05871aee618ec7102b0 has contained in the 3.1 release kernel.When we run the i-g-t with 3.1 release,it will cause crash. Can you try this patch: http://lists.freedesktop.org/archives/intel-gfx/2011-October/012984.html > --- Comment #11 from yangguang <guang.a.yang@intel.com> 2011-10-31 18:15:04 UTC --- > (In reply to comment #7) > > Created attachment 52826 [details] > > BUG capture on my snb with netconsole > > "Thread overran stack, or stack corrupted" is the important bit ... everything > > else kinda stops making sense with that ;-) > > BTW,I found that this bad commit : > Kernel: (drm-intel-next)82d165557ef094d4b4dfc05871aee618ec7102b0 > has contained in the 3.1 release kernel.When we run the i-g-t with 3.1 > release,it will cause crash. This is not how it works. The commit you've mentioned changes a few things in the PCH modeset code. It's extremely unlikely that this will break gem_linear_blits. So it's probably a new bug somewhere else. So _please_ gather all the required details (machine details, what kind of crash, dmesg, crash output over netconsole if there's nothing in the logs, which test exactly fails, ...) and open a new bug report. Yours, Daniel (In reply to comment #13) > > --- Comment #11 from yangguang <guang.a.yang@intel.com> 2011-10-31 18:15:04 UTC --- > > (In reply to comment #7) > > > Created attachment 52826 [details] > > > BUG capture on my snb with netconsole > > > "Thread overran stack, or stack corrupted" is the important bit ... everything > > > else kinda stops making sense with that ;-) > > > > BTW,I found that this bad commit : > > Kernel: (drm-intel-next)82d165557ef094d4b4dfc05871aee618ec7102b0 > > has contained in the 3.1 release kernel.When we run the i-g-t with 3.1 > > release,it will cause crash. > > This is not how it works. The commit you've mentioned changes a few things > in the PCH modeset code. It's extremely unlikely that this will break > gem_linear_blits. So it's probably a new bug somewhere else. > > So _please_ gather all the required details (machine details, what > kind of crash, dmesg, crash output over netconsole if there's nothing > in the logs, which test exactly fails, ...) and open a new bug report. > Hi Daniel, I think Guang emphasized the issue had appeared on the master branch. Now the Ben's patch is able to fix the issue. (In reply to comment #12) > Can you try this patch: > http://lists.freedesktop.org/archives/intel-gfx/2011-October/012984.html Okay, it works well Ben has already submitted a patch to fix this, so please close when it lands in Keith's tree. Has the patch committed? (In reply to comment #17) > Has the patch committed? Keith took Daniel's patch which doesn't work for unknown reasons. I believe nobody (except me) has ever tested my patch. Please refer to this email/thread, and ping Keith if you'd like him to try merging my patch to -next. Otherwise we have nothing. http://lists.freedesktop.org/archives/dri-devel/2011-December/017520.html (In reply to comment #18) > (In reply to comment #17) > > Has the patch committed? > Keith took Daniel's patch which doesn't work for unknown reasons. I believe > nobody (except me) has ever tested my patch. > Please refer to this email/thread, and ping Keith if you'd like him to try > merging my patch to -next. Otherwise we have nothing. > http://lists.freedesktop.org/archives/dri-devel/2011-December/017520.html Hi Ben,yi have said at comment 15 ,we have already test your patch, it can work well. The patch test-by Guang Yang <guang.a.yang.intel.com>. I(In reply to comment #19) > (In reply to comment #18) > > (In reply to comment #17) > > > Has the patch committed? > > Keith took Daniel's patch which doesn't work for unknown reasons. I believe > > nobody (except me) has ever tested my patch. > > Please refer to this email/thread, and ping Keith if you'd like him to try > > merging my patch to -next. Otherwise we have nothing. > > http://lists.freedesktop.org/archives/dri-devel/2011-December/017520.html > Hi Ben,yi have said at comment 15 ,we have already test your patch, it can work > well. > The patch test-by Guang Yang <guang.a.yang.intel.com>. I pinged Keith on IRC. It's up to him whether or not he takes it. A patch referencing this bug report has been merged in Linux v3.2-rc5: commit eb1711bb94991e93669c5a1b5f84f11be2d51ea1 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Tue Dec 6 12:12:33 2011 +0100 drm/i915: fix infinite recursion on unbind due to ilk vt-d w/a A patch referencing a commit referencing this bug report has been merged in Linux v3.2-rc6: commit ed4a51842a9d9e618d4f4c31349b15b974dba5df Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Fri Dec 16 12:58:39 2011 -0800 Revert "drm/i915: fix infinite recursion on unbind due to ilk vt-d w/a" Ben's patch is now merged to -next. (In reply to comment #23) > Ben's patch is now merged to -next. Is it targeted only to 3.4 kernel? If so I'd suggest removing blocking relatationship with 3.2 and 3.3 tracker. A patch referencing this bug report has been merged in Linux v3.4-rc1: commit 8436473a4b10243fd4c3009b97b6646c2ba642f7 Author: Ben Widawsky <ben@bwidawsk.net> Date: Tue Jan 24 20:36:15 2012 -0800 drm/i915: drm/i915: Fix recursive calls to unmap Closing old verified. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.