System Environment: -------------------------- Platform: SugarBay Libdrm: (master)2.4.24-6-g3b04c73650b5e9bbcb602fdb8cea0b16ad82d0c0 Mesa: (master)dedc81e1dced8768334c300d630b4683fd8a1ba2 Xserver: (master)xorg-server-1.10.0-77-ga19771e4337d1c4600550314bbc42a1495a023ff Xf86_video_intel: (master)2.14.901-13-g5c81886c23b6e92f224d40592b077f4817b408b8 Cairo: (master)f1d313e042af89b2f5f5d09d3eb1703d0517ecd7 Kernel: (drm-intel-next) 47ae63e0c2e5fdb582d471dc906eb29be94c732f Bug detailed description: ------------------------- GPU hangs on backend xlib when running firefox-talos-gfx.trace on a SugarBay(i5-2500K,0112(rev09)).It's kernel regression.And it works fine on Piketon.Please the attached dmesg.By bisected, d7b9935a347ae954be907ea3d5eb4564ff124c53 is the first bad commit. Backtrace(sometimes): 0: X (xorg_backtrace+0x28) [0x457718] 1: X (mieqEnqueue+0x1f4) [0x457594] 2: X (xf86PostMotionEventM+0x97) [0x475217] 3: /opt/X11R7/lib/xorg/modules/input/evdev_drv.so (0x7fe92b936000+0x5531) [0x7fe92b93b531] 4: X (0x400000+0x68567) [0x468567] 5: X (0x400000+0x115753) [0x515753] 6: /lib64/libpthread.so.0 (0x37f7400000+0xf3c0) [0x37f740f3c0] 7: /lib64/libc.so.6 (ioctl+0x7) [0x37f70dc7b7] 8: /opt/X11R7/lib/libdrm.so.2 (drmIoctl+0x28) [0x7fe92cbd12a8] 9: /opt/X11R7/lib/libdrm_intel.so.1 (drm_intel_gem_bo_map_gtt+0x7e) [0x7fe92c36c92e] 10: /opt/X11R7/lib/xorg/modules/drivers/intel_drv.so (0x7fe92c571000+0x10b53) [0x7fe92c581b53] 11: /opt/X11R7/lib/xorg/modules/drivers/intel_drv.so (0x7fe92c571000+0x282a2) [0x7fe92c5992a2] 12: X (0x400000+0x155d89) [0x555d89] 13: X (0x400000+0xa912e) [0x4a912e] 14: X (0x400000+0x53975) [0x453975] 15: X (0x400000+0x54541) [0x454541] 16: X (0x400000+0x214fb) [0x4214fb] 17: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x37f701ee7d] 18: X (0x400000+0x21089) [0x421089 Reproduce steps: ---------------- 1 xinit& 2 ./cairo-perf-trace firefox-36-20090609.trace commit d7b9935a347ae954be907ea3d5eb4564ff124c53 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Thu Jan 20 13:19:55 2011 -0800 i915: Fix i915 suspend delay During system suspend, the "wait for ring buffer to empty" loop would always time out after three seconds, because the faster cached ring buffer head read would always return zero. Force the slow-and-careful PIO read on all but the first iterations of the loop to fix it. This also removes the unused (and useless) 'actual_head' variable that tried to approximate doing this, but did it incorrectly. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Dave Airlie <airlied@linux.ie> Cc: DRI mailing list <dri-devel@lists.freedesktop.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Created attachment 44430 [details] The dmesg for this bug
Ok, I've reproduced this on next and does seem to be a ringbuffer overflow. Much to my surprise.
After fiddling a little bit, it still hangs without the suspicious ringbuffer wrapping.
Also reproduced a very similar GPU hang running the trace on a HuronRiver (rev09).
Created attachment 44622 [details] [review] Flush BLT before the interrupt
I am waiting on confirmation that commit fa0fd4d6f815d05c6f87f11df2cac8a9003cab74 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Mar 19 22:26:49 2011 +0000 drm/i915: Restore missing command flush before interrupt on BLT ring We always skipped flushing the BLT ring if the request flush did not include the RENDER domain. However, this neglects that we try to flush the COMMAND domain after every batch and before the breadcrumb interrupt (to make sure the batch is indeed completed prior to the interrupt firing and so insuring CPU coherency). As a result of the missing flush, incoherency did indeed creep in, most notable when using lots of command buffers and so potentially rewritting an active command buffer (i.e. the GPU was still executing from it even though the following interrupt had already fired and the request/buffer retired). As all ring->flush routines now have the same preconditions, de-duplicate and move those checks up into i915_gem_flush_ring(). Fixes gem_linear_blit. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35284 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> in drm-intel-staging fixes this bug.
we can compile it successfully unless deleted the sentence "c:702 drivers/usb/serial/usb_wwan.c" in drm-intel-staging.Even if compiled successfully, system can't work normally on SugarBay.I'm sorry for that.Could you please look at the ERROR when compiling? drivers/usb/serial/usb_wwan.c:In function ‘play_delayed’: drivers/usb/serial/usb_wwan.c:702: error: ‘struct dev_pm_info’ has no member named ‘usage_count’ make[5]: *** [drivers/usb/serial/usb_wwan.o] Error 1 make[4]: *** [drivers/usb/serial] Error 2 make[3]: *** [drivers/usb] Error 2 make[2]: *** [drivers] Error 2 make[1]: *** [binrpm-pkg] Error 2 make: *** [binrpm-pkg] Error 2
I rebased drm-intel-staging on drm-core-next, so the compilation issue should be no more and provide a working system on which to test.
It works fine when testing in the commit 87862b8b(drm/i915: Restore missing command flush before interrupt on BLT ring) on SugarBay.
commit d2023bf8be6c39d45a1a08d0bd8efb126701634c Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Mar 19 22:26:49 2011 +0000 drm/i915: Restore missing command flush before interrupt on BLT ring We always skipped flushing the BLT ring if the request flush did not include the RENDER domain. However, this neglects that we try to flush the COMMAND domain after every batch and before the breadcrumb interrupt (to make sure the batch is indeed completed prior to the interrupt firing and so insuring CPU coherency). As a result of the missing flush, incoherency did indeed creep in, most notable when using lots of command buffers and so potentially rewritting an active command buffer (i.e. the GPU was still executing from it even though the following interrupt had already fired and the request/buffer retired). As all ring->flush routines now have the same preconditions, de-duplicate and move those checks up into i915_gem_flush_ring(). Fixes gem_linear_blit. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35284 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Tested-by: mengmeng.meng@intel.com
Verified with the commit d2023bf8be6c39d45a1a08d0bd8efb126701634c,it works fine.
Closing old verified+fixed.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.