Summary: | [ILK]igt/gem_reset_stats/ban-render causes GPU HANG: ecode 0:0x169955aa and *ERROR* render ring :timed out trying to stop ring | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | lu hua <huax.lu> | ||||||
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||
Severity: | major | ||||||||
Priority: | high | CC: | christophe.prigent, intel-gfx-bugs, jinxianx.guo, yi.sun | ||||||
Version: | unspecified | ||||||||
Hardware: | All | ||||||||
OS: | Linux (All) | ||||||||
Whiteboard: | |||||||||
i915 platform: | ILK | i915 features: | GPU hang | ||||||
Attachments: |
|
Created attachment 99015 [details]
/sys/class/drm/card0/error
Following cases also have this issue: gem_reset_stats_close-pending-fork-render gem_reset_stats_close-pending-fork-reverse-render gem_reset_stats_close-pending-render gem_reset_stats_reset-count-render gem_reset_stats_reset-stats-render Is this a regression? The testcase itself is a few months old, and at least on my testing here it seemed to have worked recently ... (In reply to comment #3) > Is this a regression? The testcase itself is a few months old, and at least > on my testing here it seemed to have worked recently ... Test on commit 16b23af8d4f95c09d2bb650e85ecf8ed9e7c18d0, it works well. Ok, let's shrug this off as a fluke then. Please reopen if it shows up again. For verification please run the test a few times in a loop to make sure. Bisect shows: e9fea5747d2b3dbff47a8790c1cc4d7af80051d6 is the first bad commit commit e9fea5747d2b3dbff47a8790c1cc4d7af80051d6 Author: Naresh Kumar Kachhi <naresh.kumar.kachhi@intel.com> Date: Wed Mar 12 16:39:41 2014 +0530 drm/i915: wait for rings to become idle once disabled make sure we wait for rings to become idle once they are disabled. In case of timeout print an error message Signed-off-by: Naresh Kumar Kachhi <naresh.kumar.kachhi@intel.com> [danvet: Frob patch as suggested by Chris.] Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> First bad commit for what? The error message? (In reply to comment #7) > First bad commit for what? The error message? It's for printing error message, the error existed earlier. It looks like this error is earlier than e9fea5747d2b3dbff47a8790c1cc4d7af80051d6. I try to apply this patch on earlier commit and try to find any good commit, but patch fails. patching file drivers/gpu/drm/i915/i915_reg.h Hunk #1 FAILED at 748. Hunk #2 FAILED at 824. 2 out of 2 hunks FAILED -- saving rejects to file drivers/gpu/drm/i915/i915_reg.h.rej patching file drivers/gpu/drm/i915/intel_ringbuffer.c Hunk #1 FAILED at 444. 1 out of 1 hunk FAILED -- saving rejects to file drivers/gpu/drm/i915/intel_ringbuffer.c.rej patching file drivers/gpu/drm/i915/intel_ringbuffer.h Hunk #1 succeeded at 35 with fuzz 2 (offset 2 lines). So I am not sure it is regression. Closed after more than one year of inactivity. Feel free to reopen if needed. Thanks |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 99014 [details] dmesg System Environment: -------------------------- Platform: Ironlake Kernel: (drm-intel-nightly)2be456541ea41728002ccca2de5235f48d14326e Bug detailed description: ------------------------- It causes GPU hang on Ironlake with -queued, -fixes and -nightly kernel. Run on earlier kernel, It also has this issue. output: IGT-Version: 1.6-g351e7d3 (x86_64) (Linux: 3.15.0-rc3_drm-intel-nightly_2be456_2 0140514+ x86_64) Subtest ban-render: SUCCESS Test requirement not met in function gem_require_ring, file ioctl_wrappers.c:813: Last errno: 0, Success Test requirement: (!((((intel_get_drm_devid(fd)) == 0x0102 || (intel_get_drm_devid(fd)) == 0x0112 || (intel_get_drm_devid(fd)) == 0x0122 || (intel_get_drm_devid(fd)) == 0x0106 || (intel_get_drm_devid(fd)) == 0x0116 || (intel_get_drm_devid(fd)) == 0x0126 || (intel_get_drm_devid(fd)) == 0x010A) || (((intel_get_drm_devid(fd)) == 0x0152 || (intel_get_drm_devid(fd)) == 0x0162 || (intel_get_drm_devid(fd)) == 0x0156 || (intel_get_drm_devid(fd)) == 0x0166 || (intel_get_drm_devid(fd)) == 0x015a || (intel_get_drm_devid(fd)) == 0x016a) || (((intel_get_drm_devid(fd)) == 0x0402 || (intel_get_drm_devid(fd)) == 0x0406 || (intel_get_drm_devid(fd)) == 0x040A || (intel_get_drm_devid(fd)) == 0x040B || (intel_get_drm_devid(fd)) == 0x040E || (intel_get_drm_devid(fd)) == 0x0C02 || (intel_get_drm_devid(fd)) == 0x0C06 || (intel_get_drm_devid(fd)) == 0x0C0A || (intel_get_drm_devid(fd)) == 0x0C0B || (intel_get_drm_devid(fd)) == 0x0C0E || (intel_get_drm_devid(fd)) == 0x0A02 || (intel_get_drm_devid(fd)) == 0x0A06 || (intel_get_drm_devid(fd)) == 0x0A0A || (intel_get_drm_devid(fd)) == 0x0A0B || (intel_get_drm_devid(fd)) == 0x0A0E || (intel_get_drm_devid(fd)) == 0x0D02 || (intel_get_drm_devid(fd)) == 0x0D06 || (intel_get_drm_devid(fd)) == 0x0D0A || (intel_get_drm_devid(fd)) == 0x0D0B || (intel_get_drm_devid(fd)) == 0x0D0E) || ((intel_get_drm_devid(fd)) == 0x0412 || (intel_get_drm_devid(fd)) == 0x0416 || (intel_get_drm_devid(fd)) == 0x041A || (intel_get_drm_devid(fd)) == 0x041B || (intel_get_drm_devid(fd)) == 0x041E || (intel_get_drm_devid(fd)) == 0x0C12 || (intel_get_drm_devid(fd)) == 0x0C16 || (intel_get_drm_devid(fd)) == 0x0C1A || (intel_get_drm_devid(fd)) == 0x0C1B || (intel_get_drm_devid(fd)) == 0x0C1E || (intel_get_drm_devid(fd)) == 0x0A12 || (intel_get_drm_devid(fd)) == 0x0A16 || (intel_get_drm_devid(fd)) == 0x0A1A || (intel_get_drm_devid(fd)) == 0x0A1B || (intel_get_drm_devid(fd)) == 0x0A1E || (intel_get_drm_devid(fd)) == 0x0D12 || (intel_get_drm_devid(fd)) == 0x0D16 || (intel_get_drm_devid(fd)) == 0x0D1A || (intel_get_drm_devid(fd)) == 0x0D1B || (intel_get_drm_devid(fd)) == 0x0D1E) || ((intel_get_drm_devid(fd)) == 0x0422 || (intel_get_drm_devid(fd)) == 0x0426 || (intel_get_drm_devid(fd)) == 0x042A || (intel_get_drm_devid(fd)) == 0x042B || (intel_get_drm_devid(fd)) == 0x042E || (intel_get_drm_devid(fd)) == 0x0C22 || (intel_get_drm_devid(fd)) == 0x0C26 || (intel_get_drm_devid(fd)) == 0x0C2A || (intel_get_drm_devid(fd)) == 0x0C2B || (intel_get_drm_devid(fd)) == 0x0C2E || (intel_get_drm_devid(fd)) == 0x0A22 || (intel_get_drm_devid(fd)) == 0x0A26 || (intel_get_drm_devid(fd)) == 0x0A2A || (intel_get_drm_devid(fd)) == 0x0A2B || (intel_get_drm_devid(fd)) == 0x0A2E || (intel_get_drm_devid(fd)) == 0x0D22 || (intel_get_drm_devid(fd)) == 0x0D26 || (intel_get_drm_devid(fd)) == 0x0D2A || (intel_get_drm_devid(fd)) == 0x0D2B || (intel_get_drm_devid(fd)) == 0x0D2E)) || ((intel_get_drm_devid(fd)) == 0x0f30 || (intel_get_drm_devid(fd)) == 0x0f31 || (intel_get_drm_devid(fd)) == 0x0f32 || (intel_get_drm_devid(fd)) == 0x0f33)) || (((((intel_get_drm_devid(fd)) & 0xff00) != 0x1600) ? 0 : ((((intel_get_drm_devid(fd)) & 0x00f0) >> 4) > 3) ? 0 : (((intel_get_drm_devid(fd)) & 0x000f) == 0x2) ? 1 : (((intel_get_drm_devid(fd)) & 0x000f) == 0x6) ? 1 : (((intel_get_drm_devid(fd)) & 0x000f) == 0xb) ? 1 : (((intel_get_drm_devid(fd)) & 0x000f) == 0xa) ? 1 : (((intel_get_drm_devid(fd)) & 0x000f) == 0xd) ? 1 : (((intel_get_drm_devid(fd)) & 0x000f) == 0xe) ? 1 : 0) || ((intel_get_drm_devid(fd)) == 0x22b0 || (intel_get_drm_devid(fd)) == 0x22b1 || (intel_get_drm_devid(fd)) == 0x22b2 || (intel_get_drm_devid(fd)) == 0x22b3))))) # echo $? 0 # dmesg -r | egrep "<[1-6]>" |grep drm <5>[ 0.000000] Linux version 3.15.0-rc3_drm-intel-nightly_2be456_20140514+ (buildtopcommit@x-kernel) (gcc version 4.7.0 20120507 (Red Hat 4.7.0-5) (GCC) ) #2606 SMP Wed May 14 11:26:03 CST 2014 <6>[ 0.000000] Command line: BOOT_IMAGE=kernels//nightly_parents/2014_05_14/drm-intel-nightly/2be456541ea41728002ccca2de5235f48d14326e/bzImage_x86_64 root=/dev/sda2 drm.debug=0xe modules_path=kernels//nightly_parents/2014_05_14/drm-intel-nightly/2be456541ea41728002ccca2de5235f48d14326e/modules_x86_64/lib/modules/3.15.0-rc3_drm-intel-nightly_2be456_20140514+ <5>[ 0.000000] Kernel command line: BOOT_IMAGE=kernels//nightly_parents/2014_05_14/drm-intel-nightly/2be456541ea41728002ccca2de5235f48d14326e/bzImage_x86_64 root=/dev/sda2 drm.debug=0xe modules_path=kernels//nightly_parents/2014_05_14/drm-intel-nightly/2be456541ea41728002ccca2de5235f48d14326e/modules_x86_64/lib/modules/3.15.0-rc3_drm-intel-nightly_2be456_20140514+ <6>[ 0.668163] usb usb1: Manufacturer: Linux 3.15.0-rc3_drm-intel-nightly_2be456_20140514+ ehci_hcd <6>[ 0.680113] usb usb2: Manufacturer: Linux 3.15.0-rc3_drm-intel-nightly_2be456_20140514+ ehci_hcd <6>[ 0.682131] usb usb3: Manufacturer: Linux 3.15.0-rc3_drm-intel-nightly_2be456_20140514+ uhci_hcd <6>[ 0.683681] usb usb4: Manufacturer: Linux 3.15.0-rc3_drm-intel-nightly_2be456_20140514+ uhci_hcd <6>[ 0.685389] usb usb5: Manufacturer: Linux 3.15.0-rc3_drm-intel-nightly_2be456_20140514+ uhci_hcd <6>[ 0.686879] usb usb6: Manufacturer: Linux 3.15.0-rc3_drm-intel-nightly_2be456_20140514+ uhci_hcd <6>[ 0.688329] usb usb7: Manufacturer: Linux 3.15.0-rc3_drm-intel-nightly_2be456_20140514+ uhci_hcd <6>[ 0.689740] usb usb8: Manufacturer: Linux 3.15.0-rc3_drm-intel-nightly_2be456_20140514+ uhci_hcd <6>[ 1.496414] [drm] Initialized drm 1.1.0 20060810 <6>[ 1.500656] [drm] Memory usable by graphics device = 2048M <6>[ 1.513734] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). <6>[ 1.513834] [drm] Driver supports precise vblank timestamp query. <6>[ 1.541715] fbcon: inteldrmfb (fb0) is primary device <6>[ 1.617461] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device <6>[ 1.617557] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0 <6>[ 60.815868] [drm] stuck on render ring <6>[ 60.818433] [drm] GPU HANG: ecode 0:0x169955aa, in gem_reset_stats [3692], reason: Ring hung, action: reset <6>[ 60.818550] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. <6>[ 60.818610] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel <6>[ 60.818675] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. <6>[ 60.818729] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. <6>[ 60.818778] [drm] GPU crash dump saved to /sys/class/drm/card0/error <6>[ 60.818985] [drm] Simulated gpu hang, resetting stop_rings <3>[ 61.820788] [drm:stop_ring] *ERROR* render ring :timed out trying to stop ring <3>[ 62.822775] [drm:stop_ring] *ERROR* render ring :timed out trying to stop ring <3>[ 62.822813] [drm:init_ring_common] *ERROR* failed to set render ring head to zero ctl 0001f401 head 00000308 tail 00000390 start 00003000 <6>[ 64.815826] [drm] stuck on render ring <6>[ 64.818306] [drm] GPU HANG: ecode 0:0x169955ab, in gem_reset_stats [3692], reason: Ring hung, action: reset <6>[ 64.818490] [drm] Simulated gpu hang, resetting stop_rings <3>[ 65.819737] [drm:stop_ring] *ERROR* render ring :timed out trying to stop ring <3>[ 66.821769] [drm:stop_ring] *ERROR* render ring :timed out trying to stop ring <3>[ 66.821820] [drm:init_ring_common] *ERROR* failed to set render ring head to zero ctl 0001f401 head 00000308 tail 000005f0 start 00003000 <6>[ 68.815769] [drm] stuck on render ring <6>[ 68.818296] [drm] GPU HANG: ecode 0:0x169955ab, reason: Ring hung, action: reset <3>[ 69.819686] [drm:stop_ring] *ERROR* render ring :timed out trying to stop ring <3>[ 70.821673] [drm:stop_ring] *ERROR* render ring :timed out trying to stop ring <3>[ 70.821710] [drm:init_ring_common] *ERROR* failed to set render ring head to zero ctl 0001f401 head 00000308 tail 00000720 start 00003000 <6>[ 72.807713] [drm] stuck on render ring <6>[ 72.810300] [drm] GPU HANG: ecode 0:0x169955ab, in gem_reset_stats [3692], reason: Ring hung, action: reset <3>[ 73.811638] [drm:stop_ring] *ERROR* render ring :timed out trying to stop ring <3>[ 74.813624] [drm:stop_ring] *ERROR* render ring :timed out trying to stop ring <3>[ 74.813662] [drm:init_ring_common] *ERROR* failed to set render ring head to zero ctl 0001f401 head 00000308 tail 000007b8 start 00003000 <6>[ 76.819664] [drm] no progress on render ring <6>[ 76.822147] [drm] GPU HANG: ecode -1:0x00000000, reason: Ring hung, action: reset <3>[ 77.823586] [drm:stop_ring] *ERROR* render ring :timed out trying to stop ring <3>[ 78.825573] [drm:stop_ring] *ERROR* render ring :timed out trying to stop ring <3>[ 78.825613] [drm:init_ring_common] *ERROR* failed to set render ring head to zero ctl 0001f401 head 00000308 tail 000007b8 start 0000300 Reproduce steps: ---------------------------- 1. ./gem_reset_stats --run-subtest ban-render