If you can easily reproduce this error, can you please build a kernel using http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=xv-overlay which has some revised memory barriers. Can you help me to build rpm for fedora? On second thoughts, I think this should be fixed by the slight robustification in more recent hangcheck. Please try the latest kernel for your distribution (should be 3.6.7 atm) and reopen if it still occurs. I am use Fedora 18 with 3.6.7-5.fc18.i686 kernel and in dmesg output still exists message: [22826.654365] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [22826.654369] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state That is not the same bug, so you need to attach a fresh set of debug info (please remember the i915_error_state)... Please, explain how get needed debug info. Thanks. http://intellinuxgraphics.org/how_to_report_bug.html From which we need the i915_error_state, so $ sudo mount -tdebugfs debug /sys/kernel/debug $ sudo cat /sys/kernel/debug/dri/0/i915_error_state > i915_error_state Created attachment 70518 [details]
i915_error_state
Looks that corresponds to the bug commit 1c8b46fc8c865189f562c9ab163d63863759712f Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Nov 14 09:15:14 2012 +0000 drm/i915: Use LRI to update the semaphore registers The bspec was recently updated to remove the ability to update the semaphore using the MI_SEMAPHORE_BOX command, the ability to wait upon the semaphore value remained. Instead the advice is to update the register using the MI_LOAD_REGISTER_IMM command. In cursory testing, semaphores continue to function - the question is whether this fixes some of the deadlocks where the semaphore registers contained stale values? hopefully addresses. That patch is only available on drm-intel-next at the moment, which is available either at http://cgit.freedesktop.org/~danvet/drm-intel or available as drm-intel-experimental in the ubuntu kernel-ppa. Problem repeated with patched kernel. [118637.439016] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [118637.439020] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [mikhail@localhost ~]$ uname -a Linux localhost.localdomain 3.6.9-4.1.fc18.i686.PAE #1 SMP Wed Dec 5 15:16:33 UTC 2012 i686 i686 i386 GNU/Linux [mikhail@localhost ~]$ sudo cat /sys/kernel/debug/dri/0/i915_error_state > i915_error_state [sudo] password for mikhail: [mikhail@localhost ~]$ Created attachment 71192 [details]
i915_error_state (new)
sudo cat /sys/kernel/debug/dri/0/i915_error_state > i915_error_state-8 cat: /sys/kernel/debug/dri/0/i915_error_state: Cannot allocate memory What it mean?? Created attachment 71199 [details]
i915_error_state (new)
Created attachment 71200 [details]
dmesg output (new)
Lalalalala. *** Bug 58057 has been marked as a duplicate of this bug. *** *** Bug 58212 has been marked as a duplicate of this bug. *** We can confirm the synopsis by disabling semaphores (i915.semaphore=0), but can we also test whether this is an rc6 side-effect (i915.i915_enable_rc6-0)? Also maybe time for ' git revert 4e0e90dcb8a7df1229c69e30abebb59b0b3c2a1f' Created attachment 71549 [details]
i915_error_state
Created attachment 71550 [details]
dmesg
Created attachment 71629 [details]
i915_error_state
Created attachment 71630 [details]
dmesg
Mikhail, for the time being you can set i915.semaphores=0 (or echo 0 > /sys/modules/i915/parameters/semaphores) to prevent this hang. The only interesting patch I can suggest atm is commit 31643d54a739382626c27c0f2a12b3bbc22d1a38 Author: Ben Widawsky <ben@bwidawsk.net> Date: Wed Sep 26 10:34:01 2012 -0700 drm/i915: Workaround to bump rc6 voltage to 450 BIOS should be setting the minimum voltage for rc6 to be 450mV. Old or buggy BIOSen may not be doing this, so we correct it for them. Ideally customers should update the BIOS as only it would know the optimal values for the platform, so we leave that fact as a DRM_ERROR for the user to see. in 3.8-rc1 or look for a BIOS update. *** Bug 58986 has been marked as a duplicate of this bug. *** Created attachment 72766 [details] [review] Read back semaphore mboxes after update Can you please try this patch, enable semaphores and see if the bug persists? (In reply to comment #24) > Mikhail, for the time being you can set i915.semaphores=0 (or echo 0 > > /sys/modules/i915/parameters/semaphores) to prevent this hang. What are the consequences? > The only interesting patch I can suggest atm is > > commit 31643d54a739382626c27c0f2a12b3bbc22d1a38 > Author: Ben Widawsky <ben@bwidawsk.net> > Date: Wed Sep 26 10:34:01 2012 -0700 > > drm/i915: Workaround to bump rc6 voltage to 450 > > BIOS should be setting the minimum voltage for rc6 to be 450mV. Old or > buggy BIOSen may not be doing this, so we correct it for them. Ideally > customers should update the BIOS as only it would know the optimal > values for the platform, so we leave that fact as a DRM_ERROR for the > user to see. > > in 3.8-rc1 or look for a BIOS update. I have H61M/U3S3 motherboard and you latest BIOS ver 2.20 from 8/15/2012 ftp://174.142.97.10/bios/1155/H61MU3S3(2.20)ROM.zip How to check problem persists or not? (In reply to comment #27) > (In reply to comment #24) > > Mikhail, for the time being you can set i915.semaphores=0 (or echo 0 > > > /sys/modules/i915/parameters/semaphores) to prevent this hang. > > What are the consequences? Rendering throughput is dropped by 10% with SNA, or as much as 3x with UXA. OpenGL performance is likely to be reduced by about 30%. More CPU time is spent waiting for the GPU with rc6 disabled, so increased power consumption. (In reply to comment #27) > > The only interesting patch I can suggest atm is > > > > commit 31643d54a739382626c27c0f2a12b3bbc22d1a38 > > Author: Ben Widawsky <ben@bwidawsk.net> > > Date: Wed Sep 26 10:34:01 2012 -0700 > > > > drm/i915: Workaround to bump rc6 voltage to 450 > > > > BIOS should be setting the minimum voltage for rc6 to be 450mV. Old or > > buggy BIOSen may not be doing this, so we correct it for them. Ideally > > customers should update the BIOS as only it would know the optimal > > values for the platform, so we leave that fact as a DRM_ERROR for the > > user to see. > > > > in 3.8-rc1 or look for a BIOS update. > > I have H61M/U3S3 motherboard and you latest BIOS ver 2.20 from 8/15/2012 > ftp://174.142.97.10/bios/1155/H61MU3S3(2.20)ROM.zip > How to check problem persists or not? The easiest way is to apply the patch and look for DRM_DEBUG_DRIVER messages. This is unlikely to fix the problem, but also can't hurt. We've only assumed new BIOS will fix the problem, but who knows. Especially if it's a 3rd party BIOS. *** Bug 59786 has been marked as a duplicate of this bug. *** Created attachment 73560 [details] [review] write mbox regs twice on snb Another piece of magic which might help. Please test this patch and the one from Chris ("Read back semaphore mboxes after update") separately and report back whether anything changes. Created attachment 73577 [details] [review] write mbox regs twice on snb, v2 Now actually the right patch attached, the old one didn't compile ... Which patch I need applied for fix this issue? I see that patches from comment 26 and 32 have similar logic... @@ -596,6 +606,16 @@ gen6_add_request(struct intel_ring_buffer *ring) intel_ring_emit(ring, MI_USER_INTERRUPT); intel_ring_advance(ring); + if (IS_GEN6(ring->dev)) { + ret = intel_ring_begin(ring, 6); + if (ret) + return ret; + + read_mboxes(ring, mbox1_reg, 1024); + read_mboxes(ring, mbox2_reg, 1028); + intel_ring_advance(ring); + } + return 0; } @@ -598,6 +598,19 @@ gen6_add_request(struct intel_ring_buffer *ring) intel_ring_emit(ring, MI_USER_INTERRUPT); intel_ring_advance(ring); + if (IS_GEN6(ring->dev)) { + ret = intel_ring_begin(ring, 6); + if (ret) + return ret; + + mbox1_reg = ring->signal_mbox[0]; + mbox2_reg = ring->signal_mbox[1]; + + update_mboxes(ring, mbox1_reg); + update_mboxes(ring, mbox2_reg); + intel_ring_advance(ring); + } + return 0; } > --- Comment #33 from mikhail.v.gavrilov@gmail.com ---
> Which patch I need applied for fix this issue?
We can't reproduce the bug, so those are just patches to test
different ideas. Please test them both each individually (i.e. remove
the first before testing the 2nd patch) and the report whether
anything changes (i.e. harder or easier for you to hit the issue).
Can't compile kernel with patch above: drivers/gpu/drm/i915/intel_ringbuffer.c: In function 'gen6_add_request': drivers/gpu/drm/i915/intel_ringbuffer.c:611:3: error: too few arguments to function 'update_mboxes' drivers/gpu/drm/i915/intel_ringbuffer.c:557:1: note: declared here drivers/gpu/drm/i915/intel_ringbuffer.c:612:3: error: too few arguments to function 'update_mboxes' drivers/gpu/drm/i915/intel_ringbuffer.c:557:1: note: declared here make[4]: *** [drivers/gpu/drm/i915/intel_ringbuffer.o] Error 1 make[3]: *** [drivers/gpu/drm/i915] Error 2 make[2]: *** [drivers/gpu/drm] Error 2 make[1]: *** [drivers/gpu] Error 2 make[1]: *** Waiting for unfinished jobs.... make: *** [drivers] Error 2 make: *** Waiting for unfinished jobs.... Created attachment 74087 [details]
kernel.spec
Created attachment 74561 [details]
i915_error_state
Created attachment 74566 [details]
i915_error_state (kernel 3.8 Ubuntu)
Created attachment 74779 [details]
i915_error_state (kernel 3.7 Fedora)
Created attachment 74781 [details]
i915_error_state (kernel 3.7 Fedora)
Created attachment 74850 [details]
i915_error_state (kernel 3.7 Fedora)
I'm seeing this bug, or something like it, on an older chip (G965, desktop version): Feb 19 22:05:56 muttonhead kernel: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung Feb 19 22:05:56 muttonhead kernel: [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state Feb 19 22:05:56 muttonhead kernel: [drm:kick_ring] *ERROR* Kicking stuck wait on render ring Feb 19 22:05:57 muttonhead kernel: [drm:i915_reset] *ERROR* Failed to reset chip. after which the mouse pointer sticks in one spot (with most other things working), and then when I shut down X, the console fails to appear, requiring a reboot. Not knowing that the given file path was under /sys/kernel, I failed to capture the error state, but will do so next time this happens (which is maybe every other day). This is with a 3.7 kernel (Gentoo); before 3.7, the driver was stable. I don't know what the 'generation' numbers in the driver mean, but I'm guessing that generation 6 is later, so many of the suggested fixes would not make any difference on this machine. (In reply to comment #42) > I'm seeing this bug, or something like it, on an older chip (G965, desktop > version): Good news, it is not this bug. Please make sure you have the latest stable driver (a gentoo user not using 3.8 already! ;-) and latest xf86-video-intel, then file a fresh bug report, attaching your dmesg, Xorg.0.log and i915_error_state. I subscribed to this bug because I was seeing this hang too. It happened randomly several times, without a specific cause or way to reproduce it. This was around December, and it happened maybe 4-5 times along a month. The GPU would hang with that error in dmesg, and everything continued to work, though very slowly. However, I must say that since then it didn't happen again for almost 2 months maybe. I use Arch Linux, which means I always update to the latest stable packages of everything, so it seems that for me it got solved at some point (or at least much harder to reproduce). This is an Ironlake / HD 2000 based Dell laptop. I did update the BIOS when I found this bug report, but it didn't solve the problem, the hang happened after updating it. *** Bug 61310 has been marked as a duplicate of this bug. *** Created attachment 75818 [details]
i915_error_state (kernel 3.8.1 Fedora)
Today Fedora 18 updated kernel to 3.8.1 and message "[drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung" still here. Please look at my last log. Any updates? This looks weird to me: 0x00005a58: 0x11000001: MI_LOAD_REGISTER_IMM 0x00005a5c: 0x00012044: dword 1 0x00005a60: 0x0043b625: dword 2 0x00005a64: 0x11000001: MI_LOAD_REGISTER_IMM 0x00005a68: 0x00022040: dword 1 0x00005a6c: 0x0043b625: dword 2 0x00005a70: 0x10800001: MI_STORE_DATA_INDEX 0x00005a74: 0x00000080: index 0x00005a78: 0x0043b625: dword 0x00005a7c: 0x01000000: MI_USER_INTERRUPT 0x00005a80: 0x0b160001: MI_SEMAPHORE_MBOX compare semaphore, use compare reg 2 0x00005a84: 0x0043b625: value 0x00005a88: 0x00000000: address 0x00005a8c: 0x00000000: MI_NOOP Chris? Weird? Did you just forget about that the hw does a strictly greater-than comparison? (In reply to comment #47) > Today Fedora 18 updated kernel to 3.8.1 and message > "[drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung" > still here. Please look at my last log. Any updates? We're still waiting upon you apply patches and report. *** Bug 61925 has been marked as a duplicate of this bug. *** Created attachment 76196 [details]
i915_error_state (kernel 3.8.1 Fedora) with path (write mbox regs twice on snb, v2)
I am applied patch "write mbox regs twice on snb, v2" but still have problem [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Created attachment 76208 [details]
i915_error_state (kernel 3.8.1 Fedora) with path (Read back semaphore mboxes after update)
I am also applied patch "Read back semaphore mboxes after update" but still have problem [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
(In reply to comment #52) > Created attachment 76196 [details] > i915_error_state (kernel 3.8.1 Fedora) with path (write mbox regs twice on > snb, v2) > > I am applied patch "write mbox regs twice on snb, v2" but still have problem > [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung 0x00052cc8: 0x18800100: MI_BATCH_BUFFER_START 0x00052ccc: 0x0d59b000: dword 1 0x00052cd0: 0x13000001: MI_FLUSH_DW post_sync_op='no write' 0x00052cd4: 0x000000c4: address 0x00052cd8: 0x00000000: dword 0x00052cdc: 0x00000000: MI_NOOP 0x00052ce0: 0x11000001: MI_LOAD_REGISTER_IMM 0x00052ce4: 0x00002044: dword 1 0x00052ce8: 0x0007a582: dword 2 0x00052cec: 0x11000001: MI_LOAD_REGISTER_IMM 0x00052cf0: 0x00012040: dword 1 0x00052cf4: 0x0007a582: dword 2 0x00052cf8: 0x10800001: MI_STORE_DATA_INDEX 0x00052cfc: 0x00000080: index 0x00052d00: 0x0007a582: dword 0x00052d04: 0x01000000: MI_USER_INTERRUPT That's only a single LRI per semaphore, the patch wasn't tested. I would say '3.8.1-203.fc18.i686.PAE' was the distro kernel and not your patched version. Created attachment 76215 [details] kernel.spec (In reply to comment #55) > I would say '3.8.1-203.fc18.i686.PAE' was the distro kernel and not your > patched version. It's impossible. Distro kernel is 3.8.1-201.fc18.i686.PAE. 3.8.1-202.fc18.i686.PAE and 3.8.1-203.fc18.i686.PAE is kernels patched by me. You can sure if look at my build spec file. Created attachment 76239 [details]
i915_error_state (kernel 3.8.1 Fedora) with path (Read back semaphore mboxes after update)
I am sorry. Seems I forgot add "ApplyPatch" to spec. I am rebuild kernel with "0001-drm-i915-Read-back-semaphore-mboxes-after-updating-t.patch" patch, but seems problem still here.
Does it make sense to check the "0001-write-mbox-regs-twice-on-gen6.patch" patch?
Created attachment 76243 [details]
i915_error_state (kernel 3.8.1 Fedora) with path (Read back semaphore mboxes after update)
Created attachment 76261 [details]
i915_error_state (kernel 3.8.2 Fedora) with path (write mbox regs twice on snb, v2)
"write mbox regs twice on snb, v2" patch also not solve problem.
[ 1399.270341] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 1399.270345] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 1399.277331] [drm:__gen6_gt_force_wake_get] *ERROR* Timed out waiting for forcewake old ack to clear.
Created attachment 76293 [details]
i915_error_state (kernel 3.8.2 Fedora) with path (write mbox regs twice on snb, v2)
Created attachment 76448 [details]
i915_error_state (kernel 3.8.2 Fedora) with path (write mbox regs twice on snb, v2)
Any updates?
*** Bug 62443 has been marked as a duplicate of this bug. *** As a workaround, this commit a24a11e6b4e96bca817f854e0ffcce75d3eddd13 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Mar 14 17:52:05 2013 +0200 drm/i915: Resurrect ring kicking for semaphores, selectively should improve the recovery from the hangs. OK, I've been experiencing this bug from time to time on my Arch Linux box. No apparent reason, last time it happened I was watching a Youtube video, and it also seems to happen more often when I'm running VirtualBox. However, this might just be a coincidence. I have this bug too. Gentoo 64bit 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller]) Subsystem: Samsung Electronics Co Ltd Device c0a0 Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at f5c00000 (64-bit, non-prefetchable) [size=4M] Memory at e0000000 (64-bit, prefetchable) [size=256M] I/O ports at e000 [size=64] Expansion ROM at <unassigned> [disabled] Capabilities: <access denied> Kernel driver in use: i915 Kernel 3.8.0 gentoo-sources I try patch a24a11e6b4e96bca817f854e0ffcce75d3eddd13, but nothing change. Mar 31 15:14:37 localhost kernel: [64379.291736] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung Mar 31 15:14:37 localhost kernel: [64379.291742] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state Created attachment 77475 [details] [review] [PATCH] drm/i915: Resurrect ring kicking for semaphores, selectively (In reply to comment #61) > Created attachment 76448 [details] > i915_error_state (kernel 3.8.2 Fedora) with path (write mbox regs twice on > snb, v2) > > Any updates? Mikhail, Could you please try patch: [PATCH] drm/i915: Resurrect ring kicking for semaphores, selectively Patch is also included in latest drm-intel-nightly, linux-next. So you can test it by grabbing a distro-build of one of those. (In reply to comment #67) > (In reply to comment #61) > > Created attachment 76448 [details] > > i915_error_state (kernel 3.8.2 Fedora) with path (write mbox regs twice on > > snb, v2) > > > > Any updates? > > Mikhail, > > Could you please try patch: > [PATCH] drm/i915: Resurrect ring kicking for semaphores, selectively Hm, seems better but problem still here [59120.008798] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [59120.008802] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [59120.012173] [drm:kick_ring] *ERROR* Kicking stuck semaphore on render ring Created attachment 77692 [details]
i915_error_state (kernel 3.8.5 Fedora) with path (drm/i915: Resurrect ring kicking for semaphores, selectively)
Created attachment 77693 [details]
dmesg (kernel 3.8.5 Fedora) with path (drm/i915: Resurrect ring kicking for semaphores, selectively)
\o/ It kicked the right ring. (In reply to comment #72) > \o/ It kicked the right ring. So is this normal? It's the expected 'improved' recovery behaviour for this bug. *** Bug 63542 has been marked as a duplicate of this bug. *** Chris, what is the upstream status for the ring kicker patch? Is that likely to get incorporated upstream, or do you feel it needs further polish before it's ready? Would this patch incur some risk of regressions in other areas were it be backported for inclusion in Ubuntu? (In reply to comment #76) > Chris, what is the upstream status for the ring kicker patch? Is that > likely to get incorporated upstream, or do you feel it needs further polish > before it's ready? Would this patch incur some risk of regressions in other > areas were it be backported for inclusion in Ubuntu? Merged for 3.10 as commit a24a11e6b4e96bca817f854e0ffcce75d3eddd13 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Mar 14 17:52:05 2013 +0200 drm/i915: Resurrect ring kicking for semaphores, selectively Nothing else planned for now, but I think we can just keep this bug here open in case we stumble across a new idea. And it seems to be good honey to attrack all the me,too reports ;-) (In reply to comment #65) > Kernel 3.8.0 gentoo-sources Did you report this at the Gentoo Bugzilla? When you do, please attach /debug/dri/0/i915_error_state >Did you report this at the Gentoo Bugzilla? >When you do, please attach /debug/dri/0/i915_error_state Now no report in gentoo bugzilla (so as in kernel they no have patches intel drivers). But now with it patch, I can't repeat bug 2 weeks on kernel 3.9-rc6. But I no test with blender (when I try use blender, GPU hung reapeted for 1-5 minutes). *** Bug 64094 has been marked as a duplicate of this bug. *** Created attachment 78692 [details]
i915_error_state (kernel 3.9 Fedora)
Created attachment 78693 [details]
i915_error_state (kernel 3.9 Fedora)
*** Bug 64094 has been marked as a duplicate of this bug. *** Created attachment 79704 [details]
i915_error_state - kernel 3.10-rc2, dual monitor, Dell E6430
I can reproduce this bug every time I try to quickly drag a Chrome window with a YouTube movie to a secondary monitor connected to my laptop Dell E6430. It is very annoying. Tested on latest kernel 3.10-rc2.
I can give you any additional information you want, test patches, etc. Just please try to fix this :)
(In reply to comment #84) > Created attachment 79704 [details] > i915_error_state - kernel 3.10-rc2, dual monitor, Dell E6430 > > I can reproduce this bug every time I try to quickly drag a Chrome window > with a YouTube movie to a secondary monitor connected to my laptop Dell > E6430. One more information - you need to enable "Override software rendering list" in chrome://flags Created attachment 79979 [details]
i915_error_state - 3.9.2-201.rhbz879823.fc18.x86_64 (included patch write mbox regs twice on snb, v2)
Linux bobloblaw 3.9.2-201.rhbz879823.fc18.x86_64 #1 SMP Thu May 16 13:35:12 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
[45482.757631] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[45482.757645] [drm] capturing error event; look for more information in/sys/kernel/debug/dri/0/i915_error_state
[45482.766942] [drm:kick_ring] *ERROR* Kicking stuck semaphore on render ring
[45482.770617] [drm:__gen6_gt_force_wake_get] *ERROR* Timed out waiting for forcewake old ack to clear.
I added patch (drm/i915: Resurrect ring kicking for semaphores, selectively) to Fedora 18's 3.9.2-200 x86_64 kernel.
Is there any input or assistance I can give to help move this along? Thanks! Created attachment 82747 [details] [review] New read-after-write patch New patch for testing, thanks! Created attachment 82748 [details] [review] New read-after-write patch For which version of the kernel this patch? I tried it patch on linux-3.11_rc1, but when X starting I see: 791966 Jul 21 16:17:07 localhost kernel: [ 19.320879] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 791967 Jul 21 16:17:07 localhost kernel: [ 19.320948] IP: [<ffffffff8136bfc0>] gen6_add_request+0xe7/0x178 791968 Jul 21 16:17:07 localhost kernel: [ 19.320995] PGD b0d80067 PUD b0c18067 PMD 0 791969 Jul 21 16:17:07 localhost kernel: [ 19.321031] Oops: 0000 [#1] PREEMPT SMP 791970 Jul 21 16:17:07 localhost kernel: [ 19.321064] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec brcmsmac snd_hwdep snd_p cm cordic brcmutil bcma snd_page_alloc snd_timer snd soundcore 791971 Jul 21 16:17:07 localhost kernel: [ 19.321209] CPU: 0 PID: 2696 Comm: X Not tainted 3.11.0-rc1 #1 791972 Jul 21 16:17:07 localhost kernel: [ 19.321249] Hardware name: SAMSUNG ELECTRONICS CO., LTD. SF311/SF411/SF511/SF311/SF411/SF511, BIOS 06HW.M011.20110503.SCY 05 /03/2011 791973 Jul 21 16:17:07 localhost kernel: [ 19.321322] task: ffff8800b1c07590 ti: ffff8800b0c24000 task.ti: ffff8800b0c24000 791974 Jul 21 16:17:07 localhost kernel: [ 19.321370] RIP: 0010:[<ffffffff8136bfc0>] [<ffffffff8136bfc0>] gen6_add_request+0xe7/0x178 791975 Jul 21 16:17:07 localhost kernel: [ 19.321426] RSP: 0018:ffff8800b0c25bc8 EFLAGS: 00010286 791976 Jul 21 16:17:07 localhost kernel: [ 19.321461] RAX: 0000000000000000 RBX: ffff8800b1c3d4d8 RCX: 0000000000027330 791977 Jul 21 16:17:07 localhost kernel: [ 19.321506] RDX: 0000000000000080 RSI: ffffc900045c003c RDI: ffffc900045c0038 791978 Jul 21 16:17:07 localhost kernel: [ 19.321550] RBP: ffff8800b0c25c08 R08: ffff8800b0d97f00 R09: 00000000000145c0 791979 Jul 21 16:17:07 localhost kernel: [ 19.321594] R10: 0000000000001000 R11: ffff8800b1c3c000 R12: 0000000000000000 791980 Jul 21 16:17:07 localhost kernel: [ 19.321638] R13: 0000000000002044 R14: 0000000000000000 R15: ffff8800b1c3c000 791981 Jul 21 16:17:07 localhost kernel: [ 19.321682] FS: 00007ff167ae8880(0000) GS:ffff880100200000(0000) knlGS:0000000000000000 791982 Jul 21 16:17:07 localhost kernel: [ 19.321732] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 791983 Jul 21 16:17:07 localhost kernel: [ 19.321767] CR2: 0000000000000010 CR3: 00000000b1cc9000 CR4: 00000000000407f0 791984 Jul 21 16:17:07 localhost kernel: [ 19.321810] Stack: 791985 Jul 21 16:17:07 localhost kernel: [ 19.321824] ffff8800b1c3d4d8 0000000000000000 ffff8800aff24000 0000000000000000 791986 Jul 21 16:17:07 localhost kernel: [ 19.321876] ffff8800b1c3c000 ffff8800b0d97f00 ffff8800b1f66a00 ffff8800b1c3d4d8 791987 Jul 21 16:17:07 localhost kernel: [ 19.321927] ffff8800b0c25c68 ffffffff81334b11 ffff880000000028 0000000000000000 791988 Jul 21 16:17:07 localhost kernel: [ 19.321979] Call Trace: 791989 Jul 21 16:17:07 localhost kernel: [ 19.322000] [<ffffffff81334b11>] __i915_add_request+0x6d/0x215 791990 Jul 21 16:17:07 localhost kernel: [ 19.322045] [<ffffffff8133b8d9>] i915_gem_do_execbuffer.isra.14+0xd07/0xdc5 791991 Jul 21 16:17:07 localhost kernel: [ 19.322089] [<ffffffff8133bd5e>] ? i915_gem_execbuffer2+0x5d/0x1e3 791992 Jul 21 16:17:07 localhost kernel: [ 19.322128] [<ffffffff8133be5a>] i915_gem_execbuffer2+0x159/0x1e3 791993 Jul 21 16:17:07 localhost kernel: [ 19.322170] [<ffffffff8130e167>] drm_ioctl+0x302/0x446 791994 Jul 21 16:17:07 localhost kernel: [ 19.322204] [<ffffffff8133bd01>] ? i915_gem_execbuffer+0x36a/0x36a 791995 Jul 21 16:17:07 localhost kernel: [ 19.322245] [<ffffffff8102a823>] ? __do_page_fault+0x34f/0x3f3 791996 Jul 21 16:17:07 localhost kernel: [ 19.322285] [<ffffffff810d3621>] vfs_ioctl+0x21/0x34 791997 Jul 21 16:17:07 localhost kernel: [ 19.322317] [<ffffffff810d3e7a>] do_vfs_ioctl+0x3b8/0x3fb 791998 Jul 21 16:17:07 localhost kernel: [ 19.322353] [<ffffffff810dbab9>] ? fget_light+0xa1/0xb8 791999 Jul 21 16:17:07 localhost kernel: [ 19.322387] [<ffffffff810d3efd>] SyS_ioctl+0x40/0x6b 792000 Jul 21 16:17:07 localhost kernel: [ 19.322420] [<ffffffff816450d2>] system_call_fastpath+0x16/0x1b 792001 Jul 21 16:17:07 localhost kernel: [ 19.322457] Code: e8 d4 c0 f0 ff 8b 73 2c 44 89 ef 83 c6 04 89 73 2c 48 03 73 10 e8 bf c0 f0 ff 8b 73 2c 48 8b 45 c8 83 c6 0 4 89 73 2c 48 03 73 10 <8b> 78 10 83 ef 80 e8 a3 c0 f0 ff 83 43 2c 04 49 ff c4 49 83 fc 792002 Jul 21 16:17:07 localhost kernel: [ 19.322688] RIP [<ffffffff8136bfc0>] gen6_add_request+0xe7/0x178 792003 Jul 21 16:17:07 localhost kernel: [ 19.322728] RSP <ffff8800b0c25bc8> 792004 Jul 21 16:17:07 localhost kernel: [ 19.322750] CR2: 0000000000000010 792005 Jul 21 16:17:07 localhost kernel: [ 19.330669] ---[ end trace b13215eb98a2df5f ]--- Created attachment 82768 [details] [review] New read-after-write patch Oops, my mistake, please try again. Created attachment 82773 [details] i915_error_state with new patch (In reply to comment #92) > Created attachment 82768 [details] [review] [review] > New read-after-write patch > > Oops, my mistake, please try again. Now loading, but after five minutes test: 793485 Jul 21 17:32:56 localhost kernel: [ 321.432882] hda-intel 0000:00:1b.0: Unstable LPIB (32740 >= 4096); disabling LPIB delay counting 793486 Jul 21 17:34:49 localhost kernel: [ 434.291085] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring 793487 Jul 21 17:34:49 localhost kernel: [ 434.291088] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state 793488 Jul 21 17:34:49 localhost kernel: [ 434.307124] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xbfe2000 ctx 1) at 0xbfe21dc (In reply to comment #93) > Created attachment 82773 [details] > i915_error_state with new patch > > (In reply to comment #92) > > Created attachment 82768 [details] [review] [review] [review] > > New read-after-write patch > > > > Oops, my mistake, please try again. > > Now loading, but after five minutes test: > 793485 Jul 21 17:32:56 localhost kernel: [ 321.432882] hda-intel > 0000:00:1b.0: Unstable LPIB (32740 >= 4096); disabling LPIB delay counting > 793486 Jul 21 17:34:49 localhost kernel: [ 434.291085] > [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring > 793487 Jul 21 17:34:49 localhost kernel: [ 434.291088] [drm] capturing > error event; look for more information in > /sys/kernel/debug/dri/0/i915_error_state > 793488 Jul 21 17:34:49 localhost kernel: [ 434.307124] > [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xbfe2000 > ctx 1) at 0xbfe21dc That is a blorp (mesa/i965) bug and not the semaphore deadlock. Will someone please try https://bugs.freedesktop.org/attachment.cgi?id=82768 with a working mesa! :) The patch seems to have helped -- my box survived a couple days with the patch applied. The bad news is that I've just had the semaphore hang with all the read-after-write patch applied. :| (In reply to comment #94) > (In reply to comment #93) > > Created attachment 82773 [details] > > i915_error_state with new patch > > > > (In reply to comment #92) > > > Created attachment 82768 [details] [review] [review] [review] [review] > > > New read-after-write patch > > > > > > Oops, my mistake, please try again. > > > > Now loading, but after five minutes test: > > 793485 Jul 21 17:32:56 localhost kernel: [ 321.432882] hda-intel > > 0000:00:1b.0: Unstable LPIB (32740 >= 4096); disabling LPIB delay counting > > 793486 Jul 21 17:34:49 localhost kernel: [ 434.291085] > > [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring > > 793487 Jul 21 17:34:49 localhost kernel: [ 434.291088] [drm] capturing > > error event; look for more information in > > /sys/kernel/debug/dri/0/i915_error_state > > 793488 Jul 21 17:34:49 localhost kernel: [ 434.307124] > > [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xbfe2000 > > ctx 1) at 0xbfe21dc > > That is a blorp (mesa/i965) bug and not the semaphore deadlock. Could you please provide some link to this blorp bug report? I had problem with semaphore deadlock, seems that with kernel 3.11 problem does not occur (without patch), but now I have: [22221.843000] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring [22221.843483] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x4dfb5000 ctx 1) at 0x4dfb5518 *** Bug 68913 has been marked as a duplicate of this bug. *** I have, I think, a reliable way to trigger this behavior, if that helps. It requires a non-trivial setup, though. I have gnome-shell running on dual monitors. The first is 1920x1200, the second is 1920x1080 (not sure if the resolution difference matters). If I run a full-screen game on The 1920x1200 monitor, I get freezes, and notes in the dmesg about hangcheck timers and kickrings ("stuck wait on blitter ring"). I believe OpenGL acceleration of the desktop is important, because the freezes are not triggered in fluxbox, for instance. I'm not sure if the game itself needs to be using OpenGL, or if the full-screen window is the triggering factor, or something else entirely. It is important that the game keep the monitors distinct, and only go full screen on one. I just tried it on Battle for Wesnoth, and full screen there sets the monitors to mirror, which doesn't trigger the problem. This is on an i7 4770, if that matters. I realize this is may be difficult to put together for a test setup, but I thought I'd mention it. (In reply to comment #100) > I have, I think, a reliable way to trigger this behavior, if that helps. It > requires a non-trivial setup, though. > > I have gnome-shell running on dual monitors. The first is 1920x1200, the > second is 1920x1080 (not sure if the resolution difference matters). If I > run a full-screen game on The 1920x1200 monitor, I get freezes, and notes in > the dmesg about hangcheck timers and kickrings ("stuck wait on blitter > ring"). > > I believe OpenGL acceleration of the desktop is important, because the > freezes are not triggered in fluxbox, for instance. I'm not sure if the game > itself needs to be using OpenGL, or if the full-screen window is the > triggering factor, or something else entirely. It is important that the game > keep the monitors distinct, and only go full screen on one. I just tried it > on Battle for Wesnoth, and full screen there sets the monitors to mirror, > which doesn't trigger the problem. > > This is on an i7 4770, if that matters. > > I realize this is may be difficult to put together for a test setup, but I > thought I'd mention it. I also have dual monitors and also gnome-shell, but I have on both 1920x1080px. I notice that when I am watching some videos on full screen on one monitor, this is happening more often (on non full-screen work, it's still happening) (In reply to comment #100) > This is on an i7 4770, if that matters. No, that's something completely new. Please open a new bug report and attach your dmesg, Xorg.0.log and /sys/drm/card0/error from after one of the hangs. Created attachment 87101 [details]
i915_error_state (kernel 3.11.3)
After playing hedgewars for about half an hour, the gpu started to hang. dmesg output: [ 3442.907459] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring [ 3442.907471] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state [ 3442.916792] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x5e52000 ctx 1) at 0x5e52220 [ 3466.911077] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring [ 3466.911087] [drm:i915_hangcheck_elapsed] *ERROR* stuck on blitter ring [ 3466.947069] [drm:__gen6_gt_force_wake_get] *ERROR* Timed out waiting for forcewake old ack to clear. I'm not sure my problem is related to this bug. (In reply to comment #104) > After playing hedgewars for about half an hour, the gpu started to hang. > dmesg output: > [ 3442.907459] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring > [ 3442.907471] [drm] capturing error event; look for more information in > /sys/kernel/debug/dri/0/i915_error_state > [ 3442.916792] [drm:i915_set_reset_status] *ERROR* render ring hung inside > bo (0x5e52000 ctx 1) at 0x5e52220 > [ 3466.911077] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring > [ 3466.911087] [drm:i915_hangcheck_elapsed] *ERROR* stuck on blitter ring > [ 3466.947069] [drm:__gen6_gt_force_wake_get] *ERROR* Timed out waiting for > forcewake old ack to clear. > I'm not sure my problem is related to this bug. My laptop is Thinkpad T420 with i5-2520M. The BIOS version is 1.44. (In reply to comment #104) > I'm not sure my problem is related to this bug. Most likely it isn't - gpu hang is similar to an application crashing. Please file a new bug report and don't forget to attach the error state file. That's the first thing we need to triage the bug. And of course list the versions of all the userspace driver parts (mesa, ddx, ...) since like a normal application crash most often it's not a kernel bug, but a bug in the render commands submitted by userspace to the gpu. (In reply to comment #106) > (In reply to comment #104) > > I'm not sure my problem is related to this bug. > > Most likely it isn't - gpu hang is similar to an application crashing. > Please file a new bug report and don't forget to attach the error state > file. That's the first thing we need to triage the bug. > > And of course list the versions of all the userspace driver parts (mesa, > ddx, ...) since like a normal application crash most often it's not a kernel > bug, but a bug in the render commands submitted by userspace to the gpu. Why userspace drivers can breaking render and calling error in kernel part of driver? May be can add "filter" sent commands and ignore (or other reaction, but not execute their) their? (In reply to comment #107) > (In reply to comment #106) > > (In reply to comment #104) > > > I'm not sure my problem is related to this bug. > > > > Most likely it isn't - gpu hang is similar to an application crashing. > > Please file a new bug report and don't forget to attach the error state > > file. That's the first thing we need to triage the bug. > > > > And of course list the versions of all the userspace driver parts (mesa, > > ddx, ...) since like a normal application crash most often it's not a kernel > > bug, but a bug in the render commands submitted by userspace to the gpu. > > Why userspace drivers can breaking render and calling error in kernel part > of driver? May be can add "filter" sent commands and ignore (or other > reaction, but not execute their) their? The GPU is a full Turing complete computational engine (in fact, lots of them coupled in parallel and in series), see http://xkcd.com/1266/ (In reply to comment #106) > (In reply to comment #104) > > I'm not sure my problem is related to this bug. > > Most likely it isn't - gpu hang is similar to an application crashing. > Please file a new bug report and don't forget to attach the error state > file. That's the first thing we need to triage the bug. > > And of course list the versions of all the userspace driver parts (mesa, > ddx, ...) since like a normal application crash most often it's not a kernel > bug, but a bug in the render commands submitted by userspace to the gpu. Someone has reported it here. https://bugs.freedesktop.org/show_bug.cgi?id=70151 Hello. Same problem here. [ 485.443455] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring [ 485.443467] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state [ 485.452727] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xa637000 ctx 1) at 0xa6371c8 [ 821.726799] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring [ 821.726873] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x4974000 ctx 1) at 0x49741c8 [ 1311.134514] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring [ 1311.134613] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x4a98000 ctx 1) at 0x4a98220 sys: fedora 19 64b Linux jarvis 3.11.2-201.fc19.x86_64 #1 SMP Fri Sep 27 19:20:55 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux WM: KDE with effects enabled 8G ram 300G SATA HDD ntb Lenovo ThinkPad E320 problem occurs in: - scrolling in firefox - playing video in vlc and switch to KDE terminal or another app - sometimes system hangs, cpu 100%, freeze and hard reboot needed - sometimes happens if I work with ff or in terminal only (very frustrating) - happening across many kernel versions 3.0 to newest I think lspci 00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09) 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) 00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04) 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 04) 00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 04) 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b4) 00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 2 (rev b4) 00:1c.2 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 3 (rev b4) 00:1c.5 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 6 (rev b4) 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 04) 00:1f.0 ISA bridge: Intel Corporation HM65 Express Chipset Family LPC Controller (rev 04) 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 04) 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 04) 02:00.0 Network controller: Intel Corporation Centrino Wireless-N 1000 [Condor Peak] 03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5209 PCI Express Card Reader (rev 01) 03:00.1 SD Host controller: Realtek Semiconductor Co., Ltd. RTS5209 PCI Express Card Reader (rev 01) 08:00.0 Ethernet controller: Qualcomm Atheros AR8151 v2.0 Gigabit Ethernet (rev c0) (In reply to comment #110) > Hello. Same problem here. > > [ 485.443455] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring > [ 485.443467] [drm] capturing error event; look for more information in > /sys/kernel/debug/dri/0/i915_error_state > [ 485.452727] [drm:i915_set_reset_status] *ERROR* render ring hung inside > bo (0xa637000 ctx 1) at 0xa6371c8 Unlikey that this is the same gpu hang. Please file a new bug report and attach the error state. Just a few remarks. I still see this bug with Kernel 3.8, Mesa 9.2.1 and DRI 2.99.904. Moreover, with switching from Mesa 9.1.x to Mesa 9.2.x the number of lockups highly increased (especially in games). Additionally with running the latest drivers complete system lockups are gone, but it's still a lockup for multiple seconds with following VT switching. Maybe these observations help somehow. (In reply to comment #112) > Just a few remarks. > I still see this bug with Kernel 3.8, Mesa 9.2.1 and DRI 2.99.904. > Moreover, with switching from Mesa 9.1.x to Mesa 9.2.x the number of lockups > highly increased (especially in games). On snb the blorp engine in mesa has become a bit more hang-happy, see bug #70151 Not all gpu hangs are created equal ;-) > Additionally with running the latest drivers complete system lockups are > gone, but it's still a lockup for multiple seconds with following VT > switching. You mean a gpu hang happens while when doing a vt switch? (In reply to comment #113) > On snb the blorp engine in mesa has become a bit more hang-happy, see bug > #70151 > Not all gpu hangs are created equal ;-) > Actually it was on Sandybridge. > You mean a gpu hang happens while when doing a vt switch? No I meant, if you suffer a lockup you just have to wait a few seconds and switch to another VT and back, then you can resume with your system (although sometimes fonts are broken). Created attachment 87857 [details]
i915_error_state
I also met this bug while I was watching video in mplayer. It every 1-2 hours.
[40787.765816] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[40787.765852] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
[40787.772361] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x1fb63000 ctx 1) at 0x1fb63220
Created attachment 87858 [details]
X -version output
(In reply to comment #115) > Created attachment 87857 [details] > i915_error_state > > I also met this bug while I was watching video in mplayer. It every 1-2 > hours. > > [40787.765816] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring > [40787.765852] [drm] capturing error event; look for more information in > /sys/kernel/debug/dri/0/i915_error_state > [40787.772361] [drm:i915_set_reset_status] *ERROR* render ring hung inside > bo (0x1fb63000 ctx 1) at 0x1fb63220 This looks like bug #70151, but is definitely not this bug here. Created attachment 89314 [details]
i915_error_state (kernel 3.11.6, mesa 9.2.2, xf86-video-intel 2.99.906)
GPU hangs after playing hedgewars for a few minutes. Thinkpad T420 laptop, i5-2520M.
dmesg error message:
[16901.286432] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[16901.286441] [drm:i915_hangcheck_elapsed] *ERROR* stuck on blitter ring
[16901.286444] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
[16908.287504] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[16908.287508] [drm:i915_hangcheck_elapsed] *ERROR* stuck on blitter ring
*** Bug 71890 has been marked as a duplicate of this bug. *** *** Bug 72048 has been marked as a duplicate of this bug. *** *** Bug 72829 has been marked as a duplicate of this bug. *** *** Bug 73659 has been marked as a duplicate of this bug. *** Created attachment 92710 [details]
i915_error_state
I'm also getting regular Sandybridge GPU lockups with Mesa 10.0.1 and Linux kernel 3.13.
dmesg output:
[ 918.876872] [drm] stuck on render ring
[ 918.876876] [drm] stuck on blitter ring
[ 918.876878] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 918.876879] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 918.876879] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 918.876880] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 918.876880] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 932.923240] [drm] stuck on render ring
[ 932.923242] [drm] stuck on blitter ring
Unfortunately the crash dump doesn't help - it's an empty file!
*** Bug 74180 has been marked as a duplicate of this bug. *** *** Bug 74265 has been marked as a duplicate of this bug. *** *** Bug 74452 has been marked as a duplicate of this bug. *** *** Bug 74473 has been marked as a duplicate of this bug. *** *** Bug 74867 has been marked as a duplicate of this bug. *** *** Bug 75163 has been marked as a duplicate of this bug. *** Created attachment 95090 [details] Another version of the same hang - directed here from bug 75502 *** Bug 75999 has been marked as a duplicate of this bug. *** *** Bug 76408 has been marked as a duplicate of this bug. *** *** Bug 76677 has been marked as a duplicate of this bug. *** *** Bug 76801 has been marked as a duplicate of this bug. *** For what its worth, running 3.13.7 greatly mitigates this bug, to where the dead time is barely noticeable. It happened three times in short order here and I didn't notice any of them: [ 4562.551141] [drm:ring_stuck] *ERROR* Kicking stuck semaphore on render ring [ 4582.530028] [drm:ring_stuck] *ERROR* Kicking stuck semaphore on render ring [ 4633.476199] [drm:ring_stuck] *ERROR* Kicking stuck semaphore on render ring *** Bug 77043 has been marked as a duplicate of this bug. *** *** Bug 77058 has been marked as a duplicate of this bug. *** My stuck ring faults are completely gone with i915.i915_enable_rc6=0. Fan stays on a bit more (subjectively) seems to be the only side effect. HP Pavilion dv6 (Sandybridge). Oh that's interesting. We might be able to find a register to prevent rc6 whilst waiting on a semaphore. (Hmm, too bad it isn't ivb or we could just frob forcewake directly.) (In reply to comment #139) > Oh that's interesting. We might be able to find a register to prevent rc6 > whilst waiting on a semaphore. (Hmm, too bad it isn't ivb or we could just > frob forcewake directly.) Happy to test patches. I'm updating to 3.13.9 tonight. I could add something on top if you have ideas. If you need more info than my attachment to #76801 just let me know. *** Bug 77147 has been marked as a duplicate of this bug. *** *** Bug 77974 has been marked as a duplicate of this bug. *** *** Bug 78317 has been marked as a duplicate of this bug. *** Created attachment 98589 [details]
Kernel 3.14.2-1-ARCH, xf86-video-intel 2.99.911-2, mesa 10.1.2-1
*** Bug 78785 has been marked as a duplicate of this bug. *** *** Bug 79500 has been marked as a duplicate of this bug. *** *** Bug 79640 has been marked as a duplicate of this bug. *** commit ca79d888eb63cdacf80653ae23ce8f7d9ac52c68 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jun 6 10:22:29 2014 +0100 drm/i915: Reorder semaphore deadlock check *** Bug 80055 has been marked as a duplicate of this bug. *** *** Bug 80125 has been marked as a duplicate of this bug. *** *** Bug 80168 has been marked as a duplicate of this bug. *** *** Bug 80401 has been marked as a duplicate of this bug. *** *** Bug 80592 has been marked as a duplicate of this bug. *** *** Bug 80935 has been marked as a duplicate of this bug. *** *** Bug 81064 has been marked as a duplicate of this bug. *** Can someone indicate what the current status of this is? I haven't seen it with xorg-x11-drv-intel-2.99.912-4 (built for fc20) from kojipkgs. I'm using 2.21.15 which as far as I know is the latest release. I am seeing [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle followed by a graphics freeze and the need to reboot (if I can) in Fedora 20 with the latest updates including the 3.15.4 kernel. *** Bug 81402 has been marked as a duplicate of this bug. *** same happens with 3.15.0 on Ubuntu 14.04 64 bit Jul 11 12:43:41 localhost kernel: [42049.462542] [drm] stuck on render ring Jul 11 12:43:41 localhost kernel: [42049.463330] [drm] GPU HANG: ecode 0:0x00ffffff, in chrome [2172], reason: Ring hung, action: reset Jul 11 12:43:41 localhost kernel: [42049.463334] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. Jul 11 12:43:41 localhost kernel: [42049.463335] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel Jul 11 12:43:41 localhost kernel: [42049.463336] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. Jul 11 12:43:41 localhost kernel: [42049.463337] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. Jul 11 12:43:41 localhost kernel: [42049.463338] [drm] GPU crash dump saved to /sys/class/drm/card0/error Jul 11 12:43:43 localhost kernel: [42051.464623] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off Jul 11 12:43:47 localhost kernel: [42055.468816] [drm] stuck on render ring Jul 11 12:43:47 localhost kernel: [42055.469614] [drm] GPU HANG: ecode 0:0x00ffffff, in chrome [2172], reason: Ring hung, action: reset Jul 11 12:43:49 localhost kernel: [42057.470899] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off Jul 11 12:43:53 localhost kernel: [42061.439056] [drm] stuck on render ring Jul 11 12:43:53 localhost kernel: [42061.439867] [drm] GPU HANG: ecode 0:0xfeffffff, in chrome [2172], reason: Ring hung, action: reset [872948.822279] [drm] stuck on render ring [872948.822291] [drm] stuck on blitter ring [872948.823041] [drm] GPU HANG: ecode 0:0xf4e9fffe, in Xorg [30647], reason: Ring hung, action: reset [872948.823045] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [872948.823046] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [872948.823047] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [872948.823048] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [872948.823049] [drm] GPU crash dump saved to /sys/class/drm/card0/error [872948.823168] [drm:i915_context_is_banned] *ERROR* gpu hanging too fast, banning! [872950.821912] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off Linux bobloblaw 3.15.0-1.fc20.x86_64 #1 SMP Sat Jun 14 11:22:00 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux Attaching gpu crash dump as card0-error.071714-cwawak Created attachment 102991 [details]
card0-error.071714-cwawak - gpu dump
*** Bug 81673 has been marked as a duplicate of this bug. *** *** Bug 81676 has been marked as a duplicate of this bug. *** *** Bug 81710 has been marked as a duplicate of this bug. *** *** Bug 81844 has been marked as a duplicate of this bug. *** *** Bug 81990 has been marked as a duplicate of this bug. *** *** Bug 82277 has been marked as a duplicate of this bug. *** *** Bug 82301 has been marked as a duplicate of this bug. *** *** Bug 82399 has been marked as a duplicate of this bug. *** *** Bug 82451 has been marked as a duplicate of this bug. *** *** Bug 82620 has been marked as a duplicate of this bug. *** *** Bug 82631 has been marked as a duplicate of this bug. *** *** Bug 82666 has been marked as a duplicate of this bug. *** *** Bug 82691 has been marked as a duplicate of this bug. *** *** Bug 82901 has been marked as a duplicate of this bug. *** *** Bug 83098 has been marked as a duplicate of this bug. *** *** Bug 83156 has been marked as a duplicate of this bug. *** *** Bug 83326 has been marked as a duplicate of this bug. *** *** Bug 83473 has been marked as a duplicate of this bug. *** *** Bug 83661 has been marked as a duplicate of this bug. *** Is there any ongoing development to fix this bug? I still see it with Linux <hostname> 3.13.0-35-generic #62-Ubuntu SMP Fri Aug 15 01:58:42 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux And the latest intel drivers as provided by intel linux graphics installer from https://01.org/linuxgraphics/ Many times my system freezes few minutes after starting to watch a movie with vlc. I have my screen connected through a receiver (hdmi for audio + video) with the linux system. The probability for a freeze is higher when the hdmi receiver was powered of for some time before playing the movie than when I do a reboot and hdmi is always on. I'm happy to help with crashdumps as far as I'm able to collect them. (In reply to comment #183) I recommend configuring i915.semaphores=0. I did it and it doesn't freeze anymore. *** Bug 83721 has been marked as a duplicate of this bug. *** *** Bug 83783 has been marked as a duplicate of this bug. *** Hi Chris, meanwhile my current kernel is 3.16.1-46.1.g90bc0f1 I'm wondering (after a reinstall) that the semaphore bug hasn't occured yet, which was the case before (after a fresh install). This leads me to 4 definable possible reasons: 1. the named kernel revision somehow contains a fix for it. looking at the changes I could'nt get an affirmation to that assumption. 2. cgroup_memory=disabled has a relation to it. (That's why I removed it for now). 3. the BIOS settings (which could be different now) might have something to do with it. 4. I haven't installed KVM suppport yet. I'll post again if I find a reproducible explanation. Frank 2. of course I meant cgroup_disable=memory Hi Chris, OK, nothing of the above was the reason. In my case it's simply this: /etc/X11/xorg.conf.d/20-intel.conf Section "Device" Identifier "Intel Graphics" Driver "intel" Option "TearFree" "true" EndSection I added it when the tearing scrolling through large webpages annoyed me. As soon as I added it, the problems quickly started. Selfmade problem. Frank (In reply to comment #189) > Hi Chris, > > OK, nothing of the above was the reason. In my case it's simply this: > > /etc/X11/xorg.conf.d/20-intel.conf > > Section "Device" > Identifier "Intel Graphics" > Driver "intel" > Option "TearFree" "true" > EndSection > > > I added it when the tearing scrolling through large webpages annoyed me. > As soon as I added it, the problems quickly started. > > Selfmade problem. Not really, https://bugs.freedesktop.org/show_bug.cgi?id=70764 tracks that this hang is more likely with TearFree (fundamentally the hang is still the same hardware issue, but it is interesting that TearFree has a higher chance of hitting it). If you want to experiment: http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=requests should have an interesting fix, at least for trying to prevent the TearFree leading to the semaphore hang. What information is most useful for these repeating issues, as it just happened again: Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139690] [drm] stuck on render ring Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139699] [drm] stuck on blitter ring Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.140239] [drm] GPU HANG: ecode 0:0xf4e9fffe, in Xorg [26353], reason: Ring hung, action: reset Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.140750] [drm:i915_context_is_banned] *ERROR* gpu hanging too fast, banning! Sep 16 08:32:59 arrowsmithlap1 kernel: [drm] stuck on render ring Sep 16 08:32:59 arrowsmithlap1 kernel: [drm] stuck on blitter ring Sep 16 08:32:59 arrowsmithlap1 kernel: [drm] GPU HANG: ecode 0:0xf4e9fffe, in Xorg [26353], reason: Ring hung, action: reset Sep 16 08:32:59 arrowsmithlap1 kernel: [drm:i915_context_is_banned] *ERROR* gpu hanging too fast, banning! Sep 16 08:33:01 arrowsmithlap1 kernel: [1182244.142445] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off Sep 16 08:33:01 arrowsmithlap1 kernel: [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off The only thing under my /etc/X11/xorg.conf.d/ is 00-keyboard.conf (system generated). Do you want a copy of /sys/class/drm/card0/error every time? (In reply to comment #191) > What information is most useful for these repeating issues, as it just > happened again: > > Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139690] [drm] stuck on > render ring > Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139699] [drm] stuck on > blitter ring So long as it is the same event, there is no more information we need other than testing feedback for an eventual workaround. (In reply to comment #184) > (In reply to comment #183) > > I recommend configuring i915.semaphores=0. I did it and it doesn't freeze > anymore. Meanwhile I tested both i915.semaphores=0 and i915.semaphores=1 neither of which did help in my case. But with i915.semaphores=0 my system became much more unstable and even crashed on its own after some days without stress on graphics (just ran some desktop apps like thunar or vlc for music only - no movies). With i915.semaphores=1 the system is at least stable (for some weeks) as long as I don't heavily use desktop applications. *** Bug 85194 has been marked as a duplicate of this bug. *** *** Bug 85333 has been marked as a duplicate of this bug. *** *** Bug 85609 has been marked as a duplicate of this bug. *** I am also experiencing this, on a Gentoo system running on a ThinkPad T440s. I'm not doing anything related to XBMC, simply using xrandr for multihead. The interesting thing is that DRI works fine on my laptop screen (glxgears reports 60fps, which is the refresh rate of my screen), but breaks when I move a window trying to use DRI (e.g. Chrome, glxgears) to the external monitor connected to the mini Display Port output. I see this stuff in dmesg: [ 3561.424762] [drm:ring_stuck] *ERROR* Kicking stuck wait on blitter ring [ 3561.424770] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 3561.424772] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 3561.424774] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 3561.424776] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 3561.424778] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 3566.422957] [drm:ring_stuck] *ERROR* Kicking stuck wait on blitter ring [ 3571.425143] [drm:ring_stuck] *ERROR* Kicking stuck wait on blitter ring [ 3575.423680] [drm:ring_stuck] *ERROR* Kicking stuck wait on blitter ring Seems like the same issue. I'm trying to downgrade X, mesa, et al., to try and get the system back in working order. *** Bug 79675 has been marked as a duplicate of this bug. *** *** Bug 85972 has been marked as a duplicate of this bug. *** *** Bug 86058 has been marked as a duplicate of this bug. *** For those running Ubuntu, here is a build of a kernel based on 3.17.1 with the patches Chris Willson wants you to test: - Those patches have other regressions (so be careful to only test your specific issue). https://dl.dropboxusercontent.com/u/55728161/linux-headers-3.17.1simonickle_3.17.1simonickle-10.00.Custom_amd64.deb https://dl.dropboxusercontent.com/u/55728161/linux-image-3.17.1simonickle_3.17.1simonickle-10.00.Custom_amd64.deb Those kernels are based on: https://bugs.freedesktop.org/show_bug.cgi?id=83677#c35 Beware, don't switch VTs. I've tryed the mentioned kernel on my Fedora 21 Beta and still hangs after for example Netbeans opens main window for the whole screen. *** Bug 86437 has been marked as a duplicate of this bug. *** *** Bug 86765 has been marked as a duplicate of this bug. *** *** Bug 86836 has been marked as a duplicate of this bug. *** *** Bug 86925 has been marked as a duplicate of this bug. *** *** Bug 87710 has been marked as a duplicate of this bug. *** *** Bug 87776 has been marked as a duplicate of this bug. *** *** Bug 88541 has been marked as a duplicate of this bug. *** *** Bug 88626 has been marked as a duplicate of this bug. *** *** Bug 88723 has been marked as a duplicate of this bug. *** *** Bug 88789 has been marked as a duplicate of this bug. *** *** Bug 89078 has been marked as a duplicate of this bug. *** *** Bug 89299 has been marked as a duplicate of this bug. *** *** Bug 89570 has been marked as a duplicate of this bug. *** *** Bug 89671 has been marked as a duplicate of this bug. *** *** Bug 89774 has been marked as a duplicate of this bug. *** *** Bug 89771 has been marked as a duplicate of this bug. *** *** Bug 89981 has been marked as a duplicate of this bug. *** *** Bug 90106 has been marked as a duplicate of this bug. *** *** Bug 90146 has been marked as a duplicate of this bug. *** *** Bug 90271 has been marked as a duplicate of this bug. *** *** Bug 90473 has been marked as a duplicate of this bug. *** *** Bug 90835 has been marked as a duplicate of this bug. *** Chris, you referred me to this bug as I reported Bug 90835 - [4.1-rc6] gpu hang: ecode 6:-1:0x00000000, Kicking stuck semaphore on render ring I skimmed through it and it appears that there are some patches to test? But I am not sure which ones these are. Can you or someone else enlighten me? Also I note that I still use Option "AccelMethod" "uxa" and I have martin@merkaba:~> cat /etc/modprobe.d/i915-kms.conf options i915 modeset=1 i915_enable_rc6=7 thus maximum energy saving. But according to powertop it never enters the highest sleep state anyway. I will remove the AccelMethod setting now and see whether it helps. If not, I downgrade to 4.1-rc4 for now, as issues have been at least much less frequent with it. And its really that for me 4.1-rc6 makes things much *worse*. I am typing this after a clean reboot and already got the GPU hang again. It happens about every few minutes. Are you really sure this is the same GPU hang? I didn´t have this before 4.1 kernel? (In reply to Martin Steigerwald from comment #225) > Chris, you referred me to this bug as I reported > > Bug 90835 - [4.1-rc6] gpu hang: ecode 6:-1:0x00000000, Kicking stuck > semaphore on render ring > > I skimmed through it and it appears that there are some patches to test? But > I am not sure which ones these are. Can you or someone else enlighten me? There's likely a modest improvement in 4.2. > Also I note that I still use > > Option "AccelMethod" "uxa" > > and I have > > martin@merkaba:~> cat /etc/modprobe.d/i915-kms.conf > options i915 modeset=1 i915_enable_rc6=7 Fortuitously that dangerous option doesn't do anything for your kernel. > ffffffff813a4b0e > thus maximum energy saving. But according to powertop it never enters the > highest sleep state anyway. > > I will remove the AccelMethod setting now and see whether it helps. If not, > I downgrade to 4.1-rc4 for now, as issues have been at least much less > frequent with it. Purely circumstantial. > And its really that for me 4.1-rc6 makes things much *worse*. I am typing > this after a clean reboot and already got the GPU hang again. It happens > about every few minutes. Are you really sure this is the same GPU hang? I > didn´t have this before 4.1 kernel? Yes. (In reply to Chris Wilson from comment #226) > (In reply to Martin Steigerwald from comment #225) > > Chris, you referred me to this bug as I reported > > > > Bug 90835 - [4.1-rc6] gpu hang: ecode 6:-1:0x00000000, Kicking stuck > > semaphore on render ring > > > > I skimmed through it and it appears that there are some patches to test? But > > I am not sure which ones these are. Can you or someone else enlighten me? > > There's likely a modest improvement in 4.2. Nice. > > Also I note that I still use > > > > Option "AccelMethod" "uxa" > > > > and I have > > > > martin@merkaba:~> cat /etc/modprobe.d/i915-kms.conf > > options i915 modeset=1 i915_enable_rc6=7 > > Fortuitously that dangerous option doesn't do anything for your kernel. Well I found out why, I compiled i915 into the kernel it seems, at least I don´t have an i915 module in lsmod. But also i915.i915_enable_rc6=7 on kernel command line does not seem to have any effect. I removed the option. > > ffffffff813a4b0e > > thus maximum energy saving. But according to powertop it never enters the > > highest sleep state anyway. > > > > I will remove the AccelMethod setting now and see whether it helps. If not, > > I downgrade to 4.1-rc4 for now, as issues have been at least much less > > frequent with it. > > Purely circumstantial. Since using SNA I didn´t see a GPU hang so far. Too early to say for sure, but it seems something in UXA may have triggered it more easily. *** Bug 91212 has been marked as a duplicate of this bug. *** *** Bug 91662 has been marked as a duplicate of this bug. *** *** Bug 91810 has been marked as a duplicate of this bug. *** *** Bug 91832 has been marked as a duplicate of this bug. *** (In reply to Chris Wilson from comment #192) > (In reply to comment #191) > > What information is most useful for these repeating issues, as it just > > happened again: > > > > Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139690] [drm] stuck on > > render ring > > Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139699] [drm] stuck on > > blitter ring > > So long as it is the same event, there is no more information we need other > than testing feedback for an eventual workaround. Is this the same bug? $ journalctl -p 3 -b -1 Ruj 25 02:13:01 crnigrom kernel: [drm:fw_domains_get [i915]] *ERROR* render: timed out waiting for forcewake ack request. Ruj 25 02:13:01 crnigrom kernel: [drm:__gen6_gt_wait_for_thread_c0.isra.16 [i915]] *ERROR* GT thread status wait timed out ... [ repeated messages ] ... Ruj 25 02:13:33 crnigrom kernel: [drm:fw_domains_get [i915]] *ERROR* render: timed out waiting for forcewake ack request. Ruj 25 02:13:33 crnigrom kernel: [drm:__gen6_gt_wait_for_thread_c0.isra.16 [i915]] *ERROR* GT thread status wait timed out Ruj 25 02:13:34 crnigrom kernel: [drm:stop_ring [i915]] *ERROR* render ring : timed out trying to stop ring Ruj 25 02:13:34 crnigrom kernel: [drm:init_ring_common [i915]] *ERROR* render ring initialization failed ctl 00000000 (valid? 0) head 00000000 tail 00000000 start 00000000 [expected 00000000] Ruj 25 02:13:34 crnigrom kernel: [drm:i915_reset [i915]] *ERROR* Failed hw init on reset -5 Ruj 25 02:13:34 crnigrom gnome-session[1823]: Unrecoverable failure in required component gnome-shell.desktop After which gnome crashes with "Oh No Something Is Wrong" screen $ uname -r 4.1.7-200.fc22.x86_64 Hardware i3-2100 CPU/GPU This bug is going on already for a long long time, but at least computer is not hard freezing anymore, although gnome is crashing so any gtk applications running doing something stalls. *** Bug 92118 has been marked as a duplicate of this bug. *** *** Bug 92739 has been marked as a duplicate of this bug. *** FWIW, my issue (https://bugs.freedesktop.org/show_bug.cgi?id=54226#c191), was resolved by uninstalling various components, re-installing and updating them. I have a hunch (completely unproven) that it was a transparent bit-fail issue from the SSD. By un-installing and re-installing, the files were likely installed to a different location on the drive. It wasn't configuration, as I tried erasing, and even rolling back to defaults, with the problem still persisting. As it was almost daily, prior to uninstall, and hasn't happened since the install, this is all I can attribute it to. HTH someone. Created attachment 119432 [details] attachment-28908-0.html I reported this bug from a system without an SSD. Recently, I have not seen the kernel messages appear however--currently on linux 4.2.5. On Sun, Nov 1, 2015 at 10:04 PM, <bugzilla-daemon@freedesktop.org> wrote: > *Comment # 235 <https://bugs.freedesktop.org/show_bug.cgi?id=54226#c235> > on bug 54226 <https://bugs.freedesktop.org/show_bug.cgi?id=54226> from > arrowsmith@pythian.com <arrowsmith@pythian.com> * > > FWIW, my issue (https://bugs.freedesktop.org/show_bug.cgi?id=54226#c191), was > resolved by uninstalling various components, re-installing and updating them. I > have a hunch (completely unproven) that it was a transparent bit-fail issue > from the SSD. By un-installing and re-installing, the files were likely > installed to a different location on the drive. It wasn't configuration, as I > tried erasing, and even rolling back to defaults, with the problem still > persisting. As it was almost daily, prior to uninstall, and hasn't happened > since the install, this is all I can attribute it to. > > HTH someone. > > ------------------------------ > You are receiving this mail because: > > - You are on the CC list for the bug. > > (In reply to Jeffrey E. Bedard from comment #236) > Created attachment 119432 [details] > attachment-28908-0.html > > I reported this bug from a system without an SSD. Recently, I have not > seen the kernel messages appear however--currently on linux 4.2.5. Ah, let me clarify that earlier comment: I dd'd a failing spinning drive to an SSD. There was lots of clicking. Upgraded packages as they came in, but no change. Only the uninstall and re-install cleared the repeat button. :) Created attachment 119433 [details] attachment-32271-0.html I think this bug can be marked as closed with the latest linux/mesa/xorg versions :) On Fri, Nov 6, 2015 at 1:47 AM, <bugzilla-daemon@freedesktop.org> wrote: > *Comment # 237 <https://bugs.freedesktop.org/show_bug.cgi?id=54226#c237> > on bug 54226 <https://bugs.freedesktop.org/show_bug.cgi?id=54226> from > arrowsmith@pythian.com <arrowsmith@pythian.com> * > > (In reply to Jeffrey E. Bedard from comment #236 <https://bugs.freedesktop.org/show_bug.cgi?id=54226#c236>)> Created attachment 119432 [details] <https://bugs.freedesktop.org/attachment.cgi?id=119432> [details] <https://bugs.freedesktop.org/attachment.cgi?id=119432&action=edit> > > attachment-28908-0.html > > > > I reported this bug from a system without an SSD. Recently, I have not > > seen the kernel messages appear however--currently on linux 4.2.5. > > Ah, let me clarify that earlier comment: I dd'd a failing spinning drive to an > SSD. There was lots of clicking. Upgraded packages as they came in, but no > change. Only the uninstall and re-install cleared the repeat button. :) > > ------------------------------ > You are receiving this mail because: > > - You are on the CC list for the bug. > > *** Bug 92927 has been marked as a duplicate of this bug. *** *** Bug 93057 has been marked as a duplicate of this bug. *** Created attachment 120189 [details]
error state with 4.2 kernel
*** Bug 93331 has been marked as a duplicate of this bug. *** *** Bug 93482 has been marked as a duplicate of this bug. *** *** Bug 93493 has been marked as a duplicate of this bug. *** *** Bug 89524 has been marked as a duplicate of this bug. *** *** Bug 93595 has been marked as a duplicate of this bug. *** *** Bug 93876 has been marked as a duplicate of this bug. *** *** Bug 93824 has been marked as a duplicate of this bug. *** *** Bug 94057 has been marked as a duplicate of this bug. *** Tuesday, March 1, 2016, 9:43:23 PM, you wrote: > Chris Wilson changed bug 54226 > WhatRemovedAddedCC russ.pridemore@gmail.com > > Comment # 249 on bug 54226 from Chris Wilson > *** Bug 94057 has been marked as a duplicate of this bug. *** > > You are receiving this mail because: > You are on the CC list for the bug. > Sorry to say, but: Is there a way to get off the CC-list of this slightly depressing kind of "catch-all" bug ? It unfortunately doesn't seem to have be going anywhere for the last 3 to 4 years accept for an endless stream of duplicates being appended. -- Sander (In reply to Sander Eikelenboom from comment #250) > Is there a way to get off the CC-list of this slightly depressing kind of > "catch-all" bug ? CC list is at the top right corner. Choose the address, tick "Remove selected CCs", and hit Save Changes. I've done this for you now. *** Bug 95238 has been marked as a duplicate of this bug. *** Chris, I seem to be experiencing this bug in Linux 4.7rc3 on an x220 ThinkPad with Intel HD 3000 chipset. I was getting random full system freeze, non responsive over network. The main messages before the crash were: Jun 23 19:11:18 athena kernel: [drm:fw_domains_get [i915]] *ERROR* render: timed out waiting for forcewake ack request. Jun 23 19:11:18 athena kernel: [drm:__gen6_gt_wait_for_thread_c0.isra.7 [i915]] *ERROR* GT thread status wait timed out. The original crash I haven't been able to reproduce easily but I CAN reproduce every time a full system lockup running the following intel-gpu-tools tests (I have not even close to run all the tests though) [**This may or may not be related to the original crash**] gem_sync, subtest: bsd2-hang drv_hangman, subtest: error-state-capture-bit I do not know if these tests are helpful or related (maybe some are known to fail? not sure). I have drm debugging turned on for when I ran those tests. (drm.debug=0x1e log_buf_len=1M) I can post logs of the hangs associated with the two tests/subtests and run any other tests if you desire (with kernel drm debug on), I will wait for the issue to reappear with the drm debug on before posting that log though. By the number of similar bugs you may already have the CALL TRACE and non-debug level logs. I know how to patch and am able to compile kernels to test. The bug effects me maybe once every 1 or 2 days. I use XOrg with Glamor. I have been seeing these crashes since 4.6 (maybe 4.5 or earlier not sure). I know how to apply patches and am able to compile drm-next or any patches you have to see if this issue can be isolated. Thanks, sorry for the long response. *** Bug 97304 has been marked as a duplicate of this bug. *** *** Bug 97451 has been marked as a duplicate of this bug. *** *** Bug 98294 has been marked as a duplicate of this bug. *** *** Bug 98807 has been marked as a duplicate of this bug. *** *** Bug 100245 has been marked as a duplicate of this bug. *** Adding tag into "Whiteboard" field - ReadyForDev The bug still active *Status is correct *Platform is included *Feature is included *Priority and Severity correctly set *Logs included I doesn't seem to be getting mentioned Gnome crashes on my sandybridge anymore with mainline kernels, that is currently 4.11 and I think even with 4.10 I was not getting any issues, with mainline longterm 4.4.61 and default centos 7 kernels I am definitely getting very frequent GPU crashes that brings down Gnome. So it is either fixed for good, or it become much rarer. The issue I am/was experiencing happens when Gnome is running, it does not happen when only GDM is loaded. System load seems to not have effect on the bug triggering, seems to happen any time, on idle, or when machine is loaded. (In reply to samuel.rakitnican from comment #260) > I doesn't seem to be getting mentioned Gnome crashes on my sandybridge > anymore with mainline kernels, that is currently 4.11 and I think even with > 4.10 I was not getting any issues, with mainline longterm 4.4.61 and default > centos 7 kernels I am definitely getting very frequent GPU crashes that > brings down Gnome. > > So it is either fixed for good, or it become much rarer. The issue I am/was > experiencing happens when Gnome is running, it does not happen when only GDM > is loaded. System load seems to not have effect on the bug triggering, seems > to happen any time, on idle, or when machine is loaded. Hopefully, is fixed for good. I'm closing this bug, if problem arise with latest kernel versions https://www.kernel.org/ please open a NEW bug with HW and SW information, steps to reproduce and relevant logs.Thank you. (In reply to Elizabeth from comment #261) > (In reply to samuel.rakitnican from comment #260) > > I doesn't seem to be getting mentioned Gnome crashes on my sandybridge > > anymore with mainline kernels, that is currently 4.11 and I think even with > > 4.10 I was not getting any issues, with mainline longterm 4.4.61 and default > > centos 7 kernels I am definitely getting very frequent GPU crashes that > > brings down Gnome. > > > > So it is either fixed for good, or it become much rarer. The issue I am/was > > experiencing happens when Gnome is running, it does not happen when only GDM > > is loaded. System load seems to not have effect on the bug triggering, seems > > to happen any time, on idle, or when machine is loaded. > Hopefully, is fixed for good. I'm closing this bug, if problem arise with > latest kernel versions https://www.kernel.org/ please open a NEW bug with HW > and SW information, steps to reproduce and relevant logs.Thank you. There was no fix for this HW issue. Created attachment 135173 [details]
gpu error file on 4.13.5-200.fc26.x86_64
This problem reappeared on 4.13.5-200.fc26.x86_64 last Friday.
[774249.632109] [drm] GPU HANG: ecode 6:0:0x85fffff8, in Xorg [696], reason: Hang on rcs0, action: reset
[774249.632110] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[774249.632111] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[774249.632111] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[774249.632111] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[774249.632112] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[774249.632172] drm/i915: Resetting chip after gpu hang
commit 0da715ee60774401bea00dc71fca6fd1096c734a Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Nov 20 20:55:02 2017 +0000 drm/i915: Disable semaphores on Sandybridge *** Bug 104243 has been marked as a duplicate of this bug. *** *** Bug 104304 has been marked as a duplicate of this bug. *** *** Bug 104772 has been marked as a duplicate of this bug. *** I will close this now. *** Bug 106119 has been marked as a duplicate of this bug. *** |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 66289 [details] dmesg output From time to time interface freezes, and in dmesg appear these records: [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blitter ring idle $ lspci 00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09) 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) 00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04) 00:1a.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05) 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) 00:1c.1 PCI bridge: Intel Corporation 82801 PCI Bridge (rev b5) 00:1c.2 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 3 (rev b5) 00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 4 (rev b5) 00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5) 00:1d.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05) 00:1f.0 ISA bridge: Intel Corporation H61 Express Chipset Family LPC Controller (rev 05) 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 05) 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05) 02:00.0 PCI bridge: ASMedia Technology Inc. Device 1080 (rev 01) 03:01.0 Multimedia audio controller: VIA Technologies Inc. VT1720/24 [Envy24PT/HT] PCI Multi-Channel Audio Controller (rev 01) 04:00.0 Ethernet controller: Atheros Communications AR8151 v2.0 Gigabit Ethernet (rev c0) 05:00.0 USB Controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller 06:00.0 SATA controller: ASMedia Technology Inc. Device 0612 (rev 01)