Created attachment 70091 [details]
i915_err_state
Reassigning to kernel team, since it's probably issues in the memory pressure handling. I've also seen it before, and my way to reproduce within a few minutes was to rsync / (I think to the local system, not from), while doing basic web browsing not involving Mesa. Can you please attach dmesg, too? A few more questions: - Are you using a 3d compositor (gnome-shell, ...)? - Are older kernels (like 3.6) stable? Created attachment 70234 [details]
dmesg
Attached is a dmesg from a situation when the system ran out of memory linking some absurdly huge library, which resulted in a gpu hang.
I am using kwin with compositing. With 3.6, I didn't notice the problem. Also with 3.6+drm-next (which I compiled for #55112) I never noticed the problem.
It's actually the boot-message from dmesg I'm interested in, specifically the e820 map and zone layout. Can you please attach a fresh one? Created attachment 70239 [details]
dmesg, including boot stage
Here you go
Ok, yet another new theory ... please attach your kernel .config, thanks. Created attachment 70270 [details]
kernel config
Please try out the patch at https://patchwork.kernel.org/patch/1885411/ It has a decent chance to reduce gtt trashing, which might be good enough to again ducttape over the hangs. Or maybe change the pattern to be able to reproduce it much quicker. In any case, should be interesting ... Created attachment 71807 [details] [review] make the shrinker less aggressive Duct-tape solution if it is one, but imo very much worth a try. I've now finished building a 3.7.0 kernel with your latest patch and will do some stress tests today or tomorrow - thanks! Created attachment 72011 [details] tar.gz containing output of 'dmesg', Xorg.0.log, /var/gdm/:0*.log, empty i915_error_state I applied the patch suggested above and built a Fedora kernel, kernel-3.7.0-6.local.fc19.x86_64. I then booted that kernel and drove excessive I/O load on my Thinkpad X200: I ran 'digikam', qemu-kvm of a Win7 image configured to use 2 cores, cat a 'cat BIGFILES >/dev/null'. While the system didn't crash until I got all the above running, it did hang/crash. This crash, I could not recover /system/kernel/debug/dri/0/i915_error_state: I got a 'page allocation failure' when I attempted to copy it. I've been BZ'ing this on the fedora bz for a while here: https://bugzilla.redhat.com/show_bug.cgi?id=877461 That ticket has numerous more such failures/logs, included a few with non-zero i915_error_state files. I believe the patch was built in this kernel: + '[' '!' -f /home/tbl/rpmbuild/SOURCES/make-the-shrinker-less-aggressive.patch ']' Patch33333: make-the-shrinker-less-aggressive.patch + case "$patch" in + patch -p1 -F1 -s + chmod +x scripts/checkpatch.pl Here is what I see in dmesg: [ 1103.968037] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 1103.968330] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [ 1105.845804] traps: gnome-shell[1259] trap int3 ip:39c9e4f597 sp:7fff189222d0 error:0 [ 1110.016026] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 1110.070657] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 00003000 tail 00000000 start 00003000 [ 1111.608050] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 1111.609856] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! [ 1111.609859] [drm:i915_reset] *ERROR* Failed to reset chip. [ 1114.433338] gnome-shell (1259) used greatest stack depth: 1392 bytes left [ 1250.731117] gnome-shell[2381]: segfault at 230 ip 00007fcabc3fd89f sp 00007fff0d9909d0 error 4 in i965_dri.so[7fcabc3ab000+b3000] [ 1305.904089] gnome-shell[2446]: segfault at 230 ip 00007ffe0688089f sp 00007fff2883df50 error 4 in i965_dri.so[7ffe0682e000+b3000] [ 1332.132006] [sched_delayed] sched: RT throttling activated [ 1375.751786] gnome-shell[2500]: segfault at 230 ip 00007fa9663bb89f sp 00007fff5ad6a660 error 4 in i965_dri.so[7fa966369000+b3000] [ 1715.826609] cat: page allocation failure: order:9, mode:0x40d0 [ 1715.828604] Pid: 2789, comm: cat Not tainted 3.7.0-6.local.fc19.x86_64 #1 [ 1715.830463] Call Trace: [ 1715.832239] [<ffffffff81167469>] warn_alloc_failed+0xe9/0x150 [ 1715.834110] [<ffffffff8116a090>] ? page_alloc_cpu_notify+0x50/0x50 [ 1715.835995] [<ffffffff810d8b6d>] ? trace_hardirqs_on+0xd/0x10 [ 1715.837676] [<ffffffff8116bc25>] __alloc_pages_nodemask+0x8b5/0xb40 [ 1715.839345] [<ffffffff811ad460>] alloc_pages_current+0xb0/0x120 [ 1715.840971] [<ffffffff8116991e>] ? __free_pages_ok.part.54+0x9e/0xe0 [ 1715.842522] [<ffffffff8116632a>] __get_free_pages+0x2a/0x80 [ 1715.844143] [<ffffffff811b9c89>] kmalloc_order_trace+0x39/0x190 [ 1715.845784] [<ffffffff811ba07d>] __kmalloc+0x29d/0x2d0 [ 1715.847337] [<ffffffff811f8fcf>] seq_read+0x11f/0x3e0 [ 1715.848948] [<ffffffff811d320c>] vfs_read+0xac/0x180 [ 1715.850413] [<ffffffff811d3335>] sys_read+0x55/0xa0 [ 1715.851846] [<ffffffff816fbd19>] system_call_fastpath+0x16/0x1b [ 1715.853273] Mem-Info: [ 1715.854697] Node 0 DMA per-cpu: [ 1715.856193] CPU 0: hi: 0, btch: 1 usd: 0 [ 1715.857537] CPU 1: hi: 0, btch: 1 usd: 0 [ 1715.858823] Node 0 DMA32 per-cpu: [ 1715.860145] CPU 0: hi: 186, btch: 31 usd: 0 [ 1715.861417] CPU 1: hi: 186, btch: 31 usd: 0 [ 1715.862668] Node 0 Normal per-cpu: [ 1715.864038] CPU 0: hi: 186, btch: 31 usd: 32 [ 1715.865257] CPU 1: hi: 186, btch: 31 usd: 0 [ 1715.866418] active_anon:366511 inactive_anon:174404 isolated_anon:0 active_file:60034 inactive_file:192590 isolated_file:0 unevictable:30 dirty:25 writeback:0 unstable:0 free:39181 slab_reclaimable:21162 slab_unreclaimable:95728 mapped:29886 shmem:23042 pagetables:10701 bounce:0 free_cma:0 [ 1715.872852] Node 0 DMA free:15848kB min:264kB low:328kB high:396kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:40kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15648kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 1715.876063] lowmem_reserve[]: 0 2947 3892 3892 [ 1715.877257] Node 0 DMA32 free:111548kB min:50976kB low:63720kB high:76464kB active_anon:1288852kB inactive_anon:502196kB active_file:207752kB inactive_file:687716kB unevictable:32kB isolated(anon):0kB isolated(file):0kB present:3018404kB mlocked:32kB dirty:36kB writeback:0kB mapped:94252kB shmem:50888kB slab_reclaimable:39640kB slab_unreclaimable:128856kB kernel_stack:840kB pagetables:20200kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 1715.882062] lowmem_reserve[]: 0 0 945 945 [ 1715.883246] Node 0 Normal free:27972kB min:16340kB low:20424kB high:24508kB active_anon:177192kB inactive_anon:195420kB active_file:32384kB inactive_file:83896kB unevictable:88kB isolated(anon):0kB isolated(file):0kB present:967680kB mlocked:88kB dirty:64kB writeback:0kB mapped:25292kB shmem:41280kB slab_reclaimable:45008kB slab_unreclaimable:254040kB kernel_stack:1960kB pagetables:22604kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 1715.888714] lowmem_reserve[]: 0 0 0 0 [ 1715.890149] Node 0 DMA: 2*4kB 2*8kB 1*16kB 0*32kB 3*64kB 2*128kB 2*256kB 1*512kB 2*1024kB 2*2048kB 2*4096kB = 15848kB [ 1715.891696] Node 0 DMA32: 1573*4kB 1401*8kB 906*16kB 510*32kB 424*64kB 202*128kB 26*256kB 3*512kB 0*1024kB 1*2048kB 0*4096kB = 111548kB [ 1715.893244] Node 0 Normal: 1393*4kB 669*8kB 203*16kB 144*32kB 58*64kB 25*128kB 4*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 27228kB [ 1715.894843] 282546 total pagecache pages [ 1715.896341] 6307 pages in swap cache [ 1715.897947] Swap cache stats: add 54887, delete 48580, find 8649/10442 [ 1715.899457] Free swap = 6015380kB [ 1715.900944] Total swap = 6127612kB [ 1715.919139] 1032176 pages RAM [ 1715.920639] 52602 pages reserved [ 1715.922114] 714600 pages shared [ 1715.923569] 896850 pages non-shared Let me know if I can provide more or test more.... (In reply to comment #13) > Created attachment 72011 [details] > tar.gz containing output of 'dmesg', Xorg.0.log, /var/gdm/:0*.log, empty > i915_error_state > > I applied the patch suggested above and built a Fedora kernel, > kernel-3.7.0-6.local.fc19.x86_64. > > I then booted that kernel and drove excessive I/O load on my Thinkpad X200: > I ran 'digikam', qemu-kvm of a Win7 image configured to use 2 cores, cat a > 'cat BIGFILES >/dev/null'. > > While the system didn't crash until I got all the above running, it did > hang/crash. > > This crash, I could not recover /system/kernel/debug/dri/0/i915_error_state: > I got a 'page allocation failure' when I attempted to copy it. Drat. Can't verify that the hang is the same as we are hunting without the error-state, but from the scenario it should be. So (as expected) we can assume that this is contradictory evidence that the patch is a sufficient workaround. Created attachment 72036 [details]
Another tar.gz, this time with i915_error_state, dmesg, Xorg.0.log, etc.
Looks like I can easily recreate this hang by booting, starting a 'cat About-30G-files >/dev/null', and then starting 'digikam'.
This time, when it hung, I 'ctrl-alt-F2' to a terminal, logged in as root, killed off the offending 'cat' process, did a 'sync', and ran my script.
You should find the i915_error_state file in the tarball.
Let me know if you need another/more ...
Forgot to paste in the dmesg spew: [ 137.364357] DMA-API: debugging out of memory - disabling [ 242.788042] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 242.788337] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [ 248.824064] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 248.877066] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 00003000 tail 00000000 start 00003000 [ 250.384039] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 250.385463] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! [ 250.385467] [drm:i915_reset] *ERROR* Failed to reset chip. [ 263.697098] gnome-shell[2020]: segfault at 230 ip 00007fc6035c689f sp 00007fffcbeb8d30 error 4 in i965_dri.so[7fc603574000+b3000] [ 304.669091] kworker/u:0 (6) used greatest stack depth: 2176 bytes left Thanks, the same hang as before so we can be certain that that particular workaround is not sufficient. Looks like I can reproduce this pretty easily: I have a directory with my KVM guest images plus some CD ISO images. Running 'cat all those files >/dev/null' hangs/crashes my system (I had only rhythmbox, firefox + a couple of gnome-terminal windows open). This hang occurred with a newly built/booted Fedora kernel-3.7.1-1.local.fc19.x86_64 (above patch included). Here is the dmesg spew. Let me know if the i915_error_state would be helpful. [ 7438.644043] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 7438.644056] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [ 7444.672037] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 7444.723034] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 00003000 tail 00000000 start 00003000 [ 7446.228045] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 7446.228178] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! [ 7446.228181] [drm:i915_reset] *ERROR* Failed to reset chip. [ 7473.694112] gnome-shell[3157]: segfault at 230 ip 00007fc2f58c589f sp 00007fffd9fe8a60 error 4 in i965_dri.so[7fc2f5873000+b3000] Tom, the patches I have other people testing are: https://bugs.freedesktop.org/attachment.cgi?id=72022 https://bugs.freedesktop.org/attachment.cgi?id=71933 Can you try both of those (kernel + ddx)? Just to be clear, you want me to remove the previous patch before applying these, right? (In reply to comment #20) > Just to be clear, you want me to remove the previous patch before applying > these, right? Yes. The idea is to find the minimal set of patches required, and hope it is an obvious single line change... Created attachment 72157 [details]
tar.gz containing 'dmesg', i915_error_state, Xorg.0.log, gdm/0:*.log
OK. I've built:
kernel-3.7.1-1.local2.fc19.x86_64
xorg-x11-drv-intel-2.20.16-1.local.fc19.x86_64
with the above 2 patches, and rebooted:
[tbl@tlondon ~]$ uname -a
Linux tlondon.localhost.org 3.7.1-1.local2.fc19.x86_64 #1 SMP Wed Dec 26 15:21:18 PST 2012 x86_64 x86_64 x86_64 GNU/Linux
[tbl@tlondon ~]$ rpm -q xorg-x11-drv-intel
xorg-x11-drv-intel-2.20.16-1.local.fc19.x86_64
[tbl@tlondon ~]$
I started my "read lots of blocks from the disk command":
cat *.ISO *.img >/dev/null&
and ran a 'vmstat -10' in the terminal.
Within 2 minutes (of quite high disk traffic), I got what appears to be the usual hang/crash:
[ 299.800029] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 299.800036] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 311.840049] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 311.892025] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 00003000 tail 00000000 start 00003000
[ 313.396044] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 313.396164] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[ 313.396167] [drm:i915_reset] *ERROR* Failed to reset chip.
[ 351.595568] gnome-shell[1978]: segfault at 230 ip 00007fa61378989f sp 00007fff1d0a4730 error 4 in i965_dri.so[7fa613737000+b3000]
I'm pretty sure I'm applying the kernel patch properly:
+ case "$patch" in
+ patch -p1 -F1 -s
+ ApplyPatch 8139cp-re-enable-interrupts-after-tx-timeout.patch
+ local patch=8139cp-re-enable-interrupts-after-tx-timeout.patch
+ shift
+ '[' '!' -f /home/tbl/rpmbuild/SOURCES/8139cp-re-enable-interrupts-after-tx-timeout.patch ']'
Patch21233: 8139cp-re-enable-interrupts-after-tx-timeout.patch
+ case "$patch" in
+ patch -p1 -F1 -s
+ ApplyPatch only-evict-block-required-for-requested-hole.patch
+ local patch=only-evict-block-required-for-requested-hole.patch
+ shift
+ '[' '!' -f /home/tbl/rpmbuild/SOURCES/only-evict-block-required-for-requested-hole.patch ']'
Patch33334: only-evict-block-required-for-requested-hole.patch
+ case "$patch" in
+ patch -p1 -F1 -s
+ chmod +x scripts/checkpatch.pl
+ touch .scmversion
I attach a tar.gz with the usual files, including a legit looking i915_error_state.
Updated to xorg-x11-drv-intel-2.20.17-1.fc19.x86_64, reran my "disk load" test ("cat bigfiles >/dev/null"), and waited. Within about 2 minutes gdm/Xorg hard crashed, the screen was black, and the system was unresponsive to the usual keyboard entries (i.e., ctrl-alt-F2, ctrl-alt-bksp, ctrl-alt-delete). I did not get the "gdm Ooops something has gone wrong" screen. I had to hard power reset the system. On rebooting, I see this in /var/log/messages. Jan 5 10:27:59 tlondon kernel: [ 2017.404040] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung Jan 5 10:27:59 tlondon kernel: [ 2017.404047] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state Jan 5 10:28:05 tlondon kernel: [ 2023.424023] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung Jan 5 10:28:05 tlondon kernel: [ 2023.475044] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 00003000 tail 00000000 start 00003000 Jan 5 10:28:06 tlondon kernel: [ 2025.140021] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung Jan 5 10:28:06 tlondon kernel: [ 2025.140106] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! Jan 5 10:28:06 tlondon kernel: [ 2025.140108] [drm:i915_reset] *ERROR* Failed to reset chip. Jan 5 10:28:07 tlondon kernel: [ 2025.214077] ------------[ cut here ]------------ Jan 5 10:28:07 tlondon kernel: [ 2025.215017] kernel BUG at drivers/gpu/drm/i915/i915_gem.c:3476! Jan 5 10:28:07 tlondon kernel: [ 2025.215017] invalid opcode: 0000 [#1] SMP Jan 5 10:28:07 tlondon kernel: [ 2025.215017] Modules linked in: fuse(F) ip6table_filter(F) ip6_tables(F) ebtable_nat(F) ebtables(F) ipt_MASQUERADE(F) iptable_nat(F) nf_nat_ipv4(F) nf_nat(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) xt_CHECKSUM(F) iptable_mangle(F) bridge(F) stp(F) llc(F) lockd(F) sunrpc(F) snd_usb_audio(F) snd_hda_codec_conexant(F) snd_usbmidi_lib(F) arc4(F) iwldvm(F) snd_hda_intel(F) snd_hda_codec(F) uvcvideo(F) snd_hwdep(F) snd_rawmidi(F) snd_seq(F) snd_seq_device(F) mac80211(F) videobuf2_vmalloc(F) videobuf2_memops(F) videobuf2_core(F) videodev(F) snd_pcm(F) thinkpad_acpi(F) iwlwifi(F) snd_page_alloc(F) media(F) snd_timer(F) snd(F) cfg80211(F) soundcore(F) e1000e(F) btusb(F) iTCO_wdt(F) bluetooth(F) coretemp(F) iTCO_vendor_support(F) mei(F) tpm_tis(F) tpm(F) lpc_ich(F) rfkill(F) mfd_core(F) i2c_i801(F) tpm_bios(F) microcode(F) vhost_net(F) tun(F) macvtap(F) macvlan(F) kvm_intel(F) kvm(F) binfmt_misc(F) uinput(F) i915(F) i2c_algo_bit(F) drm_kms_helper(F) drm(F) i2c_core(F) wmi(F) video(F) Jan 5 10:28:07 tlondon kernel: [ 2025.215017] CPU 0 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] Pid: 660, comm: Xorg Tainted: GF 3.7.1-1.local2.fc19.x86_64 #1 LENOVO 74585FU/74585FU Jan 5 10:28:07 tlondon kernel: [ 2025.215017] RIP: 0010:[<ffffffffa009c847>] [<ffffffffa009c847>] i915_gem_object_unpin+0x47/0x50 [i915] Jan 5 10:28:07 tlondon kernel: [ 2025.215017] RSP: 0018:ffff880134be7938 EFLAGS: 00010246 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] RAX: ffff880130a78000 RBX: ffff880130da3800 RCX: 0000000000000000 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] RDX: 0000000000000002 RSI: 0000000000070008 RDI: ffff8801262db400 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] RBP: ffff880134be7938 R08: 0000000000000030 R09: 0000000000000006 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880130da0800 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] R13: ffff880130da0820 R14: 0000000000000000 R15: ffff880130da0800 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] FS: 00007fc5f1d5f940(0000) GS:ffff88013bc00000(0000) knlGS:0000000000000000 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Jan 5 10:28:07 tlondon kernel: [ 2025.215017] CR2: 00000000008054bc CR3: 0000000130822000 CR4: 00000000000007f0 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] Process Xorg (pid: 660, threadinfo ffff880134be6000, task ffff880130964560) Jan 5 10:28:07 tlondon kernel: [ 2025.215017] Stack: Jan 5 10:28:07 tlondon kernel: [ 2025.215017] ffff880134be7948 ffffffffa00adf5e ffff880134be7978 ffffffffa00b17e6 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] ffff8801338497d8 ffff880130da3800 0000000000000001 ffff880130da0c50 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] ffff880134be7c08 ffffffffa00b43d2 ffff880100000001 000000008121ac18 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] Call Trace: Jan 5 10:28:07 tlondon kernel: [ 2025.215017] [<ffffffffa00adf5e>] intel_unpin_fb_obj+0x3e/0x40 [i915] Jan 5 10:28:07 tlondon kernel: [ 2025.215017] [<ffffffffa00b17e6>] intel_crtc_disable+0x96/0x130 [i915] Jan 5 10:28:07 tlondon kernel: [ 2025.215017] [<ffffffffa00b43d2>] intel_set_mode+0x262/0xa50 [i915] Jan 5 10:28:07 tlondon kernel: [ 2025.215017] [<ffffffff8121d26c>] ? ext4_dirty_inode+0x3c/0x60 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] [<ffffffff8125b182>] ? jbd2_journal_stop+0x1b2/0x2a0 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] [<ffffffff81237dc6>] ? __ext4_journal_stop+0x76/0xa0 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] [<ffffffff8121badd>] ? ext4_da_write_end+0x9d/0x350 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] [<ffffffff812f1a31>] ? vsnprintf+0x461/0x600 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] [<ffffffff812f1c74>] ? snprintf+0x34/0x40 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] [<ffffffffa00b4d11>] ? intel_crtc_set_config+0x151/0x970 [i915] Jan 5 10:28:07 tlondon kernel: [ 2025.215017] [<ffffffffa00b52d6>] intel_crtc_set_config+0x716/0x970 [i915] Jan 5 10:28:07 tlondon kernel: [ 2025.215017] [<ffffffff81633af6>] ? __schedule+0x3c6/0x7a0 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] [<ffffffffa0037286>] drm_framebuffer_remove+0xc6/0x150 [drm] Jan 5 10:28:07 tlondon kernel: [ 2025.215017] [<ffffffffa003ac75>] drm_mode_rmfb+0xd5/0xe0 [drm] Jan 5 10:28:07 tlondon kernel: [ 2025.215017] [<ffffffffa002a4a3>] drm_ioctl+0x4d3/0x580 [drm] Jan 5 10:28:07 tlondon kernel: [ 2025.215017] [<ffffffff811d3402>] ? send_to_group+0x182/0x250 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] [<ffffffffa003aba0>] ? drm_mode_addfb2+0x6d0/0x6d0 [drm] Jan 5 10:28:07 tlondon kernel: [ 2025.215017] [<ffffffff811d372f>] ? fsnotify+0x25f/0x340 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] [<ffffffff811a6649>] do_vfs_ioctl+0x99/0x580 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] [<ffffffff8128b94a>] ? inode_has_perm.isra.31.constprop.61+0x2a/0x30 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] [<ffffffff8128cd17>] ? file_has_perm+0x97/0xb0 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] [<ffffffff811a6bc1>] sys_ioctl+0x91/0xb0 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] [<ffffffff810dc8cc>] ? __audit_syscall_exit+0x3ec/0x450 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] [<ffffffff8163d9d9>] system_call_fastpath+0x16/0x1b Jan 5 10:28:07 tlondon kernel: [ 2025.215017] Code: 00 74 2a 89 d0 83 e2 0f c0 e8 04 83 e8 01 83 e0 0f 89 c1 c1 e1 04 09 ca 84 c0 88 97 e9 00 00 00 75 07 80 a7 ea 00 00 00 fb 5d c3 <0f> 0b 0f 0b 0f 1f 44 00 00 66 66 66 66 90 55 48 89 e5 41 57 41 Jan 5 10:28:07 tlondon kernel: [ 2025.215017] RIP [<ffffffffa009c847>] i915_gem_object_unpin+0x47/0x50 [i915] Jan 5 10:28:07 tlondon kernel: [ 2025.215017] RSP <ffff880134be7938> More: This just popped up in dmesg: [10213.840108] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [10213.841102] i915: render error detected, EIR: 0x00000010 [10213.841102] i915: IPEIR: 0x00000000 [10213.841102] i915: IPEHR: 0x69040000 [10213.841102] i915: INSTDONE_0: 0xffffffff [10213.841102] i915: INSTDONE_1: 0xbfbbffff [10213.841102] i915: INSTDONE_2: 0x00000000 [10213.841102] i915: INSTDONE_3: 0x00000000 [10213.841102] i915: INSTPS: 0x8001e025 [10213.841102] i915: ACTHD: 0x055b608c [10213.841102] i915: page table error [10213.841102] i915: PGTBL_ER: 0x00000001 [10213.841102] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking i915_error_state was empty: [root@tlondon dri]# ls -l i915_error_state -rw-r--r--. 1 root root 0 Jan 5 13:25 i915_error_state [root@tlondon dri]# I seem to be able to reproduce at will. Any more testing/reporting/building I can do to help? Created attachment 72677 [details]
tar.gz containing dmesg, i915_error_state, Xorg.0.log, /var/log/gdm/*.log
Hang/crash continues with kernel-3.8.0-0.rc2.git2.2.fc19.x86_64 and xorg-x11-drv-intel-2.20.17-1.fc19.x86_64.
[ 368.708039] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 368.708047] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 376.708382] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 376.759026] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 00003000 tail 00000000 start 00003000
[ 378.704039] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 378.704541] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[ 378.704543] [drm:i915_reset] *ERROR* Failed to reset chip.
[ 403.384220] gnome-shell[1981]: segfault at 230 ip 00007f40b0fb989f sp 00007fff1db58130 error 4 in i965_dri.so[7f40b0f67000+b3000]
Here are the first 20 lines of i915_error_state:
Time: 1357655991 s 435476 us
PCI ID: 0x2a42
EIR: 0x00000000
IER: 0x02028c53
PGTBL_ER: 0x00000000
CCID: 0x00000000
fence[0] = 00000000
fence[1] = 00000000
fence[2] = 00000000
fence[3] = 591e0000511f0dd
fence[4] = 00000000
fence[5] = 00000000
fence[6] = 00000000
fence[7] = 00000000
fence[8] = 00000000
fence[9] = 00000000
fence[10] = 00000000
fence[11] = 00000000
fence[12] = 00000000
fence[13] = 00000000
I attach tar.gz containing dmesg output, /var/log/gdm/*.log, Xorg.0.log and i915_error_state.
More I can do?
Created attachment 72695 [details] [review] Longshot 1: remove g4x/g5 specific MI_FLUSH Created attachment 72696 [details] [review] Longshot 2: make the shrinker less aggressive towards instruction bo Tom, please try Chris' patches. Uhhhh, I only see one patch. Both posted patches are the same... That right? (In reply to comment #30) > Uhhhh, I only see one patch. Both posted patches are the same... > > That right? Indeed. You could try the original "make shrinker less aggressive" patch though. Created attachment 72727 [details] [review] Longshot 2: make the shrinker less aggressive towards instruction bo It was almost 3am when I tried to upload the patches... Having problems applying last patch: + patch -p1 -F1 -s + ApplyPatch 0002-make-the-shrinker-less-aggressive.patch + local patch=0002-make-the-shrinker-less-aggressive.patch + shift + '[' '!' -f /home/tbl/rpmbuild/SOURCES/0002-make-the-shrinker-less-aggressive.patch ']' Patch33336: 0002-make-the-shrinker-less-aggressive.patch + case "$patch" in + patch -p1 -F1 -s 1 out of 2 hunks FAILED -- saving rejects to file drivers/gpu/drm/i915/i915_gem.c.rej error: Bad exit status from /var/tmp/rpm-tmp.lcmyBp (%prep) Here is the .rej file: --- drivers/gpu/drm/i915/i915_gem.c +++ drivers/gpu/drm/i915/i915_gem.c @@ -4470,11 +4515,8 @@ unlock = false; } - if (nr_to_scan) { - nr_to_scan -= i915_gem_purge(dev_priv, nr_to_scan); - if (nr_to_scan > 0) - i915_gem_shrink_all(dev_priv); - } + if (nr_to_scan) + i915_gem_shrink(dev_priv, nr_to_scan); cnt = 0; list_for_each_entry(obj, &dev_priv->mm.unbound_list, gtt_list) ~ ~ Was written against drm-intel-fixes, so should apply to 3.8-rc2 fine. Which kernel are you testing? Sorry. Was applying to 3.7.1. Will grab source for kernel-3.8.0-0.rc2.git3.1.fc19.x86_64 and start again.... Am having problems building with the src rpm I pulled. Will have to try again tonight. I pulled kernel-3.8.0-0.rc2.git4.2.fc19.src.rpm from http://alt.fedoraproject.org/pub/alt/rawhide-kernel-nodebug/SRPMS/. The patching now succeeds; I am building now.... Sorry for the delay, but the build took a few tries... 3.8.0-0.rc2.git4.2.local.fc19.x86_64 with the 2 above patches has been running my "cat 43GB-files >/dev/null" crasher for several minutes now without incident. This is great!!!!!! I haven't been able to complete this test, so this is quite a change. No spew in dmesg either; stable gdm/Xorg/... THANKS!!!! I am now repeating this test. I will monitor and report. More I can do? Here is output from vmstat: [tbl@tlondon VirtualMachines]$ vmstat 10 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 1 0 1775656 105060 1210000 0 0 3056 107 727 1203 13 5 37 44 0 2 0 1042640 105072 1941068 0 0 73442 54 1071 2032 4 5 63 28 0 2 0 267392 105092 2714712 0 0 77370 7 980 1905 3 5 52 40 0 1 0 144492 31080 2931872 0 0 85133 4 1337 2093 4 8 62 26 0 1 0 149396 31072 2930492 0 0 87949 0 1263 1910 3 9 64 24 0 1 0 150632 31188 2928340 0 0 83795 7 1257 1946 3 7 55 35 1 3 0 146360 31196 2938700 0 0 86184 2 1198 1886 2 6 47 45 0 1 0 146504 31204 2943096 0 0 83787 3 1208 1929 3 7 52 38 1 0 0 145768 31200 2948804 0 0 89929 2 1244 1924 3 8 60 29 0 1 0 144688 31208 2953332 0 0 104573 1 1442 2143 4 10 60 26 0 1 0 148116 31100 2952640 0 0 105757 1 1340 2044 3 9 56 32 1 0 0 151436 31096 2952480 0 0 107600 0 1365 2096 3 10 65 22 0 1 0 150376 31092 2953792 0 0 106744 0 1363 2067 3 9 55 33 0 1 0 149860 31088 2955348 0 0 103801 0 1332 2097 3 8 52 37 0 1 0 144076 31120 2961904 0 0 96686 1 1468 2063 3 9 56 31 1 1 0 150732 31124 2953044 0 0 100161 0 1378 2082 4 10 62 25 0 1 0 146408 31128 2958476 0 0 95870 0 1301 1984 3 9 58 30 1 0 0 145520 31136 2959972 0 0 97358 0 1371 2060 3 8 52 37 0 1 0 146996 31152 2958508 0 0 82962 5 1244 1951 3 7 52 38 3 0 0 147996 31160 2954000 0 0 80578 2 1316 2006 3 8 60 29 0 1 0 145532 31168 2957428 0 0 79778 0 1241 1897 3 7 58 32 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 1 0 127132 31320 2972564 0 0 73284 1197 1214 1921 5 6 48 41 0 1 0 144656 31288 2951296 0 0 69968 13 1268 2141 5 8 53 35 1 1 0 148924 31292 2946676 0 0 75920 4 1243 1975 3 8 61 28 0 1 0 151308 31308 2939844 0 0 75828 1 1202 1903 3 7 57 33 0 1 0 145636 31320 2946304 0 0 72241 16 1228 1913 3 7 55 35 0 1 0 150740 31332 2941224 0 0 73810 3 1207 1948 3 8 58 31 1 0 0 146628 31356 2944968 0 0 73763 0 1328 2007 3 8 61 28 0 1 0 149832 31360 2941776 0 0 75132 2 1190 1905 3 7 63 27 0 1 0 146020 31408 2915404 0 0 65193 194 1406 2064 5 6 43 46 0 2 0 145768 31344 2915824 0 0 71813 11 1288 2074 3 6 50 41 5 1 0 147704 31352 2913676 0 0 72329 6 1180 1835 3 6 49 42 1 1 0 146844 31352 2913724 0 0 74676 166 1197 1876 3 7 60 30 0 1 0 147560 31368 2900980 0 0 66823 4 1192 2036 4 6 50 40 1 1 0 150736 31376 2897716 0 0 74338 17 1228 2133 3 6 47 44 0 1 0 151344 31380 2894552 0 0 76119 0 1239 2108 3 7 51 39 1 0 0 150808 31392 2895576 0 0 73536 1 1269 2170 3 7 56 34 0 2 0 147624 31400 2898924 0 0 74670 2 1215 2090 3 6 57 33 0 1 0 144880 31388 2901828 0 0 67518 6 1312 2129 4 8 50 38 0 2 0 150752 31404 2897156 0 0 73924 2 1254 2135 3 6 50 41 1 0 0 145128 31408 2903132 0 0 74100 1 1241 2140 4 7 58 32 1 1 0 151144 31448 2894644 0 0 71395 30 1240 2135 3 6 51 39 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 2 1 0 148172 31464 2907324 0 0 71740 8 1203 2084 4 6 50 41 0 2 0 151500 31480 2883140 0 0 62043 20 1552 2896 7 9 43 42 0 1 0 148668 31476 2878704 0 0 67546 159 1329 2325 4 7 46 43 0 1 0 149040 31524 2873000 0 0 66586 64 1298 2341 5 8 45 43 5 0 0 124592 31472 2893828 0 0 41530 47 1734 2170 5 30 40 25 0 1 0 145252 31488 2872904 0 0 276 60 1108 1916 7 12 79 2 2 0 0 131864 31504 2881628 0 0 69 42 1270 2818 12 7 80 1 0 0 0 141808 31616 2871964 0 0 65 2 1069 2359 11 7 82 0 1 0 0 136560 31640 2871916 0 0 58 42 1307 2919 14 6 79 1 0 0 0 111908 31664 2895860 0 0 1181 1198 905 1912 10 3 85 2 0 0 0 120600 31688 2896620 0 0 77 52 736 1564 7 2 90 1 0 1 0 117952 31704 2896928 0 0 29 39 678 1353 6 2 90 1 2 0 0 113708 31720 2897192 0 0 26 25 777 1593 7 3 89 1 (In reply to comment #38) > More I can do? Can you run you test with just the first patch, https://bugs.freedesktop.org/attachment.cgi?id=72695 ? I'm interested if that path is worth pursuing, or if it is just a dead end. OK. I just started a build with just one patch: 0002-remove-MI-FLUSH.patch I've commented out 0002-make-the-shrinker-less-aggressive.patch [Takes a while to build on my laptop.] Will retest when complete and report back. Everyone please retest with latest drm-intel-fixes from http://cgit.freedesktop.org/~danvet/drm-intel I've just merged a bunch of duct-tapes for this issue. Created attachment 72801 [details]
tar.gz included 'dmesg', i915_error_state, etc.
kernel with just the one patch (remove-MI_FLUSH) hangs/crashes under my "cat big-files >/dev/null" test.
Here is the dmesg spew:
[ 178.704031] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 178.704039] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 188.704040] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 188.757040] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 00003000 tail 00000000 start 00003000
[ 190.704040] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 190.704160] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[ 190.704163] [drm:i915_reset] *ERROR* Failed to reset chip.
[ 201.972860] gnome-shell[1927]: segfault at 230 ip 00007fbcbf78989f sp 00007fff8ae7ac00 error 4 in i965_dri.so[7fbcbf737000+b3000]
Here are the first few lines from i915_error_state:
Time: 1357847009 s 606144 us
PCI ID: 0x2a42
EIR: 0x00000000
IER: 0x02028c53
PGTBL_ER: 0x00000000
CCID: 0x00000000
fence[0] = 00000000
fence[1] = 00000000
fence[2] = 00000000
Hope this helps....
I built kernel-3.8.0-0.rc3.git0.3.local.fc19.x86_64 with the one patch labelled 'drm-intel-fixes' in the above link. I ran my crasher: "cat 40GB-files >/dev/null"; system stayed up, there was no spew in /var/log/messages, and system was stable. Thanks! Here is output of 'vmstat 10' during the disk traffic surge: procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 2 0 2332844 99788 604672 0 0 2262 73 834 1243 13 6 27 53 0 2 0 1686000 99836 1268764 0 0 66922 56 1185 2209 7 4 26 63 0 2 0 966036 99856 1987428 0 0 71821 30 868 1777 2 3 29 66 0 2 0 204288 99936 2746552 0 0 75932 782 893 1774 2 3 30 65 0 3 0 151216 29400 2871912 0 0 83392 42 1102 1820 2 5 7 87 0 4 0 150760 29376 2873360 0 0 87950 0 1098 1875 2 4 0 94 0 3 0 144736 31812 2880712 0 0 74862 54 1187 1954 2 4 0 94 0 2 0 148012 33152 2878448 0 0 83268 80 1139 1918 2 4 9 84 0 2 0 149312 32696 2877756 0 0 81625 2 1094 1831 2 4 33 61 0 2 0 151232 32692 2878168 0 0 84898 0 1102 1894 2 4 33 61 0 2 0 146352 32696 2885892 0 0 93866 1 1129 1894 2 4 33 61 0 2 0 145628 31620 2895608 0 0 105160 76 1227 2056 2 5 34 59 0 2 0 150792 30936 2889108 0 0 94103 2 1173 1934 2 5 34 59 0 2 0 151032 30940 2889684 0 0 98985 2 1190 2011 2 5 33 59 0 2 0 149740 30940 2892472 0 0 102569 0 1162 1978 2 5 35 58 1 2 0 147280 31912 2889596 0 0 64857 12 1453 2815 7 8 19 67 0 2 0 148320 31364 2878272 0 0 73233 180 1846 3628 11 9 16 64 2 2 0 149648 31364 2890884 0 0 89531 15 1697 3587 9 9 27 56 1 2 0 147136 31372 2893320 0 0 86727 0 1378 2484 4 8 29 59 0 2 0 146868 31388 2892416 0 0 84028 152 1330 2456 4 6 29 60 0 2 0 146492 31400 2892260 0 0 78779 5 1316 2442 4 5 30 61 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 3 2 0 149564 31236 2887672 0 0 78745 15 1454 2966 6 7 29 59 0 2 0 149184 31196 2888136 0 0 74080 11 1466 2819 6 6 28 60 0 2 0 147344 31196 2891156 0 0 77196 9 1241 2332 4 5 30 60 0 2 0 147120 31208 2889356 0 0 73685 16 1183 2160 3 5 25 67 0 3 0 150392 31236 2884580 0 0 69221 18 1200 2183 3 5 29 63 1 2 0 149028 31248 2885824 0 0 75914 2 1272 2203 3 5 31 61 0 2 0 146816 31264 2886108 0 0 70554 18 1342 2422 4 6 30 60 0 2 0 146616 31296 2886848 0 0 72022 1202 1149 2014 4 5 28 63 0 2 0 151444 31340 2881796 0 0 66190 257 1478 2575 9 7 22 61 1 2 0 146656 31352 2885688 0 0 66033 27 1360 2499 4 6 30 60 0 2 0 150356 31364 2879456 0 0 68884 33 1277 2383 4 6 28 62 1 2 0 147388 31496 2878560 0 0 68414 164 1494 2739 7 8 25 60 1 2 0 145040 31412 2883196 0 0 66066 21 1274 2281 3 5 30 62 1 1 0 150928 31420 2875880 0 0 75336 20 1188 2089 4 5 32 59 0 2 0 150160 31416 2853440 0 0 68478 167 1135 1920 4 5 29 62 1 3 0 148228 31436 2852236 0 0 69452 21 1424 2743 5 7 27 60 0 2 0 146688 31436 2852892 0 0 71651 12 1137 2001 3 6 25 66 0 2 0 150668 31444 2847216 0 0 64634 15 1196 2013 3 5 20 73 1 2 0 148196 31428 2849844 0 0 66958 0 1622 3516 8 10 27 56 0 2 0 147388 31432 2850632 0 0 71242 4 1222 2113 4 6 29 61 0 2 0 151420 31436 2845936 0 0 68421 0 1314 2233 3 8 30 59 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 3 0 149068 31436 2848756 0 0 70618 0 1455 2978 6 7 28 60 1 2 0 146380 31436 2851932 0 0 67282 0 1125 1915 3 5 21 71 0 2 0 145056 31440 2852548 0 0 71702 1 1078 1827 2 4 27 67 0 2 0 148568 31484 2848428 0 0 58694 10 1374 2710 5 7 23 65 0 2 0 147868 31488 2848808 0 0 69512 0 1068 1918 2 4 29 65 0 2 0 146396 31488 2850440 0 0 68333 11 1021 1712 2 4 31 63 0 2 0 147952 31500 2847680 0 0 64743 2 1219 2317 4 6 27 63 0 2 0 151560 31508 2843916 0 0 65393 2 1194 2142 3 7 27 63 1 1 0 139692 31484 2856432 0 0 24770 2 2246 2552 3 35 23 39 0 1 0 122144 31484 2874040 0 0 2422 0 894 1770 3 2 59 37 0 1 0 108100 31532 2888136 0 0 1413 23 677 1336 3 1 55 41 0 1 0 149916 31540 2845932 0 0 1790 4 779 1592 3 2 54 41 Consolidating all gen4/5 i/o related hangs. *** This bug has been marked as a duplicate of bug 55984 *** A patch referencing this bug report has been merged in Linux v3.8-rc4: commit 93927ca52a55c23e0a6a305e7e9082e8411ac9fa Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Thu Jan 10 18:03:00 2013 +0100 drm/i915: Revert shrinker changes from "Track unbound pages" Patch has been merged (long ago). Closing. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 70090 [details] backtrace Probably is an exact duplicate of #51376, which though is claimed to be fixed. Hardware is a GM45 (GMA4500) OS is fedora rawhide, with relevant packages mesa-dri-drivers-9.0-5.fc19.x86_64 kernel-3.7.0-0.rc5.git1.3.fc19.x86_64 Reproducible: often I've noticed three times always in a similar scenario, beeing io pressure (busy harddrive, i.e. from a yum update) and watching some html5 video content (youtube). Backtrace attached. i915_error_state attached. Dmesg shows a gpu lockup: [39911.544036] [drm] GMBUS [i915 gmbus dpb] timed out, falling back to bit banging on pin 5 [41612.416035] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [41612.416041] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [41620.256016] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [41620.308044] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 00003000 tail 00000000 start 00003000 [41621.828021] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [41621.828359] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! [41621.828363] [drm:i915_reset] *ERROR* Failed to reset chip.