Summary: | [snb blorp] GPU hang | ||
---|---|---|---|
Product: | Mesa | Reporter: | Janusz <januszmk6> |
Component: | Drivers/DRI/i965 | Assignee: | Kenneth Graunke <kenneth> |
Status: | CLOSED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | czajernia, greatquux, intel-gfx-bugs, jan.steffens, js314592, kenneth, longerdev, moonpfe, shuhao, tomi, yeled.nova, yjcoshc |
Version: | 9.2 | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
dmesg
/sys/class/drm/card0/error i915_error_state /sys/kernel/debug/dri/0/i915_error_state intel_gpu_abrt.tar i915_error_state file after causing "render ring" hang in Google Maps i915_error_state |
Created attachment 87140 [details]
/sys/class/drm/card0/error
Using chrome for a while and then playing hedgewars. [11788.465654] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x270d6000 ctx 1) at 0x270d61c8 [12010.091755] SELinux: initialized (dev proc, type proc), uses genfs_contexts [12196.509788] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring [12196.509851] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x9e37000 ctx 1) at 0x9e371c8 [12196.534762] [drm:__gen6_gt_force_wake_get] *ERROR* Timed out waiting for forcewake old ack to clear. [12262.519865] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring [12262.519875] [drm:i915_hangcheck_elapsed] *ERROR* stuck on blitter ring [12262.528953] [drm:__gen6_gt_force_wake_get] *ERROR* Timed out waiting for forcewake old ack to clear. [12331.530366] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring [12331.530375] [drm:i915_hangcheck_elapsed] *ERROR* stuck on blitter ring [12331.824316] [drm:__gen6_gt_force_wake_get] *ERROR* Timed out waiting for forcewake old ack to clear. Also playing hedgewars for about half an hour can reproduce these errors. i5-2510M with hd3000, mesa 9.2.0, xf86-video-intel 2.99.903. Created attachment 87163 [details]
i915_error_state
I can also see it after upgrade of Mesa from 9.1.6 to 9.2.1 with full screen video playback: mplayer -vo gl [43554.987053] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring [43554.987128] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xdde4000 ctx 1) at 0xdde41c8 [43707.904260] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring [43707.904342] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xdde4000 ctx 1) at 0xdde4220 Hello. Same problem here. [ 485.443455] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring [ 485.443467] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state [ 485.452727] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xa637000 ctx 1) at 0xa6371c8 [ 821.726799] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring [ 821.726873] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x4974000 ctx 1) at 0x49741c8 [ 1311.134514] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring [ 1311.134613] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x4a98000 ctx 1) at 0x4a98220 sys: fedora 19 64b Linux jarvis 3.11.2-201.fc19.x86_64 #1 SMP Fri Sep 27 19:20:55 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux WM: KDE with effects enabled 8G ram 300G SATA HDD ntb Lenovo ThinkPad E320 problem occurs in: - scrolling in firefox - playing video in vlc and switch to KDE terminal or another app - sometimes system hangs, cpu 100%, freeze and hard reboot needed - sometimes happens if I work with ff or in terminal only (very frustrating) - happening across many kernel versions 3.0 to newest I think lspci 00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09) 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) 00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04) 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 04) 00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 04) 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b4) 00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 2 (rev b4) 00:1c.2 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 3 (rev b4) 00:1c.5 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 6 (rev b4) 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 04) 00:1f.0 ISA bridge: Intel Corporation HM65 Express Chipset Family LPC Controller (rev 04) 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 04) 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 04) 02:00.0 Network controller: Intel Corporation Centrino Wireless-N 1000 [Condor Peak] 03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5209 PCI Express Card Reader (rev 01) 03:00.1 SD Host controller: Realtek Semiconductor Co., Ltd. RTS5209 PCI Express Card Reader (rev 01) 08:00.0 Ethernet controller: Qualcomm Atheros AR8151 v2.0 Gigabit Ethernet (rev c0) Created attachment 87349 [details]
/sys/kernel/debug/dri/0/i915_error_state
Downstream Gentoo bug report at https://bugs.gentoo.org/show_bug.cgi?id=480930 *** Bug 70594 has been marked as a duplicate of this bug. *** Does anyone else here experience CPU/GPU performance drops when they experience these hangs? I think these are due to overheating but I can't be sure as the clock freq of the CPU is not being limited. Previously, that happened when I hit 95C, where as these slow down happens when my core hits roughly 86C. Laptop is a Thinkpad T420. Critical temp is 100C. It's been documented that these laptops will hit these temperature after computationally intensive tasks such as games. (In reply to comment #9) > Does anyone else here experience CPU/GPU performance drops when they > experience these hangs? It takes up to a few seconds for the kernel to detect a hang (the gpu doesn't really tell us, it simply stops doing stuff). So frame-freezing while that happens is completely expected. And we also need to turbo-up the chip again, which in some pathetic corner-cases also takes a while. 3.13 has some cool tricks from Chris to help with the turboing. Would a freeze that requires hard reset be also part of this? I just had one shortly after a short hang. Syslog got a bunch of nul at the end before the reboot. I also have a stack trace that may or may not be related: [20151.429335] [drm:intel_pipe_config_compare] *ERROR* mismatch in adjusted_mode.flags (expected 1, found 0) [20151.429345] ------------[ cut here ]------------ [20151.429411] WARNING: CPU: 2 PID: 1707 at /build/buildd/linux-3.11.0/drivers/gpu/drm/i915/intel_display.c:8292 check_crtc_state+0x58f/0x9c0 [i915]() [20151.429415] pipe state doesn't match! [20151.429418] Modules linked in: usb_storage(F) hid_generic hidp hid ipt_MASQUERADE(F) iptable_nat(F) nf_nat_ipv4(F) xt_CHECKSUM(F) iptable_mangle(F) bridge(F) stp(F) llc(F) pci_stub vboxpci(OF) vboxnetadp(OF) vboxnetflt(OF) vboxdrv(OF) ebtable_nat(F) ebtables(F) parport_pc(F) ppdev(F) rfcomm bnep joydev(F) binfmt_misc(F) nfsd(F) auth_rpcgss(F) nfs_acl(F) nfs(F) lockd(F) sunrpc(F) fscache(F) ext2(F) uvcvideo videobuf2_vmalloc videobuf2_memops x86_pkg_temp_thermal videobuf2_core videodev intel_powerclamp coretemp kvm_intel(F) kvm(F) arc4(F) iwldvm mac80211 ip6t_REJECT(F) xt_hl(F) ip6t_rt(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) btusb bluetooth ipt_REJECT(F) xt_limit(F) xt_tcpudp(F) xt_addrtype(F) nf_conntrack_ipv4(F) microcode(F) snd_hda_codec_conexant psmouse(F) iwlwifi serio_raw(F) snd_hda_intel snd_hda_codec cfg80211 snd_hwdep(F) wmi snd_pcm(F) snd_page_alloc(F) snd_seq_midi(F) snd_seq_midi_event(F) nf_defrag_ipv4(F) xt_conntrack(F) snd_rawmidi(F) snd_seq(F) ip6table_filter(F) tpm_tis ip6_tables(F) thinkpad_acpi nf_conntrack_netbios_ns(F) nvram(F) nf_conntrack_broadcast(F) nf_nat_ftp(F) nf_nat(F) nf_conntrack_ftp(F) nf_conntrack(F) iptable_filter(F) ip_tables(F) x_tables(F) snd_seq_device(F) snd_timer(F) mac_hid lpc_ich snd(F) mei_me mei soundcore(F) lp(F) parport(F) dm_crypt(F) crct10dif_pclmul(F) crc32_pclmul(F) ghash_clmulni_intel(F) aesni_intel(F) i915 aes_x86_64(F) lrw(F) gf128mul(F) glue_helper(F) ablk_helper(F) cryptd(F) video(F) i2c_algo_bit sdhci_pci drm_kms_helper sdhci ahci(F) libahci(F) drm e1000e(F) ptp(F) pps_core(F) [20151.429568] CPU: 2 PID: 1707 Comm: Xorg Tainted: GF W O 3.11.0-12-generic #19-Ubuntu [20151.429572] Hardware name: LENOVO 4180J4C/4180J4C, BIOS 83ET76WW (1.46 ) 07/05/2013 [20151.429576] 0000000000000009 ffff8801925bd898 ffffffff816e547a ffff8801925bd8e0 [20151.429583] ffff8801925bd8d0 ffffffff81061dbd ffff88019534f6d0 0000000000000001 [20151.429589] ffff88018fcdc000 ffff88019534f000 ffff88018fcdc478 ffff8801925bd930 [20151.429595] Call Trace: [20151.429610] [<ffffffff816e547a>] dump_stack+0x45/0x56 [20151.429620] [<ffffffff81061dbd>] warn_slowpath_common+0x7d/0xa0 [20151.429627] [<ffffffff81061e2c>] warn_slowpath_fmt+0x4c/0x50 [20151.429667] [<ffffffffa015d9ef>] check_crtc_state+0x58f/0x9c0 [i915] [20151.429707] [<ffffffffa0168bf3>] intel_modeset_check_state+0x2c3/0x770 [i915] [20151.429740] [<ffffffffa0169135>] intel_set_mode+0x25/0x30 [i915] [20151.429772] [<ffffffffa0169982>] intel_crtc_set_config+0x742/0x910 [i915] [20151.429807] [<ffffffffa0071f8d>] drm_mode_set_config_internal+0x5d/0xe0 [drm] [20151.429838] [<ffffffffa00749c7>] drm_mode_setcrtc+0xf7/0x650 [drm] [20151.429872] [<ffffffffa015ef72>] ? intel_crtc_load_lut+0xd2/0x170 [i915] [20151.429898] [<ffffffffa0065212>] drm_ioctl+0x532/0x660 [drm] [20151.429918] [<ffffffff811b8ba5>] do_vfs_ioctl+0x2e5/0x4d0 [20151.429927] [<ffffffff811a91f1>] ? __sb_end_write+0x31/0x60 [20151.429934] [<ffffffff811a6d82>] ? vfs_write+0x172/0x1e0 [20151.429942] [<ffffffff811b8e11>] SyS_ioctl+0x81/0xa0 [20151.429950] [<ffffffff816f521d>] system_call_fastpath+0x1a/0x1f [20151.429955] ---[ end trace ad96f94859530fca ]--- Would someone be able to try the 'snbfixes' branch of: git://people.freedesktop.org/~kwg/mesa It has a number of patches that may help with this problem. (In reply to comment #13) > Would someone be able to try the 'snbfixes' branch of: > git://people.freedesktop.org/~kwg/mesa > > It has a number of patches that may help with this problem. These patches seem to work. I've been playing DOTA2 for the last 3 hours without problems. It used to hang every 8-10 minutes. I'll post if it hangs again. *** Bug 70896 has been marked as a duplicate of this bug. *** (In reply to comment #14) > (In reply to comment #13) > > Would someone be able to try the 'snbfixes' branch of: > > git://people.freedesktop.org/~kwg/mesa > > > > It has a number of patches that may help with this problem. > > These patches seem to work. I've been playing DOTA2 for the last 3 hours > without problems. It used to hang every 8-10 minutes. > > I'll post if it hangs again. I just read the news at http://www.phoronix.com/scan.php?page=news_item&px=MTQ5Njg, which says "Kenneth's patches don't appear to eliminate the driver issues but rather allow DOTA 2 to run for about three hours before hanging." Sorry for the misunderstanding I caused, apparently my English is too bad to deliver. That was not what I meant to say. The hanging didn't happen again! The patch was good! I played DOTA2 for 3 hours before I stopped playing. NO hanging, so I felt it's time to let Kenneth know the patch worked. Since 3 hours' testing wasn't conclusive enough, I added "I'll post if it hangs again". I hope this clarifies. I'll try to contact phoronix to let them know too. P.S I played more DOTA2 today. Good news: still no hanging. Now I have been testing the patch for about 10 hours so I can conclude with 99% confidence that the patch fixes the problem. P.S2 https://bugs.freedesktop.org/show_bug.cgi?id=69379 seems to be a duplicate. One of you mesa knowledgeables can check on that. No, I understood you perfectly. Don't worry about it. Thanks again for testing! Created attachment 88211 [details]
intel_gpu_abrt.tar
Hi,
Update on the testing on patch.. Hanging occurs again. It happens when I was watching video using flash. My KDE setup uses OpenGL 2.0 Raster for desktop effects, if that's relevant.
I'll attach intel_gpu_abrt.tar
Maybe that's a different bug? Or maybe I just did something wrong. This is my first time compiling an important Linux component, and it's a possibility I didn't do things 100% right.
That's a different hang - it hangs on MI_SEMAPHORE_MBOX. Based on Daniel's recent mesa-dev post, I believe it's bug #54226. I've pushed the following fixes to Mesa master: commit 5563dfabc8c1b7cc1a67e4d64311ea29aef43087 Author: Kenneth Graunke <kenneth@whitecape.org> i965: Also emit HiZ and Stencil packets when disabling depth on Gen6. commit 29e5d5db5149f721e6c15a9aee6f8135a98ba5c8 Author: Kenneth Graunke <kenneth@whitecape.org> i965: Also emit HIER_DEPTH and STENCIL packets when disabling depth. commit 65b1f642ac2dff58498622bf6e0b7be8d9d3e20d Author: Kenneth Graunke <kenneth@whitecape.org> i965: Move post-sync non-zero flush for 3DSTATE_MULTISAMPLE. commit 10a918e52c37715744f7980b2bc9da69575514da Author: Kenneth Graunke <kenneth@whitecape.org> i965: Also guard 3DSTATE_DRAWING_RECTANGLE with a flush in blorp. commit 3aef1fefb4dc2a66101725f2fdc3f2bb0eb926c2 Author: Kenneth Graunke <kenneth@whitecape.org> i965: Emit post-sync non-zero flush before 3DSTATE_DRAWING_RECTANGLE. commit 436e815a250a8fde22d79093f4b9eed56472693b Author: Kenneth Graunke <kenneth@whitecape.org> i965: Emit post-sync non-zero flush before 3DSTATE_GS_SVB_INDEX. commit 32a3f5f6d768e5828be1d1f46b1b3f819f55cba8 Author: Daniel Vetter <daniel.vetter@ffwll.ch> i965: CS writes/reads should use I915_GEM_INSTRUCTION These should be in Mesa 10.0. I've also marked them as candidates for a 9.2 stable release, but we may want to let them sit on master for a little while to make sure they don't cause other hang problems. Marking Resolved/Fixed. Thanks for your patience, all. For ubuntu users, this might help you: https://launchpad.net/~oibaf/+archive/graphics-drivers/ It is available for 13.10 and seems like it compiled the oct 31st version of mesa, which is after when this patch landed on master. I can tentatively confirm that this is working for me as well. Will run some more tests later. I still experienced a hard crash of the entire system when bringing up a complex page in Google Maps, even with this installed. So I've taken to disabling all acceleration (NoAccel "true") - I definitely do not see the issue then (although of course scrolling in Google Maps is slower, it does not crash the machine). I can confirm that the patches in Comment 20 help a lot. In my case the gpu hung while playing MegaGlest 3.7.1, SuperTuxKart 0.8 and sometimes also while scrolling in LyX and Firefox 17.0.9 (using KDE with compositing enabled). I haven't tested LyX so far, but the freezes in the other programs seem to be gone. Of course this does not mean that there can't be another bug somewhere (which might be the case in Comment 23). I also confirm that after compiling mesa 10 (and patching xorg-server for new mesa) it's working fine, no gpu hang I just re-installed oibaf's PPA and reproduced a GPU hang in Google Maps. It didn't crash the first time, but I know that if I continue scrolling in Google Maps it will bring down the whole system. However here's the message from syslog and the associated i915_error_state file will be attached. Nov 6 07:08:37 hawty kernel: [250620.035165] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring Nov 6 07:08:37 hawty kernel: [250620.035173] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state Nov 6 07:08:37 hawty kernel: [250620.038100] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xe1e3000 ctx 3) at 0xe1e4ee0 Nov 6 07:08:37 hawty kernel: [250620.223203] [drm:__gen6_gt_force_wake_get] *ERROR* Timed out waiting for forcewake old ack to clear. Nov 6 07:08:45 hawty kernel: [250627.992511] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring Nov 6 07:08:45 hawty kernel: [250627.992561] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xe948000 ctx 3) at 0xe948338 Created attachment 88747 [details] i915_error_state file after causing "render ring" hang in Google Maps This is the i915_error_state file after updating my system with the https://launchpad.net/~oibaf/+archive/graphics-drivers/ PPA (and restarting X of course) and scrolling around a lot in Google Maps in Google Chrome. I just had: [155147.011667] [drm] stuck on bsd ring [155147.012432] [drm:i915_set_reset_status] *ERROR* bsd ring hung inside bo (0x11d6c000 ctx 0) at 0x11d6c2cc with mesa 10 (with fixes from comment #20)... It's happened only twice in two days... (In reply to comment #28) > I just had: > [155147.011667] [drm] stuck on bsd ring > [155147.012432] [drm:i915_set_reset_status] *ERROR* bsd ring hung inside bo > (0x11d6c000 ctx 0) at 0x11d6c2cc > with mesa 10 (with fixes from comment #20)... > It's happened only twice in two days... A hang in the bsd can't be a mesa issue. Also since it's inside the batch it's not one of the known kernel issues. Please upgrade libva (only thing that uses the bsd ring) and if it still fails fail a new bug against lib. (In reply to comment #28) > I just had: > [155147.011667] [drm] stuck on bsd ring > [155147.012432] [drm:i915_set_reset_status] *ERROR* bsd ring hung inside bo > (0x11d6c000 ctx 0) at 0x11d6c2cc > with mesa 10 (with fixes from comment #20)... > It's happened only twice in two days... You may have the same issue that I have...see https://bugs.freedesktop.org/show_bug.cgi?id=71276 *** Bug 71074 has been marked as a duplicate of this bug. *** this error: *ERROR* render ring hung inside bo (0x4a98000 ctx 1) at 0x4a98220 and desktop freezing were solved for me by bios update Created attachment 88969 [details]
i915_error_state
Still hang after playing hedgewars for nearly an hour, but it doesn't hang continuously, just one time. Dmesg shows:
[ 3929.521951] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[ 3929.521963] [drm:i915_hangcheck_elapsed] *ERROR* stuck on blitter ring
[ 3929.521967] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
I notice that my error message has "blitter ring". Is it a different bug?
I have compiled Mesa 10 master in /opt/mesa and added a conf file in /etc/ld.so.conf.d. Now glxinfo shows: OpenGL version string: 3.0 Mesa 10.0.0-devel (git-69b425e) OpenGL shading language version string: 1.30 OpenGL context flags: (none) This is the first time I have built mesa. (In reply to comment #33) > Created attachment 88969 [details] > i915_error_state > > Still hang after playing hedgewars for nearly an hour, but it doesn't hang > continuously, just one time. Dmesg shows: > [ 3929.521951] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring > [ 3929.521963] [drm:i915_hangcheck_elapsed] *ERROR* stuck on blitter ring > [ 3929.521967] [drm] capturing error event; look for more information in > /sys/kernel/debug/dri/0/i915_error_state > > I notice that my error message has "blitter ring". Is it a different bug? Yes, IPEHR of 0x0b160001 suggests that your bug is actually: https://bugs.freedesktop.org/show_bug.cgi?id=54226 It is not this bug. Fixed in Mesa 9.2.3 (out today), the upcoming Mesa 10.0 release, or Mesa master. This fixes the Sandybridge-specific GPU hang where i915_error_state (aka /sys/class/drm/card0/error) lists an IPEHR of 0x79050005. If you experience a GPU hang and your IPEHR is not 0x79050005, that is a different bug. Thanks again for your patience! Sorry to re-open but I wanted to include one more detail - either a recent update fixed this for me, or a recent change I made to my BIOS settings (changing my fan speed from "normal" to "turbo" in my motherboard BIOS) has made me unable to reproduce this hang, even after long periods of scrolling complex maps in Google Maps. I can definitely raise the temperature nearly 10C by doing this, but with the fans on turbo (and this only seems to make it slightly louder) the hang does not occur. Hopefully this will help anyone else with these issues - always try to cool the processor as much as possible! Closing old resolved+fixed. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 87138 [details] dmesg Sometimes my desktop is freezing and I get in dmesg something like that: [ 2387.969110] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x249ce000 ctx 1) at 0x249ce1d8 i5 2500k with hd3000, 2 screens, mesa-9.2.0 with classic driver arch, xf86-video-intel-2.99.901.