Created attachment 129174 [details] /sys/class/drm/card0/error I got this in my dmesg while playing portal 2. The game would hang for about 5 seconds, recover, and then repeat after a random amount of time. [36111.153352] [drm] GPU HANG: ecode 9:0:0x85dffffb, in portal2_linux [20076], reason: Hang on render ring, action: reset [36111.153356] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [36111.153358] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [36111.153359] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [36111.153361] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [36111.153363] [drm] GPU crash dump saved to /sys/class/drm/card0/error [36111.153487] drm/i915: Resetting chip after gpu hang [36111.153920] [drm] GuC firmware load skipped [36113.070947] [drm] RC6 on [36157.230576] drm/i915: Resetting chip after gpu hang [36157.231028] [drm] GuC firmware load skipped [36159.150428] [drm] RC6 on [36224.537792] thinkpad_acpi: EC reports that Thermal Table has changed
There were improvements pushed in kernel, xf86-video-intel and Mesa that will benefit to your system, so please re-test with latest kernel, xf86-video-intel (in case you use SNA) & Mesa to see if this issue is still occurring. Mark as REOPENED if you can reproduce (please capture and upload an apitrace (https://github.com/apitrace/apitrace) so that we can easily reproduce as well.) and RESOLVED/* if you cannot reproduce. In both case, please confirm your environment (see below) * Details: - Kernel: 4.8.13-1-ARCH - Platform: Skylake (PCI ID: 0x1916, PCI Revision: 0x07, PCI Subsystem: 17aa:504a) - Mesa: [Please confirm your version] - xf86-video-intel: [Please confirm your version] From this error dump, hung is happening in render ring batch with active head at 0xef313088, with 0x7a000004 (PIPE_CONTROL) as IPEHR. Batch extract (around 0xef313088): 0xef313014: 0x00000000: MI_NOOP 0xef313018: 0x00000000: MI_NOOP 0xef31301c: 0x78490001: 3D UNKNOWN: 3d_965 opcode = 0x7849 0xef313020: 0x00000001: MI_NOOP 0xef313024: 0x00000000: MI_NOOP 0xef313028: 0x78490001: 3D UNKNOWN: 3d_965 opcode = 0x7849 0xef31302c: 0x00000002: MI_NOOP 0xef313030: 0x00000000: MI_NOOP 0xef313034: 0x78490001: 3D UNKNOWN: 3d_965 opcode = 0x7849 0xef313038: 0x00000003: MI_NOOP 0xef31303c: 0x00000000: MI_NOOP 0xef313040: 0x78490001: 3D UNKNOWN: 3d_965 opcode = 0x7849 0xef313044: 0x00000004: MI_NOOP 0xef313048: 0x00000000: MI_NOOP 0xef31304c: 0x780c0000: 3D UNKNOWN: 3d_965 opcode = 0x780c 0xef313050: 0x00000000: MI_NOOP Bad length 7 in (null), expected 6-6 0xef313054: 0x7b000005: 3DPRIMITIVE: fail sequential 0xef313058: 0x00000104: vertex count 0xef31305c: 0x00000fba: start vertex 0xef313060: 0x00000000: instance count 0xef313064: 0x00000001: start instance 0xef313068: 0x00000000: index bias 0xef31306c: 0x00000000: MI_NOOP Bad count in PIPE_CONTROL 0xef313070: 0x7a000004: PIPE_CONTROL: no write, no depth stall, no RC write flush, no inst flush 0xef313074: 0x00101001: destination address 0xef313078: 0x00000000: immediate dword low 0xef31307c: 0x00000000: immediate dword high Bad count in PIPE_CONTROL 0xef313088: 0x7a000004: PIPE_CONTROL: no write, no depth stall, no RC write flush, no inst flush 0xef31308c: 0x00000408: destination address 0xef313090: 0x00000000: immediate dword low 0xef313094: 0x00000000: immediate dword high 0xef3130a0: 0x78300000: 3D UNKNOWN: 3d_965 opcode = 0x7830
Created attachment 130249 [details] /sys/class/drm/card0/error I think I'm having the same issue with Kabylake Laptop Dell XPS 13 9360 Dev Edition Ubuntu 16.04 LTS w/ HWE stack running 4.8.0-42-generic #45~16.04.1-Ubuntu SMP Thu Mar 9 14:10:58 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux /sys/class/drm/card0/error
Mesa: v12.0.6-0ubuntu0.16.04.1 xserver-xorg-video-intel: 2:2.99.917+git20160325-1ubuntu1.2 Although I have the 16.04.1 Hardware Enablement (HWE) kernel, it appears there is an HWE version of xserver-xorg-video-intel I was not aware of, and which was not automatically selected with the HWE kernel. I will try that now. xserver-xorg-video-intel-hwe-16.04: 2:2.99.917+git20160706-1ubuntu1~16.04.1
Still crashes with the following: Dell XPS 13 9360 Dev Edition (Kabylake) Ubuntu 16.04 LTS w/ HWE stack running Kernel: 4.8.0-42-generic #45~16.04.1-Ubuntu SMP Thu Mar 9 14:10:58 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Mesa: v12.0.6-0ubuntu0.16.04.1 xserver-xorg-video-intel-hwe-16.04: 2:2.99.917+git20160706-1ubuntu1~16.04.1
I have the same problem and Portal 2 crashes quite quickly after a few minutes of playing. I had the problem with Ubuntu 16.10 and upgraded to 17.04 beta yesterday to check with newest kernels and Mesa. - Kernel: 4.10.0-11-generic - Platform: Skylake and Intel Corporation Iris Pro Graphics 580 (rev 09) - Mesa: 17.0.1-1ubuntu1 - xf86-video-intel: 2.99.917+git20160706-1ubuntu1 Unfortunately I had some trouble to get apitrace to run with portal2. I will try again tomorrow because the problem is easily reproducible. Uploading my card0/error file too.
Created attachment 130269 [details] /sys/class/drm/card0/error
I still didn't get apitrace to work because portal2 is 32-bit and the apitrace distributed with Ubuntu doesn't support 32-bit easily. If anyone by chance has a 32-bit system or 32-bit apitrace version here are the commands to trace portal2 properly (which was quite fidely because of the tons of library paths added): # in portal 2 folder $ cd $HOME/.steam/steam/steamapps/common/Portal 2 $ LD_LIBRARY_PATH=":$HOME/.steam/steam/steamapps/common/Portal 2:$HOME/.steam/steam/steamapps/common/Portal 2/bin:$HOME/.steam/ubuntu12_32:$HOME/.steam/ubuntu12_32/panorama:$HOME/.steam/ubuntu12_32/steam-runtime/amd64/lib:$HOME/.steam/ubuntu12_32/steam-runtime/amd64/lib/x86_64-linux-gnu:$HOME/.steam/ubuntu12_32/steam-runtime/amd64/usr/lib:$HOME/.steam/ubuntu12_32/steam-runtime/amd64/usr/lib/x86_64-linux-gnu:$HOME/.steam/ubuntu12_32/steam-runtime/i386/lib:$HOME/.steam/ubuntu12_32/steam-runtime/i386/lib/i386-linux-gnu:$HOME/.steam/ubuntu12_32/steam-runtime/i386/usr/lib:$HOME/.steam/ubuntu12_32/steam-runtime/i386/usr/lib/i386-linux-gnu:$HOME/.steam/ubuntu12_64:/lib:/lib/i386-linux-gnu:/lib/x86_64-linux-gnu:/usr/lib:/usr/lib/i386-linux-gnu:/usr/lib/i386-linux-gnu/mesa:/usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu/libfakeroot:/usr/lib/x86_64-linux-gnu/mesa:/usr/lib/x86_64-linux-gnu/mesa-egl:/usr/local/lib:" apitrace trace -a gl ./portal2_linux -game portal2 -steam
Created attachment 130635 [details] /sys/class/drm/card0/error I believe I got the same with a NUC Skull Canyon (Iris 580, Skylake), this time it's on Fedora 25 Kernel: 4.10.6-200.fc25.x86_64 Mesa: 13.0.4 (also tried 17.1.0-devel (git-31970ab)) Xorg: xorg-x11-drv-intel-2.99.917-26.20160929 Is an apitrace what is missing to help with this issue ? If so, let me know, I'll manually rebuild apitrace in 32bit.
Ok, made some progress, I have apitrace in 32bits working now. Initially to reproduce I was raising the video quality level, for the test I've set that to maximum, and it leads to a crash in mesa (will report seperatly later). I'll keep tweaking the settings until I can reproduce the hang. Meanwhile, if any one needs apitrace 32bit, I can provide instructions, or even the binary.
I have a trace now, though I notice replaying the trace does not trigger the hang. Let's hope it will at least be used full to understand what is going on. Hang with medium settings: http://people.collabora.co.uk/~nicolas/portal2_linux.hang.trace SHA256 a17185c9eeb322a73a9a4202a14ade9710a328bfe64837bd54bde11f1b41f28d 806M / 844585317 bytes
Seems this is still happening: Ubuntu 17.10 (Artful) Kernel: 4.13.0-16-generic #19-Ubuntu SMP Wed Oct 11 18:35:14 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Mesa: 17.2.2-0ubuntu1 xserver-xorg-video-intel: 2:2.99.917+git20170309-0ubuntu1 The crash seems to occur after a relative number of passages through portals with each passage. This makes the game more jittery over time until it just freezes and then crashes.
Created attachment 134873 [details] Latest dmesg
Created attachment 134874 [details] Latest /sys/class/drm/card0/error
Just noticed what I encountered today might be a separate issue... Hang on render ring, action: reset vs Hang on rcs0, action: reset
Render ring, RCS, and RCS0 are all interchangable names, I think they probably just changed intel_error_decode's naming convention.
Oh, thank you for clarifying that. Good to know :)
*** Bug 101389 has been marked as a duplicate of this bug. ***
*** Bug 100906 has been marked as a duplicate of this bug. ***
Some hangs affecting Portal 2 were fixed in: commit ee57b15ec764736e2d5360beaef9fb2045ed0f68 Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Wed Nov 29 16:22:42 2017 -0800 i965: Disable regular fast-clears (CCS_D) on gen9+ This partially reverts commit 3e57e9494c2279580ad6a83ab8c065d01e7e634e which caused a bunch of GPU hangs on several Source titles. To date, we have no clue why these hangs are actually happening. This undoes the final effect of 3e57e9494c227 and gets us back to not hanging. Tested with Team Fortress 2. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102435 Fixes: 3e57e9494c2279580ad6a83ab8c065d01e7e634e Cc: mesa-stable@lists.freedesktop.org The original report from January looks a bit different though, so there may be additional hangs. Please reopen and attach a new error state if you still experience issues with Mesa master or 17.3.0 once it's released. I've been testing it locally and it appears to be working fine. Thanks for the reports, and your patience!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.