Created attachment 141481 [details] Intel crash log read from /sys/class/drm/card0/error GPU hangs suddenly. It happened twice since last 1.5 months. Background applications keep running, such as spotify. DE is not responding at all for a short time (~30 secs) and it comes back with a crash dump. Linux 4.18.5-arch1-1-ARCH #1 SMP PREEMPT Fri Aug 24 12:48:58 UTC 2018 x86_64 GNU/Linux 00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 620 (rev 07)
Created attachment 141482 [details] 2. Intel crash log read from /sys/class/drm/card0/error
dmesg output when the GPU has crashed: [150585.225717] PPP generic driver version 2.4.2 [150585.486803] PPP BSD Compression module registered [150585.488316] PPP Deflate Compression module registered [151567.369673] [drm] GPU HANG: ecode 9:0:0x87f9fff9, in chrome [3234], reason: hang on rcs0, action: reset [151567.369674] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [151567.369674] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [151567.369675] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [151567.369675] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [151567.369675] [drm] GPU crash dump saved to /sys/class/drm/card0/error [151567.369769] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 [151570.350999] asynchronous wait on fence i915:gnome-shell[5831]/1:860d0 timed out [151570.351004] asynchronous wait on fence i915:gnome-shell[5831]/1:860d0 timed out [151575.257790] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 [151583.364426] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 [151591.257657] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 [151599.364228] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Today GPU hanged again with following DMESG outputs. (Log file attached) [95759.685883] [drm] GPU HANG: ecode 9:0:0x87f9fff9, in chrome [17795], reason: hang on rcs0, action: reset [95759.685885] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [95759.685885] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [95759.685885] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [95759.685885] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [95759.685886] [drm] GPU crash dump saved to /sys/class/drm/card0/error [95759.685971] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 [95767.787768] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 [95775.684168] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 [95783.787272] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 [95791.680429] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Created attachment 141525 [details] 3. Intel crash log read from /sys/class/drm/card0/error
I'll try to run applications which cause GPU hangs with dGPU using optirun.
hello. Provide please your mesa version in use (glxinfo -B) Also, could you please try to downgrade kernel to 4.17 or 4.15 - and try on it?
Created attachment 141531 [details] glxinfo -B output as required by Denis
Hello Denis, I attached the glxinfo as per you requested. Today I got another crash while resuming after suspend. Here is the only output I got from dmesg: [10081.255459] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun [10081.335164] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun gdm couldn't restore the session and it was like a login loop. I was able to see the login screen but the session was hanging after I enter my password. I tried to kill all open gdm sessions and restart the service but no luck. I got the same errors when I restarted the gdm 2 times. Then I rebooted and the problem was gone. As I mentioned in my previous comment I started using optirun with the apps, such as chrome, which was causing i915 to crash but nothing changed. Chrome was running on dGpu before I put the device into the suspend mode.
Here is another error regarding i915 after resuming from suspend: [31086.679530] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=15769 end=15770) time 646 us, min 1073, max 1079, scanline start 1056, end 1098
Today got another hang with following errors on dmesg. [66562.914106] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=1242982 end=1242983) time 1329 us, min 1073, max 1079, scanline start 1033, end 0 [87748.971802] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun This time, fortunately, gnome successfully restarted the session after 2 or 3 minutes of black screen, but all applications from previous session was closed.
Created attachment 141971 [details] 4. Intel crash log read from /sys/class/drm/card0/error
Hi, I'm having the exact same issue on my laptop (Core i7 Sandybridge with HD3000), which runs fine and clean. It occurs almost everytime I play a game like 0ad or supertuxkart. Everything is fine until the whole desktop freezes during 30 seconds while music is still playing. Several seconds later, everything is back to normal. Sometimes, the game just crashes, but the rest of the desktop always ends up showing as if nothing happened... I tried to reproduce with other intensive apps with no success. My dmesg shows the same messages as yours. There's also an Intel crash log (see attachments).
Created attachment 142224 [details] Error in dmesg
Created attachment 142225 [details] Crash log found at /sys/class/drm/card0/error
Created attachment 142226 [details] glxinfo -B
hi. How long do you play on the game for the hang? I played about 1 hour on "supertuxkart" - nothing( I used same mesa and kernel version with you, 18.2.3 SNB CPU (Intel Core i5-2520M Intel® HD Graphics 3000) Any additional information would be helpful. Also if it is stable on your PC, could you make an apitrace? https://github.com/apitrace/apitrace
Hi Denis, Thanks for your message. Well, the GPU sometimes hangs a few seconds or a few minutes after I launch supertuxkart. It happens faster if some other application that needs the GPU is running. For example, with SMPlayer (mpv). While the computer is frozen, I can still hear SMPlayer playing. There is no overheating (I fully cleaned my laptop last summer, even changed the thermal paste). CPU tops at 65ºC under load, and 35-43ºC when idle. I tried a few things : disabling IOMMU and other stuff in the kernel parameters, recompiling a custom kernel for Arch, creating a new user profile... Nothing helped. I tried in latest Debian testing: same result. I'm starting to suspect VA-API or any GL/DRM related stuff, so I'll keep on testing but I'm a bit lost. By the way, I always use KDE Plasma. Unfortunately, I have reinstalled the whole computer and now use Gentoo. But the issue also occurred when I tried before trying in Arch. So I'll post the result of the command of the apitrace when it happens in Gentoo (the kernel and Mesa versions are different). Cheers, Chris
(In reply to Denis from comment #16) Hi Denis, I ran a few tests today in Gentoo. Supertuxkart still crashes with the same kind of messages, although a bit different. I have attached the apitrace dump, dmesg messages and the error in /sys/class/drm/card0. Thanks. Chris
Created attachment 142304 [details] Crash log found at /sys/class/drm/card0/error (31/10/2018)
Created attachment 142305 [details] Supertuxkart i915/i965 Apitrace dump
Created attachment 142306 [details] Error in dmesg (31/10/2018)
I have just realized that the supertuxkart apitrace is 700MB big... I have uploaded the trace to my Google Drive at : https://drive.google.com/file/d/18sVMRL7VpWvh8-1KzBHODLR-iEfIFPS_/view?usp=sharing
Comment on attachment 142305 [details] Supertuxkart i915/i965 Apitrace dump This is only the first page of the supertuxkart apitrace dump. Full apitrace downloadable here: https://drive.google.com/file/d/18sVMRL7VpWvh8-1KzBHODLR-iEfIFPS_/view?usp=sharing
thank you, will check. Forgot to mention the big size of the apitraces.
ok... looks like I reproduced the issue with provided apitrace. But I haven't ideas - how... I mean, that issue is not straight and stable. I launched browser with an openGL rendering on it (on of the available demo's) - and then provided trace - maybe for 3 or 5 times. And, according to the dmesg, I got 1 hang: [ 7991.390603] powercap intel-rapl:0: package locked by BIOS, monitoring only [ 8214.539084] perf: interrupt took too long (3142 > 3137), lowering kernel.perf_event_max_sample_rate to 63500 [ 8360.400165] [drm] GPU HANG: ecode 6:0:0x85fffffc, in glretrace [20413], reason: hang on rcs0, action: reset [ 8360.400167] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 8360.400168] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 8360.400168] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 8360.400169] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 8360.400170] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 8360.400214] i915 0000:00:02.0: Resetting chip for hang on rcs0 [ 8456.495729] intel_powerclamp: Start idle injection to reduce power Investigating
I also re-checked exist bugs for SNB, and looks like this bug also has the same roots https://bugs.freedesktop.org/show_bug.cgi?id=102379 At least, ecode is the same with our's
(In reply to Denis from comment #26) Thanks Denis, if there is something else that you need me to report, please tell me. Chris
BTW, Chris, I checked logs one more time, and I think, your crash is differ from topic starter... You have SNB, he has KBL. And error codes are also different. So I think need to create separate bug report.
My GPU freezes after opening some page with many images (20+) in Chromium. /sys/class/drm/card0/error attached mesa: 19.0.1-1ubuntu1 mike ~$ uname -a Linux delorean 5.0.0-7-generic #8-Ubuntu SMP Mon Mar 4 16:27:25 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux mike ~$ glxinfo -B name of display: :0 display: :0 screen: 0 direct rendering: Yes Extended renderer info (GLX_MESA_query_renderer): Vendor: Intel Open Source Technology Center (0x8086) Device: Mesa DRI Intel(R) Ironlake Mobile (0x46) Version: 19.0.1 Accelerated: yes Video memory: 1536MB Unified memory: yes Preferred profile: compat (0x2) Max core profile version: 0.0 Max compat profile version: 2.1 Max GLES1 profile version: 1.1 Max GLES[23] profile version: 2.0 OpenGL vendor string: Intel Open Source Technology Center OpenGL renderer string: Mesa DRI Intel(R) Ironlake Mobile OpenGL version string: 2.1 Mesa 19.0.1 OpenGL shading language version string: 1.20 OpenGL ES profile version string: OpenGL ES 2.0 Mesa 19.0.1 OpenGL ES profile shading language version string: OpenGL ES GLSL ES 1.0.16 mike ~$ cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=19.04 DISTRIB_CODENAME=disco DISTRIB_DESCRIPTION="Ubuntu Disco Dingo (development branch)" mike ~$ dmesg | tail [288446.586620] i915 0000:00:02.0: Resetting chip for hang on rcs0 [288454.590794] i915 0000:00:02.0: Resetting chip for hang on rcs0 [288462.586720] i915 0000:00:02.0: Resetting chip for hang on rcs0 [288470.586669] i915 0000:00:02.0: Resetting chip for hang on rcs0 [288478.586768] i915 0000:00:02.0: Resetting chip for hang on rcs0 [288486.586821] i915 0000:00:02.0: Resetting chip for hang on rcs0 [288496.570617] i915 0000:00:02.0: Resetting chip for hang on rcs0 [288506.586681] i915 0000:00:02.0: Resetting chip for hang on rcs0 [288514.586677] i915 0000:00:02.0: Resetting chip for hang on rcs0 [288522.586699] i915 0000:00:02.0: Resetting chip for hang on rcs0
Created attachment 143860 [details] /sys/class/drm/card0/error from Mike
Mike create please separate issue, because your case (steps) and HW/SW quite different from current problem.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1757.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.