Created attachment 145249 [details] /sys/class/drm/card0/error [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [drm] GPU crash dump saved to /sys/class/drm/card0/error
More Information: Hardware: Thinkpad X220 OS: Arch Linux Windowmanager: Sway (Wayland + XWayland) Additional information: The CPU was all the time near 97 degrees C + 100% usage due to playing games with the spring RTS engine. Here is more dmesg output: [13980.504547] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 1) [13980.504548] mce: CPU0: Core temperature above threshold, cpu clock throttled (total events = 1) [13980.504550] mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 1) [13980.504553] mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 1) [13980.504571] mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 1) [13980.504572] mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 1) [13980.505517] mce: CPU0: Core temperature/speed normal [13980.505518] mce: CPU1: Core temperature/speed normal [13980.505520] mce: CPU2: Package temperature/speed normal [13980.505521] mce: CPU3: Package temperature/speed normal [13980.505522] mce: CPU1: Package temperature/speed normal [13980.505523] mce: CPU0: Package temperature/speed normal [14280.506447] mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 34472) [14280.506448] mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 34472) [14280.506460] mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 34472) [14280.506461] mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 34472) [14324.705510] mce: CPU3: Core temperature above threshold, cpu clock throttled (total events = 2838) [14324.705511] mce: CPU2: Core temperature above threshold, cpu clock throttled (total events = 2838) [14324.707529] mce: CPU2: Core temperature/speed normal [14324.707530] mce: CPU3: Core temperature/speed normal [14580.502402] mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 100030) [14580.502404] mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 100030) [14580.502420] mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 100030) [14580.502422] mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 100030) [14591.465253] i915 0000:00:02.0: GPU HANG: ecode 6:1:0xfffffffe, in spring-main [10578], hang on rcs0 [14591.465258] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [14591.465259] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [14591.465260] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [14591.465260] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [14591.465262] [drm] GPU crash dump saved to /sys/class/drm/card0/error [14591.466173] i915 0000:00:02.0: Resetting chip for hang on rcs0 [14880.500328] mce: CPU1: Package temperature/speed normal [14880.500329] mce: CPU0: Package temperature/speed normal [14880.500342] mce: CPU2: Package temperature/speed normal [14880.500344] mce: CPU3: Package temperature/speed normal [14988.455723] i915 0000:00:02.0: Resetting chip for hang on rcs0 [15180.454085] i915 0000:00:02.0: Resetting chip for hang on rcs0 [15180.498204] mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 213447) [15180.498206] mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 213447) [15180.498208] mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 213447) [15180.498210] mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 213447) [15180.532334] mce: CPU0: Core temperature above threshold, cpu clock throttled (total events = 81692) [15180.532336] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 81692) [15180.533319] mce: CPU1: Core temperature/speed normal [15180.533323] mce: CPU0: Core temperature/speed normal [15480.496118] mce: CPU1: Package temperature/speed normal [15480.496120] mce: CPU0: Package temperature/speed normal [15480.496144] mce: CPU3: Package temperature/speed normal [15480.496146] mce: CPU2: Package temperature/speed normal [15480.529257] mce: CPU0: Core temperature above threshold, cpu clock throttled (total events = 125082) [15480.529264] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 125082) [15480.530242] mce: CPU1: Core temperature/speed normal [15480.530243] mce: CPU0: Core temperature/speed normal [15588.558999] i915 0000:00:02.0: Resetting chip for hang on rcs0 [15590.478834] Asynchronous wait on fence i915:sway[1328]:163300 timed out (hint:intel_atomic_commit_ready+0x0/0x54 [i915]) [15596.452130] i915 0000:00:02.0: Resetting chip for hang on rcs0 [15604.558768] i915 0000:00:02.0: Resetting chip for hang on rcs0
Hi Christian, what was the game actually? Also, provide please mesa version and reproducibility - it was only one time or stable which you can reproduce?
Was the system seeing any swap used at the time? There's been a couple of Sandybridge bugs that correlate with swapping, hence checking if this might fit that pattern.
No there was no swapping involved. There was enough RAM and I have no swap partition nor a swap file. The game is called `spring1944` but it also happens with a spring unrelated game called 0ad. Mesa version is: 19.1.5-1 Basically it happens when ever I get high CPU load + graphical applications. My guess is that the CPU throttles down and somehow the GPU get confused about this. No idea if this is related. It happens everytime during a game. But it's difficult to reproduce, because the screen freezes are not logged, only when the game crashes ultimately due to GPU hang there is actually something like a log in dmesg. I tried capturing a GPU backtrace with spring long ago, but it generates over 5GB data and if just screen freezes occur (and no whole game crash) there is nothing logged about it. I guess this is also related to: https://bugs.freedesktop.org/show_bug.cgi?id=102379 and https://bugs.freedesktop.org/show_bug.cgi?id=110971 Sorry, I just realized now that I opened so much bug reports for the same thing. Feel free to close two of them, but as you can see the problem exists since 2017.. I just forgot that I have reported this already. Sorry.
oh, yes, you already reported this issue https://bugs.freedesktop.org/show_bug.cgi?id=102379 >Basically it happens when ever I get high CPU load + graphical applications. >My guess is that the CPU throttles down and somehow the GPU get confused about >this. No idea if this is related. This is 100% related. When I tried to reproduce this issue with your apitrace or with a game, I got 1 or 2 hangs, exactly when I loaded cpu/gpu as match as I could. But reproducibility is so bad, that it is impossible to debug. Also I am suspect all these issues to have the same root-cause (but couldn't proof): BZ id's 105288,105219,105116,104180,104044,103745,101822,101604,100396,100103,99864,93402,97271,107866,102379,106495,104822,93842 I suggest to close current one as duplicate of mentioned higher ticket.
(In reply to Denis from comment #5) > oh, yes, you already reported this issue > https://bugs.freedesktop.org/show_bug.cgi?id=102379 > > > >Basically it happens when ever I get high CPU load + graphical applications. > >My guess is that the CPU throttles down and somehow the GPU get confused about >this. No idea if this is related. > > This is 100% related. When I tried to reproduce this issue with your > apitrace or with a game, I got 1 or 2 hangs, exactly when I loaded cpu/gpu > as match as I could. But reproducibility is so bad, that it is impossible to > debug. > > Also I am suspect all these issues to have the same root-cause (but couldn't > proof): > > BZ id's > 105288,105219,105116,104180,104044,103745,101822,101604,100396,100103,99864, > 93402,97271,107866,102379,106495,104822,93842 > > I suggest to close current one as duplicate of mentioned higher ticket. Denis, Thanks for your assessment. Resolved as duplicate. *** This bug has been marked as a duplicate of bug 102379 ***
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.