Bug 108510 - [snb] quakespasm triggers a reproducible GPU hang on Sandy Bridge
Summary: [snb] quakespasm triggers a reproducible GPU hang on Sandy Bridge
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
Depends on:
Reported: 2018-10-22 01:50 UTC by bugs
Modified: 2019-09-25 19:14 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:

4.18.16.a-1-hardened-dmesg.txt (139.89 KB, text/plain)
2018-10-22 01:53 UTC, bugs
4.18.16.a-1-hardened-error.txt.bz2 (22.59 KB, application/octet-stream)
2018-10-22 01:55 UTC, bugs
4.18.16.a-1-hardened-journalctl.txt (242.90 KB, text/plain)
2018-10-22 01:57 UTC, bugs
4.19.0-1035f22af3e97-dmesg.txt (220.26 KB, text/plain)
2018-10-22 01:58 UTC, bugs
4.19.0-1035f22af3e97-error.txt.bz2 (20.49 KB, application/octet-stream)
2018-10-22 02:00 UTC, bugs
4.19.0-1035f22af3e97-journalctl.txt (238.45 KB, text/plain)
2018-10-22 02:01 UTC, bugs
grey-triangles-1.jpg (311.38 KB, image/jpeg)
2018-10-22 02:02 UTC, bugs
grey-triangles-2.jpg (608.90 KB, image/jpeg)
2018-10-22 02:03 UTC, bugs
grey-triangles-3.jpg (622.65 KB, image/jpeg)
2018-10-22 02:04 UTC, bugs
unfilled-triangles.jpg (428.87 KB, image/jpeg)
2018-10-22 02:10 UTC, bugs
denys_artifacts (1.40 MB, image/png)
2018-10-24 15:45 UTC, Denis

Description bugs 2018-10-22 01:50:56 UTC
Bug description:

On my machine, playing quakespasm for long enough always results in a GPU hang. If using a current stable kernel (4.18.16), the game always appears to crash and exit within approximately 30 seconds of the hang, during which time the system is completely unresponsive and the rendered frame visibly changes only two or three times, with the audio buffer repeating itself. If using drm-tip, the kernel always seems to be able to recover from the hang within approximately 10 seconds, after which the game continues as normal but with some rendering artifacts. The artifacts manifest as:

* certain triangles consistently being filled with a shade of grey instead of a texture
* certain triangles being effectively transparent, rendering the scenery behind them visible

Once it has happened, the exact triangles that are affected remain the same, and the artifacts can thus be demonstrated by moving the player to a particular position and/or using the mouse to carefully orient the player's view. In other words, the artifacts are only visible from specific vantage points.

Sometimes, the hang is preceded by a precipitous frame rate drop - my eyes tell me from ~60 to under 30 - which can go on for minutes before the hang occurs. I have found this to be more often the case while using 4.18.16, as opposed to drm-tip. Sometimes, albeit very rarely, the hang occurs almost immediately after beginning a game.

After drm-tip recovers from the hang, it is possible that another one occurs before I choose to exit the game.

System environment:

-- chipset: Intel 6 Series / Sandy Bridge
-- system architecture: 64-bit
-- xf86-video-intel: 1:2.99.917+847+g25c9a2fc-1
-- xserver: 1.20.2-1
-- mesa: 18.2.2-1
-- libdrm: 2.4.96-1
-- kernel: 4.18.16.a-1-hardened
-- Linux distribution: Arch
-- Machine or mobo model: Lenovo Thinkpad X220 (i7-2640M)
-- Display connector: DisplayPort (via UltraBase)

Reproducing steps:

1) Launch quakespasm-0.93.1
2) Play until hang occurs

Additional info:

I am using Gnome 3.30 without Wayland.

I have attached the captured error along with dmesg and journalctl output for an incident that occurred while running 4.18.16 and another while running drm-tip. The journalctl dump includes "quake.desktop" messages in addition to "kernel", so as to provide a little more context. In this particular case, it took a relatively long time for the (one) hang to occur with drm-tip, but this is atypical; normally it happens quickly.

I have also attached some screenshots that show some some of the rendering artifacts in the event that the game is able to continue running after the hang.

I mentioned that the observable effects of the hang can last up to 30 seconds or so while running a stable kernel. Interestingly, the few visible changes that occur during this time alway seem to follow a common pattern:

1) a new frame is rendered normally
2) a new frame is rendered abnormally, with numerous triangles unfilled or only filled with a grey colour
3) a new frame is rendered, with the quakespasm (tilde) console being inexplicably visible
4) the game crashes to desktop

The laptop was connected to an UltraBase docking station, with a DisplayPort link to a 1920x1200 panel. However, the bug manifests while undocked and running at the native resolution of 1366x768.

Sometimes, the laptop is subject to CPU throttling. As concerns the frame rate drops that sometiems precede the hang, I am not convinced that this is a significant factor. My reasoning is that quakespasm is a relatively undemanding engine, with almost no bells or whistles above and beyond the original (ancient) GLQuake engine. The machine can run somewhat more demanding games without comparable performance problems.
Comment 1 bugs 2018-10-22 01:53:39 UTC
Created attachment 142126 [details]
Comment 2 bugs 2018-10-22 01:55:22 UTC
Created attachment 142127 [details]

/sys/class/drm/card0/error capture under 4.18.16.
Comment 3 bugs 2018-10-22 01:57:02 UTC
Created attachment 142128 [details]
Comment 4 bugs 2018-10-22 01:58:21 UTC
Created attachment 142129 [details]
Comment 5 bugs 2018-10-22 02:00:00 UTC
Created attachment 142130 [details]

/sys/class/drm/card0/error capture under drm-tip.
Comment 6 bugs 2018-10-22 02:01:28 UTC
Created attachment 142131 [details]
Comment 7 bugs 2018-10-22 02:02:43 UTC
Created attachment 142132 [details]

Example of post-hang artifacts.
Comment 8 bugs 2018-10-22 02:03:26 UTC
Created attachment 142133 [details]

Example of post-hang artifacts.
Comment 9 bugs 2018-10-22 02:04:13 UTC
Created attachment 142134 [details]

Example of post-hang artifacts.
Comment 10 bugs 2018-10-22 02:10:10 UTC
Created attachment 142135 [details]

Example of post-hang artifacts.
Comment 11 Denis 2018-10-23 11:41:52 UTC
hi, thanks for the report. I tried to reproduce the issue on SNB with 18.2.2 mesa (built from source).
kernel 4.18.16 too.
OS - ubuntu 18.10
Screen - native for laptop (I used max resolution in the game + fullscreen + vsync enabled).

My playing session was about 1.30h in total, with few re-launches. For me it worked perfect. Later I will try more, but, it would be great if you try to get an apitrace of the game with a hang:
Comment 12 Denis 2018-10-24 14:59:55 UTC
short update. Connected my monitor via display port.
What I found - a lot of random hangs not related to the game (in desktop env... mostly).
The closed thing I got during crashes/freezes in the game, this:

>Shutting down SDL sound
>X Error of failed request:  BadValue (integer parameter out of range for >operation)
>  Major opcode of failed request:  153 (XFree86-VidModeExtension)
>  Minor opcode of failed request:  10 (XF86VidModeSwitchToMode)
>  Value in failed request:  0x2c00014
>  Serial number of failed request:  780
>  Current serial number in output stream:  782

In window mode - everything works fine. In full-screen - those things happened. After some time even in full-screen mode it started working fine.

So, could you try windows mode and check the game?
Comment 13 bugs 2018-10-24 15:39:13 UTC
Hi Denis. I did some quick testing, having disabled the fullscreen mode in the game. Performance was notably reduced, presumably due to no longer being unredirected, but no hangs occurred. After subsequently re-enabling the fullscreen mode, a hang occurred within a minute of play.

I was surprised to find that the artifacts I previously mentioned became visible as soon as I switched off the full-screen mode. I shall carry out some more in-depth testing this weekend, including the use of the apitrace tool that you kindly mentioned.

For what it's worth, I don't have any apparent issues with the use of DisplayPort in my operating environment, as long as I avoid Wayland. Docking and undocking works, other unredirected/fullscreen applications seem to work reliably, Gnome's compositor seems to behave etc.
Comment 14 Denis 2018-10-24 15:44:35 UTC
oh! I forgot to mention that during those freezes and hangs, after playing with a screen resolution, I had these artifacts at the screen (display port connection). I never saw them on built-in display (see attachments)
Comment 15 Denis 2018-10-24 15:45:50 UTC
Created attachment 142171 [details]
Comment 16 Denis 2018-11-16 12:25:44 UTC
hi. Do you have any news on this issue? Did you have possibility to get an apitrace for the hang?
Comment 17 GitLab Migration User 2019-09-25 19:14:41 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1765.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.