Bug 111100 - i915 GPU hang with kwin_x11
Summary: i915 GPU hang with kwin_x11
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: high normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged, ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2019-07-10 14:08 UTC by S.
Modified: 2019-07-16 23:08 UTC (History)
2 users (show)

See Also:
i915 platform: IVB
i915 features: GPU hang


Attachments
/sys/class/drm/card0/error (31.55 KB, text/plain)
2019-07-10 14:08 UTC, S.
no flags Details
Opening the popup calendar widget in Plasma (277.82 KB, image/png)
2019-07-10 14:09 UTC, S.
no flags Details
A popup notification (Firefox browser in background) (21.19 KB, image/png)
2019-07-10 14:09 UTC, S.
no flags Details

Description S. 2019-07-10 14:08:16 UTC
Created attachment 144749 [details]
/sys/class/drm/card0/error

Hi there, not sure if you want duplicate reports or not, because I suspect I am experiencing bug #111014.

I started experiencing this major bug in one of the openSUSE Tumbleweed (x86-64) updates within the past couple of weeks. It's hard to say for sure what is the root cause, because there were kernel updates (currently at 5.1.16), Plasma shell and KWin updates (currently 5.16.2), KDE Framework updates (currently 5.59.0), and Mesa / Mesa-dri updates (currently at 19.1.1-223.1).

Very frequently (several times within an hour) I notice that the Plasma panel clock stops updating, and the App Menu and Calendar popups stop rendering (just the outline), and notification popups show black with a copy of the panel underneath. When I restart `plasmashell` it fixes it for a while, but later starts glitching again. It's not a complete system freeze. It mainly seems to be the panel and notifications. Applications continue to work, as does Alt+Tab switching. But clicking on running applications in the panel does nothing. Also files on my ~/Desktop do not update after adding or removing them when this bug appears. Also Krunner does continue to work, which lets me kill and restart plasmashell.

This is on a Thinkpad T530 laptop with integrated Intel graphics, running the X modesetting drivers. I also tried switching to the Xorg Intel driver, but it still behaves the same. I tried switching the KWin compositor from X11 rendering to openGL, but it made no difference.

I somehow suspect it has something to do with system load, especially RAM situations, since yesterday was the first day after the major updates that I was running my Windows VM in VirtualBox. I use zRAM and sometimes hit the memory pretty hard. It seemed to be after high RAM usage situations, even after freeing up plenty of RAM, that this glitch started happening. But I'm not 100% certain.

I switched to an older kernel 5.1.7 that I still have installed, and I haven't experienced this bug since doing so. That would appear to correlate with this bug starting to happen when I upgraded from kernel 5.1.7 to 5.1.15 (skipped some Tumbleweed updates in between).

This is the `dmesg` error:
----------------------
[Tue Jul  9 14:03:12 2019] i915 0000:00:02.0: GPU HANG: ecode 7:1:0xfffffffe, in kwin_x11 [1478], hang on rcs0
[Tue Jul  9 14:03:12 2019] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[Tue Jul  9 14:03:12 2019] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[Tue Jul  9 14:03:12 2019] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[Tue Jul  9 14:03:12 2019] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[Tue Jul  9 14:03:12 2019] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[Tue Jul  9 14:03:12 2019] i915 0000:00:02.0: Resetting chip for hang on rcs0
Comment 1 S. 2019-07-10 14:09:07 UTC
Created attachment 144750 [details]
Opening the popup calendar widget in Plasma
Comment 2 S. 2019-07-10 14:09:28 UTC
Created attachment 144751 [details]
A popup notification (Firefox browser in background)
Comment 3 Chris Wilson 2019-07-10 17:22:37 UTC
The fix for '014 will be in v5.1.17 so best to wait for that before delving in too deep.
Comment 4 S. 2019-07-12 15:26:47 UTC
OK, sorry I can't test any unreleased kernels for TW, since I need VirtualBox for my work and at any rate this bug only appears to happen when I hit the RAM really hard with VirtualBox + other X11 apps.

I just ran into this bug again, but unfortunately it's even more confusing now. I'm currently booted into the 5.1.7-1 kernel, and there is absolutely nothing from today in `dmesg`. Rather, it mentioned several times: "i915 0000:00:02.0: Resetting chip for hang on rcs0" from *yesterday*, but I definitely didn't have any trouble with Plasma/Kwin yesterday.
Comment 5 S. 2019-07-15 22:43:04 UTC
I tried disabling zRAM and just using a regular swap file. It didn't make any difference, and it kept glitching on me really bad throughout the workday.

So I just found a repository with VirtualBox KMP packages that apparently syncs with the latest Kernel Stable repository:
- https://download.opensuse.org/repositories/Kernel:/stable:/KMP/standard
- https://download.opensuse.org/repositories/Kernel:/stable/standard/
So I'm now booted into that, and we'll see how it behaves throughout the workday tomorrow.
Comment 6 S. 2019-07-16 23:08:11 UTC
OK, after a day of hard work with kernel 5.2.1-2 I haven't had a single problem with Plasma. So I think I can say with confidence that this problem is fixed. Thanks to everyone for their help.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.