Bug 104892 - [gen4] GPU hang in KDE plasmashell [openSUSE 42.3]
Summary: [gen4] GPU hang in KDE plasmashell [openSUSE 42.3]
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-31 23:09 UTC by Juha Vuori
Modified: 2018-06-21 18:57 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
system log (3.98 KB, text/plain)
2018-01-31 23:09 UTC, Juha Vuori
Details
GPU crash dump (853.23 KB, text/plain)
2018-01-31 23:10 UTC, Juha Vuori
Details
Console log of the linux system for 7th-Feb covering GPU hang time (56.90 KB, text/plain)
2018-02-07 19:21 UTC, Juha Vuori
Details
GPU crash dump of the linux system for 7th-Feb GPU hang time (697.16 KB, text/plain)
2018-02-07 19:22 UTC, Juha Vuori
Details

Description Juha Vuori 2018-01-31 23:09:00 UTC
Created attachment 137097 [details]
system log

During the last months, graphical KDE sessions in my old HP server running openSUSE 42.3 have started to crash occasionally for GPU problem.
It seems to be triggered more often when a new rich graphical program, like Firefox, is opened (by rich I mean many graphical objects visible, like menus, buttons, icons etc.).
The GUI screen just freezes totally: mouse is not moving and keystokes won't get thru. Then, some kind of recovery operation starts: the screen goes black for a couple of seconds and kind of restarts. Sometimes the restarted screen is fully functional again, sometimes the screen is still frozen and the restart operation happens again and again between around 1 minutes.
Sometimes the screen is never recovered, but today this occurred so that the screen returned functional after restarts and I was able to capture some debug data, which are attached:
- HP.messages.20180201.txt contains /var/log/messages data covering the problem window time
- HP.20180201.sys.class.drm.card0.error contains /sys/class/drm/card0/error (GPU crash dump)
Some info of the hardware/software:
The machine is HP model dc5700 Microtower, so pretty old and modest graphical card.
One thing I changed around the time the problem started to happen was that I added and extension wire to the VGA connection (PC to monitor), but I am quite sure the problem is not related to that.
SW info:

HP:~ # uname -a
Linux HP 4.4.104-39-default #1 SMP Thu Jan 4 08:11:03 UTC 2018 (7db1912) x86_64 x86_64 x86_64 GNU/Linux

Can you find any root cause for the problem by analysing the data?
Comment 1 Juha Vuori 2018-01-31 23:10:03 UTC
Created attachment 137098 [details]
GPU crash dump
Comment 2 Kenneth Graunke 2018-02-01 08:15:17 UTC
What version of Mesa are you using?  Can you try the latest?

GPU crash dumps generated on Kernel < 4.13 aren't quite as useful for debugging, unfortunately.  I'm not seeing anything suspect in the one you provided at a quick glance.
Comment 3 Juha Vuori 2018-02-01 08:48:40 UTC
Thanks for prompt reply!
Mesa v. 17.0.5-176.1, latest delivered within the distro. In detail:
Information for package Mesa:
-----------------------------
Repository     : Main Update Repository
Name           : Mesa
Version        : 17.0.5-176.1
Arch           : x86_64
Vendor         : openSUSE
Installed Size : 16.6 MiB
Installed      : Yes (automatically)
Status         : up-to-date
Source package : Mesa-17.0.5-176.1.src
Summary        : System for rendering interactive 3-D graphics

If I saw correctly, the latest is v. 17.3.3. from 13th Jan 2018.
If you see that might include relevant fixes for the problem, I could try that. It will take some time as I am a bit rusty nowadays in going "outside of the distro", so a reference to the best "How to manually deploy a Mesa release in a linux system" manual would be useful..
Comment 4 Juha Vuori 2018-02-04 17:24:05 UTC
I upgraded my openSUSE 42.3 system with all updates included in Xorg build repo
http://download.opensuse.org/repositories/X11:/XOrg/openSUSE_Leap_42.3/
delivered by openSUSE, which upgraded Mesa from 17.0.5-176.1 to 17.1.6-720.1.

The first two KDE GUI working sessions worked flawlessly.
However, I don't think it proves that the problem is completely corrected because it has been happening so sporadically (maybe ~30% of KDE sessions have suffered from the problem). I can report later after using the system a bit more if the problem seems to have disappeared.

Thanks,
Juha
Comment 5 Juha Vuori 2018-02-07 19:21:48 UTC
Created attachment 137219 [details]
Console log of the linux system for 7th-Feb covering GPU hang time
Comment 6 Juha Vuori 2018-02-07 19:22:37 UTC
Created attachment 137220 [details]
GPU crash dump of the linux system for 7th-Feb GPU hang time
Comment 7 Juha Vuori 2018-02-07 19:27:46 UTC
After running cleanly for a couple of days' sessions, GUI hang did recur today 7th Feb at 19:40:50, even with the updated Mesa + other Xorg upgrades.
Now this seems to be originated in the Xorg process:
GPU HANG: ecode 4:0:0xfde5fafd, in Xorg ...
whereas the previous hang happened in plasmashell.
Anything new conclusions to be made by the messages or the new GPU crash dump?
Meanwhile, I plan to upgrade Mesa further to its newest level available.
Thanks, Juha
Comment 8 Juha Vuori 2018-06-21 17:20:59 UTC
After distro version upgrade openSUSE 42.3 -> 15.0, the problem has not occurred any more. No other changes in the system but the OS upgrade.
Mesa version is now 18.0.2.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.