Bug 101746 - radeonsi: Kernel syscall lockup caused probably by GPU crash
Summary: radeonsi: Kernel syscall lockup caused probably by GPU crash
Status: RESOLVED WORKSFORME
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: 17.1
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-10 19:10 UTC by Tobias Auerochs
Modified: 2018-04-03 05:42 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg with lockup warning at the end (87.26 KB, text/plain)
2017-07-10 19:10 UTC, Tobias Auerochs
Details

Description Tobias Auerochs 2017-07-10 19:10:05 UTC
Created attachment 132600 [details]
dmesg with lockup warning at the end

An amdgpu syscall, called by plasmashell, appears to deadlock randomly and freeze X.org completely. Several graphics processes, plasmashell and X.org are left stuck in D-State. Everything else continues to operate correctly, including audio, networking, etc..

The issue seems to appear more frequently whilst running games, although I am unable to find any particular pattern to it.

Running Arch Linux with a custom compiled linux-zen kernel, Mesa 17.1.4 with Radeon RX 480. The issue has been around for a while and affects all 17.1.x releases so far, however I am not certain for how long it has been around. The issue is way too rare though for me to bisect the exact cause however.

Once I get another freeze I will see if I can get any userspace thread dumps from plasmashell and possibly other processes as well as any logs containing something interesting. (This may take anywhere from a few days to weeks due to the random nature of the lockup)

(Also, just to clarify, this is not directly related to https://bugs.freedesktop.org/show_bug.cgi?id=101294, this occured before and after that issue being fixed.)

Originally reported at, as this manifests itself in the kernel first: https://bugzilla.kernel.org/show_bug.cgi?id=196291
Comment 1 Christian König 2017-07-10 19:20:45 UTC
For reference, checking on the fence status it looks like one of the SDMA engines is hung.
Comment 2 Tobias Auerochs 2017-07-14 00:47:03 UTC
Sadly could not find any useful information out of various process logs, in particular Xorg.0.log contains nothing error related. The only thing that is somewhat interesting is that some processes that ought to use OpenGL do not end up stuck in D state, whilst it appears only specifically plasmashell and Xorg get stuck. (Obviously no graphical output works on any other processes either anymore, nor can one switch virtual terminals anymore with Xorg being dead.)

If anyone knows any specific things that may be useful to debug, I can see if anything helps.
Comment 3 Tobias Auerochs 2017-09-22 01:24:31 UTC
I have not encountered the random hang for a while now (specifically after turning off slideshow in plasmashell) and am unaware of any reliable way to reproduce this exact issue, so not entirely sure if this report being left open is of much use.

However recently I also get a GPU crash by running Overwatch in wine (using a patched version), but as I said, I am not sure if this is even remotely related (besides both causing a GPU crash) at this time.
Comment 4 Timothy Arceri 2018-04-03 05:42:22 UTC
(In reply to Tobias Auerochs from comment #3)
> I have not encountered the random hang for a while now (specifically after
> turning off slideshow in plasmashell) and am unaware of any reliable way to
> reproduce this exact issue, so not entirely sure if this report being left
> open is of much use.
> 

In that case I'll close the bug. Thanks.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.