Bug 111922 - amdgpu fails to resume on 5.2 kernel [regression]
Summary: amdgpu fails to resume on 5.2 kernel [regression]
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: DRI git
Hardware: Other All
: not set not set
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-10-08 15:21 UTC by Pierre Ossman
Modified: 2019-11-19 09:57 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Description Pierre Ossman 2019-10-08 15:21:24 UTC
The upgrade to the 5.2 series of the kernel has unfortunately made my system unusable. After resuming the system from suspend the display goes green and is unresponsive locally. It is still up though so I was able to reach it via the network and get some logs out of it.

Tested bad kernels:

5.2.15-200.fc30
5.2.9-200.fc30

Known good kernels:

5.0.17-300.fc30
5.1.18-300.fc30
5.1.20-300.fc30

Found some other bug reports both here and with Red Hat with similar warnings in dmesg, but they seem to be failing right away and not after a suspend. So I'm not sure if it's the same issue.

But report at Red Hat:

https://bugzilla.redhat.com/show_bug.cgi?id=1754252

(includes dmesg)
Comment 1 Alex Deucher 2019-10-08 18:57:09 UTC
Can you bisect?
Comment 2 Pierre Ossman 2019-10-09 05:15:09 UTC
Not easily unfortunately as I've only been using Fedora kernels, so I don't have a build environment set up.
Comment 3 Pierre Ossman 2019-11-05 17:38:40 UTC
Issue still remains with 5.4.0-rc6 unfortunately. :/

Do you have any patches or commits I could try reverting? It's much easier building a test RPM here.

It should be something during the 5.2.0 merge window. Anything likely from that set?
Comment 4 Alex Deucher 2019-11-05 19:55:42 UTC
Nothing comes to mind.
Comment 5 Pierre Ossman 2019-11-08 13:49:15 UTC
That's a shame.

I did find bug 111811, which looks very similar. Through that I found this patch:

https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg40304.html

Unfortunately it does not solve the issue here. :/


Have you checked if you can reproduce this in a 2200G in your end? Or other Raven Ridge APUs?
Comment 6 Pierre Ossman 2019-11-08 14:01:45 UTC
Hmmm... I did get this from that patch though:

> [   98.391016] amdgpu 0000:38:00.0: GPU mode1 reset
> [   98.391072] [drm] psp mode 1 reset not supported now! 
> [   98.391074] amdgpu 0000:38:00.0: GPU mode1 reset failed
> [   98.391151] amdgpu 0000:38:00.0: GPU mode1 reset
> [   98.391198] [drm] psp mode 1 reset not supported now! 
> [   98.391199] amdgpu 0000:38:00.0: GPU mode1 reset failed
> [   98.391358] [drm:amdgpu_device_suspend [amdgpu]] *ERROR* amdgpu asic reset failed

Not sure if it helps.
Comment 7 Pierre Ossman 2019-11-10 12:17:50 UTC
I finally got a build environment set up, and the winner is:

> df8368be1382b442384507a5147c89978cd60702 is the first bad commit
> commit df8368be1382b442384507a5147c89978cd60702
> Author: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> Date:   Wed Feb 27 12:56:36 2019 -0500
> 
>     drm/amdgpu: Bump amdgpu version for per-flip plane tiling updates
>     
>     To help xf86-video-amdgpu and mesa know DC supports updating the
>     tiling attributes for a framebuffer per-flip.
>     
>     Cc: Michel Dänzer <michel@daenzer.net>
>     Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
>     Acked-by: Alex Deucher <alexander.deucher@amd.com>
>     Reviewed-by: Marek Olšák <marek.olsak@amd.com>
>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> 
> :040000 040000 06a7975c484e74ebdaa4ccf9ee1dc5dac7a0abc9 ab68acde511d49b3f96818066bba35f255ce1656 M	drivers

Which seems extremely odd given the contents of that commit. But I guess it makes userspace change behaviour in a way that provokes the bug?

I don't think bisect will get me further. Help?
Comment 8 Alex Deucher 2019-11-11 17:44:34 UTC
(In reply to Pierre Ossman from comment #7)
> I finally got a build environment set up, and the winner is:
> 
> > df8368be1382b442384507a5147c89978cd60702 is the first bad commit
> > commit df8368be1382b442384507a5147c89978cd60702
> > Author: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> > Date:   Wed Feb 27 12:56:36 2019 -0500
> > 
> >     drm/amdgpu: Bump amdgpu version for per-flip plane tiling updates
> >     
> >     To help xf86-video-amdgpu and mesa know DC supports updating the
> >     tiling attributes for a framebuffer per-flip.
> >     
> >     Cc: Michel Dänzer <michel@daenzer.net>
> >     Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> >     Acked-by: Alex Deucher <alexander.deucher@amd.com>
> >     Reviewed-by: Marek Olšák <marek.olsak@amd.com>
> >     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > 
> > :040000 040000 06a7975c484e74ebdaa4ccf9ee1dc5dac7a0abc9 ab68acde511d49b3f96818066bba35f255ce1656 M	drivers
> 
> Which seems extremely odd given the contents of that commit. But I guess it
> makes userspace change behaviour in a way that provokes the bug?
> 
> I don't think bisect will get me further. Help?

Userspace only enables per flip tiling updates if the version of the kernel driver is new enough to support that feature.  Maybe this is related to the DCC changes in mesa.
Comment 9 Marek Olšák 2019-11-11 19:22:39 UTC
Userspace doesn't know when suspend/resume is happening, so it can't hang on suspend/resume. My guess is it's something in DAL.
Comment 10 Martin Peres 2019-11-19 09:57:20 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/931.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.