Bug 107880

Summary: Regression: System fails to boot on raven ridge 4.18 vs 4.19 rc
Product: DRI Reporter: Marvin Damschen <marvin.damschen>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED INVALID QA Contact:
Severity: blocker    
Priority: medium    
Version: DRI git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Successful boot with 4.18.7 on Raven Ridge
none
Modprobe amdgpu freezes video output with 4.19-rc3
none
4.19-rc3 with amdgpu.ip_block_mask=0xff none

Description Marvin Damschen 2018-09-10 08:28:17 UTC
Created attachment 141500 [details]
Successful boot with 4.18.7 on Raven Ridge

System boots fine with kernel 4.18 on a Raven Ridge (AMD Ryzen 5 2500U, Lenovo E485, latest firmware from linux-firmware.git), but boot fails with kernel 4.19 (tested rc2 and rc3). System hangs after "fb: switching to amdgpudrmfb from EFI VGA".
I am unable to obtain any logs of the crash (LUKS encryption might be the reason?). I will attach a log of a working boot with 4.18.7, please let me know how to provide more info.

Thank you
Marvin
Comment 1 Michel Dänzer 2018-09-10 09:13:25 UTC
(In reply to Marvin Damschen from comment #0)
> System hangs after "fb: switching to amdgpudrmfb from EFI VGA".

How long have you waited for? E.g. if a microcode file is missing, the attempt to load it can hang for one or several minutes before timing out.


> I am unable to obtain any logs of the crash (LUKS encryption might be the
> reason?).

One possibility is to prevent the driver from loading by passing

 modprobe.blacklist=amdgpu

on the kernel command line, then you can try manually loading it with

 sudo modprobe amdgpu

and should get the full dmesg output.
Comment 2 Marvin Damschen 2018-09-10 10:51:21 UTC
(In reply to Michel Dänzer from comment #1)
> (In reply to Marvin Damschen from comment #0)
> > System hangs after "fb: switching to amdgpudrmfb from EFI VGA".
> 
> How long have you waited for? E.g. if a microcode file is missing, the
> attempt to load it can hang for one or several minutes before timing out.
Waited for ~5min now, but nothing changed.

> > I am unable to obtain any logs of the crash (LUKS encryption might be the
> > reason?).
> 
> One possibility is to prevent the driver from loading by passing
> 
>  modprobe.blacklist=amdgpu
> 
> on the kernel command line, then you can try manually loading it with
> 
>  sudo modprobe amdgpu
> 
> and should get the full dmesg output.
Thank you, this worked. Full output is attached, but appears fine. Still, the video output freezes.
Comment 3 Marvin Damschen 2018-09-10 10:52:21 UTC
Created attachment 141502 [details]
Modprobe amdgpu freezes video output with 4.19-rc3
Comment 4 Michel Dänzer 2018-09-10 10:59:24 UTC
Looks like there may be an issue with the VCN microcode loading. Does

 amdgpu.ip_block_mask=0xff

on the kernel command line avoid the problem?

Can you bisect?
Comment 5 Marvin Damschen 2018-09-10 11:12:21 UTC
(In reply to Michel Dänzer from comment #4)
> Looks like there may be an issue with the VCN microcode loading. Does
> 
>  amdgpu.ip_block_mask=0xff
> 
> on the kernel command line avoid the problem?
It does! dmesg contains a lot of call traces though (I will attach).

> Can you bisect?
I can, but will probably find time by the end of the week only.

Thank you
Marvin
Comment 6 Marvin Damschen 2018-09-10 11:14:27 UTC
Created attachment 141503 [details]
4.19-rc3 with amdgpu.ip_block_mask=0xff
Comment 7 jamesz@amd.com 2018-09-10 17:36:16 UTC
Dmesg shows: Found VCN firmware Version: 1.24 Family ID: 18.
It is really old.
Please update with latest vcn firmware for raven (1.73)
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=bd63ccb9d9dd926aa7072f416363dd8f529b5fca.
Comment 8 Marvin Damschen 2018-09-10 20:00:06 UTC
I overlooked that the ubuntu kernel builds bring their own firmware for each version. In my case, I had the recent firmware in /lib/firmware/amdgpu, but /lib/firmware/4.19.0-041900rc3-generic/amdgpu/ was actually used. I moved the files accordingly and 4.19-rc3 boots perfectly fine now without any extra parameters.
Thank you and sorry for the trouble.
Comment 9 jamesz@amd.com 2018-09-11 13:23:40 UTC
Hi Marvin,

That is great! I want to check with you where this old VCN firmware came from.

Did you install old AMD ROCm package on this system before?

Best Regards!
James Zhu
Comment 10 Marvin Damschen 2018-09-11 15:12:35 UTC
(In reply to jamesz@amd.com from comment #9)

> Did you install old AMD ROCm package on this system before?
Yes, I did. I now believe that removing all traces of rocm-dkms eliminated the root of the problem.

Best regards
Marvin
Comment 11 jamesz@amd.com 2018-09-12 14:33:03 UTC
I think latest ROCm package has fixed this issue. You are welcome to try it.

Best Regards!
James zhu
Comment 12 Marvin Damschen 2018-09-14 07:18:38 UTC
rocm-dkms (http://repo.radeon.com/rocm/apt/debian/pool/main/r/rock-dkms/rock-dkms_1.8-199_all.deb) currently still contains the old firmware. I will open an issue on ROCm's GitHub repo.

Best regards
Marvin

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.