Bug 105883

Summary: booting with kernel using amd-staging-drm-next on 2400G hangs
Product: DRI Reporter: Joshua Lee <joshua613>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: blocker    
Priority: medium CC: f.pinamartins, joshua613
Version: DRI git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=105760
Whiteboard:
i915 platform: i915 features:

Description Joshua Lee 2018-04-04 11:36:09 UTC
I am running an AMD Ryzen 2400G, using its integrated graphics in the Linux host and a gtx 1070 bound to VFIO for virtualized passthrough.  When I boot with a kernel fetched from git using amd-dri-next on 4.16, it halts the booting process, usually somewhere around when it checks UTMP. It does not lock up my keyboard lights, but nothing is displayed on the screen, it does respond to control-alt-delete. My system is set up to boot to the command line, I use X11 from there with "startx" usually; when the system completes boot of course.
Comment 1 Alex Deucher 2018-04-04 17:02:26 UTC
Can you attach your kernel log or dmesg output from the boot?  Do other kernels work?
Comment 2 Joshua Lee 2018-04-04 17:17:59 UTC
Where is my kernel log located? I successfully use a 4.16 mainline kernel... it's dri-next-staging that's causing problems. I am using mesa from git also... though I don't know if it ever gets around to using 3d graphics before the driver fails anyway.
Comment 3 Joshua Lee 2018-04-04 17:19:20 UTC
How do I find an prior boot's dmesg?
Comment 4 Edward Kigwana 2018-04-05 05:19:36 UTC
Try 

options amdgpu dpm=0 dc=1 and seee if it still locks up.
Comment 5 Joshua Lee 2018-04-05 08:01:45 UTC
(In reply to Edward Kigwana from comment #4)
> Try 
> 
> options amdgpu dpm=0 dc=1 and seee if it still locks up.

That's in /etc/modconf.d or the like, right?
Comment 6 Harry Wentland 2018-04-05 20:15:28 UTC
On Ubuntu the kernel log keeps appending to /var/log/kern.log, but that might look different on different distros.

If you have a luxury of a second system you might be able to ssh into the Ryzen system and run dmesg that way.

As for the options Edward mentioned, you can pass them to the kernel command line. If you use grub for your bootloader you'd press 'e' on the selected kernel and append " amdgpu.dpm=0 amdgpu.dc=1" at the end of the line that starts with "linux". Alternatively you can append those to GRUB_CMDLINE_LINUX in /etc/default/grub and run "sudo update-grub"

Keep in mind that this is how I'd do it on Ubuntu. There' might be a way to pass these through /etc/modconf.d as well.
Comment 7 Joshua Lee 2018-04-08 02:59:24 UTC
I'm not sure where the kernel log is on arch. When I add that option to my kernel command line that you recommended, both my drm-next-staging kernel and 4.16 mainline kernels fail. I have to remove it, then my 4.16 kernel works, but the drm-next-staging kernel still fails to operate the screen. (The kernel doesn't crash, as my keyboard still works, I can even press control-alt-delete to reboot, so I suspect it just isn't using my screen in the amdgpu driver.)
Comment 8 Michel Dänzer 2018-04-09 08:10:04 UTC
Is CONFIG_DRM_AMD_DC_DCN1_0 enabled in the kernel build configuration in both cases?
Comment 9 taijian 2018-04-16 13:01:32 UTC
This is possibly the same bug described  in bug #105760.
Comment 10 taijian 2018-04-16 13:02:33 UTC
And yes, Arch has had CONFIG_DRM_AMD_DC_DCN1_0=y since 4.15.
Comment 11 taijian 2018-04-16 13:04:09 UTC
(In reply to Joshua Lee from comment #3)
> How do I find an prior boot's dmesg?

Try "journalctl -b -1" (for the boot attempt directly prior to this one, -2 for the one before that, etc...).
Comment 12 Joshua Lee 2018-04-22 14:55:49 UTC
Someone on the /r/VFIO discord with a Ryzen APU (he usually boots his VM from the console, rather than having a graphical host) confirmed the crashiness by running Furmark, which crashed his GPU driver in ten minutes; his dmesg showed that as well.

13877  0.1  0.0      0     0 pts/1    ZNl+ 10:09   0:00 [GpuTest] <defunct>
[90972.383503] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=36081, last emitted seq=36083
[90972.383512] [drm] IP block:psp is hung!
[90972.383514] [drm] GPU recovery disabled.
Comment 13 Joshua Lee 2018-04-22 14:57:17 UTC
To be clear, the Furmark was being run in his host system, not within a VM.
Comment 14 Martin Peres 2019-11-19 08:34:45 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/342.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.