Bug 96243

Summary: GPU initialization fails when running in VM
Product: DRI Reporter: hiwatari.seiji
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg output after modprobing amdgpu
none
boot_log_successfull_initialization none

Description hiwatari.seiji 2016-05-27 11:55:40 UTC
Created attachment 124126 [details]
dmesg output after modprobing amdgpu

Setup:
QEMU + VFIO running VM with Ubuntu 16.04 and the latest amdgpu driver (amdgpu-pro 16.20.3)

If the system is started normally, the kernel crashes during boot with various errors (varies from boot to boot!).

If the module (amdgpu) is blacklisted during boot (grub.cfg) and modprobed afterwards the error always the same [see attached file]:

sw_init 5 failed -12
amdgpu_init failed
memory type 2 has not been initialized
amdgpu probe failed with error -12

Experiments:

- Using the amdgpu version shipped with Ubuntu 16.04, booting works ONCE. Rebooting/Shutting-Down the VM leads to consecutive errors of not beeing able to init the GPU again, probably due to Bonaire PCI-Reset issues. Restarting the Host itself allows booting the VM once more. Hereafter: Same error.

- A Windows 10 VM does not suffer from those issues. Rebooting/Shutting-Down this VM works without issues / host-reboot
Comment 1 Alex Deucher 2016-12-31 06:16:50 UTC
Can you try a newer kernel?
Comment 2 hiwatari.seiji 2017-10-27 17:22:48 UTC
I did another round of testing and am quite pleased with the results.

New setup:
> # Host
> qemu: 2.10.0
> kernel: 4.13.5-gentoo

> # Guest:
> kernel: vmlinuz-4.13.6-1-default (OpenSUSE)
> cmdline: amdgpu.cik_support=1 modprobe.blacklist=radeon

I did a couple of restarts and am quite pleased, that no host-hangups occured. The chance of a successfull initialization of the graphics hardware within the guest is at around 60%.

When it fails, the following line is always the last visible:

> fb: switching to amdgpudrmfb from EFI VGA

Additionally, even when everything works, there are a couple of ring test errors:

> [drm:gfx_v7_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 2 test failed (scratch(0xC040)=0xCAFEDEAD)

I'll attach a new log-file showing the boot log when everything works.
Comment 3 hiwatari.seiji 2017-10-27 17:24:13 UTC
Created attachment 135114 [details]
boot_log_successfull_initialization
Comment 4 Martin Peres 2019-11-19 08:08:14 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/75.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.