Bug 109456

Summary: KVM VFIO guest X hang with guest kernel > 4.15
Product: DRI Reporter: libgradev
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: ckoenig.leichtzumerken, libgradev
Version: XOrg git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Git bisect log + result
none
Dmesg output from failed Xorg start
none
Xorg output from failed Xorg start none

Description libgradev 2019-01-25 11:03:28 UTC
Hi

Host: Arch running kernel 4.20.3 / Qemu 3.1
Guest: Ubuntu 18.04.1 (tried Ubuntu 18.10 also) with any kernel after 4.15
Driver: Happens with current stable, git from Padoka PPA and amdgpu-pro 18.50

Issue: Vega 64 passed through to guest causes X to hang on boot using 100% CPU for one of the passed through cores for the Xorg process. X never starts with the stopping point being 'LoadModule: "dri2"'. I cannot see any relevant errors in Xorg.log or the KRB - though it's easily reproducible. The system is still alive and can be ssh'd into.

Adding the nomodeset kernel option allows the guest to boot to GUI (albeit without acceleration).

Details: With Qemu 3.0 having more than ~12GB RAM assigned to the guest causes this behaviour. With Qemu 3.1 the amount of RAM is irrelevant - the hang always occurs.

Believe this is related to one of the GPU reset patches added to the 4.16 kernel as the guest boots fine with Qemu 3.0/3.1 and guest kernel 4.15 (tested up to 16GB guest RAM).
Comment 1 Alex Deucher 2019-01-25 15:32:59 UTC
Can you bisect?  Please attach your dmesg output and xorg log.
Comment 2 libgradev 2019-04-18 14:12:42 UTC
Created attachment 144034 [details]
Git bisect log + result
Comment 3 libgradev 2019-04-23 10:07:48 UTC
Created attachment 144077 [details]
Dmesg output from failed Xorg start
Comment 4 libgradev 2019-04-23 10:11:23 UTC
Created attachment 144078 [details]
Xorg output from failed Xorg start
Comment 5 libgradev 2019-04-23 10:13:53 UTC
Have compiled 5.0.7 with the commit indicated by the bisect reverted and it boots to Xorg fine.

Running an OpenGL application subsequently will hang the VM though.
Comment 6 libgradev 2019-06-19 11:46:13 UTC
Updated to QEMU 4.0.0 and re-tested - same result.

Let me know if you would like anything further info.

Thanks!
Comment 7 Alex Williamson 2019-06-19 12:47:07 UTC
Can you test this QEMU patch that's already in qemu.git for 4.1:

https://git.qemu.org/?p=qemu.git;a=commitdiff;h=3412d8ec9810b819f8b79e8e0c6b87217c876e32

Alternatively, setting the pci-hole64-size=0 can also avoid this issue:

 -global i440FX-pcihost.pci-hole64-size=0

or

 -global q35-host.pci-hole64-size=0

depending on your VM machine type.
Comment 8 libgradev 2019-07-29 14:16:35 UTC
Got back to this again recently and can confirm it's fixed in Qemu git (for 4.1) now.

Many thanks :)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.