Bug 109456 - KVM VFIO guest X hang with guest kernel > 4.15
Summary: KVM VFIO guest X hang with guest kernel > 4.15
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-25 11:03 UTC by libgradev
Modified: 2019-07-29 14:17 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Git bisect log + result (3.51 KB, text/plain)
2019-04-18 14:12 UTC, libgradev
no flags Details
Dmesg output from failed Xorg start (70.75 KB, text/plain)
2019-04-23 10:07 UTC, libgradev
no flags Details
Xorg output from failed Xorg start (5.32 KB, text/x-log)
2019-04-23 10:11 UTC, libgradev
no flags Details

Description libgradev 2019-01-25 11:03:28 UTC
Hi

Host: Arch running kernel 4.20.3 / Qemu 3.1
Guest: Ubuntu 18.04.1 (tried Ubuntu 18.10 also) with any kernel after 4.15
Driver: Happens with current stable, git from Padoka PPA and amdgpu-pro 18.50

Issue: Vega 64 passed through to guest causes X to hang on boot using 100% CPU for one of the passed through cores for the Xorg process. X never starts with the stopping point being 'LoadModule: "dri2"'. I cannot see any relevant errors in Xorg.log or the KRB - though it's easily reproducible. The system is still alive and can be ssh'd into.

Adding the nomodeset kernel option allows the guest to boot to GUI (albeit without acceleration).

Details: With Qemu 3.0 having more than ~12GB RAM assigned to the guest causes this behaviour. With Qemu 3.1 the amount of RAM is irrelevant - the hang always occurs.

Believe this is related to one of the GPU reset patches added to the 4.16 kernel as the guest boots fine with Qemu 3.0/3.1 and guest kernel 4.15 (tested up to 16GB guest RAM).
Comment 1 Alex Deucher 2019-01-25 15:32:59 UTC
Can you bisect?  Please attach your dmesg output and xorg log.
Comment 2 libgradev 2019-04-18 14:12:42 UTC
Created attachment 144034 [details]
Git bisect log + result
Comment 3 libgradev 2019-04-23 10:07:48 UTC
Created attachment 144077 [details]
Dmesg output from failed Xorg start
Comment 4 libgradev 2019-04-23 10:11:23 UTC
Created attachment 144078 [details]
Xorg output from failed Xorg start
Comment 5 libgradev 2019-04-23 10:13:53 UTC
Have compiled 5.0.7 with the commit indicated by the bisect reverted and it boots to Xorg fine.

Running an OpenGL application subsequently will hang the VM though.
Comment 6 libgradev 2019-06-19 11:46:13 UTC
Updated to QEMU 4.0.0 and re-tested - same result.

Let me know if you would like anything further info.

Thanks!
Comment 7 Alex Williamson 2019-06-19 12:47:07 UTC
Can you test this QEMU patch that's already in qemu.git for 4.1:

https://git.qemu.org/?p=qemu.git;a=commitdiff;h=3412d8ec9810b819f8b79e8e0c6b87217c876e32

Alternatively, setting the pci-hole64-size=0 can also avoid this issue:

 -global i440FX-pcihost.pci-hole64-size=0

or

 -global q35-host.pci-hole64-size=0

depending on your VM machine type.
Comment 8 libgradev 2019-07-29 14:16:35 UTC
Got back to this again recently and can confirm it's fixed in Qemu git (for 4.1) now.

Many thanks :)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.