Created attachment 133720 [details] iGPU crash dump (/sys/class/drm/card0/error) Overview: The iGPU hangs semi-randomly in a KVM virtual machine running Debian 9, with an Intel HD 530 passed through to the guest at 00:02.0 and a GTX 960M (Optimus device, no outputs of its own) passed through to the guest at 00:04.0 with the appropriate rom file (that nouveau successfully loads) in the default PRIME setup. Steps to reproduce: It's mostly random, but executing this command usually does the trick: 1. Run vblank_mode=0 DRI_PRIME=1 glxgears in the aforementioned environment. Actual results: The iGPU hangs, rendering the screen completely unresponsive, except for the cursor which can sometimes be moved around, and is stuck in the state (the image displayed) it was in at the moment the iGPU hanged. It sometimes recovers, sometimes stays frozen, and sometimes (happened exactly once to me) goes black and then displays a static, garbled (appears random) image. Expected results: The iGPU should continuously stay responsive, and provide an output for the Optimus-enabled dGPU. Build Dates: i915 1.6.0 20160919 nouveau 1.3.1 20120801 Debian 4.9.30-2+deb9u3 (2017-08-06) X.Org X Server 1.19.2 Additional Information: Output of lspci: 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] 00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01) 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03) 00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06) 00:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03) 00:04.0 3D controller: NVIDIA Corporation GM107M [GeForce GTX 960M] (rev a2) Output of xrandr --listproviders: Provider 0: id: 0x75 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 3 outputs: 3 associated providers: 0 name:modesetting Provider 1: id: 0x3f cap: 0x5, Source Output, Source Offload crtcs: 0 outputs: 0 associated providers: 0 name:modesetting Output of DRI_PRIME=0 glxinfo | grep "OpenGL": OpenGL vendor string: Intel Open Source Technology Center OpenGL renderer string: Mesa DRI Intel(R) HD Graphics 530 (Skylake GT2) OpenGL core profile version string: 4.5 (Core Profile) Mesa 13.0.6 OpenGL core profile shading language version string: 4.50 OpenGL core profile context flags: (none) OpenGL core profile profile mask: core profile OpenGL core profile extensions: OpenGL version string: 3.0 Mesa 13.0.6 OpenGL shading language version string: 1.30 OpenGL context flags: (none) OpenGL extensions: OpenGL ES profile version string: OpenGL ES 3.2 Mesa 13.0.6 OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20 OpenGL ES profile extensions: Output of DRI_PRIME=1 glxinfo | grep "OpenGL": OpenGL vendor string: nouveau OpenGL renderer string: Gallium 0.4 on NV117 OpenGL core profile version string: 4.1 (Core Profile) Mesa 13.0.6 OpenGL core profile shading language version string: 4.10 OpenGL core profile context flags: (none) OpenGL core profile profile mask: core profile OpenGL core profile extensions: OpenGL version string: 3.0 Mesa 13.0.6 OpenGL shading language version string: 1.30 OpenGL context flags: (none) OpenGL extensions: OpenGL ES profile version string: OpenGL ES 3.0 Mesa 13.0.6 OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00 OpenGL ES profile extensions: Output of dmesg after crash: http://paste,debian,net/982675
Created attachment 133721 [details] dmesg output after the bug is triggered
It didn't handle a context-switch interrupt and so the ELSP queue was drained -- the hardware was idle, even though we still thought it was processing work. Does this still happen on a recent kernel? There's a little more info in new error states that may help to debug this problem.
I just tried updating the linux-image package to 4.12.0, now I just get no output on my monitor, instead of the highly unstable output I had with 4.9. I can, however, confirm that the VM still boots up - I can still ssh to it from another device.
Created attachment 133727 [details] 4.12 guest dmesg
Oh, I just realized something: I don't get any freezing at all without the Optimus (https://devtalk.nvidia.com/default/topic/957981/linux/prime-render-offloading-on-nvidia-optimus/) dGPU passed through, it makes sense that the driver would be idle with nouveau+PRIME enabled.
The 4.12 kernel issue is also probably completely unrelated and should be in a separate bug report, filing that one too in a couple of hours.
Created attachment 133747 [details] I was wrong, just got a GPU hang with only the iGPU passed through on 4.9 (no dGPU/PRIME)
Created attachment 133749 [details] crash dump from hang w/o dGPU passed through
First of all. Sorry about spam. This is mass update for our bugs. Sorry if you feel this annoying but with this trying to understand if bug still valid or not. If bug investigation still in progress, please ignore this and I apologize! If you think this is not anymore valid, please comment to the bug that can be closed. If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Closing, please re-open is issue still exists.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.