Environment: Xen: upstream 4.9 kernel: drm-intel-nightly 4f89fbd drm-tip: 2017y-06m-12d-19h-18m-52s UTC integration manifest intel-gpu-tools: master branch d1ea0c0 gem_wsim: More interesting workloads Issues: After running igt@drv_hangman, gpu couldn't recovery on dom0, then send reboot command through SSH, dom0 couldn't reboot as Xorg couldn't be terminated and blocked there, I have to press power button to shutdown machine. How to reproduce: 1) compile upstream xen 4.9, compile drm-intel-nightly kernel 2) drm-intel-nightly as dom0's kernel, use xen 4.9 to boot it 3) After dom0 boot up, run igt@tests@drv_hangman through ssh 4) After running drv_hangman, gpu couldn't recovery and dom0's desktop is frozen 5) send reboot command to dom0 through ssh, but dom0 couldn't reboot as xorg blocked there. And I have to press power button to shutdown dom0. Experiments: 1) this only happens on xen environment, native environment doesn't have such issue. 2) this is an kernel regression. 4.10 kernel doesn't have such issue, 4.11 kernel has this issue and drm-intel-night has this too. git bisect tell me the first bad commit is: commit 4c9655436522eaf4ba35572851150ccb71f3866e Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Jan 17 17:59:01 2017 +0200 drm/i915: Move engine reset preparation to i915_gem_reset_prepare() Now that we have prepare/finish routines for the GEM reset, move the disabling of the engine->irq_tasklet into them to reduce repetition. The device irq enable/disable is split out to ensure it is run first and last always (even if the GPU reset fails). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-1-git-send-email-mika.kuoppala@intel.com
Created attachment 131914 [details] Dmesg while running drv_hangman
Created attachment 131915 [details] dmesg while send reboot command through ssh
Created attachment 131916 [details] the output and drv_hangman
What did you bisect? The failure of drv_hangman is quite normal for !full-ppgtt, which I presume is the result of the vgpu emulation. The hang of X is another issue unconnected to reset. You should not be running igt while anything else is accessing the gpu as most of the tests assume exclusive ownership of the device.
Good day, could you specify the platform you used? It works well with the following configuration: Attached Displays: DP and HDMI Driver Graphics Specifications ============================================================== Component: drm tag: libdrm-2.4.80-20-g48aac8c commit: 48aac8c6ef301be5ed4cf824779baa3c98981a90 Component: cairo tag: 1.15.4-22-g0fd0fd0 commit: 0fd0fd0ae9ad8cfb177bb844091de98c0235917e Component: intel-gpu-tools tag: intel-gpu-tools-1.18-214-ga0433ca commit: a0433ca1dddb83968a0f91753509526bb0240b5a Component: piglit tag: piglit-v1 commit: 943b4f9dff77874c1998ca68f78f16db1d175fdf Hardware Graphics Specifications ============================================================== Processor Graphics Intel® Iris™ Graphics 540 Graphics Base Frequency 300.00 MHz Graphics Max Dynamic Frequency 950.00 MHz Graphics Video Max Memory 32 GB eDRAM 64 MB Graphics Output eDP/DP/HDMI/DVI 4K Support Yes, at 60Hz Max Resolution (Intel® WiDi) 1080p Max Resolution (HDMI 1.4) 4096x2304@24Hz Max Resolution (DP) 4096x2304@60Hz Max Resolution (eDP) 4096x2304@60Hz Max Resolution (VGA) N/A DirectX* Support 12 OpenGL* Support 4.4 Intel® Quick Sync Video Yes Intel® InTru™ 3D Technology Yes Intel® Clear Video HD Technology Yes Intel® Clear Video Technology Yes Intel® Wireless Display Yes # of Displays Supported 3 Device ID 0x1926 Could you take a look on that and share the info? Thanks.
I just install an upstream xen 4.9 and upstream 4.11 kernel, then use xen to boot 4.11 kernel. Both of them don't contain any xen-gvt related code and I don't boot a guest, so there is no vgpu emulation. In this case, i915 driver access hw directly and doesn't trap into vgpu. I bisect the upstream 4.11 kernel through checking gpu recovery and found the above commit. Then I tried drm-intel-nightly and found it has the same issue. Just now, I boot drm-intel-nightly kernel to text mode using xen, then run drv_hangman, but GPU still couldn't recovery. And through the dmesg, it is in full-ppgtt mode as dmesg output "[i915_driver_load[i915]] ppgtt mode: 3" (In reply to Chris Wilson from comment #4) > What did you bisect? The failure of drv_hangman is quite normal for > !full-ppgtt, which I presume is the result of the vgpu emulation. The hang > of X is another issue unconnected to reset. You should not be running igt > while anything else is accessing the gpu as most of the tests assume > exclusive ownership of the device.
My machine is SkyLake. This issue doesn't happen in native machine. It only happens in xen environment. If you are using ubuntu, you could "apt-get install xen-hypervisor-4.(6/7/8)-amd64". Then you could see an item "Advanced options for Ubuntu GNU/Linux (with Xen hypervisor)" in grub, enter this item and select "your target kernel" to boot. (In reply to elizabethx.de.la.torre.mena from comment #5) > Good day, could you specify the platform you used? It works well with the > following configuration: > > Attached Displays: DP and HDMI > > Driver Graphics Specifications > ============================================================== > Component: drm > tag: libdrm-2.4.80-20-g48aac8c > commit: 48aac8c6ef301be5ed4cf824779baa3c98981a90 > > Component: cairo > tag: 1.15.4-22-g0fd0fd0 > commit: 0fd0fd0ae9ad8cfb177bb844091de98c0235917e > > Component: intel-gpu-tools > tag: intel-gpu-tools-1.18-214-ga0433ca > commit: a0433ca1dddb83968a0f91753509526bb0240b5a > > Component: piglit > tag: piglit-v1 > commit: 943b4f9dff77874c1998ca68f78f16db1d175fdf > > Hardware Graphics Specifications > ============================================================== > Processor Graphics Intel® Iris™ Graphics 540 > Graphics Base Frequency 300.00 MHz > Graphics Max Dynamic Frequency 950.00 MHz > Graphics Video Max Memory 32 GB > eDRAM 64 MB > Graphics Output eDP/DP/HDMI/DVI > 4K Support Yes, at 60Hz > Max Resolution (Intel® WiDi) 1080p > Max Resolution (HDMI 1.4) 4096x2304@24Hz > Max Resolution (DP) 4096x2304@60Hz > Max Resolution (eDP) 4096x2304@60Hz > Max Resolution (VGA) N/A > DirectX* Support 12 > OpenGL* Support 4.4 > Intel® Quick Sync Video Yes > Intel® InTru™ 3D Technology Yes > Intel® Clear Video HD Technology Yes > Intel® Clear Video Technology Yes > Intel® Wireless Display Yes > # of Displays Supported 3 > Device ID 0x1926 > > Could you take a look on that and share the info? Thanks.
drv_hangman couldn't be finished in text mode and gpu couldn't recovery: tests# ./drv_hangman IGT-Version: 1.18-gd1ea0c0 (x86_64) (Linux: 4.12.0-rc4nightly+ x86_64) Subtest error-state-sysfs-entry: SUCCESS (0.000s) Subtest error-state-basic: SUCCESS (0.013s) Subtest error-state-capture-render: SUCCESS (9.891s) Subtest error-state-capture-bsd: SUCCESS (6.016s) Test requirement not met in function test_error_state_capture, file drv_hangman.c:186: Test requirement: gem_has_ring(device, ring_id) Subtest error-state-capture-bsd1: SKIP (0.000s) Test requirement not met in function test_error_state_capture, file drv_hangman.c:186: Test requirement: gem_has_ring(device, ring_id) Subtest error-state-capture-bsd2: SKIP (0.000s) Subtest error-state-capture-blt: SUCCESS (5.983s) Subtest error-state-capture-vebox: SUCCESS (6.016s) Subtest hangcheck-unterminated: SUCCESS (15.999s) cursor flicker forever and couldn't exit to normal console where I could input command.
Created attachment 131944 [details] dmesg from bootup and runing drv_hangman
This dmesg is in text mode. (In reply to XiongZhang from comment #9) > Created attachment 131944 [details] > dmesg from bootup and runing drv_hangman
Latest drm-intel-nightly branch couldn't reproduce this issue. And it is fixed by Per-engine reset feature from: https://lists.freedesktop.org/archives/intel-gfx/2017-June/130921.html
closing
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.