Created attachment 125363 [details] IVB-drv_missed_irq-kern.log Platform: IVB CPU: Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (family 6, model 58, stepping 9) Motherboard version: DH77EB GPU: Intel® HD Graphics 4000 - Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller Software Bios: EBH7710H.86A.0096.2012.1012.1645 Linux distribution: Ubuntu 16.04 64 bits Kernel: 4.7.0-rc7 7eeb04a from http://cgit.freedesktop.org/drm-intel/ commit 7eeb04a101316645916d4d9df058a9341797f1af Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sun Jul 24 11:00:31 2016 +0100 drm-intel-nightly: 2016y-07m-24d-09h-59m-54s UTC integration manifest drm: libdrm-2.4.70 0caa84c from git://anongit.freedesktop.org/mesa/drm mesa: mesa-11.2.2 3a9f628from git://anongit.freedesktop.org/mesa/mesa cairo: 1.15.2 db8a7f1 from git://anongit.freedesktop.org/cairo xserver: xorg-server-1.18.0-497 0b2f308 from git://git.freedesktop.org/git/xorg/xserver xf86-video-intel: 2.99.917-687 6988b87 from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel libva: libva-1.7.0-26 c36971c from git://git.freedesktop.org/git/vaapi/libva vaapi-intel-driver: 1.7.0-58 e554446 from git://git.freedesktop.org/git/vaapi/intel-driver Intel-Gpu-Tools 1.15-140 e3abb20 from http://anongit.freedesktop.org/git/xorg/app/intel-gpu-tools.git Steps: ------ 1. Execute IGT test: drv_missed_irq Actual result -------------- 1. Test is fail: root@IVB102:/opt/X11R7/src/intel-gpu-tools/tests# ./drv_missed_irq IGT-Version: 1.15-ge3abb20 (x86_64) (Linux: 4.7.0-rc7-nightly+ x86_64) (drv_missed_irq:4592) CRITICAL: Test assertion failure function __real_main115, file drv_missed_irq.c:165: (drv_missed_irq:4592) CRITICAL: Failed assertion: missed_rings == expect_rings (drv_missed_irq:4592) CRITICAL: error: 0 != 0x7 Stack trace: #0 [__igt_fail_assert+0xf1] #1 [_start+0x0] #2 [__libc_start_main+0xf0] #3 [_start+0x29] #4 [<unknown>+0x29] Test drv_missed_irq failed. **** DEBUG **** (drv_missed_irq:4592) igt-core-DEBUG: Test requirement passed: !igt_run_in_simulation() (drv_missed_irq:4592) drmtest-DEBUG: Test requirement passed: !(fd<0) (drv_missed_irq:4592) DEBUG: Test requirement passed: gem_mmap__has_wc(fd) (drv_missed_irq:4592) DEBUG: Testing rings 7 (drv_missed_irq:4592) DEBUG: Executing on ring render [1] (drv_missed_irq:4592) DEBUG: Executing on ring bsd [2] (drv_missed_irq:4592) DEBUG: Executing on ring bsd1 [2002] (drv_missed_irq:4592) DEBUG: Executing on ring bsd2 [4002] (drv_missed_irq:4592) DEBUG: Executing on ring blt [3] (drv_missed_irq:4592) DEBUG: Executing on ring vebox [4] (drv_missed_irq:4592) CRITICAL: Test assertion failure function __real_main115, file drv_missed_irq.c:165: (drv_missed_irq:4592) CRITICAL: Failed assertion: missed_rings == expect_rings (drv_missed_irq:4592) CRITICAL: error: 0 != 0x7 **** END **** FAIL (7.458s) Expected result: ---------------- 1. Test is Pass
Hmm, yes, this was why that hangcheck -> stuck rings markup was in the idle worker. That code had an inherent race condition, just need to find a better means of flagging this failure -- less of a concern as the test is explicitly trying to break the driver.
Created attachment 125520 [details] IVB-kms_setmode__clone_exclusive_crtc-kern.log
Created attachment 125521 [details] KBLU-drv_missed_irq-kern.log
Created attachment 125522 [details] KBLU-drv_missed_irq-output Reproduced on KBL-U. KBL-U Hardware Platform: KABY LAKE-U CPU : Intel(R) Core(TM) @ 2.60GHz MCP : KBL-U G0 2+2 QDF : QYQ8 Chipset PCH: SPT-LP C1 CRB : KABY LAKE U DDR3L RVP7 CRB FAB1 Software BIOS: 38_07 KBLSE2R1.R00.X038.P07.1606200632 from https://ubit-artifactory-ba.intel.com/artifactory/simple/one-windows-local/Submissions/ifwi/KBL_PURPLE_IFWI_2016_WW26_1_00_HR'16/IFWI-KBL_PURPLE_IFWI_2016_WW26_1_00_HR'16-R.7z ME FW: 11.5.0.1058 EC FW: 1.19 Ksc (EC FW): 1.20 Linux distribution: Ubuntu 16.04 64 bits Kernel: 4.7.0 6f87e85 from http://cgit.freedesktop.org/drm-intel/ commit 6f87e85fa302ffdb4cb9f4cd712691165923c7a2 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Aug 1 15:53:41 2016 +0100 drm-intel-nightly: 2016y-08m-01d-14h-53m-17s UTC integration manifest drm: libdrm-2.4.70 f19cd3a from git://anongit.freedesktop.org/mesa/drm mesa: mesa-11.2.2 3a9f628from git://anongit.freedesktop.org/mesa/mesa cairo: 1.15.2 db8a7f1 from git://anongit.freedesktop.org/cairo xserver: xorg-server-1.18.0-502 c833c08 from git://git.freedesktop.org/git/xorg/xserver xf86-video-intel: 2.99.917-688 49daf5d from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel libva: libva-1.7.0-40 f7e2263 from git://git.freedesktop.org/git/vaapi/libva vaapi-intel-driver: 1.7.0-64 1cd6795 from git://git.freedesktop.org/git/vaapi/intel-driver GuC 9.14 from http://rdvivi-hillsboro.jf.intel.com/firmware/kbl_guc_ver9_14.tar.bz2 DMC 1.01 from: https://01.org/linuxgraphics/downloads/kabylake-dmc-1.01 Intel-Gpu-Tools 1.15-188 53b4dfdfrom http://anongit.freedesktop.org/git/xorg/app/intel-gpu-tools.git
In my tree, I have moved the kicking from hangcheck to intel_breadcrumbs.c which prevents this problem. I'll try and get that polished up...
Christophe, please try with Chris' patch: https://patchwork.freedesktop.org/series/10711/
Had some fun with the test case as well. It appears that on execlists (at least) we could get spurious interrupts due to the IIR bit being set for the user interrupt and a second context-switch interrupt causing processing of the wakeup.
commit 83348ba84ee0d5d4d982e5382bfbc8b2a2d05e75 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Aug 9 17:47:51 2016 +0100 drm/i915: Move missed interrupt detection from hangcheck to breadcrumbs In commit 2529d57050af ("drm/i915: Drop racy markup of missed-irqs from idle-worker") the racy detection of missed interrupts was removed when we went idle. This however opened up the issue that the stuck waiters were not being reported, causing a test case failure. If we move the stuck waiter detection out of hangcheck and into the breadcrumb mechanims (i.e. the waiter) itself, we can avoid this issue entirely. This leaves hangcheck looking for a stuck GPU (inspecting for request advancement and HEAD motion), and breadcrumbs looking for a stuck waiter - hopefully make both easier to understand by their segregation.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.