Summary: | [KBL IVB] kicked stuck waiters missed irq when running drv_missed_irq | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | cprigent <christophe.prigent> | ||||||||||
Component: | DRM/Intel | Assignee: | Chris Wilson <chris> | ||||||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||
Severity: | normal | ||||||||||||
Priority: | medium | CC: | intel-gfx-bugs | ||||||||||
Version: | unspecified | ||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||
OS: | Linux (All) | ||||||||||||
Whiteboard: | |||||||||||||
i915 platform: | IVB, KBL | i915 features: | GEM/Other | ||||||||||
Attachments: |
|
Description
cprigent
2016-07-28 09:44:02 UTC
Hmm, yes, this was why that hangcheck -> stuck rings markup was in the idle worker. That code had an inherent race condition, just need to find a better means of flagging this failure -- less of a concern as the test is explicitly trying to break the driver. Created attachment 125520 [details]
IVB-kms_setmode__clone_exclusive_crtc-kern.log
Created attachment 125521 [details]
KBLU-drv_missed_irq-kern.log
Created attachment 125522 [details] KBLU-drv_missed_irq-output Reproduced on KBL-U. KBL-U Hardware Platform: KABY LAKE-U CPU : Intel(R) Core(TM) @ 2.60GHz MCP : KBL-U G0 2+2 QDF : QYQ8 Chipset PCH: SPT-LP C1 CRB : KABY LAKE U DDR3L RVP7 CRB FAB1 Software BIOS: 38_07 KBLSE2R1.R00.X038.P07.1606200632 from https://ubit-artifactory-ba.intel.com/artifactory/simple/one-windows-local/Submissions/ifwi/KBL_PURPLE_IFWI_2016_WW26_1_00_HR'16/IFWI-KBL_PURPLE_IFWI_2016_WW26_1_00_HR'16-R.7z ME FW: 11.5.0.1058 EC FW: 1.19 Ksc (EC FW): 1.20 Linux distribution: Ubuntu 16.04 64 bits Kernel: 4.7.0 6f87e85 from http://cgit.freedesktop.org/drm-intel/ commit 6f87e85fa302ffdb4cb9f4cd712691165923c7a2 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Aug 1 15:53:41 2016 +0100 drm-intel-nightly: 2016y-08m-01d-14h-53m-17s UTC integration manifest drm: libdrm-2.4.70 f19cd3a from git://anongit.freedesktop.org/mesa/drm mesa: mesa-11.2.2 3a9f628from git://anongit.freedesktop.org/mesa/mesa cairo: 1.15.2 db8a7f1 from git://anongit.freedesktop.org/cairo xserver: xorg-server-1.18.0-502 c833c08 from git://git.freedesktop.org/git/xorg/xserver xf86-video-intel: 2.99.917-688 49daf5d from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel libva: libva-1.7.0-40 f7e2263 from git://git.freedesktop.org/git/vaapi/libva vaapi-intel-driver: 1.7.0-64 1cd6795 from git://git.freedesktop.org/git/vaapi/intel-driver GuC 9.14 from http://rdvivi-hillsboro.jf.intel.com/firmware/kbl_guc_ver9_14.tar.bz2 DMC 1.01 from: https://01.org/linuxgraphics/downloads/kabylake-dmc-1.01 Intel-Gpu-Tools 1.15-188 53b4dfdfrom http://anongit.freedesktop.org/git/xorg/app/intel-gpu-tools.git In my tree, I have moved the kicking from hangcheck to intel_breadcrumbs.c which prevents this problem. I'll try and get that polished up... Christophe, please try with Chris' patch: https://patchwork.freedesktop.org/series/10711/ Had some fun with the test case as well. It appears that on execlists (at least) we could get spurious interrupts due to the IIR bit being set for the user interrupt and a second context-switch interrupt causing processing of the wakeup. commit 83348ba84ee0d5d4d982e5382bfbc8b2a2d05e75 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Aug 9 17:47:51 2016 +0100 drm/i915: Move missed interrupt detection from hangcheck to breadcrumbs In commit 2529d57050af ("drm/i915: Drop racy markup of missed-irqs from idle-worker") the racy detection of missed interrupts was removed when we went idle. This however opened up the issue that the stuck waiters were not being reported, causing a test case failure. If we move the stuck waiter detection out of hangcheck and into the breadcrumb mechanims (i.e. the waiter) itself, we can avoid this issue entirely. This leaves hangcheck looking for a stuck GPU (inspecting for request advancement and HEAD motion), and breadcrumbs looking for a stuck waiter - hopefully make both easier to understand by their segregation. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.