Bug 105128

Summary: [CI] igt@* - dmesg-warn - *ERROR* dp aux hw did not signal timeout!
Product: DRI Reporter: Marta Löfstedt <marta.lofstedt>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: RESOLVED MOVED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: minor    
Priority: low CC: intel-gfx-bugs, martin.peres, matthew.d.roper, tomi.p.sarvela
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: BXT, KBL, SKL i915 features: display/Other

Description Marta Löfstedt 2018-02-16 13:31:50 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3783/shard-apl8/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-mmap-cpu.html

[  200.843975] [drm:intel_dp_aux_ch [i915]] *ERROR* dp aux hw did not signal timeout (has irq: 1)!

I could have reopened bug 104062 however, since that was archived a long time ago. I made a new bug for this rare new APL occurrence.
Comment 1 Marta Löfstedt 2018-03-12 11:16:48 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3910/fi-kbl-7500u/igt@gem_exec_suspend@basic-s4-devices.html

<7>[  198.599572] [drm:intel_opregion_setup [i915]] Found valid VBT in ACPI OpRegion (Mailbox #4)
<7>[  198.609244] [drm:intel_dp_aux_xfer [i915]] dp_aux_ch timeout status 0x7c1003ff
<7>[  198.617661] [drm:intel_dp_aux_xfer [i915]] dp_aux_ch timeout status 0x7c1003ff
<7>[  198.626083] [drm:intel_dp_aux_xfer [i915]] dp_aux_ch timeout status 0x7c1003ff
<7>[  198.634495] [drm:intel_dp_aux_xfer [i915]] dp_aux_ch timeout status 0x7c1003ff
<3>[  198.642740] [drm:intel_dp_aux_xfer [i915]] *ERROR* dp aux hw did not signal timeout (has irq: 1)!
<3>[  198.642778] [drm:intel_dp_aux_xfer [i915]] *ERROR* dp_aux_ch not done status 0xac1003ff
Comment 2 Martin Peres 2018-04-18 11:07:46 UTC
Also seen on hibernate tests: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_22/fi-kbl-7500u/igt@gem_eio@hibernate.html

	
[   59.246542] Setting dangerous option reset - tainting kernel
[   59.249777] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
[   59.254239] Setting dangerous option reset - tainting kernel
[   64.828941] usb usb1: root hub lost power or was reset
[   64.828972] usb usb2: root hub lost power or was reset
[   64.831656] usb usb3: root hub lost power or was reset
[   64.831659] usb usb4: root hub lost power or was reset
[   64.891584] [drm:intel_dp_aux_xfer [i915]] *ERROR* dp aux hw did not signal timeout (has irq: 1)!
[   65.454836] Setting dangerous option reset - tainting kernel
[   65.455807] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
[   71.071867] usb usb1: root hub lost power or was reset
[   71.071870] usb usb2: root hub lost power or was reset
[   71.072632] usb usb3: root hub lost power or was reset
[   71.072634] usb usb4: root hub lost power or was reset
[   71.691027] Setting dangerous option reset - tainting kernel
[   71.691650] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
[   71.708335] Setting dangerous option reset - tainting kernel
[   71.708754] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
Comment 3 Martin Peres 2018-04-24 10:33:16 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4078/shard-kbl4/igt@gem_eio@suspend.html

[  204.807336] Setting dangerous option reset - tainting kernel
[  204.809756] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
[  204.814567] Setting dangerous option reset - tainting kernel
[  210.280951] Setting dangerous option reset - tainting kernel
[  210.284210] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
[  215.463406] [drm:intel_dp_aux_xfer [i915]] *ERROR* dp aux hw did not signal timeout (has irq: 1)!
[  215.765612] Setting dangerous option reset - tainting kernel
[  215.766304] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
[  215.805440] Setting dangerous option reset - tainting kernel
[  215.805889] i915 0000:00:02.0: Resetting chip for Manually set wedged engine mask = ffffffffffffffff
Comment 4 Martin Peres 2018-04-24 15:38:52 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_25/fi-skl-6770hq/igt@gem_exec_suspend@basic-s3-devices.html

[  291.458474] [drm:intel_dp_aux_xfer [i915]] *ERROR* dp aux hw did not signal timeout (has irq: 1)!
[  291.458517] [drm:intel_dp_aux_xfer [i915]] *ERROR* dp_aux_ch not done status 0xac1003ff
Comment 5 Martin Peres 2018-06-04 17:05:45 UTC
Also seen on a reload test: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4276/fi-skl-6770hq/igt@drv_module_reload@basic-reload-inject.html

	
[  450.673308] Setting dangerous option inject_load_failure - tainting kernel
[  452.062877] [drm:intel_dp_aux_xfer [i915]] *ERROR* dp aux hw did not signal timeout (has irq: 1)!
[  452.579538] Setting dangerous option reset - tainting kernel
Comment 6 Martin Peres 2018-06-18 07:33:09 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4331/fi-elk-e7500/igt@gem_ringfill@basic-default-hang.html

[  214.693090] [drm:intel_dp_aux_xfer [i915]] *ERROR* dp aux hw did not signal timeout (has irq: 1)!
[  222.759299] i915 0000:00:02.0: Resetting chip for hang on rcs0
Comment 7 Martin Peres 2018-06-22 10:10:00 UTC
Also seen without the " (has irq: 1)" part:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4360/fi-kbl-7500u/igt@gem_exec_suspend@basic-s4-devices.html

[drm:intel_dp_aux_xfer [i915]] *ERROR* dp aux hw did not signal timeout!
Comment 8 Matt Roper 2019-09-18 20:18:55 UTC
The "(has irq: 1)" part was removed by commit:

    commit 8a29c778fa1a50a25a3e66cf9589888758858d24
    Author: Lucas De Marchi <lucas.demarchi@intel.com>
    Date:   Wed May 23 11:04:35 2018 -0700

        drm/i915: remove check for aux irq

since all relevant platforms have the ability to utilize AUX interrupts these days; removing it from the bug title.

When we perform an aux transfer, we ask the hardware to give us an interrupt when the transfer completes, and we expect the 'busy' bit of the AUX control register to be zero at that point.  We set a timeout of 10ms for this completion to happen and print out the error message here if the control message still has the 'busy' bit asserted at that time --- given that the hardware itself is programmed to timeout after 1600us we should definitely have ended the transfer by this point, either through completion or through timeout).

It sounds like the hardware isn't behaving as we expect here; the next question is whether the hardware is sending us a completion interrupt, but failing to clear the 'busy' bit, or whether it's doing neither.  Right now we use the interrupt just as a notification to our workqueue to wake up and check the register bit again; if we seem to be still getting the interrupts even though the control register bit isn't updated, we could avoid waiting for the timeout to be declared.  The bspec (page 4301) does say "AUX Transaction complete interrupt if set OR when DDI_AUX_CTL_*[31:30] = ‘01’" --- given the emphasis on "OR," maybe it is valid in some cases for the hardware to not clear the bit even though it notified us via interrupt.

We should probably also update this error message to print out the value of the control register, just so that we can see which error bits and such are set at the time of timeout so that we'll have a better idea of what state the hardware is really in.

Impact-wise, I don't believe this should have any impact for an end-user (the hardware isn't behaving as we expect, but it doesn't interfere with the system otherwise); the main impact here is for CI since this behavior will lead to random dmesg-warn results.
Comment 9 Matt Roper 2019-09-18 20:19:40 UTC
*** Bug 111556 has been marked as a duplicate of this bug. ***
Comment 10 Matt Roper 2019-09-18 20:23:43 UTC
Also, all failures within the last month have been on gen9 platforms (KBL or SKL here, and BXT in 111556) so removing g45 from the platform list and adding these three gen9 platforms.
Comment 11 CI Bug Log 2019-10-16 06:31:26 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* TGL: igt@* - dmesg-warn - *ERROR* dp aux hw did not signal timeout!
  - http://gfx-ci.fi.intel.com/tree/drm-tip/CI_DRM_6973_HDMI/re-tgl1-display/igt@i915_pm_rpm@basic-rte.html
  - http://gfx-ci.fi.intel.com/tree/drm-tip/CI_DRM_6973_HDMI/re-tgl1-display/igt@i915_pm_rpm@debugfs-forcewake-user.html
  - http://gfx-ci.fi.intel.com/tree/drm-tip/CI_DRM_6973_HDMI/re-tgl1-display/igt@i915_pm_rpm@debugfs-read.html
  - http://gfx-ci.fi.intel.com/tree/drm-tip/CI_DRM_6973_HDMI/re-tgl1-display/igt@i915_pm_rpm@dpms-mode-unset-lpsp.html
  - http://gfx-ci.fi.intel.com/tree/drm-tip/CI_DRM_6973_HDMI/re-tgl1-display/igt@i915_pm_rpm@dpms-mode-unset-non-lpsp.html
  - http://gfx-ci.fi.intel.com/tree/drm-tip/CI_DRM_6973_HDMI/re-tgl1-display/igt@i915_pm_rpm@drm-resources-equal.html
  - http://gfx-ci.fi.intel.com/tree/drm-tip/CI_DRM_6973_HDMI/re-tgl1-display/igt@i915_pm_rpm@gem-execbuf.html
  - http://gfx-ci.fi.intel.com/tree/drm-tip/CI_DRM_6973_HDMI/re-tgl1-display/igt@i915_pm_rpm@modeset-stress-extra-wait.html
  - http://gfx-ci.fi.intel.com/tree/drm-tip/CI_DRM_6973_HDMI/re-tgl1-display/igt@i915_pm_rpm@pm-caching.html
  - http://gfx-ci.fi.intel.com/tree/drm-tip/CI_DRM_6973_HDMI/re-tgl1-display/igt@i915_pm_rpm@pm-tiling.html
  - http://gfx-ci.fi.intel.com/tree/drm-tip/CI_DRM_6973_HDMI/re-tgl1-display/igt@i915_pm_rpm@system-suspend-devices.html
  - http://gfx-ci.fi.intel.com/tree/drm-tip/CI_DRM_6973_HDMI/re-tgl1-display/igt@kms_flip@blocking-absolute-wf_vblank.html
  - http://gfx-ci.fi.intel.com/tree/drm-tip/CI_DRM_6973_HDMI/re-tgl1-display/igt@kms_flip@dpms-off-confusion.html
  - http://gfx-ci.fi.intel.com/tree/drm-tip/CI_DRM_6973_HDMI/re-tgl1-display/igt@kms_flip@flip-vs-modeset-vs-hang-interruptible.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7058/re-tgl1-display/igt@i915_pm_rpm@basic-rte.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7058/re-tgl1-display/igt@i915_pm_rpm@debugfs-forcewake-user.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7058/re-tgl1-display/igt@i915_pm_rpm@debugfs-read.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7058/re-tgl1-display/igt@i915_pm_rpm@dpms-mode-unset-lpsp.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7058/re-tgl1-display/igt@i915_pm_rpm@dpms-mode-unset-non-lpsp.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7058/re-tgl1-display/igt@i915_pm_rpm@drm-resources-equal.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7058/re-tgl1-display/igt@i915_pm_rpm@gem-execbuf.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7058/re-tgl1-display/igt@i915_pm_rpm@modeset-non-lpsp-stress.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7058/re-tgl1-display/igt@i915_pm_rpm@modeset-stress-extra-wait.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7058/re-tgl1-display/igt@i915_pm_rpm@pm-caching.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7058/re-tgl1-display/igt@i915_pm_rpm@pm-tiling.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7058/re-tgl1-display/igt@i915_pm_rpm@system-suspend-devices.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7058/re-tgl1-display/igt@kms_atomic_interruptible@legacy-setmode.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7058/re-tgl1-display/igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-c.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7058/re-tgl1-display/igt@kms_busy@extended-modeset-hang-oldfb-render-c.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7058/re-tgl1-display/igt@kms_flip@2x-flip-vs-modeset-vs-hang-interruptible.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7058/re-tgl1-display/igt@kms_flip@blocking-absolute-wf_vblank.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7058/re-tgl1-display/igt@kms_flip@dpms-off-confusion.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7058/re-tgl1-display/igt@kms_flip@dpms-vs-vblank-race-interruptible.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7058/re-tgl1-display/igt@kms_flip@flip-vs-modeset-vs-hang-interruptible.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7058/re-tgl1-display/igt@kms_flip@modeset-vs-vblank-race.html
Comment 12 Lakshmi 2019-10-16 06:36:04 UTC
(In reply to Matt Roper from comment #10)
> Also, all failures within the last month have been on gen9 platforms (KBL or
> SKL here, and BXT in 111556) so removing g45 from the platform list and
> adding these three gen9 platforms.

Matt, failures are also seen on TGL. Let me know we need to create a separate bug for TGL. If we don't need a separate issue, how do you see the severity and priority of this bug?
Comment 13 CI Bug Log 2019-10-16 09:31:45 UTC
A CI Bug Log filter associated to this bug has been updated:

{- TGL: igt@* - dmesg-warn - *ERROR* dp aux hw did not signal timeout! -}
{+ TGL: igt@* - dmesg-warn - *ERROR* dp aux hw did not signal timeout! +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@gem_exec_schedule@smoketest-bsd1.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@gem_persistent_relocs@forked-interruptible-faulting-reloc.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@gem_softpin@softpin.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@i915_pm_backlight@fade_with_dpms.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@i915_pm_rpm@basic-pci-d3-state.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@i915_pm_rpm@cursor.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@i915_pm_rpm@cursor-dpms.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@i915_pm_rpm@dpms-lpsp.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@i915_pm_rpm@fences.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@i915_pm_rpm@fences-dpms.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@i915_pm_rpm@gem-evict-pwrite.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@i915_pm_rpm@gem-idle.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@i915_pm_rpm@gem-mmap-gtt.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@i915_pm_rpm@i2c.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@i915_pm_rpm@legacy-planes.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@i915_pm_rpm@legacy-planes-dpms.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@i915_pm_rpm@modeset-lpsp.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@i915_pm_rpm@modeset-lpsp-stress.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@i915_pm_rpm@modeset-lpsp-stress-no-wait.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@i915_pm_rpm@sysfs-read.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@i915_pm_rpm@system-suspend.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@i915_pm_rpm@system-suspend-modeset.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@i915_pm_rpm@universal-planes-dpms.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_atomic_interruptible@legacy-cursor.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_atomic_interruptible@legacy-pageflip.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_big_fb@x-tiled-8bpp-rotate-0.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_big_fb@y-tiled-16bpp-rotate-90.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_big_fb@y-tiled-64bpp-rotate-0.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_big_fb@y-tiled-64bpp-rotate-180.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_big_fb@y-tiled-8bpp-rotate-0.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_busy@basic-flip-a.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_busy@basic-modeset-a.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_busy@extended-modeset-hang-newfb-render-d.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-d.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_busy@extended-pageflip-hang-newfb-render-a.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_cursor_crc@pipe-a-cursor-128x128-offscreen.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_cursor_crc@pipe-a-cursor-256x256-rapid-movement.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_cursor_edge_walk@pipe-a-128x128-left-edge.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_cursor_legacy@flip-vs-cursor-atomic.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_cursor_legacy@flip-vs-cursor-crc-atomic.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_cursor_legacy@flip-vs-cursor-crc-legacy.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_cursor_legacy@flip-vs-cursor-legacy.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_cursor_legacy@flip-vs-cursor-toggle.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_flip@2x-flip-vs-modeset-vs-hang.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_flip@busy-flip.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_flip@dpms-vs-vblank-race.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_flip@flip-vs-modeset-vs-hang.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_flip@flip-vs-panning-vs-hang.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_flip@flip-vs-panning-vs-hang-interruptible.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-indfb-msflip-blt.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_frontbuffer_tracking@fbc-2p-shrfb-fliptrack.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-spr-indfb-draw-mmap-wc.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_frontbuffer_tracking@fbcpsr-rgb101010-draw-pwrite.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_frontbuffer_tracking@fbc-rgb101010-draw-mmap-gtt.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_frontbuffer_tracking@psr-2p-primscrn-cur-indfb-draw-pwrite.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_frontbuffer_tracking@psr-2p-scndscrn-cur-indfb-onoff.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_plane_alpha_blend@pipe-a-coverage-7efc.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_plane_cursor@pipe-a-overlay-size-128.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_plane_cursor@pipe-a-viewport-size-128.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_plane@plane-position-hole-pipe-a-planes.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_plane_scaling@pipe-a-scaler-with-pixel-format.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_plane_scaling@pipe-a-scaler-with-rotation.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_prime@basic-crc.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_rotation_crc@bad-tiling.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_rotation_crc@primary-rotation-180.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_rotation_crc@primary-rotation-270.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_rotation_crc@primary-rotation-90.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_rotation_crc@primary-x-tiled-reflect-x-0.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_rotation_crc@primary-x-tiled-reflect-x-180.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_rotation_crc@primary-y-tiled-reflect-x-0.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_rotation_crc@primary-y-tiled-reflect-x-180.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_rotation_crc@primary-y-tiled-reflect-x-270.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_rotation_crc@sprite-rotation-180.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_rotation_crc@sprite-rotation-270.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_rotation_crc@sprite-rotation-90.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_rotation_crc@sprite-rotation-90-pos-100-0.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_vblank@pipe-a-query-busy-hang.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_vblank@pipe-a-query-forked-busy-hang.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_vblank@pipe-a-query-forked-hang.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_vblank@pipe-a-query-idle-hang.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_vblank@pipe-a-ts-continuation-idle-hang.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_vblank@pipe-a-ts-continuation-modeset-hang.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_vblank@pipe-a-wait-busy-hang.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_vblank@pipe-a-wait-forked-busy-hang.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_vblank@pipe-a-wait-forked-hang.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_vblank@pipe-a-wait-idle-hang.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_vblank@pipe-b-accuracy-idle.html
Comment 14 CI Bug Log 2019-10-16 09:34:21 UTC
A CI Bug Log filter associated to this bug has been updated:

{- TGL: igt@* - dmesg-warn - *ERROR* dp aux hw did not signal timeout! -}
{+ TGL: igt@* - dmesg-warn - *ERROR* dp aux hw did not signal timeout! +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@i915_pm_rpm@drm-resources-equal.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7093/re-tgl-u/igt@kms_plane_scaling@pipe-a-scaler-with-clipping-clamping.html
Comment 15 Matt Roper 2019-10-17 00:13:08 UTC
(In reply to Lakshmi from comment #12)
> (In reply to Matt Roper from comment #10)
> > Also, all failures within the last month have been on gen9 platforms (KBL or
> > SKL here, and BXT in 111556) so removing g45 from the platform list and
> > adding these three gen9 platforms.
> 
> Matt, failures are also seen on TGL. Let me know we need to create a
> separate bug for TGL. If we don't need a separate issue, how do you see the
> severity and priority of this bug?

I think this bug is fine for the TGL failures as well.

As far as I can tell, this bug is harmless.  Our driver complains that the hardware didn't do something we expected it to (report completion of something that we know has to be complete by this point), but we still detect that condition and proceed properly.  I don't think there would be any visible impact to an end-user in this case.  I think dropping this to low importance is reasonable here; once the more important issues are dealt with we may want to work with the hardware guys to figure out if there's anything the driver might be doing that triggers this behavior occasionally or whether it's just a minor hardware bug.  If it does turn out to be a hardware bug that we can't control then we'll probably want to downgrade the DRM_ERROR here to a DRM_DEBUG.
Comment 16 ashutosh.dixit 2019-10-24 21:01:41 UTC
*** Bug 107139 has been marked as a duplicate of this bug. ***
Comment 17 Matt Roper 2019-10-24 23:45:57 UTC
This failure seems to happen a lot more frequently on TGL than it did on SKL/KBL, and not just on suspend/resume like it did before.  AFAICS, it's still an "impossible" case if the hardware is actually behaving properly; even in legitimate timeout cases the hardware is supposed to recognize the timeout and clear the busy bit after a specific amount of time (gen12 has a 4ms hardware timeout, gen6-11 had a 1.5ms timeout), and we only declare this timeout failure and print the message after 10ms have elapsed.

I think the next step is to see if we're getting AUX completion interrupts even though the "BUSY" flag never turns off like it's supposed to.  I'll probably send a patch tomorrow to try to recognize cases where we received an aux channel interrupt but still see the aux as busy; that should give us more insight into why/how the hardware is misbehaving.  Then we can bring that data to the hardware guys and see if they can think of any reason for this behavior or have suggestions on how best to work around it.
Comment 18 Matt Roper 2019-10-25 23:08:24 UTC
After more investigation, it appears that the frequent TGL failures happen after the platform enters DC6.  Since the AUX B and AUX C power domains are in PG1 on TGL rather than PG3 as they were on ICL, I believe we need to add these to the "DC off" list so that the DMC firmware won't turn them off behind our back when entering DC states.

I've sent a series with a change that I hope will make these failures go away (plus some other AUX fixes and cleanups) here:
   https://patchwork.freedesktop.org/series/68590/
Comment 19 Matt Roper 2019-10-29 18:02:13 UTC
This will hopefully be fixed by
  https://cgit.freedesktop.org/drm/drm-tip/commit/?id=6a3552527d431ae3281ce0dfa25107e71cc681e2

which just landed.  I'll wait for some updated CI results to show up before changing the status of this bug.

I just realized I forgot to add the "Bugzilla:" link to the commit message.
Comment 20 Matt Roper 2019-11-08 16:45:13 UTC
It doesn't look like there have been any more instances of this on TGL CI since my patch landed.  That doesn't give 100% certainty since the reproduction rate was inconsisten before, but it does look like a good sign.  I'm going to drop TGL from the platform list under the assumption that the DC state fix has solved the problems on that platform.

My patch doesn't address the failures that happen on gen9 once in a blue moon so I'll leave this bug open for now.
Comment 21 Martin Peres 2019-11-29 17:40:38 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/77.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.