On CI test kms_cursor_legacy@basic-busy-flip-before-cursorfails https://intel-gfx-ci.01.org/CI/CI_DRM_2339/fi-snb-2520m/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html (kms_cursor_legacy:8933) CRITICAL: Test assertion failure function basic_flip_cursor, file kms_cursor_legacy.c:634: (kms_cursor_legacy:8933) CRITICAL: Failed assertion: get_vblank(display->drm_fd, pipe, 0) == vblank_start (kms_cursor_legacy:8933) CRITICAL: Last errno: 25, Inappropriate ioctl for device (kms_cursor_legacy:8933) CRITICAL: error: 27258 != 27252 Subtest basic-busy-flip-before-cursor-atomic failed. Dmesg: https://intel-gfx-ci.01.org/CI/CI_DRM_2339/fi-snb-2520m/dmesg-during.log
Also seen on IVB: Seems so so putting to same bug now. https://intel-gfx-ci.01.org/CI/CI_DRM_2338/fi-ivb-3770/igt@kms_cursor_legacy@basic-flip-after-cursor-atomic.html (kms_cursor_legacy:8658) CRITICAL: Test assertion failure function basic_flip_cursor, file kms_cursor_legacy.c:655: (kms_cursor_legacy:8658) CRITICAL: Failed assertion: get_vblank(display->drm_fd, pipe, 0) == vblank_start (kms_cursor_legacy:8658) CRITICAL: Last errno: 25, Inappropriate ioctl for device (kms_cursor_legacy:8658) CRITICAL: error: 19654 != 19650 Subtest basic-flip-after-cursor-atomic failed.
Both are due to a prolonged hpd event interrupting userspace, e.g.: [ 337.504999] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7145003f [ 337.505013] [drm:drm_dp_dpcd_access] Too many retries, giving up. First error: -110
Looking through the code, it almost feels like we are trying to initialize DP when nothing is connected to it and hence we get the aux time outs. What does this system look like? Is it a laptop or Desktop? With DP or any other VGA to DP or HDMI to DP connected? Manasi
See from here: https://intel-gfx-ci.01.org/CI/hardware.html SNB: fi-snb-2520m IBM Lenovo X220 Sandy Bridge i5-2520m HD Graphics 3000 / GT2 Internal, DP (VGA) IVB: fi-ivb-3770 HP Pro 3500 Ivy Bridge i7-3770 HD Graphics 4000 / GT2 DVI (VGA)
Also seen on sandybridge: https://intel-gfx-ci.01.org/CI/CI_DRM_2352/fi-snb-2520m/igt@kms_cursor_legacy@basic-flip-before-cursor-atomic.html
- igt@kms_cursor_legacy@basic-flip-after-cursor-atomic on fi-ivb-3770: Failure rate 1/48 run(s) (2%) - igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic on fi-snb-2520m: Failure rate 1/47 run(s) (2%) - igt@kms_cursor_legacy@basic-flip-before-cursor-atomic on fi-snb-2520m: Failure rate 1/34 run(s) (2%) Increasing the severity because it is a problem we have on CI.
(In reply to Chris Wilson from comment #2) > Both are due to a prolonged hpd event interrupting userspace, e.g.: > > [ 337.504999] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status > 0x7145003f > [ 337.505013] [drm:drm_dp_dpcd_access] Too many retries, giving up. First > error: -110 Chris, I can reproduce this issue on a i5-2450M on very rare occasions. It reproduced at least twice in over 400 attempts. In an i7-2620M, I couldn't trigger it. Once again, I don't have anything connected to DP, but the following is logged: [ 510.536908] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 510.539555] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 510.542180] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 510.542194] [drm:drm_dp_dpcd_access] Too many retries, giving up. First error: -110 Do you have any test case to trigger it faster?
Statistics: Failure rate 1/134 run(s) (0%) So quite rarely with test case mentioned on bug.
(In reply to Chris Wilson from comment #2) > Both are due to a prolonged hpd event interrupting userspace, e.g.: > > [ 337.504999] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status > 0x7145003f > [ 337.505013] [drm:drm_dp_dpcd_access] Too many retries, giving up. First > error: -110 I think this happens because more than one connector is sharing the same HPD line with the DP port. In the system where I could reproduce the issue, whenever I connect an HDMI cable, it got successfully detected, but the kernel also tried to run the DP detection code in sequence, triggering the timeouts. Since HDMI and DP were both initialized to port B, the HPD pin is raised by connecting the HDMI cable, and intel_digital_port_connected() doesn't have a chance to abort the DP detection attempt. My patch aborts the hotplug detection of further connectors when the first attached connector of a given port is detected. This prevents the DP timeout I mentioned when plugging HDMI and I think is backed up by the specification[1], in page 86, since it says that no devices sharing the same pins can be enabled simultaneously. While this prevents the timeouts, I am not sure it fixes the issue 100%, because it is hard to reproduce in my system. I'll keep trying and in the meantime, I'll send the patch as an RFC to the list. [1] https://01.org/sites/default/files/documentation/snb_ihd_os_vol3_part3_0.pdf
Similar also on pw run: https://intel-gfx-ci.01.org/CI/Patchwork_4524/fi-snb-2600/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html marked this system on CI too.
(In reply to krisman from comment #9) > My patch aborts the hotplug detection of further connectors when the first > attached connector of a given port is detected. This prevents the DP > timeout I mentioned when plugging HDMI and I think is backed up by the > specification[1], in page 86, since it says that no devices sharing the same > pins can be enabled simultaneously. > > While this prevents the timeouts, I am not sure it fixes the issue 100%, > because it is hard to reproduce in my system. I'll keep trying and in the > meantime, I'll send the patch as an RFC to the list. Submitted this in an attempt to reduce the DP detection timeouts. https://patchwork.freedesktop.org/patch/151486/
*** Bug 100558 has been marked as a duplicate of this bug. ***
For me to reproduce this issue, will I need a SNB or IVB system? Or has anyone seen this on the newer platforms? Also is this still valid on the latest kernel from drm-tip? Manasi
I would say you need SNB/IVB.
(In reply to Manasi from comment #13) > For me to reproduce this issue, will I need a SNB or IVB system? Or has > anyone seen this on the newer platforms? > Also is this still valid on the latest kernel from drm-tip? We were able to observe it on both SNB and IVB, I haven't seen it on newer platforms, but I suppose it will be easy to see the dp_aux_ch timeouts on newer platforms if connectors HPD use the same pin. I'll try triggering the vblank timeout again on the drm-intel-nightly, but I don't see anything committed that would improve the situation.
*** Bug 100966 has been marked as a duplicate of this bug. ***
Should now be fixed by: commit 0d67d8ee678e883ba8a1e40699b73b7db2d540fe Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Date: Tue May 9 11:58:55 2017 +0200 tests/kms_cursor_legacy: Increase tolerance for failing flip-before/after-cursor.
Thanks, whitelisting for CI.
Hmm, not sure it is that fixed: - https://intel-gfx-ci.01.org/CI/CI_DRM_2596/fi-snb-2600/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html - https://intel-gfx-ci.01.org/CI/CI_DRM_2594/fi-snb-2600/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy.html Should I file a new bug?
I think this needs a new intel-gpu-tools to work, these are still with the old igt.. maybe keep open till tomorrow?
OOps, try 2. commit 012ad453d86a4401b8377b6ab2d4eaeaf731914e Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Date: Tue May 9 17:28:49 2017 +0200 tests/kms_cursor_legacy: Fix failing tests harder Also allow the initial vblank evasion to fail. This was the most likely cause of the test failure, so be sure to fix it too. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Leaving open for the dmesg-warn. :)
With snb basic-busy-flip-before-cursor-legacy still fails with commit in comment #21 merged. Here is the relevant dmesg parts: [ 2609.175943] [IGT] kms_cursor_legacy: executing [ 2609.186757] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:40:VGA-1] [ 2609.186790] [drm:intel_crt_detect [i915]] [CONNECTOR:40:VGA-1] force=1 [ 2609.186814] [drm:intel_crt_detect [i915]] ironlake hotplug adpa=0xf40000, result 0 [ 2609.186832] [drm:intel_crt_detect [i915]] CRT not detected via hotplug [ 2609.188347] [drm:gmbus_xfer [i915]] GMBUS [i915 gmbus vga] NAK for addr: 0050 w(1) [ 2609.188377] [drm:gmbus_xfer [i915]] GMBUS [i915 gmbus vga] NAK on first message, retry [ 2609.190332] [drm:gmbus_xfer [i915]] GMBUS [i915 gmbus vga] NAK for addr: 0050 w(1) [ 2609.190341] [drm:drm_do_probe_ddc_edid] drm: skipping non-existent adapter i915 gmbus vga [ 2609.190371] [drm:intel_crt_get_edid [i915]] CRT GMBUS EDID read failed, retry using GPIO bit-banging [ 2609.190400] [drm:intel_gmbus_force_bit [i915]] enabling bit-banging on i915 gmbus vga. force bit now 1 [ 2609.191195] [drm:drm_do_probe_ddc_edid] drm: skipping non-existent adapter i915 gmbus vga [ 2609.191223] [drm:intel_gmbus_force_bit [i915]] disabling bit-banging on i915 gmbus vga. force bit now 0 [ 2609.191256] [drm:intel_crt_detect_ddc [i915]] CRT not detected via DDC:0x50 [no valid EDID found] [ 2609.191297] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:40:VGA-1] disconnected [ 2609.191547] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:43:HDMI-A-1] [ 2609.191594] [drm:intel_hdmi_detect [i915]] [CONNECTOR:43:HDMI-A-1] [ 2609.267504] [drm:gmbus_xfer [i915]] GMBUS [i915 gmbus dpd] NAK for addr: 0040 w(1) [ 2609.267549] [drm:gmbus_xfer [i915]] GMBUS [i915 gmbus dpd] NAK on first message, retry [ 2609.269659] [drm:gmbus_xfer [i915]] GMBUS [i915 gmbus dpd] NAK for addr: 0040 w(1) [ 2609.269673] [drm:drm_dp_dual_mode_detect] DP dual mode HDMI ID: (err -6) [ 2609.269681] [drm:drm_detect_monitor_audio] Monitor has basic audio support [ 2609.269935] [drm:drm_add_edid_modes] HDMI: DVI dual 0, max TMDS clock 0 kHz [ 2609.269941] [drm:drm_edid_to_eld] ELD monitor [ 2609.269947] [drm:drm_edid_to_eld] HDMI: latency present 0 0, video latency 0 0, audio latency 0 0 [ 2609.269952] [drm:drm_edid_to_eld] ELD size 24, SAD count 1 [ 2609.270128] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:43:HDMI-A-1] probed modes : [ 2609.270136] [drm:drm_mode_debug_printmodeline] Modeline 50:"1920x1080" 60 138780 1920 1968 2000 2080 1080 1083 1088 1112 0x48 0x9 [ 2609.270142] [drm:drm_mode_debug_printmodeline] Modeline 52:"1920x1080" 60 148500 1920 2008 2052 2200 1080 1084 1089 1125 0x40 0x5 [ 2609.270148] [drm:drm_mode_debug_printmodeline] Modeline 72:"1920x1080" 60 148352 1920 2008 2052 2200 1080 1084 1089 1125 0x40 0x5 [ 2609.270153] [drm:drm_mode_debug_printmodeline] Modeline 54:"1920x1080i" 60 74250 1920 2008 2052 2200 1080 1084 1094 1125 0x40 0x15 [ 2609.270159] [drm:drm_mode_debug_printmodeline] Modeline 73:"1920x1080i" 60 74176 1920 2008 2052 2200 1080 1084 1094 1125 0x40 0x15 [ 2609.270165] [drm:drm_mode_debug_printmodeline] Modeline 53:"1920x1080" 50 148500 1920 2448 2492 2640 1080 1084 1089 1125 0x40 0x5 [ 2609.270170] [drm:drm_mode_debug_printmodeline] Modeline 66:"1920x1080i" 50 74250 1920 2448 2492 2640 1080 1084 1094 1125 0x40 0x15 [ 2609.270176] [drm:drm_mode_debug_printmodeline] Modeline 51:"1920x1080" 40 92520 1920 1968 2000 2080 1080 1083 1088 1112 0x40 0x9 [ 2609.270181] [drm:drm_mode_debug_printmodeline] Modeline 55:"1280x720" 60 74250 1280 1390 1430 1650 720 725 730 750 0x40 0x5 [ 2609.270187] [drm:drm_mode_debug_printmodeline] Modeline 74:"1280x720" 60 74176 1280 1390 1430 1650 720 725 730 750 0x40 0x5 [ 2609.270192] [drm:drm_mode_debug_printmodeline] Modeline 67:"1280x720" 50 74250 1280 1720 1760 1980 720 725 730 750 0x40 0x5 [ 2609.270198] [drm:drm_mode_debug_printmodeline] Modeline 68:"720x576" 50 27000 720 732 796 864 576 581 586 625 0x40 0xa [ 2609.270204] [drm:drm_mode_debug_printmodeline] Modeline 70:"720x576i" 50 13500 720 732 795 864 576 580 586 625 0x40 0x101a [ 2609.270209] [drm:drm_mode_debug_printmodeline] Modeline 75:"720x480" 60 27027 720 736 798 858 480 489 495 525 0x40 0xa [ 2609.270215] [drm:drm_mode_debug_printmodeline] Modeline 56:"720x480" 60 27000 720 736 798 858 480 489 495 525 0x40 0xa [ 2609.270220] [drm:drm_mode_debug_printmodeline] Modeline 81:"720x480i" 60 13514 720 739 801 858 480 488 494 525 0x40 0x101a [ 2609.270226] [drm:drm_mode_debug_printmodeline] Modeline 62:"720x480i" 60 13500 720 739 801 858 480 488 494 525 0x40 0x101a [ 2609.270321] [drm:drm_mode_debug_printmodeline] Modeline 83:"640x480" 60 25200 640 656 752 800 480 490 492 525 0x40 0xa [ 2609.270341] [drm:drm_mode_debug_printmodeline] Modeline 64:"640x480" 60 25175 640 656 752 800 480 490 492 525 0x40 0xa [ 2609.270708] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:48:DP-1] [ 2609.270760] [drm:intel_dp_detect [i915]] [CONNECTOR:48:DP-1] [ 2609.273479] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.276213] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.279008] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.281678] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.284362] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.286925] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.289491] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.292056] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.294623] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.297184] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.299749] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.302325] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.304893] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.307461] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.310025] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.312590] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.315154] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.317720] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.320306] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.322872] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.325437] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.327998] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.330561] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.333122] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.335686] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.338247] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.340835] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.343434] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.346026] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.348619] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.351210] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.353839] [drm:intel_dp_aux_ch [i915]] dp_aux_ch timeout status 0x7143003f [ 2609.353853] [drm:drm_dp_dpcd_access] Too many retries, giving up. First error: -110 [ 2609.353858] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:48:DP-1] disconnected [ 2609.354025] [IGT] kms_cursor_legacy: starting subtest basic-busy-flip-before-cursor-legacy [ 2609.354212] [drm:drm_mode_addfb2] [FB:57] [ 2609.388992] [drm:drm_mode_addfb2] [FB:61] [ 2611.073816] [IGT] kms_cursor_legacy: exiting, ret=99
Created attachment 131433 [details] ftrace I've run the failing test on f3-snb, this is a ftrace of the failure
The trace checked out, did some annotations on the timings to be sure.. As a hunch I tried to see if the vblank timings were sane, it seems kms_setmode.basic (vblank timings) is not part of fast feedback. Farm3-snb with fake monitor (difference is always the same): (kms_setmode:4749) CRITICAL: vblank interval differs from modeline! expected 16666,7us, measured 16633us +- 3,436us, difference 33,7us (9,8 sigma) Farm3-snb with real monitor (difference value differs from above, but still always the same): (kms_setmode:4826) CRITICAL: vblank interval differs from modeline! expected 16672,7us, measured 16630us +- 1,843us, difference 42,7us (23,1 sigma)
Note that all of my tests were with a HDMI connector, completely ruling out DP as root cause here. :)
Created attachment 131460 [details] ftrace, weird vblank timings In this ftrace the difference in basic-busy-flip-before-cursor-legacy between vblank_start and the get_vblank() at the end is 3, while 1 is expected. The timing of calls to drm_handle_vblank looks wrong. They should be 16.7ms apart, but it doesn't look to be the case here. After the read() it's called 3 times in a row, only 1 ms apart.
Ah, seems idle state related.. for n in /sys/devices/system/cpu/cpu*/cpuidle/state*/disable; do echo 1 > $n; done Test passes!
Disabling C3 and higher seems to be enough to pass.
commit 7c8703fb02b248c2bcf9756bba8812bcfe7ed5d3 Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Date: Wed Jun 7 11:10:06 2017 +0200 kms_cursor_legacy: Add a burner thread to make basic-busy-* pass. This appears to be required on SNB, or else we miss a lot of interrupts. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
The kernel/HW is broken. However, this fix is not necessarily wrong, as in, the real kernel fix/workaround may actually be worse than the problem it is trying to fix, because it would increase the power usage while only fixing a use case that has not been seen in the wild (or has it?). We thus need to prove that this is the case and document this issue extensively in both IGT and the kernel. Until then, let's merge the revert to keep it in the task list
Well said, but how do we go forward? The wait states are probably the root cause, but I'm not sure about the fix.
Removing KBL tag and bug duplicate, it only happened there once so probably a fluke.
Closing bug, seems to have not happened in ~130 runs, might have been fixed upstream?
I agree we can close and archive this
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.