Bug 103878 - [BAT][CFL only] igt@*- incomplete - timeout/system hang
Summary: [BAT][CFL only] igt@*- incomplete - timeout/system hang
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: high critical
Assignee: Marta Löfstedt
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-24 07:08 UTC by Marta Löfstedt
Modified: 2018-02-09 07:17 UTC (History)
1 user (show)

See Also:
i915 platform: CFL
i915 features:


Attachments

Description Marta Löfstedt 2017-11-24 07:08:19 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3378/fi-cfl-s2/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c.html

dmesg:
<5>[   42.954561] owatch: Using watchdog device /dev/watchdog0
<5>[   42.954700] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   42.955489] owatch: timeout for /dev/watchdog0 set to 100 (requested 100)
... 
<7>[  519.899445] [drm:verify_single_dpll_state.isra.76 [i915]] DPLL 1
<7>[  519.899566] [drm:edp_panel_vdd_on [i915]] Turning eDP port A VDD on
<7>[  519.899620] [drm:wait_panel_power_cycle [i915]] Wait for panel power cycle

run.log:
running: igt/gem_exec_suspend/basic-s3

[108/289] skip: 11, pass: 97 |        
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3378/fi-cfl-s2/0 : FAILURE
CI_IGT_test runtime 544 seconds
Rebooting fi-cfl-s2

weird last test in runlog is igt/gem_exec_suspend/basic-s3.  igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c is run a lot later.

igt/gem_exec_suspend/basic-s3 started:
<7>[  232.754396] [IGT] gem_exec_suspend: starting subtest basic-S3
...
<7>[  238.593699] [IGT] gem_exec_suspend: exiting, ret=0
...

Then basic-S4-devices is run, before s4 is finished there is error logs from "ASIX AX88179 USB 3.0 Gigabit Ethernet":

<7>[  238.735829] [IGT] gem_exec_suspend: starting subtest basic-S4-devices
...
<3>[  248.884891] ax88179_178a 2-6:1.0 enx000acd2892d1: Error submitting the control message: status=-19
<3>[  248.884960] ax88179_178a 2-6:1.0 enx000acd2892d1: Error submitting the control message: status=-19
<4>[  248.885122] ax88179_178a 2-6:1.0 enx000acd2892d1: Failed to read reg index 0x0000: -19
<4>[  248.885150] ax88179_178a 2-6:1.0 enx000acd2892d1: Failed to read reg index 0x0001: -19
<4>[  248.885176] ax88179_178a 2-6:1.0 enx000acd2892d1: Failed to read reg index 0x0009: -19
<4>[  248.885205] ax88179_178a 2-6:1.0 enx000acd2892d1: Failed to read reg index 0x000a: -19
<4>[  248.885229] ax88179_178a 2-6:1.0 enx000acd2892d1: Failed to read reg index 0x0004: -19
<4>[  248.885253] ax88179_178a 2-6:1.0 enx000acd2892d1: Failed to read reg index 0x0005: -19
<3>[  248.885320] ax88179_178a 2-6:1.0 enx000acd2892d1: Error submitting the control message: status=-19
<3>[  248.885385] ax88179_178a 2-6:1.0 enx000acd2892d1: Error submitting the control message: status=-19
<6>[  248.885772] ax88179_178a 2-6:1.0 enx000acd2892d1: unregister 'ax88179_178a' usb-0000:00:14.0-6, ASIX AX88179 USB 3.0 Gigabit Ethernet
<4>[  248.885907] ax88179_178a 2-6:1.0 enx000acd2892d1: Failed to read reg index 0x0002: -19
<4>[  248.885932] ax88179_178a 2-6:1.0 enx000acd2892d1: Failed to write reg index 0x0002: -19
<4>[  248.903770] ax88179_178a 2-6:1.0 enx000acd2892d1 (unregistered): Failed to write reg index 0x0002: -19
<4>[  248.903789] ax88179_178a 2-6:1.0 enx000acd2892d1 (unregistered): Failed to write reg index 0x0001: -19
<4>[  248.903807] ax88179_178a 2-6:1.0 enx000acd2892d1 (unregistered): Failed to write reg index 0x0002: -19
<6>[  248.968529] PM: hibernation exit
<4>[  249.517365] Setting dangerous option reset - tainting kernel
<7>[  249.518209] [IGT] gem_exec_suspend: exiting, ret=0

So, I believe that there was a network issue, the machine kept running the tests but the machine will rebooted by Jenkins loosing connection.

I.e. this is not a i915 issue.
Comment 1 Marta Löfstedt 2017-11-24 11:19:35 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3380/fi-cfl-s2/igt@gem_busy@basic-hang-default.html

this time no suspicious network related prints. However, it still looks like network related since run.log hasn't captured the test that was actually run.

dmesg:
<5>[   23.416428] owatch: Using watchdog device /dev/watchdog0
<5>[   23.416519] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   23.417085] owatch: timeout for /dev/watchdog0 set to 100 (requested 100)
...
<4>[   29.116530] Setting dangerous option reset - tainting kernel
<4>[   29.119195] Setting dangerous option reset - tainting kernel
<7>[   29.119239] [IGT] gem_busy: starting subtest basic-hang-default

run.log:
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3380/fi-cfl-s2/0 : FAILURE
CI_IGT_test runtime 12 seconds
Rebooting fi-cfl-s2
Comment 2 Marta Löfstedt 2017-11-24 11:26:34 UTC
I filed https://bugzilla.kernel.org/show_bug.cgi?id=197971 for this issue.
By maybe we should try to get another network dongle.
Comment 3 Marta Löfstedt 2017-12-14 07:13:29 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3510/fi-cfl-s2/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b.html

This doesn't appear to have the ax88179_178a issues, but the intention of this bug is for all CFL unexplained incompletes.

last dmesg:
<7>[  502.050725] [drm:drm_mode_addfb2] [FB:103]
<7>[  502.057806] [drm:drm_mode_setcrtc] [CRTC:47:pipe B]
<7>[  502.057820] [drm:drm_mode_setcrtc] [CONNECTOR:66:DP-1]

run.log:
running: igt/kms_pipe_crc_basic/suspend-read-crc-pipe-b

[244/288] skip: 24, pass: 220 |                        
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3510/fi-cfl-s2/0 : FAILURE
CI_IGT_test runtime 852 seconds
Rebooting fi-cfl-s2
Comment 4 Marta Löfstedt 2018-01-09 12:46:21 UTC
Here is another occurrence:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3603/fi-cfl-s2/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c.html

last test in run.log:
running: igt/gem_exec_suspend/basic-s3

However igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c is blamed for the incomplete.

<6>[  193.230328] ax88179_178a 2-6:1.0 enx000acd2892d1: ax88179 - Link status is: 1
<4>[  193.232377] Suspending console(s) (use no_console_suspend to debug)
<4>[  193.239412] ax88179_178a 2-6:1.0 enx000acd2892d1: Failed to write reg index 0x000d: -108
<4>[  193.239483] ax88179_178a 2-6:1.0 enx000acd2892d1: Failed to read reg index 0x000e: -113
Comment 5 Marta Löfstedt 2018-02-08 10:39:48 UTC
This hasn't happened since CI_DRM_3603: 2018-01-05 / 191 runs ago I will archive and close it tomorrow.
Comment 6 Marta Löfstedt 2018-02-09 07:17:17 UTC
Last seen  CI_DRM_3603: 2018-01-05 / 198 runs ago

probably fixed upstream ~4.15.0-rc7


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.