Bug 109380

Summary: [CI][BAT] igt@kms_chamelium@*- warn/fail - Last errno: 113, No route to host
Product: DRI Reporter: Martin Peres <martin.peres>
Component: IGTAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: high CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Martin Peres 2019-01-18 11:35:43 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_2259/fi-kbl-7567u/igt@kms_chamelium@common-hpd-after-suspend.html

Starting subtest: common-hpd-after-suspend
Subtest common-hpd-after-suspend: SUCCESS (10.860s)
(kms_chamelium:3002) igt_chamelium-CRITICAL: Test assertion failure function chamelium_rpc, file ../lib/igt_chamelium.c:303:
(kms_chamelium:3002) igt_chamelium-CRITICAL: Failed assertion: !chamelium->env.fault_occurred
(kms_chamelium:3002) igt_chamelium-CRITICAL: Last errno: 113, No route to host
(kms_chamelium:3002) igt_chamelium-CRITICAL: Chamelium RPC call failed: libcurl failed to execute the HTTP POST transaction, explaining:  Failed to connect to 192.168.1.224 port 9992: No route to host
Comment 1 Martin Peres 2019-01-18 12:39:28 UTC
Moving to IGT as this is something the tests need to learn how to deal with.
Comment 3 CI Bug Log 2019-05-15 06:33:51 UTC
A CI Bug Log filter associated to this bug has been updated:

{- CHAMELIUM: igt@kms_chamelium@*suspend* - warn - Last errno: 113, No route to host -}
{+ CHAMELIUM: igt@kms_chamelium@* - warn/fail - Last errno: 113, No route to host +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_284/fi-kbl-7500u/igt@kms_chamelium@hdmi-cmp-planes-random.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_284/fi-kbl-7567u/igt@kms_chamelium@hdmi-cmp-planes-random.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_285/fi-kbl-7500u/igt@kms_chamelium@dp-audio.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_285/fi-kbl-7567u/igt@kms_chamelium@dp-audio.html
Comment 4 Arek Hiler 2019-08-16 07:47:55 UTC
Other than occasional flukes here and there the biggest offender seems to be *after-suspend* family of subtest.

The most representative scenario looks like that:
1. schedule something on chamelium
2. got to sleep
3. scheduled thing triggers
4. wake up
5. check whether we see the changes reflected through DRM and pass
6. exit triggring chamelium cleanup

During 6 we have a failed RPC because network is not up yet after waking up, which overwrites the test results to WARN.

Solution: introduce chamelium_wait_online(int timeout) that pings the device and bails out after timeout. Always call it after waking up.

Everything else looks like sporadic network issues, but we have seen it just a few times in the last months. We can investigate them further once we will get rid of the main source of the noise.

Impact on users: none, it's a CI/test issue.
Impact on testing: potentially huge, as some of the tests may leave us with chamelium ports unplugged adding to the flip-flopping of the 2x tests.

Bumping priority to high, as it is easy to solve and important for keeping the CI noise down.
Comment 5 Arek Hiler 2019-08-16 08:36:12 UTC
https://patchwork.freedesktop.org/patch/324290/
Comment 6 emersion 2019-08-16 13:31:42 UTC
Related bug about Chamelium not having all ports plugged in: https://bugs.freedesktop.org/show_bug.cgi?id=110940
Comment 8 Arek Hiler 2019-09-05 11:06:20 UTC
Merged and fixed, issues not seen in 2 weekd :-)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.