https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6265/fi-icl-y/igt@i915_selftest@live_hangcheck.html <7> [593.368203] [drm:i915_reset_engine [i915]] Failed to reset vcs2, ret=-110 <3> [593.368380] i915_reset_engine(vcs2:idle): failed, err=-110 <6> [593.368450] i915_reset_engine(vcs2:idle): 35 resets <3> [593.368453] i915_reset_engine(vcs2:idle): reset 35 times, but reported 36 <3> [593.368461] i915/intel_hangcheck_live_selftests: igt_reset_engines failed with error -110 <3> [593.575953] intel_hangcheck_live_selftests+0xa1/0xd0 [i915] timed out, cancelling all further testing.
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * igt@i915_selftest@live_hangcheck - dmesg-fail - i915/intel_hangcheck_live_selftests: igt_reset_engines failed with error -110 - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4399/fi-icl-dsi/igt@i915_selftest@live_hangcheck.html - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6265/fi-icl-y/igt@i915_selftest@live_hangcheck.html
From a chat with Chris (incorrectness to be blamed on me paraphrasing): The test repeatedly resets the GPU and checks that we can execute a request afterwards. The fact that it fails isn't so critical for users, and may even be related to us using a timeout that is too short. However there have been other issues with reset (see https://bugs.freedesktop.org/show_bug.cgi?id=110683) that could be related. Also, in passing runs ICL resets < 100 times, other platforms > 1000.
Set to "high" priority to reflect assessment done by Francesco's team
So far this has occurred twice in two consecutive runs on icl-y, then not again for a month/68 runs. Let's keep monitoring it to see if it happens again and decide next steps then.
Not seen im 90 runs / 1 month , 3 weeks. Closing.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.