Bug 111384

Summary: [BXT/Iris] (recoverable) GPU hang in SynMark compute CSCloth
Product: Mesa Reporter: Eero Tamminen <eero.t.tamminen>
Component: Drivers/Gallium/IrisAssignee: Intel 3D Bugs Mailing List <intel-3d-bugs>
Status: VERIFIED WORKSFORME QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium    
Version: git   
Hardware: Other   
OS: All   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=111385
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 111444    
Attachments: i915 error state for GPU hang

Description Eero Tamminen 2019-08-12 14:24:59 UTC
Created attachment 145036 [details]
i915 error state for GPU hang

Setup:
- BXT J4205
- ClearLinux (30730)
- drm-tip git kernel (0330b51e91)
- Mesa git (5ed4e31c08d)
- Weston git build

Test-case:
- 3x fullscreen FullHD SynMark CSCloth compute test-case (Wayland version):
  synmark2 OglCSCloth

Actual outcome:
- Recoverable GPU hang:
-------------------------------
[ 8477.103209] i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0
[ 8477.103216] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 8477.103217] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 8477.103219] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 8477.103220] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 8477.103222] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 8477.104241] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[ 8477.105009] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
-------------------------------

This is faster of the 2 SynMark compute tests.

I wasn't able to reproduce the hang after reboot when re-running just CSCloth 10 times, so it may depend on previous tests, or is just very hard to reproduce.

I haven't seen such hangs with the i965 driver, and that test didn't hang on SKL GT2 (another one did).  I didn't see such hang when running similar test-set month ago, so it can be a regression.
Comment 1 Eero Tamminen 2019-08-29 12:37:37 UTC
I haven't seen these hangs since, but that setup doesn't have automated runs, so I don't have enough data points to conclude anything (currently Weston bug #273 is also preventing semi-automated testing with latest gfx stack).

I'm fine for this being closed as WORKSFORME after bug 111385 is fixed or after few weeks (whichever comes first), if I haven't seen it again.
Comment 2 Eero Tamminen 2019-09-05 16:12:00 UTC
In last ~2 weeks (3 test runs on most days), only (recoverable) GPU Iris hang on BXT was one hang in Manhattan 3.1, nothing in CSCloth -> WORKSFORME.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.