107640 – [skl] x11perf triggers gpu hang

Bug 107640 - [skl] x11perf triggers gpu hang

Summary: [skl] x11perf triggers gpu hang

Status:	RESOLVED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/DRI/i965 (show other bugs)
Version:	18.0
Hardware:	Other All

Importance:	medium normal
Assignee:	Intel 3D Bugs Mailing List
QA Contact:	Intel 3D Bugs Mailing List

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2018-08-21 14:52 UTC by Rob Clark
Modified:	2018-10-23 17:54 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments
dump of /sys/class/drm/card0/error after hang (64.12 KB, text/plain) 2018-08-28 12:50 UTC, Rob Clark	Details
View All

Description Rob Clark 2018-08-21 14:52:32 UTC

I've reproduced this on i7-6820HQ but I guess it effects other skl.  (I wasn't able to reproduce this on a kbl system that I tried).  Can reproduce with:

  x11perf -v1.3 -rop GXxor -seg10

(with either gnome-shell + Xwayland or xf86-video-modesetting..  ie. either path that uses glamor.)

Comment 1 Denis 2018-08-22 10:50:37 UTC

hi. I can confirm that issue is actual for SKL, and not actual for KBL.
Checked 13.0.6, 17.2.0 and 18.3.0 mesa versions (all on 4.13.0 kernel, ubuntu 18.04)

Will try to check other kernel versions.

Comment 2 Denis 2018-08-22 11:35:18 UTC

hm, looks like Xwayland issue. Can't reproduce it using gnome-shell + Xorg (both, xserver and xwayland are 1.19.6 version)

Comment 3 Rob Clark 2018-08-22 11:47:42 UTC

(In reply to Denis from comment #2)
> hm, looks like Xwayland issue. Can't reproduce it using gnome-shell + Xorg
> (both, xserver and xwayland are 1.19.6 version)

I'm guessing with xorg session, if you were using xf86-video-modesetting, you would see it.  I think the common thread is glamor, ie. it is a mesa issue.

Comment 4 Denis 2018-08-22 14:16:30 UTC

hmm, you are right. Below my results:


Server                  Driver    Number of tries    Fails

Xorg                     Intel           7              0
Xorg                  modesetting        2              2
Xwayland                  Intel          2              2
Xwayland              modesetting        3              3

Comment 5 Rob Clark 2018-08-28 12:50:23 UTC

Created attachment 141306 [details]
dump of /sys/class/drm/card0/error after hang

corresponding dmesg from hang:

[Aug28 08:47] [drm] GPU HANG: ecode 9:0:0x84df3cfc, in Xorg [1824], reason: Hang on rcs0, action: reset
[  +0.000001] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  +0.000000] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  +0.000001] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  +0.000000] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  +0.000000] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  +0.000007] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[  +9.014104] i915 0000:00:02.0: Resetting rcs0 after gpu hang

Comment 6 Chris Wilson 2018-10-18 21:14:36 UTC

Fwiw, reproduced on skl with 18.0.5, and appears fixed in mesa.git. Reverse bisect ensuing.

Comment 7 Chris Wilson 2018-10-18 22:41:54 UTC

904c2a617d86944fbdc2c955f327aacd0b3df318 is the first ok commit
commit 904c2a617d86944fbdc2c955f327aacd0b3df318
Author: Nanley Chery <nanley.g.chery@intel.com>
Date:   Wed Aug 22 10:43:32 2018 -0700

    i965/gen7_urb: Re-emit PUSH_CONSTANT_ALLOC on some gen9
    
    According to internal docs, some gen9 platforms have a pixel shader push
    constant synchronization issue. Although not listed among said
    platforms, this issue seems to be present on the GeminiLake 2x6's we've
    tested.
    
    We consider the available workarounds to be too detrimental on
    performance. Instead, we mitigate the issue by applying part of one of
    the workarounds. Re-emit PUSH_CONSTANT_ALLOC at the top of every batch
    (as suggested by Ken).
    
    Fixes ext_framebuffer_multisample-accuracy piglit test failures with the
    following options:
    * 6 depth_draw small depthstencil
    * 8 stencil_draw small depthstencil
    * 6 stencil_draw small depthstencil
    * 8 depth_resolve small
    * 6 stencil_resolve small depthstencil
    * 4 stencil_draw small depthstencil
    * 16 stencil_draw small depthstencil
    * 16 depth_draw small depthstencil
    * 2 stencil_resolve small depthstencil
    * 6 stencil_draw small
    * all_samples stencil_draw small
    * 2 depth_draw small depthstencil
    * all_samples depth_draw small depthstencil
    * all_samples stencil_resolve small
    * 4 depth_draw small depthstencil
    * all_samples depth_draw small
    * all_samples stencil_draw small depthstencil
    * 4 stencil_resolve small depthstencil
    * 4 depth_resolve small depthstencil
    * all_samples stencil_resolve small depthstencil
    
    v2: Include more platforms in WA (Ken).
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106865
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93355
    Cc: <mesa-stable@lists.freedesktop.org>
    Tested-by: Mark Janes <mark.a.janes@intel.com>
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

:040000 040000 f3bb6b73b8f93172db0358351c1eaf9c229f9bb3 0536193be2992f1d5736b0f8298bb56f658e1b56 M	src

Comment 8 Rob Clark 2018-10-23 17:54:07 UTC

hmm, what are the odds this depends also on some previous patch that isn't in 18.0..

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.