105290 – [BSW/HD400] SynMark OglCSDof GPU hangs when shaders come from cache

Bug 105290 - [BSW/HD400] SynMark OglCSDof GPU hangs when shaders come from cache

Summary: [BSW/HD400] SynMark OglCSDof GPU hangs when shaders come from cache

Status:	VERIFIED DUPLICATE of bug 104636

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/DRI/i965 (show other bugs)
Version:	git
Hardware:	Other All

Importance:	medium normal
Assignee:	Jordan Justen
QA Contact:	Intel 3D Bugs Mailing List

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2018-02-28 13:21 UTC by Eero Tamminen
Modified:	2018-03-12 16:43 UTC (History)
CC List:	2 users (show)

See Also:	104636
i915 platform:
i915 features:

Attachments
GPU hang error state (52.03 KB, text/plain) 2018-03-06 08:49 UTC, Eero Tamminen	Details
View All

Description Eero Tamminen 2018-02-28 13:21:57 UTC

Setup:
- Ubuntu 16.04
- Latest drm-tip kernel or 4.15, and X server from same time frame
- Mesa git
- SynMark v7.0

Test-case:
- ./synmark2 OglCSDof

Between following commits:
2018-02-20 18:43:42 UTC 4c4e6232ee: freedreno/ir3: fix use_count refcnt'ing issue
2018-02-21 18:53:38 UTC 81dd4a7637: radeonsi: enable uvd encode for HEVC main

CSDof has started to GPU hang.  Hanging happens only with BSW.

Of the changes during this period, shader cache enabling looks most likely candidate.  I'm not going to bisect this, but I should have free BSW available tomorrow and can check whether disabling shader cache helps anything.

Comment 1 Eero Tamminen 2018-03-01 09:09:06 UTC

Verified that the hang is caused by shader cache:
* Works fine with "MESA_GLSL_CACHE_DISABLE=true" and when cache is empty
* GPU hangs when shader cache is enabled and shader is in cache

Comment 2 Jordan Justen 2018-03-03 02:24:28 UTC

After some debug, I think this might be something about growing
the instruction cache between batches. The shader cache doesn't
upload the pre-compiled default programs at link time, so some
batches may run with a smaller instruction cache size. As more
programs are used, the instruction cache will grow, but something
is not working properly with this on BSW.

I found that if I disable the shader cache with
MESA_GLSL_CACHE_DISABLE=1 but also set shader_precompile=false,
then I also get the hang.

(Thanks Ken for suggesting shader_precompile=false.)

Comment 3 Kenneth Graunke 2018-03-06 02:49:17 UTC

I tried to reproduce this today, and failed - it works fine, no matter what I try.

Comment 4 Eero Tamminen 2018-03-06 08:49:45 UTC

Created attachment 137814 [details]
GPU hang error state

Comment 5 Eero Tamminen 2018-03-06 08:50:19 UTC

(In reply to Kenneth Graunke from comment #3)
> I tried to reproduce this today, and failed - it works fine, no matter what
> I try.

Was it with HD405?  I'm still seeing hangs on HD400 with last evening commit:
  2018-03-03 04:56:35 UTC 411aa8c322: vbo: Try to reuse the same VAO more often for successive dlists.

-> Maybe this is HD400 specific like bug 104636?  Jordan's comment above sound like something that could be related to program cache corruption you see in bug 104636.

PS. bug 101406 is also BSW specific, although not a hang.

Comment 6 Jordan Justen 2018-03-06 16:03:37 UTC

In #intel-3d, Ken mentioned that he suspected that we might be
getting the scratch space wrong on HD 400.

I doubled the scratch space allocated in brw_alloc_stage_scratch,
and I was no longer seeing a hang.

Comment 7 Eero Tamminen 2018-03-07 14:18:00 UTC

(In reply to Jordan Justen from comment #6)
> In #intel-3d, Ken mentioned that he suspected that we might be
> getting the scratch space wrong on HD 400.
> 
> I doubled the scratch space allocated in brw_alloc_stage_scratch,
> and I was no longer seeing a hang.

I can verify that fix for bug 104636:
https://patchwork.freedesktop.org/patch/208502/

Fixes also this hang. -> DUPLICATE?

Comment 8 Eero Tamminen 2018-03-12 16:43:18 UTC


*** This bug has been marked as a duplicate of bug 104636 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.