- Ubuntu 16.04
- Latest drm-tip kernel or 4.15, and X server from same time frame
- Mesa git
- SynMark v7.0
- ./synmark2 OglCSDof
Between following commits:
2018-02-20 18:43:42 UTC 4c4e6232ee: freedreno/ir3: fix use_count refcnt'ing issue
2018-02-21 18:53:38 UTC 81dd4a7637: radeonsi: enable uvd encode for HEVC main
CSDof has started to GPU hang. Hanging happens only with BSW.
Of the changes during this period, shader cache enabling looks most likely candidate. I'm not going to bisect this, but I should have free BSW available tomorrow and can check whether disabling shader cache helps anything.
Verified that the hang is caused by shader cache:
* Works fine with "MESA_GLSL_CACHE_DISABLE=true" and when cache is empty
* GPU hangs when shader cache is enabled and shader is in cache
After some debug, I think this might be something about growing
the instruction cache between batches. The shader cache doesn't
upload the pre-compiled default programs at link time, so some
batches may run with a smaller instruction cache size. As more
programs are used, the instruction cache will grow, but something
is not working properly with this on BSW.
I found that if I disable the shader cache with
MESA_GLSL_CACHE_DISABLE=1 but also set shader_precompile=false,
then I also get the hang.
(Thanks Ken for suggesting shader_precompile=false.)
I tried to reproduce this today, and failed - it works fine, no matter what I try.
Created attachment 137814 [details]
GPU hang error state
(In reply to Kenneth Graunke from comment #3)
> I tried to reproduce this today, and failed - it works fine, no matter what
> I try.
Was it with HD405? I'm still seeing hangs on HD400 with last evening commit:
2018-03-03 04:56:35 UTC 411aa8c322: vbo: Try to reuse the same VAO more often for successive dlists.
-> Maybe this is HD400 specific like bug 104636? Jordan's comment above sound like something that could be related to program cache corruption you see in bug 104636.
PS. bug 101406 is also BSW specific, although not a hang.
In #intel-3d, Ken mentioned that he suspected that we might be
getting the scratch space wrong on HD 400.
I doubled the scratch space allocated in brw_alloc_stage_scratch,
and I was no longer seeing a hang.
(In reply to Jordan Justen from comment #6)
> In #intel-3d, Ken mentioned that he suspected that we might be
> getting the scratch space wrong on HD 400.
> I doubled the scratch space allocated in brw_alloc_stage_scratch,
> and I was no longer seeing a hang.
I can verify that fix for bug 104636:
Fixes also this hang. -> DUPLICATE?
*** This bug has been marked as a duplicate of bug 104636 ***