Created attachment 124648 [details]
The Talos Principle bug series continues from bug 96607 with another texture misrendering, see bottom-right corner of the attached screenshot. The texture is flickering between good and bad rendering while ingame.
This seems to be a Skylake specific issue as it doesn't reproduce on Haswell.
Does not seem to be a regression and shows up since SKL support was added to mesa. Same result on 4.4 and 4.6 kernels.
Screenshot is from the last frame of this trace:
Somewhat trimmed trace:
bad draw call: 254494
It looks like some sort of caching/batching issue or something like that. While I was trimming the trace, the problem would randomly go away and return when removing draw calls in frames before the 'bad' draw call. The problem also goes away if draw call 254532 (which is after the 'bad' one) is removed. When inspecting the output with qapitrace, it occasionally also renders correctly, although most of the time it's wrong. Replaying the above trace with glretrace always triggers the issue though.
I've noticed my trimmed trace triggers a debug assert, so here is another working one:
bad draw call: 266049
It seems MESA_DEBUG=flush or setting always_flush_batch helps somewhat, but it still misrenders on around half of the tries.
Sticking glTextureBarrier after the bad call helps even more, but still doesn't completely solve it.
(In reply to Grazvydas Ignotas from comment #2)
> I've noticed my trimmed trace triggers a debug assert, so here is another
> working one:
> bad draw call: 266049
> It seems MESA_DEBUG=flush or setting always_flush_batch helps somewhat, but
> it still misrenders on around half of the tries.
> Sticking glTextureBarrier after the bad call helps even more, but still
> doesn't completely solve it.
glTextureBarrier got a fix just today on mesa master.
Additionally, some patches were sent related to the caching/flushing(still pending to get reviewed):
Just in case you want to test if they help.
Just retested - unfortunately they don't help, not even a bit. Note that the game doesn't use glTextureBarrier, I was just hacking it in, which doesn't fix anything, just reduces the frequency of the problem.
Could you try again, e.g. Mesa 17.x? I don't see this bug on SKL and there have been a lot of fixes both to Talos and Mesa Vulkan support since you filed the bug.
(In reply to Eero Tamminen from comment #5)
> I don't see this bug on SKL
How many times have you tried? It's random doesn't reproduce reliably. The trimmed trace from Comment 2 is reproducing it (much) more often (I use glretrace -b -w).
And yes it's still reproducing on mesa-git and current version of the game. Curiously anv doesn't have this issue, but it suffers from general random object flicker and poor performance (around half of OpenGL).
I can reproduce using the trace on SKL/Mesa-17.0.2. It looks exactly like the screenshot.
(In reply to Matt Turner from comment #7)
> I can reproduce using the trace on SKL/Mesa-17.0.2. It looks exactly like
> the screenshot.
Matt, does the issue still happen or is it already fixed?
Still there for me on today's git.
Reproduced on KBL with provided apitraces:
System: Kernel: 4.13.0-37-generic x86_64 (64 bit gcc: 5.4.0)
Desktop: Unity 7.4.0 (Gtk 3.18.9-1ubuntu3.3) Distro: Ubuntu 16.04 xenial
CPU: Dual core Intel Core i7-7500U (-HT-MCP-) cache: 4096 KB
flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx) bmips: 11616
clock speeds: max: 3500 MHz 1: 2900 MHz 2: 2900 MHz 3: 2900 MHz 4: 2900 MHz
Graphics: Card-1: Intel Device 5916 bus-ID: 00:02.0
GLX Renderer: Mesa DRI Intel HD Graphics 620 (Kaby Lake GT2)
GLX Version: 3.0 GLX Version: 3.0 Mesa 18.1.0-devel (git-c88e7fe29e) Direct Rendering: Yes
Created attachment 139842 [details] [review]
Workaround for trace Talos_flicker3_trim2.trace.xz
Investigation was done with trace https://drive.google.com/file/d/0Bz8fw_SGGDzsbk81T0hKMFo1V2s/view?usp=sharing
1. Only for that trace (helps not for all cases, so sometimes still problem will happen): 96624_wa.diff
2. To use LIBGL_ALWAYS_SOFTWARE=true (swrast)
1. Used simplified shaders for program 1496 (vertex shader 1286, fragment shader 1296):
- Left computation of vVexPosAbs
- Assign vMaskLightUV.xy by vLightUV.xy
- Avoid any more computations
- Use vMaskLightUV.xy to compute vOutColor.rgb
- Avoid any more computations
2. Used array buffers:
2.1. 254490 @0 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1139)
2.2. 254517 @0 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 1140)
2. Final observation:
2.1. For vertex shader data in array vLightUV are invalid.
2.2. Reading of data by DRM_IOCTL_I915_GEM_PREA (for buffers 1139 and 1140) on differnet stages shows that data in arrays are not changed.
3. Temporal conclusion:
3.1. Or some invalid offsets/addresses computed
3.2. Or before glDraw*-calls missed some flushing/caching/command
4.1. Is it possible to debug by which address vertex shader reads vLightUV?
4.2. Any more ideas?
Bisected a363bb2cd0e2a141f2c60be005009703bffcbe4e (i965: Allocate VMA in userspace for full-PPGTT systems.)
That significantly improves behaviour but still fails once per about 10 times.