Created attachment 37239 [details]
Uninitialised vertex 1
After popping several pills in pill popper, particular few vertices occasionally (but frequently) appear to be uninitialised. Screenshot attached.
This works correctly on 965, 945, etc.
Created attachment 37240 [details]
Uninitialised vertex 2
Another example of a different vertex also going wrong.
Any smaller testcase you can come up with for this? I'm definitely interested in this, as we've got a few apps with weird vertex issues on Ironlake, but so far nothing's been isolated.
I can probably come up with a small test-case using only Clutter, would that help?
The closest we can get to a minimal GL testcase, the better. Right now I've got a testcase involving 4 different shaders (clear, shadow map generation, shadow mapped objects, shadow mapped ground plane), a static vertex buffer, and static inputs, and I get vertex flashing in the shadow map draw. I can also cut it down to shadow map generation and shadow map presentation on the screen and get the same results.
It does not appear to be a clock gating issue, as tested with the new intel_disable_clock_gating in intel-gpu-tools.
OK, here are some questions:
Does the bug still occur for you if you set vs_max_threads to 29 in brw_context.c?
Does the bug still occur for you if you set vs_max_threads to 30 in brw_context.c?
Does the bug still occur for you if you set single_program_flow in brw_vs_state.c?
Please obsessively watch for the bug during testing, though, as it seemed to get less frequent in my app as I approached "29" threads. (though I have my doubts as to the mapping of these values to numbers of threads -- how do you fit a number between 0 and 71 into a 6-bit field? They could have expanded the bitfield up or down, and in other cases in this generation they tended to choose "down".)
Another note of what this bug isn't: it's not something in the clip shader -- I can stuff a clip thread kill at the top of the shader and see the results in terms of things that should have been clipped disappearing, but the funny vertices still appear.
The Humus CelShading demo's flashing, on the other hand, has a very strong dropoff from flashing at 27 max_threads to no flashing at 26. It is also fixed by turning on SPF.
I get no flashing when I turn on single_program_flow and I couldn't see any flashing when I turn vs_max_threads to 26.
I noticed some at 27, though it is massively reduced. 29 is still very visible, but reduced from normal and 30, again, seems to make the flashing more extreme.
I'll run with it at 26 for now and see if I notice anything through the day, but initial results are positive.
Just to report back, I think Eric Anholt said as much on IRC, but you do still see vertex popping at 26 - just much more rarely.
Is this the right forum to ask what exactly it is these variables control? (does SPF on basically mean vs_max_threads = 1?) Is this a memory corruption issue, or is this some magic variable that we haven't found the exact right value for yet?
We're pretty sure it's a URB space allocation parameters issue, where one vertex is stomping on data from another. Here's the basic story:
You've got this pipeline with a bunch of fixed function stages that may or may not call out to a shader. They need to be able to take in a block of data, operate on it, and hand off results to the next stage (which may be used by 0 to many threads in the next stage). To support that, we've got 64kb or whatever of fast RAM called the URB on the chip that's not visible to the CPU. We chop that up into areas owned by each stage. Each stage declares to the URB manager how big each chunk (entry) it's going to want is, and how many total entries it cares to have at a time. The URB manager can then respond when someone asks for an entry, and handles writes to entries identified by handle+offset, and it also handles the pipelining of allocations when the next rendering state uses a different set of fences, sizes, and counts.
There are lots of factor-of-twos hanging around to get us screwed up. a vec4 is half a register, is half of a URB row, is half of a URB allocation size increment (and on snb, the URB allocation size increment is doubled again). Plus things have headers in addition to the interesting data you actually care about, so there are off-by-small-integers and rounding errors to get us screwed up too.
SPF mode is not "run a single program globally at a time", but "run one vertex through any particular VS thread at a time." registers are 8 floats wide, and vertex shaders are AOS, so in non-SPF mode the hardware packs two vec4 attributes of a vertex side-by-side in one execution and you run the shading of the two verts in parallel.
New experiment: Take c->prog_data.urb_entry_size in brw_vs_emit.c after it's computed, and print it out. Now go increase that number until the problem stops. We found a sharp cutoff in CelShading, and I'm wondering if you find one too.
The "double the VS URB entry size" hack doesn't work generally, either. For my demo program, with 4 programs with VUE sizes of 2 and 3 URB rows:
Increasing the VUE size by up to 4 doesn't fix the problem
Increasing VUE size by 5 does fix the problem.
Increasing the VUE-size-2 programs by 5 fixes the original problem and introduces it in a different shader.
Increasing the VUE-size-3 programs by 5 doesn't fix the problem.
Single program flow does appear to be the common workaround for these 3 programs. If we get confirmation on gnome-shell, I'll just push the workaround.
Author: Eric Anholt <firstname.lastname@example.org>
Date: Mon Feb 21 23:46:52 2011 -0800
i965: Apply a workaround for the Ironlake "vertex flashing".
This is an awful hack and will hurt performance on Ironlake, but we're
at a loss as to what's going wrong otherwise. This is the only common
variable we've found that avoids the problem on 4 applications
(CelShading, gnome-shell, Pill Popper, and my GLSL demo), while other
variables we've tried appear to only be confounding. Neither the
specifications nor the hardware team have been able to provide any
enlightenment, despite much searching.
Tested by: Chris Lord <email@example.com> (Pill Popper)
Tested by: Ryan Lortie <firstname.lastname@example.org> (gnome-shell)
Thank you eric, this bug was plaguing us on the QM57. We were seeing it with any large vertex count, so drawing a hallow circle would trigger it every couple seconds. Using only primitives with glBegin/glEnd, we saw it with lines/line_loop/line_strip. I think it was possible in polygons, but they are filled, so the bug would be masked.
Also, Fedora15 beta showed the bug if you had the gnome desktop with an overlay warning (like harddrive failure from pulling a usb harddrive). Same hardware.
Your patch fixed it for us.
*** Bug 32845 has been marked as a duplicate of this bug. ***