On my G71 system, several programs show vertex corruption issues. In particular, vertices tend to be corrupted or randomly go to infinity, leading to spiked triangles or random polygons, in several programs, such as demos/engine, demos/dinoshade, Blender, Extreme Tux Racer. The system is running: Linux 2.6.33-rc2 libdrm 2.4.17 Mesa HEAD (b46bcd8e7b37aa2e9159e126c1cc88234a3c2790) Detected an NV40 generation card (0x049800a2) 64 MB GART aperture 256 MB VRAM The problem is solved by either of the following: 1. #define FORCE_SWTNL 1 2. Adding usleep(10000) at the end of nv40_draw_arrays 3. Making nouveau_screen_bo_del do nothing It seems that the issue is that Mesa deletes a buffer object used for vertex data while the GPU is still drawing to it. The kernel actually performs the deletion without waiting for the GPU drawing, the memory (or GART mapping) is reused, and corruption ensues. From Gallium tracing, Mesa is sending vertex data in 64 KB buffers, which are created, written, drawn and then recreated upon reuse (which seems correct behavior). It seems, in other words, that the kernel is not keeping an extra reference to buffers which are currently referenced by an in-flight pushbuffer, and unreferencing them only once the GPU finished drawing. Is the kernel already supposed to do so? If yes, something is broken. If things work for others, maybe my system is somehow more prone to reusing memory or GART mappings, so they don't see that? If no, then how are things supposed to work? (BTW, not freeing buffers leads to X freezing and the kernel oopsing on my machine upon saturating memory, but that's another issue)
Upon further examination, the kernel does seem to have the required logic: sending a pushbuffer creates a fence, which is put in bo->sync_obj, which is then checked on deletion and if non-null, the buffer is put on a delayed destroy list. However, it seems to be somehow not working. Maybe fencing is broken on my card? (i.e. the kernel thinks fences are signaled when they aren't) Or possibly fences are being signaled before the vertex shader is finished running? How can I test that fencing is working correctly?
(In reply to comment #1) > Upon further examination, the kernel does seem to have the required logic: > sending a pushbuffer creates a fence, which is put in bo->sync_obj, which is > then checked on deletion and if non-null, the buffer is put on a delayed > destroy list. > > However, it seems to be somehow not working. > > Maybe fencing is broken on my card? (i.e. the kernel thinks fences are signaled > when they aren't) > Or possibly fences are being signaled before the vertex shader is finished > running? That would be almost unprecedented... it's more likely that some caches in the GPU aren't being flushed often enough (or maybe the ones in the CPU... a bug in the kernel PAT code also used to cause the same symptoms, but that's hopefully already fixed). I'm marking this as invalid because that's the current policy, unfortunately we're already aware of too many gallium bugs. > > How can I test that fencing is working correctly? >
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.