This patch modifies r300_pacify's write to R300_RB3D_DSTCACHE_CTLSTAT to be more correct according to Vladimir's comment in r300_reg.h. Apparently we should write 0xa before 3D operations, and 0x2 afterwards. I am not entirely sure this is correct, but I tested it with glxgears and it seems to not cause any regressions. I'm not sure it actually helps anything other than technical correctness, so it some other people could test it and make sure I haven't broken anything, I'd appreciate it. It is possible that fglrx's behaviour has changed since Vladimir's comment, too.
Created attachment 9017 [details] [review] Proposed patch
Is this still relevant?
I'm always suspicious about changes in this area that are based on hearsay. Note that those comments in r300_reg.h are way older than AMD's spec releases; they were written in a time when we had zero knowledge as to what these registers really mean. From what I understand, 0xA (which is FREE|FLUSH, according to specs) corresponds to a cache invalidation. This is necessary if the destination buffer was changed by someone else since the last 3D operation, e.g. if the destination buffer has moved or if software fallbacks were run. On the other hand, 0x2 (which is a simple FLUSH) should be sufficient to make sure any writes done by the 3D operation are written out to memory. So when you think about it, this patch might fix something (I doubt it as long as I don't see concrete evidence for it), but it does so in a very hacky way that is completely unrelated to what's really going on. My conclusion: "not relevant" or "should be replaced by something else based on serious analysis of what we're doing". Then again, maybe I'm completely off base and Jerome Glisse knows more on this topic.
The real problem with lockup doesn't come from flushing but from bad command stream like the vertex uploading Nicolai fixed few weeks ago. We also need to put some more delay when moving buffer and properly waiting for idle. Another things that might help is to upload wptr of the ring every 32 dwords or some others power of 2 value as it seems the ring fetcher can go crazy for unknown reasons on some update. The new command stream ioctl is basically trying to address all this in one shot as after a try to fix things from kernel side proved me that this was a bad idea. So i set this bug as invalid, meanwhile i would like to make ddx fallback to software rendering on GPU lockup thus X should be responsive again even if dead slow.
Mass version move, cvs -> git
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.