Summary: | Color3 and Color4 differences; Possible non-conformance to OpenGL specs | ||
---|---|---|---|
Product: | Mesa | Reporter: | Darius Scerbavicius <darius.scerb> |
Component: | Mesa core | Assignee: | mesa-dev |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | high | ||
Version: | 6.5 | ||
Hardware: | x86 (IA32) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: | simple testcase showing broken glcolor3/glcolor4 behaviour |
Description
Darius Scerbavicius
2006-09-24 02:39:13 UTC
(In reply to comment #0) > I have a Radeon 9550 and I'm using the OSS radeon driver in Ubuntu Edgy. MESA is > 6.5.1~20060817-0ubuntu2. The driver works very well, except for this problem. > In Doomsday (3D Doom/Heretic/Hexen source port with many enhancements), the sky > looks like a "hall of mirrors". Screenshot: http://darka.liero.be/skyhom.jpg > Apparently the OpenGL specification says: > "The Color command has two major variants: Color3 and Color4. The four value > versions set all four values. The three value versions set R, G, and B to the > provided values; A is set to 1.0." > It seems it's not exactly like that in Mesa. > > Please see this bug and its discussion: > http://sourceforge.net/tracker/index.php?func=detail&aid=1499272&group_id=74815&atid=542099 Since other cards are mentioned, I'd assume this happens with tcl_mode=0 (or 1) too? I tend to think there is some problem when using both glColor3x and glColor4x within the same glBegin/glEnd pair (which doomsdays does). Btw doomsday shouldn't do this, I think it's a mistake by the opengl api to even allow things like that, it just causes all sorts of headaches for drivers without any benefits (because if you start with a glColor3 call, the driver might assume (for optimization) that alpha is always 1.0 and try to assemble the subsequent glcolor calls into vec3 values - and if not all calls were glColor3, it needs to go back and promote the old values to vec4. It works similarly the other way round, the vec3 values just need to be promoted to vec4 in that case). So if you want fast code, don't do that. But I disgress, obviously this insanity is allowed by the api and it's clearly a bug if it doesn't work. I can't quite see where's the bug though, since the tnl module has code to deal with this. Especially the upgrade to a larger vertex format gets quite complicated, however (should hit _tnl_fixup_vertex and then _tnl_wrap_upgrade_vertex). <history-lesson> Roland commented: "...there is some problem when using both glColor3x and glColor4x within the same glBegin/glEnd pair (which doomsdays does). Btw doomsday shouldn't do this, I think it's a mistake by the opengl api to even allow things like that..." The reason the API allows that sort of thing is to minimize the amount of data transferred from CPU to GPU (memory cycles are slow compared to either CPU or GPU compute cycles) and to maximize concurrent processing by the CPU and GPU (CPU decides what data to send; GPU handles conversion, reformatting, and inserting default values). All this made more sense before hardware was designed for D3D, which makes different assumptions about batching and data formats on the host than the original OpenGL spec. </history-lesson> Created attachment 7146 [details]
simple testcase showing broken glcolor3/glcolor4 behaviour
Ok, here's a simple testcase which shows the broken behaviour. Each of the
"quads" (which are actually tri-strips to mirror doomsday more closely) should
have two vertices fully opaque and 2 which are fully transparent (btw this is a
hacked up fogcoord.c test, yeah I'm lazy). The bug only seems to manifest
itself if you change the size more than once, and only when going from glColor4
-> glColor3 -> glColor4 -> glColor3, which from a quick look is exactly what
doomsday is doing.
(In reply to comment #3) > simple testcase showing broken glcolor3/glcolor4 behaviour forgot to mention, fails for indirect rendering, and dri radeon tcl_mode=0,1 (but works for tcl_mode=2,3 which is not unexpected). (In reply to comment #2) > The reason the API allows that sort of thing is to minimize the amount of data > transferred from CPU to GPU (memory cycles are slow compared to either CPU or > GPU compute cycles) and to maximize concurrent processing by the CPU and GPU > (CPU decides what data to send; GPU handles conversion, reformatting, and > inserting default values). > > All this made more sense before hardware was designed for D3D, which makes > different assumptions about batching and data formats on the host than the > original OpenGL spec. > </history-lesson> I'll understand it might have made sense originally, though only if you can actually send your gpu per-vertex format data along with the vertices. But in a world where for performance-oriented apps your vertex data has to live in vbos, and immediate mode won't be fast no matter what you do (unless you manage to hack it up in the driver until it looks like it's arranged in vertex arrays) it puts a lot of burden on the driver. Anyway, the fact remains that doomsday shouldn't use hat feature, it can expect the result to be correct, but I don't think it's reasonable to expect good performance out of it (though if the vertex count is low it probably doesn't actually matter) with todays hw/drivers. Roland says: "I'll understand it might have made sense originally, though only if you can actually send your gpu per-vertex format data along with the vertices." The format doesn't have to be specified explicitly. For example, you can assign a range of addresses to your command queue. Color3f data goes to one (or three consecutive) addresses, and is interpreted in one way. Color4f data goes to a different address (or consecutive addresses), and is interpreted in a different way. After conversion and insertion of defaults, it all gets queued up in a common format. The hardware logic to do all this is decently small. If the data transfer is implemented with DMA, it needn't pollute the CPU's cache, either. Fine-grained immediate-mode is valuable for some apps (particularly those that do substantial work other than graphics, so their data structures need to be optimized for that other work rather than according to the whims of the graphics API). This was an advantage for OpenGL during the days when it was competing with PEX (which, like D3D, was more oriented toward processing primitives in big batches). Even games want to work in finer-grained mode if possible; that's what the "small batch problem" with DX9 is about. So, I guess the bottom line is that I agree it's harder to get good performance out of the fine-grained API than the big batch API, but I think it's probably still worth some effort. Allen The vertex building code in the tnl/ module is designed to handle this properly, and with reasonable performance. I'll take a look at the test program & see what's going on. Turns out I'd spotted this in the i965 driver and had yet to bring the fix to the trunk. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.