Bug 8410

Summary:	Color3 and Color4 differences; Possible non-conformance to OpenGL specs
Product:	Mesa	Reporter:	Darius Scerbavicius <darius.scerb>
Component:	Mesa core	Assignee:	mesa-dev
Status:	RESOLVED FIXED	QA Contact:
Severity:	normal
Priority:	high
Version:	6.5
Hardware:	x86 (IA32)
OS:	Linux (All)
Whiteboard:
i915 platform:		i915 features:
Attachments:	simple testcase showing broken glcolor3/glcolor4 behaviour

Description Darius Scerbavicius 2006-09-24 02:39:13 UTC

I have a Radeon 9550 and I'm using the OSS radeon driver in Ubuntu Edgy. MESA is
6.5.1~20060817-0ubuntu2. The driver works very well, except for this problem. 
In Doomsday (3D Doom/Heretic/Hexen source port with many enhancements), the sky
looks like a "hall of mirrors". Screenshot: http://darka.liero.be/skyhom.jpg

It's been reproduced with other cards using different OSS drivers before:
http://forums.newdoom.com/showthread.php?t=26297 (Matrox)
http://forums.newdoom.com/showthread.php?t=24012 (Older Radeon)
http://sourceforge.net/tracker/index.php?func=detail&aid=1499272&group_id=74815&atid=542099
(Intel)

Apparently the OpenGL specification says:
"The Color command has two major variants: Color3 and Color4. The four value
versions set all four values. The three value versions set R, G, and B to the
provided values; A is set to 1.0."
It seems it's not exactly like that in Mesa.

Please see this bug and its discussion:
http://sourceforge.net/tracker/index.php?func=detail&aid=1499272&group_id=74815&atid=542099

Comment 1 Roland Scheidegger 2006-09-24 08:47:21 UTC

(In reply to comment #0)
> I have a Radeon 9550 and I'm using the OSS radeon driver in Ubuntu Edgy. MESA is
> 6.5.1~20060817-0ubuntu2. The driver works very well, except for this problem. 
> In Doomsday (3D Doom/Heretic/Hexen source port with many enhancements), the sky
> looks like a "hall of mirrors". Screenshot: http://darka.liero.be/skyhom.jpg

> Apparently the OpenGL specification says:
> "The Color command has two major variants: Color3 and Color4. The four value
> versions set all four values. The three value versions set R, G, and B to the
> provided values; A is set to 1.0."
> It seems it's not exactly like that in Mesa.
> 
> Please see this bug and its discussion:
>
http://sourceforge.net/tracker/index.php?func=detail&aid=1499272&group_id=74815&atid=542099
Since other cards are mentioned, I'd assume this happens with tcl_mode=0 (or 1)
too? I tend to think there is some problem when using both glColor3x and
glColor4x within the same glBegin/glEnd pair (which doomsdays does). Btw
doomsday shouldn't do this, I think it's a mistake by the opengl api to even
allow things like that, it just causes all sorts of headaches for drivers
without any benefits (because if you start with a glColor3 call, the driver
might assume (for optimization) that alpha is always 1.0 and try to assemble the
subsequent glcolor calls into vec3 values - and if not all calls were glColor3,
it needs to go back and promote the old values to vec4. It works similarly the
other way round, the vec3 values just need to be promoted to vec4 in that case).
So if you want fast code, don't do that. But I disgress, obviously this insanity
is allowed by the api and it's clearly a bug if it doesn't work.
I can't quite see where's the bug though, since the tnl module has code to deal
with this. Especially the upgrade to a larger vertex format gets quite
complicated, however (should hit _tnl_fixup_vertex and then
_tnl_wrap_upgrade_vertex).

Comment 2 Allen Akin 2006-09-24 11:15:48 UTC

<history-lesson>
Roland commented:  "...there is some problem when using both glColor3x and
glColor4x within the same glBegin/glEnd pair (which doomsdays does). Btw
doomsday shouldn't do this, I think it's a mistake by the opengl api to even
allow things like that..."

The reason the API allows that sort of thing is to minimize the amount of data
transferred from CPU to GPU (memory cycles are slow compared to either CPU or
GPU compute cycles) and to maximize concurrent processing by the CPU and GPU
(CPU decides what data to send; GPU handles conversion, reformatting, and
inserting default values).

All this made more sense before hardware was designed for D3D, which makes
different assumptions about batching and data formats on the host than the
original OpenGL spec.
</history-lesson>

Comment 3 Roland Scheidegger 2006-09-24 12:39:06 UTC

Created attachment 7146 [details]
simple testcase showing broken glcolor3/glcolor4 behaviour

Ok, here's a simple testcase which shows the broken behaviour. Each of the
"quads" (which are actually tri-strips to mirror doomsday more closely) should
have two vertices fully opaque and 2 which are fully transparent (btw this is a
hacked up fogcoord.c test, yeah I'm lazy). The bug only seems to manifest
itself if you change the size more than once, and only when going from glColor4
-> glColor3 -> glColor4 -> glColor3, which from a quick look is exactly what
doomsday is doing.

Comment 4 Roland Scheidegger 2006-09-24 12:54:30 UTC

(In reply to comment #3)
> simple testcase showing broken glcolor3/glcolor4 behaviour
forgot to mention, fails for indirect rendering, and dri radeon tcl_mode=0,1
(but works for tcl_mode=2,3 which is not unexpected).

(In reply to comment #2)
> The reason the API allows that sort of thing is to minimize the amount of data
> transferred from CPU to GPU (memory cycles are slow compared to either CPU or
> GPU compute cycles) and to maximize concurrent processing by the CPU and GPU
> (CPU decides what data to send; GPU handles conversion, reformatting, and
> inserting default values).
> 
> All this made more sense before hardware was designed for D3D, which makes
> different assumptions about batching and data formats on the host than the
> original OpenGL spec.
> </history-lesson>
I'll understand it might have made sense originally, though only if you can
actually send your gpu per-vertex format data along with the vertices. But in a
world where for performance-oriented apps your vertex data has to live in vbos, 
and immediate mode won't be fast no matter what you do (unless you manage to
hack it up in the driver until it looks like it's arranged in vertex arrays) it
puts a lot of burden on the driver. Anyway, the fact remains that doomsday
shouldn't use hat feature, it can expect the result to be correct, but I don't
think it's reasonable to expect good performance out of it (though if the vertex
count is low it probably doesn't actually matter) with todays hw/drivers.

Comment 5 Allen Akin 2006-09-24 14:39:04 UTC

Roland says: "I'll understand it might have made sense originally, though only
if you can actually send your gpu per-vertex format data along with the vertices."

The format doesn't have to be specified explicitly.  For example, you can assign
a range of addresses to your command queue.  Color3f data goes to one (or three
consecutive) addresses, and is interpreted in one way.  Color4f data goes to a
different address (or consecutive addresses), and is interpreted in a different
way.  After conversion and insertion of defaults, it all gets queued up in a
common format.  The hardware logic to do all this is decently small.  If the
data transfer is implemented with DMA, it needn't pollute the CPU's cache, either.

Fine-grained immediate-mode is valuable for some apps (particularly those that
do substantial work other than graphics, so their data structures need to be
optimized for that other work rather than according to the whims of the graphics
API).  This was an advantage for OpenGL during the days when it was competing
with PEX (which, like D3D, was more oriented toward processing primitives in big
batches).

Even games want to work in finer-grained mode if possible; that's what the
"small batch problem" with DX9 is about.  So, I guess the bottom line is that I
agree it's harder to get good performance out of the fine-grained API than the
big batch API, but I think it's probably still worth some effort.

Allen

Comment 6 Keith Whitwell 2006-09-24 14:44:07 UTC

The vertex building code in the tnl/ module is designed to handle this properly,
and with reasonable performance.  I'll take a look at the test program & see
what's going on.

Comment 7 Keith Whitwell 2006-09-25 02:15:30 UTC

Turns out I'd spotted this in the i965 driver and had yet to bring the fix to
the trunk.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.