The r200/rv250 chips natively support GL_ARB_point_parameters, but the driver currently doesn't. I'll attach a patch which fixes that. There are issues though: the point sprite stuff doesn't work. Texture coordinates are generated, but they are obviously wrong. Ok not really a problem, that's a different extension... But there's a problem, point size attenuation is always active (when the attenuation factors are not 1/0/0) even if something else than points is drawn. For tcl mode, that means the point size calculation always takes place (not sure, but this might have performance impact, hopefully nothing more severe), and for sw tcl this means that the vertex gets bigger as it needs to include the attenuated point size (and maybe the calculation there is done on the cpu too even though it isn't really used). The i915/i830 driver seem to ignore that problem, but I don't think this is a good solution. There doesn't seem to be an easy way to issue state changes based on the primitive being drawn, or maybe I haven't found it yet. Ideas? I also have some suspicious feeling it might break the r200PointsBitmap stuff. Additionally, there are also LOTS of values in the vector state related to point sprites / point attenuation which I have no idea what they are good for. pointblast seems to work ok, not yet tested with glean (as the point attenuation test is not yet in anonymous cvs...)
Created attachment 3509 [details] [review] patch to enable ARB_point_parameters (working) and ARB_point_sprite (broken)
This passes gleans pointAtten test, but only if the code is changed so it won't use large points (i.e. the point sprite primitive) when smooth points are enabled. Though using 1.0 sized "smooth" points doesn't strike me as particularly better than simply ignoring the smooth bit in case of large point size/point atten, it is apparently what the spec calls for. Maximum epsilon which allows the driver to pass gleans point atten test is around 0.9 (i.e. it passes with 0.9 but doesn't with 0.8, both for hw and sw tcl).
A smooth point with size 1.0 can still partially overlap multiple pixels on screen. I non-smooth point with size 1.0 will always occupy exactly 1 pixel. It's the same as with line and polygon antialiasing. I suspect that's the difference that you're seeing.
Commited (a bit different) to Mesa cvs. I didn't really solve the "problem" though that the point size calculation is always done (if Point Attenuation is "enabled"), regardless if actually points are drawn. I'd suspect the performance hit in tcl mode is fairly small. Not sure about swtcl though, I think something should be done about submitting unneded point sizes. Would something like that make sense? Going through all primitives might be a bit on the expensive side? diff -u -r1.32 r200_swtcl.c --- r200_swtcl.c 13 Oct 2006 22:10:05 -0000 1.32 +++ r200_swtcl.c 14 Oct 2006 00:44:59 -0000 @@ -118,8 +118,22 @@ } if (RENDERINPUTS_TEST( index_bitset, _TNL_ATTRIB_POINTSIZE )) { - EMIT_ATTR( _TNL_ATTRIB_POINTSIZE, EMIT_1F, R200_VTX_POINT_SIZE ); - offset += 1; + int needptsz = 0; + if ((ctx->Polygon.FrontMode == GL_POINT) || (ctx->Polygon.BackMode == GL_POINT)) + needptsz = 1; + else { + int i; + for (i = 0 ; i < VB->PrimitiveCount ; i++) { + if ((VB->Primitive[i].mode & PRIM_MODE_MASK) == GL_POINTS) { + needptsz = 1; + break; + } + } + } + if (needptsz) { + EMIT_ATTR( _TNL_ATTRIB_POINTSIZE, EMIT_1F, R200_VTX_POINT_SIZE ); + offset += 1; + } } rmesa->swtcl.coloroffset = offset; Note though that the actual (unneeded) point size calculation would still take place, the sizes just not submitted to the hardware I think.
The potential hitch is that you need to make sure this test is reevaluated for all primitives drawn. Typically the drivers look at the Mesa state and set hardware state appropriately and then try to avoid re-checking again and again unless there are actual Mesa statechanges. Unfortunately you now need to do that because you are basing hardware state off not just Mesa state but what primitives are coming down the pipe, which Mesa doesn't (yet) consider a statechange. At a certain level, if the app requests this state, its' fair enough that it pays for computing the values. Unless you find a real instance where this is a problem - ie an app that asks for this state and then doesn't render points *and* becomes noticably slower as a result, I would consider it not worth optimizing for.
(In reply to comment #5) > At a certain level, if the app requests this state, its' fair enough that it > pays for computing the values. Unless you find a real instance where this is a > problem - ie an app that asks for this state and then doesn't render points > *and* becomes noticably slower as a result, I would consider it not worth > optimizing for. Fair enough. The theoretical maximum performance hit would probably be 25% (which is the vertex size increase caused by submitting point size since the minimum otherwise is 3 floats for position and 1 float (4UB) for color0, I'd think the point size calculation itself shouldn't be that expensive), though I'd think in practice it should well stay below that. And the additional tests needed in tcl mode for ARB_point_sprite (to disable perspective correction) didn't seem to have a visible performance impact neither, though I wondered if it's the reason ati does not support that extension on r200 with any of its drivers (even those for other operating systems).
Mass version move, cvs -> git
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.