Bug 12240 - Regression, maniadrive very slow (rast fallback?) and wrongly rendered
Regression, maniadrive very slow (rast fallback?) and wrongly rendered
Status: RESOLVED FIXED
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/r300
git
x86-64 (AMD64) All
: medium normal
Assigned To: Default DRI bug account
http://maniadrive.raydium.org/
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2007-08-31 14:16 UTC by Hans de Goede
Modified: 2009-08-24 12:27 UTC (History)
2 users (show)

See Also:


Attachments
PATCH fixing maniadrive being slow with r300 and current mesa (2.27 KB, patch)
2007-12-28 13:21 UTC, Hans de Goede
Details | Splinter Review
PATCH fixing off by one error in tempreg check in rx00 vertprog code (1.96 KB, patch)
2007-12-28 14:09 UTC, Hans de Goede
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Hans de Goede 2007-08-31 14:16:38 UTC
Hi,

I'm using a radeon 9800 pro, x86_64, Fedora 8 test 1. Fedora 8 test1 comes with Mesa-7.0.1 (I choose CVS as version as 7.0.1 isn't in bugzilla yet). I'm a Fedora developer and currently I'm working on packaging maniadrive:
http://maniadrive.raydium.org/

If I downgrade mesa to 6.5.2 (from F-7 updates) and disable shadows maniadrive works well (with shadows it becomes very slow).

But with 7.0.1 it is slow even with shadows disabled, I do get this message: *********************************WARN_ONCE*********************************
File r300_vertprog.c function r300TranslateVertexShader line 1130
Ran out of temps, num temps 13, us 12
***************************************************************************

So I guess the recent r300 vertex prog rewrite is the culprit here.


And after this Fedora update:

* Thu Aug 16 2007 Dave Airlie <airlied@redhat.com> - 7.0.1-3
- mesa-7.0.1-stable-branch.patch - Add patches from stable branch
  includes support for some Intel chipsets
- mesa-7.0-use_master-r300.patch - Add r300 driver from master

It stayed slow but also become wrongly rendered (all textures seem to be missing)

As said I'm a developer / programmer myself, I have almsot no experience with 3d hardware programming, but I'm veru skilled in C and debugging in general. I would really like to see this fixed, so please let me know what I can do to help.


p.s.

I'm afraid maniadrive is a pain to install from sources, let me know if you need help with that, if you use Fedora I can provide rpms if you want.
Comment 1 Guillaume Martres 2007-09-03 01:22:43 UTC
Same problem here with mesa 7.0.1 on Ubuntu 7.10 with ATI drivers 6.7.192.
Comment 2 Guillaume Martres 2007-10-07 08:42:03 UTC
Problem still present with ati driver 6.7.195 .
Comment 3 Hans de Goede 2007-12-28 13:19:45 UTC
Okay,

I've just spend 1 1/2 day debugging this! And I've got both issues fixed.

To be clear the 2 issues I'm talking about are:
1) Wrong rendering with current git (and with Fedora's packages because those
   have the r300 part updated to a recent git snapshot for rs480 support).

2) Maniadrive being slow with mesa >= 7.0 (official release, clean build).


And now the fixes:


1) The wrong rendering is caused by the second chunk of this commit:
http://gitweb.freedesktop.org/?p=mesa/mesa.git;a=commitdiff;h=d7777f45981bf91d2cb31502ba078312b1b4cf13

If you revert the second chunk of this commit, then everything gets properly rendered.


2) The slowness problem is caused by tnl/t_vp_build.c generating vertex programs which use 14 temp variables, and the r300 has exactly 14 temp registers (according to the current driver).

This becomes a problem because:
a) Certain ARB vertex instructions are emulated on the r300, and the emulation
   code needs a temp register too.
b) Even those vertex programs which do not use the troublesome ARB vertex
   instructions will not work because there is an of by one bug in the check if   
   there are enough registers.

I have a patch which fixes the t_vp_build.c code to not require more temp registers then it actually needs, by allocating them only when needed and freeing them asap. It takes 2 of the number of temp registers needed by the
generated code in the case of maniadrive, making maniadrive work even with the
off by one bug still present in the r300 code. I'll attach this patch right away.

I'll write and attach a patch for the off by one bug next.

p.s.

Oliver I've added you to the CC as we have had some mail exchanges about the slowness bug in the past, I wrongly blamed your vertprog rewrite for it :)
Comment 4 Hans de Goede 2007-12-28 13:21:14 UTC
Created attachment 13399 [details] [review]
PATCH fixing maniadrive being slow with r300 and current mesa
Comment 5 Hans de Goede 2007-12-28 13:24:41 UTC
About the just attached patch, the removal of the 2 "release_temp(p, shininess);" calls after 2 "shininess = get_material(...)" calls is intentional, because get_material() does not return a temp register ever (I did a full audit of all temp allocations in t_vp_build.c), if it does then there would be quite a few places in t_vp_build.c missing release_temp() calls after get_material() calls.
Comment 6 Hans de Goede 2007-12-28 14:09:03 UTC
Created attachment 13401 [details] [review]
PATCH fixing off by one error in tempreg check in rx00 vertprog code

This patch makes a slightly bigger change then necessary to fix the bug in order to make it clearer to the reader of the code what is happening to avoid a similar error creeping in in the future.
Comment 7 Roland Scheidegger 2007-12-28 16:51:38 UTC
(In reply to comment #3)
> I have a patch which fixes the t_vp_build.c code to not require more temp
> registers then it actually needs, by allocating them only when needed and
> freeing them asap. It takes 2 of the number of temp registers needed by the
> generated code in the case of maniadrive, making maniadrive work even with the
> off by one bug still present in the r300 code. I'll attach this patch right
> away.

Looks good to me. Ultimately, a better solution would be to have a driver backend which optimizes temp usage, but avoiding unnecessary temps in the first place can't hurt (though might make the code slightly more complicated to read). I think there are more temps which could be avoided in some places. For instance those ambient/diffuse/specular possible temps are only used one after another, though changing that will make the code a bit harder to read and scheduling of the instructions less optimal (if using a backend which doesn't reschedule things). build_sphere_texgen seems to use a unnecessary inv_m temp (could just use tmp instead).
Oh, and if you have a r4xx card you should be able to use more temps, since for PS2.0 the limit was 12, but for PS2.0b it was 32, and I don't think it was upped without having ati hw support for it. Dunno if those are addressed the same or need some special treatment (like requiring different reg class) however.
Comment 8 Hans de Goede 2007-12-28 23:51:01 UTC
(In reply to comment #7)
> Looks good to me. Ultimately, a better solution would be to have a driver
> backend which optimizes temp usage, but avoiding unnecessary temps in the first
> place can't hurt (though might make the code slightly more complicated to
> read).

Agreed.

> I think there are more temps which could be avoided in some places. For
> instance those ambient/diffuse/specular possible temps are only used one after
> another, though changing that will make the code a bit harder to read and
> scheduling of the instructions less optimal (if using a backend which doesn't
> reschedule things).

Well if that leads to less then optimal code, I guess we should leave it as is. Esp since in the maniadrive code, the highest temp usage was reached in the
else of the if (p->state->unit[i].light_eyepos3_is_zero) check at the top of the light loop, it used to cobble 6 of the available 14 temps there with maniadrive, and thats not even the worst case scenario, worst case it would have taken 9, and with my patch still 7 :(

> build_sphere_texgen seems to use a unnecessary inv_m temp
> (could just use tmp instead).

I'll leave fixing that up to you, I'm not into vertex programs at all, I just used my (well developed) common sense to come up with the current fix.

> Oh, and if you have a r4xx card you should be able to use more temps, since for
> PS2.0 the limit was 12, but for PS2.0b it was 32, and I don't think it was
> upped without having ati hw support for it. Dunno if those are addressed the
> same or need some special treatment (like requiring different reg class)
> however.
> 

I don't have an r4xx, I have an r3xx, but it would indeed be nice to support those additional temps! However I wouldn't know where to start.

Any chance you could also take a look at my off by one fix for the temps usage check in rx00_vertprog.c? And a look at the reverting of commit:
http://gitweb.freedesktop.org/?p=mesa/mesa.git;a=commitdiff;h=d7777f45981bf91d2cb31502ba078312b1b4cf13
To fix the wrong rendering of maniadrive?

Comment 9 Michel Dänzer 2008-07-28 02:25:24 UTC
maniadrive works nice and fast here with current Mesa Git. Please reopen if you're still seeing any of the issues.
Comment 10 Hans de Goede 2008-07-28 02:29:44 UTC
Erm,

Yes it does now a days, thanks to a patch I've written which found its way into Mesa through Airlied, sorry for not closing this myself.
Comment 11 ajax at nwnk dot net 2009-08-24 12:27:56 UTC
Mass version move, cvs -> git