Bug 11594

Summary: Assertion fails in r300_vertprog.c:438: r300TranslateVertexShader
Product: Mesa Reporter: WolfgangKoebler <wk-list>
Component: Drivers/DRI/r300Assignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium    
Version: git   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: crash backtrace (in gdb)
more verbose crash log (with mesa debug build + DEBUG_VP)
Patch that should make vertex programs with no pos work again.
Fixes a problem with the last patch. This should bee better.

Description WolfgangKoebler 2007-07-13 14:46:41 UTC
When executing Planeshift (3.019) I got a assertion (AFAIR) just after joining the game (seems reproduceable, but cant test now as server is down). Assertion fails in r300_vertprog.c:438: r300TranslateVertexShader in Mesa CVS 2007-07-13 (ati r300 dri).

psc: r300_vertprog.c:438: r300TranslateVertexShader: Assertion `vp->key.OutputsWritten & (1 << 0)' failed.

I ran it in gdb and did a backtrace (see attachment).

I am using a self-compiled Mesa CVS (date 2007-07-13) on Debian Etch with Radeon X300SE (ATI Technologies Inc RV370 5B60).
Comment 1 WolfgangKoebler 2007-07-13 14:47:36 UTC
Created attachment 10716 [details]
crash backtrace (in gdb)
Comment 2 WolfgangKoebler 2007-07-13 16:32:16 UTC
Hint: I just verified that the bug is easily reproducible. The assertion fails as soon as first 3d data should be displayed. (However the backtraces differ from time to time.)
Comment 3 WolfgangKoebler 2007-07-14 02:19:57 UTC
I removed the assertion from the code, compiled & installed mesa and run the game. The crash was gone, but the rendering was seriously broken (3d display mainly black). However I already had such a broken rendering with earlier mesa versions. There I could avoid them by configuring the game via a file called gldrivers.xml . So I changed this file (look below) and got 'normal' rendering again (there are always a few glitches, but not too many). Then I reenabled the assertion (+compile+install) and ran the game => no crash. I changed the config back again and => crash again (assertion failed).

The relevant part of gldrivers.xml is (changed by me to match Mesa 7.x):

    <!--
      * 2006-12-29: S3TC support, texture compression, and AFP seems broken
                    across many video drivers with Mesa 6.5.1 & 6.5.2
      -->
    <rule description="Disable TC and AFP for Mesa 7.x">
      <conditions>
        <regexp string="renderer" pattern="Mesa DRI.*" />
        <regexp string="glversion" pattern=".\.. Mesa 7\.." />
      </conditions>
      <applicable>
        <usecfg>disableTC</usecfg>
        <usecfg>noafp</usecfg>
      </applicable>
    </rule>

Without Disabled TC and AFP:
- Mesa 6.5.x: rendering is seriously broken (3d display mainly black)
- Mesa 7.0: rendering almost works (except for a few strange effects: rainbow colors seem to overlay textures and flicker)
- Mesa CVS 2007-07-13: assertion fails
Then I tried Mesa 7.0 with "Disabled TC and AFP" and got 'normal' rendering (no rainbows). So:
With Disabled TC and AFP:
- Mesa 6.5.x: normal rendering
- Mesa 7.0: normal rendering
- Mesa CVS 2007-07-13: normal rendering

So the assertion is probably triggered by TC or AFP.

Hope this helps.
Comment 4 Roland Scheidegger 2007-07-14 03:41:05 UTC
This can't be possibly triggered by s3tc (which, btw, to my knowledge isn't broken in "many" drivers, just one, among those which support it, and if not the app really should detect that automatically).
Output 0 is the position, it doesn't really make sense if it's not present, so I'm not surprised you get bogus rendering without it...
You'd need to figure out why it's 0 in the first place, presumably "afp" refers to arb_fragment_program, but I've no idea what went wrong. You could try printing out the shaders as received by mesa etc.
Comment 5 WolfgangKoebler 2007-07-14 09:23:16 UTC
First, thanks for your (quick) response.

Then, maybe I should add a hint about the gldrivers.xml fragment:
Planeshift delivers a file called gldrivers.xml, but the fragment is not from Planeshift itself. The quoted fragment is from somewhere on the internet (i do not recall where / AFAIR the link is dead by now). I used it because it helped, but I can not tell anything beyond that. The comment about "broken across many video" drivers is not my opinion, but just a copy and paste from where I got it.

Third: Sorry, I do not even know what S3TC or AFP are. I also have no Idea what the code does or should do. I just found an assertion and reported it. So what should I do now (if I can be of any help with my limited knowledge) ? ;-)
The problem is that vp=0 ? And I should find out why it is 0 ? Did I get that right ? And what should I print ?
Comment 6 Roland Scheidegger 2007-07-14 14:34:21 UTC
(In reply to comment #5)
> but I can not tell anything beyond that. The comment about "broken across many
> video" drivers is not my opinion, but just a copy and paste from where I got
> it.
Whoever wrote it, he's probably wrong :-)

> Third: Sorry, I do not even know what S3TC or AFP are. I also have no Idea what
> the code does or should do. I just found an assertion and reported it. So what
> should I do now (if I can be of any help with my limited knowledge) ? ;-)
> The problem is that vp=0 ? And I should find out why it is 0 ? Did I get that
> right ? And what should I print ?
If you have no idea about mesa, it may not be easy to debug this. A start would be to use debug builds (sometimes errors in an application can trigger bugs in the driver, and this will print out those errors rather than just passing them back to the app which may silently ignore it) or enable the DEBUG_VP code to print out the vertex program. Or you can wait for somebody else who has the hardware to look at it...

Comment 7 WolfgangKoebler 2007-07-15 02:05:56 UTC
Created attachment 10735 [details]
more verbose crash log (with mesa debug build + DEBUG_VP)

(In reply to comment #6)

Here you find a more verbose crash log, with a mesa debug build + DEBUG_VP.
Have fun.
Comment 8 Roland Scheidegger 2007-07-15 04:42:39 UTC
(In reply to comment #7)
> Created an attachment (id=10735) [details]
> more verbose crash log (with mesa debug build + DEBUG_VP)
> 
> (In reply to comment #6)
> 
> Here you find a more verbose crash log, with a mesa debug build + DEBUG_VP.
> Have fun.
So the vertex program indeed doesn't write output 0. I suspect though it is a position_invariant vertex program (the driver shouldn't crash in any case, as vertex programs not writing to position are indeed allowed even though results are undefined), and the driver doesn't handle that correctly. It used to AFAIK, but I'd guess it probably got broken by the recent changes in the vp output assignment patches.
Comment 9 Tommy Schultz Lassen 2007-07-15 12:49:34 UTC
Created attachment 10743 [details] [review]
Patch that should make vertex programs with no pos work again.
Comment 10 Rune Petersen 2007-07-15 13:47:38 UTC
Tommy, 

your patch is flawed, vp is set ~5 lines later...

(disclaimer: it's late)
Comment 11 Tommy Schultz Lassen 2007-07-15 21:53:58 UTC
Created attachment 10746 [details] [review]
Fixes a problem with the last patch. This should bee better.
Comment 12 Roland Scheidegger 2007-07-16 01:53:58 UTC
(In reply to comment #11)
> Created an attachment (id=10746) [details]
> Fixes a problem with the last patch. This should bee better.
This looks like it should fix the problem. The if should probably be dropped though, less code & much faster :-).
Comment 13 Oliver McFadden 2007-07-16 04:43:22 UTC
I've fixed this in Git. I tested with arbvptorus and it's a position invariant bug, so we shouldn't always set the POS bit, we should only set it when the program is position invariant.  This means the position invariant will work correctly, but the user will still get an assertion if they create a non-position invariant program that does not write result.position.  I believe this is the correct behaviour, as (iirc) result.position must be written for the program to be valid.  Please report to confirm it's fixed and I'll close the bug. 
Comment 14 Roland Scheidegger 2007-07-16 05:24:02 UTC
(In reply to comment #13)
> This means the position invariant will work
> correctly, but the user will still get an assertion if they create a
> non-position invariant program that does not write result.position.  I believe
> this is the correct behaviour, as (iirc) result.position must be written for
> the program to be valid.
No, this is not true. Programs are indeed valid if they don't write position (and even if they were not abort would not be an option). The spec explicitly mentions this, though results are undefined. So either drop to swrast if the driver can't handle it at all, or just write that position and be done with it if it works well enough, as results are undefined it must just not lock up doesn't really matter how results turn out...
Comment 15 WolfgangKoebler 2007-07-17 04:09:59 UTC
Thanks for your work.

(In reply to comment #11)
Ok, new test with git 2007-07-17 (debug build and DEBUG_VP). The Assertion is gone in my test case.

However
- rendering is still slightly broken (still rainbow effects, see below).
- (and this changes nothing for comment #14.)

(supplement to comment #3)
Without Disabled TC and AFP:
- Mesa 6.5.x: rendering is seriously broken (3d display mainly black)
- Mesa 7.0: rendering almost works (except for a few strange effects: rainbow
colors seem to overlay textures and flicker)
- Mesa git 2007-07-13: assertion fails
- Mesa git 2007-07-17: just as Mesa 7.0: rendering almost works (except for a few strange effects: rainbow colors seem to overlay textures and flicker)

With Disabled TC and AFP:
- Mesa 6.5.x: normal rendering
- Mesa 7.0: normal rendering
- Mesa git 2007-07-13: normal rendering
- Mesa git 2007-07-17: normal rendering

(Of course Mesa CVS is nonsense by now, should be Mesa git)

BTW: Should I do anything about the rendering problems
(post logfile, provide screenshots, file another bugreport) ? 
Because I just stumbled over this:

Mesa: User error: GL_INVALID_ENUM in glActiveTexture(texture)
____________Vertex program 4 __________
# Vertex Program/Shader
  0: MOV OUTPUT[1], INPUT[3];
  1: MOV OUTPUT[4].xy, INPUT[17];
  2: END
*********************************WARN_ONCE*********************************
File r300_texstate.c function r300SetTexImages line 227
DXT 3/5 suffers from multitexturing problems!
***************************************************************************
Comment 16 WolfgangKoebler 2007-07-17 04:14:15 UTC
erm, of course in comment #15 it should be
(In reply to comment #13)
instead of
(In reply to comment #11)
Comment 17 Maciej Cencora 2009-04-16 10:42:33 UTC
Can you reproduce the bug with mesa 7.4 or master?
Comment 18 WolfgangKoebler 2009-04-16 15:31:52 UTC
Ok, I started PlaneShift again. This time it's PlaneShift 0.4.03 with Mesa 7.4 and Debian Lenny.
I sometimes get crashes that are not related to Mesa.
I did not see a failed assertion. I did not see any rainbow effects.

With the modified gldrivers.xml rendering mostly works. Yet sometimes textures seem to be missing/black and I get hangups with almost 100% CPU usage somewhere in the kernel (top says 9x% sys).
With the original gldrivers.xml hardly anything is displayed in 3D.

So if your question is if I can reproduce the assertion with Mesa 7.4, the answer is "no".
Comment 19 Maciej Cencora 2009-04-16 16:25:41 UTC
(In reply to comment #18)
> Ok, I started PlaneShift again. This time it's PlaneShift 0.4.03 with Mesa 7.4
> and Debian Lenny.
> I sometimes get crashes that are not related to Mesa.
> I did not see a failed assertion. I did not see any rainbow effects.
> 
> With the modified gldrivers.xml rendering mostly works. Yet sometimes textures
> seem to be missing/black and I get hangups with almost 100% CPU usage somewhere
> in the kernel (top says 9x% sys).
> With the original gldrivers.xml hardly anything is displayed in 3D.
> 
> So if your question is if I can reproduce the assertion with Mesa 7.4, the
> answer is "no".
> 

Good, could you try current mesa master? There're a few fixes (e.g. for fog coords, Wpos in fragment program) that may improve rendering for you.
Comment 20 WolfgangKoebler 2009-04-18 03:31:10 UTC
I upgraded to Mesa and libdrm to git (2009-04-18) (and PlaneShift to 0.4.03 + updater 2009-04-17).

With the original gldrivers.xml I did not notice any changes.
With the modified gldrivers.xml hangups (with ~100% CPU in Kernel) seem to be easier to reproduce. I did not notice any changes in rendering, but I did not get past level loading any more. Before level loading there is only a little 3D, but enough to see the major difference between the original and the modified gldrivers.xml .
Comment 21 WolfgangKoebler 2009-04-18 04:51:27 UTC
I thought I would be through with testing for now, but I still tried to run it with RADEON_DEBUG=all. Funnily now I got past level loading and the (slightly later) hangup now happens with 0% CPU usage. Never mind.

I could kill psclient.bin and, as I run it inside gdb, I got a backtrace. It starts with the functions __kernel_vsyscall, ioctl, drmIoctl, drmCommandWrite, radeonWaitIrq, r300UploadTexImages, r300UpdateTextureState, r300UpdateShaderStates, r300RunRender, _tnl_run_pipeline .
The last RADEON_DEBUG=all outputs are repeated r300UploadSubImage, then:
> r300UploadTexImages: Syncing
> r300Flush
So it appears that there is a r300 hangup after uploading some textures.

This reminds me of a very similar behaviour in another game (tremulous). There I got almost the same RADEON_DEBUG=all outputs and then a "Error: R300 timed out... exiting". This happened only while loading some of the levels and I could avoid it by reducing texture detail settings.

The slightly different symptoms may be due to the slightly different dri settings I used. Using driconf I had changed "Method to limit rendering latency" from default "Let the hardware emit a software interrupt and sleep" to AFAIR "Sleep for brief intervals while waiting for the graphics hardware". With the former method I still got the hangups, but I did not get the error message "Error: R300 timed out... exiting", but just a hangup.

Another relevant setting probably is 
> Option          "GARTSize"              "64"
in /etc/X11/xorg.conf .

However, it appears to me that there is a problem with uploading textures and r300 timeouts, maybe in out-of-video-memory conditions. In case this info is usefull and you need further info, tell me.

(The second issue, that hardly anything is displayed in 3D with the original gldrivers.xml from PlaneShift may be completely unrelated to that.)

[I got the impression that we're drifting away from the original bugreport.]
Comment 22 Michel Dänzer 2009-04-18 09:58:54 UTC
(In reply to comment #21)
> [I got the impression that we're drifting away from the original bugreport.]

Indeed... if the texture upload hangs are with 32 bit processes running on a 64 bit kernel, they might be related to bug 10561.
Comment 23 WolfgangKoebler 2009-04-18 10:54:50 UTC
Linux kernel is 32 bit (debian lenny linux-image-2.6.26-2-686), processor is 64 bit (AMD Athlon(tm) 64 Processor 3200+).
Comment 24 Maciej Cencora 2009-04-18 12:20:21 UTC
I tried newest Planeshift on my rs690 and it seems to be working ok until it just exits with:

Sat Apr 18 21:13:41 2009, <src/client/psmovement.cpp:787 SetRunToPos SEVERE>
Sat Apr 18 21:13:41 2009, Failed to find mesh for SetRunToPos
DistributeLeafObjects failed: !leaf_replaced
  This node contains the following objects:
    0: 'effect_anchor_basic_34' (-1,-1,-1)-(1,1,1)
    1: 'effect_label_13' (nan,nan,nan)-(nan,nan,nan)
    2: 'effect_label_16' (nan,nan,nan)-(nan,nan,nan)
    3: 'effect_label_12' (nan,nan,nan)-(nan,nan,nan)
    4: 'effect_label_3' (nan,nan,nan)-(nan,nan,nan)
    5: 'effect_label_8' (nan,nan,nan)-(nan,nan,nan)
    6: 'effect_label_4' (nan,nan,nan)-(nan,nan,nan)
    7: 'effect_label_7' (nan,nan,nan)-(nan,nan,nan)
    8: 'effect_label_9' (nan,nan,nan)-(nan,nan,nan)
    9: 'effect_label_14' (nan,nan,nan)-(nan,nan,nan)
    10: 'effect_label_15' (nan,nan,nan)-(nan,nan,nan)
    11: 'effect_label_11' (nan,nan,nan)-(nan,nan,nan)
    12: 'effect_label_2' (nan,nan,nan)-(nan,nan,nan)
    13: 'effect_label_5' (nan,nan,nan)-(nan,nan,nan)

so it's a game error.
There're some minor rendering errors but generally it's ok.

Can you try reproducing it on radeon-rewrite Mesa branch and modesetting-gem libdrm branch?
Comment 25 WolfgangKoebler 2009-04-19 05:52:34 UTC
About game error (!leaf_replaced): I know that one, afaics it happend less frequently after I ran the PlaneShift updater.

About your working configuration with rs690: You might have more video memory, which may be relevant as I assumed the bug is related to uploading Textures / Out-of-memory conditions. AFAICS I have 128MB.

About mesa/radeon-rewrite and drm/modesetting-gem:
With those drivers PlaneShift hangs much earlier, after opening the window and diplaying the startup-picture. The last debug output is:
> mtu=8
> Activating texture unit 0
> TX_ENABLE: 00000001  last_hw_tmu=0
> radeonEmitState
> Begin dirty state
> r300EmitAOS: nr=2, ofs=0x00000000
> r300Enable( GL_TEXTURE_RECTANGLE_ARB = GL_FALSE )
> radeonFlush 324
> radeonEmitState
> radeonReleaseDmaRegion 0xa260200
> r300Enable( GL_DEPTH_TEST = GL_FALSE )
> r300Enable( GL_STENCIL_TEST = GL_FALSE )
> radeonFlush 324
> radeonEmitState
> Begin dirty state
> radeonReleaseDmaRegion (nil)
Comment 26 Maciej Cencora 2009-04-19 06:02:21 UTC
(In reply to comment #25)
> About game error (!leaf_replaced): I know that one, afaics it happend less
> frequently after I ran the PlaneShift updater.
> 
> About your working configuration with rs690: You might have more video memory,
> which may be relevant as I assumed the bug is related to uploading Textures /
> Out-of-memory conditions. AFAICS I have 128MB.
> 
> About mesa/radeon-rewrite and drm/modesetting-gem:
> With those drivers PlaneShift hangs much earlier, after opening the window and
> diplaying the startup-picture. The last debug output is:
> > mtu=8
> > Activating texture unit 0
> > TX_ENABLE: 00000001  last_hw_tmu=0
> > radeonEmitState
> > Begin dirty state
> > r300EmitAOS: nr=2, ofs=0x00000000
> > r300Enable( GL_TEXTURE_RECTANGLE_ARB = GL_FALSE )
> > radeonFlush 324
> > radeonEmitState
> > radeonReleaseDmaRegion 0xa260200
> > r300Enable( GL_DEPTH_TEST = GL_FALSE )
> > r300Enable( GL_STENCIL_TEST = GL_FALSE )
> > radeonFlush 324
> > radeonEmitState
> > Begin dirty state
> > radeonReleaseDmaRegion (nil)
> 

I really have no idea what's going wrong. I have 128MB too (rs690 is a integrated chip and uses system memory).

What's left is excluding that this isn't a hardware problem and that it doesn't
happen with fglrx driver.
Comment 27 Adam Jackson 2009-08-24 12:27:20 UTC
Mass version move, cvs -> git
Comment 28 Maciej Cencora 2009-10-10 03:00:17 UTC
Since the original bug is fixed you should create new reports for the remaining problems. Closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.