When executing Planeshift (3.019) I got a assertion (AFAIR) just after joining the game (seems reproduceable, but cant test now as server is down). Assertion fails in r300_vertprog.c:438: r300TranslateVertexShader in Mesa CVS 2007-07-13 (ati r300 dri). psc: r300_vertprog.c:438: r300TranslateVertexShader: Assertion `vp->key.OutputsWritten & (1 << 0)' failed. I ran it in gdb and did a backtrace (see attachment). I am using a self-compiled Mesa CVS (date 2007-07-13) on Debian Etch with Radeon X300SE (ATI Technologies Inc RV370 5B60).
Created attachment 10716 [details] crash backtrace (in gdb)
Hint: I just verified that the bug is easily reproducible. The assertion fails as soon as first 3d data should be displayed. (However the backtraces differ from time to time.)
I removed the assertion from the code, compiled & installed mesa and run the game. The crash was gone, but the rendering was seriously broken (3d display mainly black). However I already had such a broken rendering with earlier mesa versions. There I could avoid them by configuring the game via a file called gldrivers.xml . So I changed this file (look below) and got 'normal' rendering again (there are always a few glitches, but not too many). Then I reenabled the assertion (+compile+install) and ran the game => no crash. I changed the config back again and => crash again (assertion failed). The relevant part of gldrivers.xml is (changed by me to match Mesa 7.x): <!-- * 2006-12-29: S3TC support, texture compression, and AFP seems broken across many video drivers with Mesa 6.5.1 & 6.5.2 --> <rule description="Disable TC and AFP for Mesa 7.x"> <conditions> <regexp string="renderer" pattern="Mesa DRI.*" /> <regexp string="glversion" pattern=".\.. Mesa 7\.." /> </conditions> <applicable> <usecfg>disableTC</usecfg> <usecfg>noafp</usecfg> </applicable> </rule> Without Disabled TC and AFP: - Mesa 6.5.x: rendering is seriously broken (3d display mainly black) - Mesa 7.0: rendering almost works (except for a few strange effects: rainbow colors seem to overlay textures and flicker) - Mesa CVS 2007-07-13: assertion fails Then I tried Mesa 7.0 with "Disabled TC and AFP" and got 'normal' rendering (no rainbows). So: With Disabled TC and AFP: - Mesa 6.5.x: normal rendering - Mesa 7.0: normal rendering - Mesa CVS 2007-07-13: normal rendering So the assertion is probably triggered by TC or AFP. Hope this helps.
This can't be possibly triggered by s3tc (which, btw, to my knowledge isn't broken in "many" drivers, just one, among those which support it, and if not the app really should detect that automatically). Output 0 is the position, it doesn't really make sense if it's not present, so I'm not surprised you get bogus rendering without it... You'd need to figure out why it's 0 in the first place, presumably "afp" refers to arb_fragment_program, but I've no idea what went wrong. You could try printing out the shaders as received by mesa etc.
First, thanks for your (quick) response. Then, maybe I should add a hint about the gldrivers.xml fragment: Planeshift delivers a file called gldrivers.xml, but the fragment is not from Planeshift itself. The quoted fragment is from somewhere on the internet (i do not recall where / AFAIR the link is dead by now). I used it because it helped, but I can not tell anything beyond that. The comment about "broken across many video" drivers is not my opinion, but just a copy and paste from where I got it. Third: Sorry, I do not even know what S3TC or AFP are. I also have no Idea what the code does or should do. I just found an assertion and reported it. So what should I do now (if I can be of any help with my limited knowledge) ? ;-) The problem is that vp=0 ? And I should find out why it is 0 ? Did I get that right ? And what should I print ?
(In reply to comment #5) > but I can not tell anything beyond that. The comment about "broken across many > video" drivers is not my opinion, but just a copy and paste from where I got > it. Whoever wrote it, he's probably wrong :-) > Third: Sorry, I do not even know what S3TC or AFP are. I also have no Idea what > the code does or should do. I just found an assertion and reported it. So what > should I do now (if I can be of any help with my limited knowledge) ? ;-) > The problem is that vp=0 ? And I should find out why it is 0 ? Did I get that > right ? And what should I print ? If you have no idea about mesa, it may not be easy to debug this. A start would be to use debug builds (sometimes errors in an application can trigger bugs in the driver, and this will print out those errors rather than just passing them back to the app which may silently ignore it) or enable the DEBUG_VP code to print out the vertex program. Or you can wait for somebody else who has the hardware to look at it...
Created attachment 10735 [details] more verbose crash log (with mesa debug build + DEBUG_VP) (In reply to comment #6) Here you find a more verbose crash log, with a mesa debug build + DEBUG_VP. Have fun.
(In reply to comment #7) > Created an attachment (id=10735) [details] > more verbose crash log (with mesa debug build + DEBUG_VP) > > (In reply to comment #6) > > Here you find a more verbose crash log, with a mesa debug build + DEBUG_VP. > Have fun. So the vertex program indeed doesn't write output 0. I suspect though it is a position_invariant vertex program (the driver shouldn't crash in any case, as vertex programs not writing to position are indeed allowed even though results are undefined), and the driver doesn't handle that correctly. It used to AFAIK, but I'd guess it probably got broken by the recent changes in the vp output assignment patches.
Created attachment 10743 [details] [review] Patch that should make vertex programs with no pos work again.
Tommy, your patch is flawed, vp is set ~5 lines later... (disclaimer: it's late)
Created attachment 10746 [details] [review] Fixes a problem with the last patch. This should bee better.
(In reply to comment #11) > Created an attachment (id=10746) [details] > Fixes a problem with the last patch. This should bee better. This looks like it should fix the problem. The if should probably be dropped though, less code & much faster :-).
I've fixed this in Git. I tested with arbvptorus and it's a position invariant bug, so we shouldn't always set the POS bit, we should only set it when the program is position invariant. This means the position invariant will work correctly, but the user will still get an assertion if they create a non-position invariant program that does not write result.position. I believe this is the correct behaviour, as (iirc) result.position must be written for the program to be valid. Please report to confirm it's fixed and I'll close the bug.
(In reply to comment #13) > This means the position invariant will work > correctly, but the user will still get an assertion if they create a > non-position invariant program that does not write result.position. I believe > this is the correct behaviour, as (iirc) result.position must be written for > the program to be valid. No, this is not true. Programs are indeed valid if they don't write position (and even if they were not abort would not be an option). The spec explicitly mentions this, though results are undefined. So either drop to swrast if the driver can't handle it at all, or just write that position and be done with it if it works well enough, as results are undefined it must just not lock up doesn't really matter how results turn out...
Thanks for your work. (In reply to comment #11) Ok, new test with git 2007-07-17 (debug build and DEBUG_VP). The Assertion is gone in my test case. However - rendering is still slightly broken (still rainbow effects, see below). - (and this changes nothing for comment #14.) (supplement to comment #3) Without Disabled TC and AFP: - Mesa 6.5.x: rendering is seriously broken (3d display mainly black) - Mesa 7.0: rendering almost works (except for a few strange effects: rainbow colors seem to overlay textures and flicker) - Mesa git 2007-07-13: assertion fails - Mesa git 2007-07-17: just as Mesa 7.0: rendering almost works (except for a few strange effects: rainbow colors seem to overlay textures and flicker) With Disabled TC and AFP: - Mesa 6.5.x: normal rendering - Mesa 7.0: normal rendering - Mesa git 2007-07-13: normal rendering - Mesa git 2007-07-17: normal rendering (Of course Mesa CVS is nonsense by now, should be Mesa git) BTW: Should I do anything about the rendering problems (post logfile, provide screenshots, file another bugreport) ? Because I just stumbled over this: Mesa: User error: GL_INVALID_ENUM in glActiveTexture(texture) ____________Vertex program 4 __________ # Vertex Program/Shader 0: MOV OUTPUT[1], INPUT[3]; 1: MOV OUTPUT[4].xy, INPUT[17]; 2: END *********************************WARN_ONCE********************************* File r300_texstate.c function r300SetTexImages line 227 DXT 3/5 suffers from multitexturing problems! ***************************************************************************
erm, of course in comment #15 it should be (In reply to comment #13) instead of (In reply to comment #11)
Can you reproduce the bug with mesa 7.4 or master?
Ok, I started PlaneShift again. This time it's PlaneShift 0.4.03 with Mesa 7.4 and Debian Lenny. I sometimes get crashes that are not related to Mesa. I did not see a failed assertion. I did not see any rainbow effects. With the modified gldrivers.xml rendering mostly works. Yet sometimes textures seem to be missing/black and I get hangups with almost 100% CPU usage somewhere in the kernel (top says 9x% sys). With the original gldrivers.xml hardly anything is displayed in 3D. So if your question is if I can reproduce the assertion with Mesa 7.4, the answer is "no".
(In reply to comment #18) > Ok, I started PlaneShift again. This time it's PlaneShift 0.4.03 with Mesa 7.4 > and Debian Lenny. > I sometimes get crashes that are not related to Mesa. > I did not see a failed assertion. I did not see any rainbow effects. > > With the modified gldrivers.xml rendering mostly works. Yet sometimes textures > seem to be missing/black and I get hangups with almost 100% CPU usage somewhere > in the kernel (top says 9x% sys). > With the original gldrivers.xml hardly anything is displayed in 3D. > > So if your question is if I can reproduce the assertion with Mesa 7.4, the > answer is "no". > Good, could you try current mesa master? There're a few fixes (e.g. for fog coords, Wpos in fragment program) that may improve rendering for you.
I upgraded to Mesa and libdrm to git (2009-04-18) (and PlaneShift to 0.4.03 + updater 2009-04-17). With the original gldrivers.xml I did not notice any changes. With the modified gldrivers.xml hangups (with ~100% CPU in Kernel) seem to be easier to reproduce. I did not notice any changes in rendering, but I did not get past level loading any more. Before level loading there is only a little 3D, but enough to see the major difference between the original and the modified gldrivers.xml .
I thought I would be through with testing for now, but I still tried to run it with RADEON_DEBUG=all. Funnily now I got past level loading and the (slightly later) hangup now happens with 0% CPU usage. Never mind. I could kill psclient.bin and, as I run it inside gdb, I got a backtrace. It starts with the functions __kernel_vsyscall, ioctl, drmIoctl, drmCommandWrite, radeonWaitIrq, r300UploadTexImages, r300UpdateTextureState, r300UpdateShaderStates, r300RunRender, _tnl_run_pipeline . The last RADEON_DEBUG=all outputs are repeated r300UploadSubImage, then: > r300UploadTexImages: Syncing > r300Flush So it appears that there is a r300 hangup after uploading some textures. This reminds me of a very similar behaviour in another game (tremulous). There I got almost the same RADEON_DEBUG=all outputs and then a "Error: R300 timed out... exiting". This happened only while loading some of the levels and I could avoid it by reducing texture detail settings. The slightly different symptoms may be due to the slightly different dri settings I used. Using driconf I had changed "Method to limit rendering latency" from default "Let the hardware emit a software interrupt and sleep" to AFAIR "Sleep for brief intervals while waiting for the graphics hardware". With the former method I still got the hangups, but I did not get the error message "Error: R300 timed out... exiting", but just a hangup. Another relevant setting probably is > Option "GARTSize" "64" in /etc/X11/xorg.conf . However, it appears to me that there is a problem with uploading textures and r300 timeouts, maybe in out-of-video-memory conditions. In case this info is usefull and you need further info, tell me. (The second issue, that hardly anything is displayed in 3D with the original gldrivers.xml from PlaneShift may be completely unrelated to that.) [I got the impression that we're drifting away from the original bugreport.]
(In reply to comment #21) > [I got the impression that we're drifting away from the original bugreport.] Indeed... if the texture upload hangs are with 32 bit processes running on a 64 bit kernel, they might be related to bug 10561.
Linux kernel is 32 bit (debian lenny linux-image-2.6.26-2-686), processor is 64 bit (AMD Athlon(tm) 64 Processor 3200+).
I tried newest Planeshift on my rs690 and it seems to be working ok until it just exits with: Sat Apr 18 21:13:41 2009, <src/client/psmovement.cpp:787 SetRunToPos SEVERE> Sat Apr 18 21:13:41 2009, Failed to find mesh for SetRunToPos DistributeLeafObjects failed: !leaf_replaced This node contains the following objects: 0: 'effect_anchor_basic_34' (-1,-1,-1)-(1,1,1) 1: 'effect_label_13' (nan,nan,nan)-(nan,nan,nan) 2: 'effect_label_16' (nan,nan,nan)-(nan,nan,nan) 3: 'effect_label_12' (nan,nan,nan)-(nan,nan,nan) 4: 'effect_label_3' (nan,nan,nan)-(nan,nan,nan) 5: 'effect_label_8' (nan,nan,nan)-(nan,nan,nan) 6: 'effect_label_4' (nan,nan,nan)-(nan,nan,nan) 7: 'effect_label_7' (nan,nan,nan)-(nan,nan,nan) 8: 'effect_label_9' (nan,nan,nan)-(nan,nan,nan) 9: 'effect_label_14' (nan,nan,nan)-(nan,nan,nan) 10: 'effect_label_15' (nan,nan,nan)-(nan,nan,nan) 11: 'effect_label_11' (nan,nan,nan)-(nan,nan,nan) 12: 'effect_label_2' (nan,nan,nan)-(nan,nan,nan) 13: 'effect_label_5' (nan,nan,nan)-(nan,nan,nan) so it's a game error. There're some minor rendering errors but generally it's ok. Can you try reproducing it on radeon-rewrite Mesa branch and modesetting-gem libdrm branch?
About game error (!leaf_replaced): I know that one, afaics it happend less frequently after I ran the PlaneShift updater. About your working configuration with rs690: You might have more video memory, which may be relevant as I assumed the bug is related to uploading Textures / Out-of-memory conditions. AFAICS I have 128MB. About mesa/radeon-rewrite and drm/modesetting-gem: With those drivers PlaneShift hangs much earlier, after opening the window and diplaying the startup-picture. The last debug output is: > mtu=8 > Activating texture unit 0 > TX_ENABLE: 00000001 last_hw_tmu=0 > radeonEmitState > Begin dirty state > r300EmitAOS: nr=2, ofs=0x00000000 > r300Enable( GL_TEXTURE_RECTANGLE_ARB = GL_FALSE ) > radeonFlush 324 > radeonEmitState > radeonReleaseDmaRegion 0xa260200 > r300Enable( GL_DEPTH_TEST = GL_FALSE ) > r300Enable( GL_STENCIL_TEST = GL_FALSE ) > radeonFlush 324 > radeonEmitState > Begin dirty state > radeonReleaseDmaRegion (nil)
(In reply to comment #25) > About game error (!leaf_replaced): I know that one, afaics it happend less > frequently after I ran the PlaneShift updater. > > About your working configuration with rs690: You might have more video memory, > which may be relevant as I assumed the bug is related to uploading Textures / > Out-of-memory conditions. AFAICS I have 128MB. > > About mesa/radeon-rewrite and drm/modesetting-gem: > With those drivers PlaneShift hangs much earlier, after opening the window and > diplaying the startup-picture. The last debug output is: > > mtu=8 > > Activating texture unit 0 > > TX_ENABLE: 00000001 last_hw_tmu=0 > > radeonEmitState > > Begin dirty state > > r300EmitAOS: nr=2, ofs=0x00000000 > > r300Enable( GL_TEXTURE_RECTANGLE_ARB = GL_FALSE ) > > radeonFlush 324 > > radeonEmitState > > radeonReleaseDmaRegion 0xa260200 > > r300Enable( GL_DEPTH_TEST = GL_FALSE ) > > r300Enable( GL_STENCIL_TEST = GL_FALSE ) > > radeonFlush 324 > > radeonEmitState > > Begin dirty state > > radeonReleaseDmaRegion (nil) > I really have no idea what's going wrong. I have 128MB too (rs690 is a integrated chip and uses system memory). What's left is excluding that this isn't a hardware problem and that it doesn't happen with fglrx driver.
Mass version move, cvs -> git
Since the original bug is fixed you should create new reports for the remaining problems. Closing.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.