Bugzilla – Bug 11380
r300 DRI misrenders 3D objects
Last modified: 2007-08-14 09:59:55 UTC
When trying to use DRI enabled applications on an R300 (ATI Mobility Radeon 9600 M10, PCI ID 1002:4e50 subsystem 1025:0046; 1400x1050 at 24 bpp), the application's window is garbled (there is no crash, and it looks like the application is working, just not drawing the correct items).
Neither X logs nor dmesg show any error messages, LIBGL_DEBUG=verbose glxinfo doesn't report anything odd.
2D acceleration is working perfectly, identical software works well (including DRI) with an r200 card.
Mesa 6.5.3 (also tried 7.0 with patched xorg-server, same effect)
libdrm from git
Linux 2.6.22-rc4 (tried both stock kernel and kernel with current DRI git modules; also tried 2.6.21)
The same machine worked well roughly a year ago, with xorg-server 1.1.1, xf86-video-ati 6.6.1, libdrm 2.1, Linux 2.6.18-rc2, glibc 2.4, gcc 4.1.1
Misrenderings depend on the application.
blender just garbles its window (will attach a screenshot), tuxracer/ppracer draw the background and nothing else (possibly the menus etc. are hidden behind the background), glxgears shows just a black window and a pretty high FPS (9281 frames in 5.0 seconds = 1856.061 FPS
9389 frames in 5.0 seconds = 1877.744 FPS)
Created attachment 10459 [details]
Screenshot showing garbled blender
This is probably one of my commits... Maybe the vertex shader clean up. I'm compiling blender now and I'll start bisecting this assuming I can reproduce the bug. :-)
(In reply to comment #0)
> The same machine worked well roughly a year ago, with xorg-server 1.1.1,
> xf86-video-ati 6.6.1, libdrm 2.1, Linux 2.6.18-rc2, glibc 2.4, gcc 4.1.1
What version of Mesa was that? Does that version still work with the rest of the system unchanged?
I just tested Blender 2.44 with Mesa from git and it worked fine... Could you try with Mesa from git?
Mesa from today's git does the same thing as 7.0 (and 6.5.3).
The last Mesa version known to work for me is a CVS snapshot from 2006/07/19. I'll try putting that in (it'll take a while to adjust xorg-server 188.8.131.52 to compile against it though) next.
(In reply to comment #5)
> Mesa from today's git does the same thing as 7.0 (and 6.5.3).
> The last Mesa version known to work for me is a CVS snapshot from 2006/07/19.
> I'll try putting that in (it'll take a while to adjust xorg-server 184.108.40.206 to
> compile against it though) next.
Well I don't have such problems with latest mesa.
Nor I. There were problems (which look nothing like the attached screenshot) with blender up till last week, but they were remedied for me and it works flawlessly. I am running Xorg 7.2, so I don't know if that would make any difference.
Going back to the old Mesa while leaving the rest the same doesn't fix things -- chances are this is an interoperability problem with something else.
Is anyone for whom this works using gcc 4.2.x and/or xorg-server 1.3.0?
I've now tried with xorg-server 1.3.0 and it works fine. gcc on this machine is 4.1.2.
(In reply to comment #8)
> Going back to the old Mesa while leaving the rest the same doesn't fix things
> -- chances are this is an interoperability problem with something else.
> Is anyone for whom this works using gcc 4.2.x and/or xorg-server 1.3.0?
xorg-server shouldn't really matter (in particular, you don't need to rebuild it at all or even restart it to try different 3D drivers), so the prime suspect would appear to be gcc - the Mesa codebase is relatively fragile wrt compiler optimizations. Does it work if you build Mesa with the old compiler, or if you build the linux-dri or linux-dri-debug target instead of linux-dri-x86?
I guess a simple first test for this theory would be to try the old Mesa binaries.
(In reply to comment #9)
> I've now tried with xorg-server 1.3.0 and it works fine. gcc on this machine
> is 4.1.2.
And here it works OK with gcc version 4.2.1 20070606 (prerelease).
It's definitely a compiler thing. If I just build it (7.0) with make linux-dri-x86 OPT_FLAGS="-O0" it works perfectly.
I'll try to figure out what flag exactly causes it to break.
The culprit is -O2, it works at -O0 and -O1. Will trace it down to a specific part of -O2 and report it to the gcc guys
(In reply to comment #13)
> The culprit is -O2, it works at -O0 and -O1. Will trace it down to a specific
> part of -O2 and report it to the gcc guys
The Mesa codebase is not strict aliasing safe, make sure you use -fno-strict-aliasing with -O2.
I know -- the screenshot is the result of -O2 -march=i686 -fno-strict-aliasing -fweb -frename-registers.
(And the r200 driver and software rendering compiled with those flags works perfectly, it's just r300).
Still getting the same results with -O2 and a number of the optimizations that are usually part of -O2 disabled (-O2 -fno-thread-jumps -fno-crossjumping -fno-optimize-sibling-calls -fno-cse-follow-jumps -fno-cse-skip-blocks -fno-gcse -fno-gcse-lm -fno-expensive-optimizations -fno-strict-aliasing)
Currently building with some more optimizations disabled.
(In reply to comment #15)
> Currently building with some more optimizations disabled.
You could try without the -ffast-math switch. It doesn't really do the same things if optimizations are enabled or not (though -O1 was enough to trigger the different behaviour for me (bug #9856)).
-ftree-vrp (enabled by default at -O2 and higher) is the culprit.
If you compile r300_state.c with -O2 -fno-tree-vrp, everything works as expected.
All other files can be compiled with full -O2.
The gcc guys claim -fno-tree-vrp doesn't generate any significant difference in the asm output, yet that flag definitely fixes it here.
Could this be some strange timing issue triggered by unexpectedly fast execution of r300SetupPixelShader()?
How did you narrowed down to this specific function ? I also don't see any differences in preprocessed source.
The preprocessor output can't be different because -f[no-]tree-vrp doesn't affect the preprocessor -- it affects the generated assembly code.
I narrowed down the function by splitting radeon_state.c into 2 files, compiling one with -ftree-vrp and one with -fno-tree-vrp, moving functions around between the 2 files until a version with only 1 function in the -ftree-vrp one broke (removing any "static" keywords of course).
Can you provide asm output of the function with & without the -ftree-vrp options, i haven't access to gcc 4.2.1.
I experience the same problem on a RV350 with blender, building with -O2 also triggers a hardlock when using compiz. It works fine when building r300_state.c with -fno-tree-vrp. It also runs fine when building with -O3, or when building with -O2 with gcc 4.3
I've isolated the code of r300SetupPixelShader() in a separate file, and diffed the assembly code. It shows this a gcc bug (a teammate will add more info on the upstream gcc bug report).
For reference, I'll attach the preprocessed and assembly files, as well as the assembly diff when building with -fno-tree-vrp, but this bug can probably be closed as INVALID.
Created attachment 11032 [details]
preprocessed code of r300SetupPixelShader()
Created attachment 11033 [details]
assembly code of r300SetupPixelShader() when built with -O2
Created attachment 11034 [details]
diff of assembly code of r300SetupPixelShader() when building with "-O2" and "-O2 -fno-tree-vrp"
nb: -fno-ivopts option can be used instead of -fno-tree-vrp since the vrp issue is triggered by ivopts issue :)
*** Bug 11605 has been marked as a duplicate of this bug. ***