Bug 7371 - tux leaves colored trails in ppracer
tux leaves colored trails in ppracer
Status: RESOLVED FIXED
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/r300
6.5
x86 (IA32) Linux (All)
: high normal
Assigned To: Default DRI bug account
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-06-30 02:33 UTC by Hans de Goede
Modified: 2006-08-03 01:18 UTC (History)
1 user (show)

See Also:


Attachments
Backtrace of ppracer crashing when run under electricfence with mesa 6.5 snapshot (1.77 KB, text/plain)
2006-06-30 13:18 UTC, Hans de Goede
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hans de Goede 2006-06-30 02:33:30 UTC
I friend has been so kind to donate me a radeon 9800 so now I'm giving this card
a spin with the new opensource r300 dri support.

I've just tried ppracer and for 99% it works fine, however tux leaves a colored
(in 16 bpp mostly red, in 32 bpp mostly yellow) track behind him in the snow.
Tux is supposed to leave a track in the form of an indentation in the snow, but
this track is normally white. Last time I checked with my radeon 9200 it worked
fine (iow not an application problem). The first few frames things are fine and
then all of a sudden the trail becomes colored.

This might be related to bug 6991 which is about ppracer crashing after a while,
maybe this is because of memory corruption and because of the 64 bit nature of
my system some different memory gets corrupted?


I'm running a fully up2date Fedora development branch (rawhide) x86_64 system.
So I've got pretty much the latest version of everything.

I'm a skilled C programmer and have a bachelor in electronics. I've written a
few kernel drivers and lots of userland code. I'm however totaly at loss when it
comes where to begin with debugging OpenGL / dri problems.

Please let me know what I can do to help debug this, compiling CVS versions
adding printf running it through a debugger etc, should all not be a problem.
Comment 1 Hans de Goede 2006-06-30 02:36:39 UTC
I forgot, when running ppracer, just before the race starts it says:
*********************************WARN_ONCE*********************************
File r300_vertexprog.c function t_dst_index line 178
Unknown output 3
***************************************************************************

After this the track left is still fine though, it doesn't become colored until a 
couple of seconds after the start of the race.
Comment 2 Hans de Goede 2006-06-30 12:51:01 UTC
I just tried this with a current checkout of cvs instead of the 6.5 snapshot and
this most definetly is related to bug 6991. With cvs it crashes hard (the whole
machine) at the point where usually the color trails begin.

I thought that this happened always at the same point, but unfortunatly that is
not true. But it often happens at the same point.

So this is most likely a dup of bug 6991, but it might be an idea to debug this
with the 6,5 snapshot instead of cvs because that seems easier to debug.

I stil think tihs is memory corruption if I run ppracer through electric fence
it segfaults the instant the game starts

Comment 3 Hans de Goede 2006-06-30 13:18:25 UTC
Created attachment 6085 [details]
Backtrace of ppracer crashing when run under electricfence with mesa 6.5 snapshot

Hi,

I just ran ppracer under gdb in combination with electric fence (a malloc
debugger, which checks out of bound accesses) and I've just attached the
backtrace. Or atleast as much as I got of a backtrace before gdb crashed.

The problem is that currently I'm using the following to get electric fence
linked in:
export LD_PRELOAD=libefence.so

Neat trick, but I don't know how todo this with it also influencing gdb, which
then quickly runs out of virtual memory (each malloc causes 2 whole pages to be
allocated by electricfence).

When I've got the time I'll rebuild ppracer directly linked against efence and
investigate some more. I think we have something here.
Comment 4 Brian Paul 2006-07-01 08:13:02 UTC
If you suspect a memory error, valgrind is an excellent tool to use.
Run with 'valgrind --tool=memcheck ppracer'
Comment 5 Hans de Goede 2006-07-01 09:09:51 UTC
Ok,

I've managed to fix the segfault when running under ElectricFence,
ppracer has a file called:
/usr/share/ppracer/courses/themes/models/common/tree.png
which measures 256x255 this should be 256x256. I find it strange though that the
GLU functions involved start writing out of bounds instead of just complaining
and returning an error. I believe this is a bug, let me know if you agree, then
I'll post a seperate bug for this. Unfortunatly with this bug fixed I still have
colored trails.

Using Electricfence isn't helping me any fruther because now it gets sofar into
the game that EF aborts because it runs out of memory. I'll try valgrind when
I've got some more time.
Comment 6 Hans de Goede 2006-07-01 13:47:36 UTC
Erm,

Yes using valgrind is very entertaining, it ends with this:
==3017== More than 10000000 total errors detected.  I'm not reporting any more.
==3017== Final error counts will be inaccurate.  Go fix your program!
==3017== Rerun with --error-limit=no to disable this cutoff.  Note
==3017== that errors may occur in your program without prior warning from
==3017== Valgrind, because errors are no longer being displayed.  

I did take a somewhat closer look, but I'm not famliar enough with the r300 code
(or with mesa / X at all) to get very far I noticed these 3 warnings many times:
==3017== Conditional jump or move depends on uninitialised value(s)
==3017==    at 0x965FBA2: r300GartOffsetFromVirtual (r300_ioctl.c:902)
==3017==    by 0x96796D3: emit_vector (r300_maos.c:201)
==3017==    by 0x9679E41: r300EmitArrays (r300_maos.c:339)
==3017==    by 0x966B782: r300_run_vb_render (r300_render.c:386)
==3017==    by 0x966C360: r300_run_tcl_render (r300_render.c:575)
==3017==    by 0x9738273: _tnl_run_pipeline (t_pipeline.c:162)   
==3017==    by 0x977A346: _tnl_flush_vtx (t_vtx_exec.c:281)
==3017==    by 0x9773D1C: _tnl_FlushVertices (t_vtx_api.c:873)
==3017==    by 0x96D2866: _mesa_PopMatrix (matrix.c:270)   
==3017==    by 0x40B60C: (within /usr/bin/ppracer)
==3017==    by 0x454C22: (within /usr/bin/ppracer)
==3017==    by 0x45F65A: (within /usr/bin/ppracer)

==3017== Use of uninitialised value of size 8
==3017==    at 0x96792B8: emit_vec8 (r300_maos.c:116)
==3017==    by 0x967974E: emit_vector (r300_maos.c:213)
==3017==    by 0x9679E41: r300EmitArrays (r300_maos.c:339)
==3017==    by 0x966B782: r300_run_vb_render (r300_render.c:386)
==3017==    by 0x966C360: r300_run_tcl_render (r300_render.c:575)
==3017==    by 0x9738273: _tnl_run_pipeline (t_pipeline.c:162)   
==3017==    by 0x977A346: _tnl_flush_vtx (t_vtx_exec.c:281)      
==3017==    by 0x9773D1C: _tnl_FlushVertices (t_vtx_api.c:873)   
==3017==    by 0x96D2866: _mesa_PopMatrix (matrix.c:270)
==3017==    by 0x40B60C: (within /usr/bin/ppracer)
==3017==    by 0x454C22: (within /usr/bin/ppracer)
==3017==    by 0x45F65A: (within /usr/bin/ppracer)

==3017== Conditional jump or move depends on uninitialised value(s)
==3017==    at 0x965FB0E: r300IsGartMemory (r300_ioctl.c:886)
==3017==    by 0x9679821: r300EmitElts (r300_maos.c:237)
==3017==    by 0x966AEEC: r300_render_vb_primitive (r300_render.c:311)
==3017==    by 0x966B98F: r300_run_vb_render (r300_render.c:437)
==3017==    by 0x965A944: radeonDrawElements (radeon_vtxfmt_a.c:335)
==3017==    by 0x9727C29: neutral_DrawElements (vtxfmt_tmp.h:341)   
==3017==    by 0x4479C0: (within /usr/bin/ppracer)
==3017==    by 0x44BF9D: (within /usr/bin/ppracer)
==3017==    by 0x41DF33: (within /usr/bin/ppracer)
==3017==    by 0x41F033: (within /usr/bin/ppracer)
==3017==    by 0x434DB4: (within /usr/bin/ppracer)
==3017==    by 0x45F65A: (within /usr/bin/ppracer)

I think that the emit_vec8 case _may_ be caused by the r300IsGartMemory case as
emit_vec8 calls r300IsGartMemory and acts on the result when this waring is caused.

The other 2 both seem to indicate that
rmesa->radeon.radeonScreen->gartTextures.size is uninitialised.

I've set a breakpoint in gdb on r300GartOffsetFromVirtual, and then printed the
contents of l jump or move depends on uninitialised value(s)
==3017==    at 0x965FBA2: r300GartOffsetFromVirtual (r300_ioctl.c:902)
==3017==    by 0x96796D3: emit_vector (r300_maos.c:201)
==3017==    by 0x9679E41: r300EmitArrays (r300_maos.c:339)
==3017==    by 0x966B782: r300_run_vb_render (r300_render.c:386)
==3017==    by 0x966C360: r300_run_tcl_render (r300_render.c:575)
==3017==    by 0x9738273: _tnl_run_pipeline (t_pipeline.c:162)   
==3017==    by 0x977A346: _tnl_flush_vtx (t_vtx_exec.c:281)
==3017==    by 0x9773D1C: _tnl_FlushVertices (t_vtx_api.c:873)
==3017==    by 0x96D2866: _mesa_PopMatrix (matrix.c:270)   
==3017==    by 0x40B60C: (within /usr/bin/ppracer)
==3017==    by 0x454C22: (within /usr/bin/ppracer)
==3017==    by 0x45F65A: (within /usr/bin/ppracer)

==3017== Use of uninitialised value of size 8
==3017==    at 0x96792B8: emit_vec8 (r300_maos.c:116)
==3017==    by 0x967974E: emit_vector (r300_maos.c:213)
==3017==    by 0x9679E41: r300EmitArrays (r300_maos.c:339)
==3017==    by 0x966B782: r300_run_vb_render (r300_render.c:386)
==3017==    by 0x966C360: r300_run_tcl_render (r300_render.c:575)
==3017==    by 0x9738273: _tnl_run_pipeline (t_pipeline.c:162)   
==3017==    by 0x977A346: _tnl_flush_vtx (t_vtx_exec.c:281)      
==3017==    by 0x9773D1C: _tnl_FlushVertices (t_vtx_api.c:873)   
==3017==    by 0x96D2866: _mesa_PopMatrix (matrix.c:270)
==3017==    by 0x40B60C: (within /usr/bin/ppracer)
==3017==    by 0x454C22: (within /usr/bin/ppracer)
==3017==    by 0x45F65A: (within /usr/bin/ppracer)

==3017== Conditional jump or move depends on uninitialised value(s)
==3017==    at 0x965FB0E: r300IsGartMemory (r300_ioctl.c:886)
==3017==    by 0x9679821: r300EmitElts (r300_maos.c:237)
==3017==    by 0x966AEEC: r300_render_vb_primitive (r300_render.c:311)
==3017==    by 0x966B98F: r300_run_vb_render (r300_render.c:437)
==3017==    by 0x965A944: radeonDrawElements (radeon_vtxfmt_a.c:335)
==3017==    by 0x9727C29: neutral_DrawElements (vtxfmt_tmp.h:341)   
==3017==    by 0x4479C0: (within /usr/bin/ppracer)
==3017==    by 0x44BF9D: (within /usr/bin/ppracer)
==3017==    by 0x41DF33: (within /usr/bin/ppracer)
==3017==    by 0x41F033: (within /usr/bin/ppracer)
==3017==    by 0x434DB4: (within /usr/bin/ppracer)
==3017==    by 0x45F65A: (within /usr/bin/ppracer)

I think that the emit_vec8 case _may_ be caused by the r300IsGartMemory case as
emit_vec8 calls r300IsGartMemory and acts on the result when this waring is caused.

The other 2 both seem to indicate that
rmesa->radeon.radeonScreen->gartTextures.size is uninitialised.

I've set a breakpoint in gdb on r300GartOffsetFromVirtual, and then printed the
contents of l jump or move depends on uninitialised value(s)
==3017==    at 0x965FBA2: r300GartOffsetFromVirtual (r300_ioctl.c:902)
==3017==    by 0x96796D3: emit_vector (r300_maos.c:201)
==3017==    by 0x9679E41: r300EmitArrays (r300_maos.c:339)
==3017==    by 0x966B782: r300_run_vb_render (r300_render.c:386)
==3017==    by 0x966C360: r300_run_tcl_render (r300_render.c:575)
==3017==    by 0x9738273: _tnl_run_pipeline (t_pipeline.c:162)   
==3017==    by 0x977A346: _tnl_flush_vtx (t_vtx_exec.c:281)
==3017==    by 0x9773D1C: _tnl_FlushVertices (t_vtx_api.c:873)
==3017==    by 0x96D2866: _mesa_PopMatrix (matrix.c:270)   
==3017==    by 0x40B60C: (within /usr/bin/ppracer)
==3017==    by 0x454C22: (within /usr/bin/ppracer)
==3017==    by 0x45F65A: (within /usr/bin/ppracer)

==3017== Use of uninitialised value of size 8
==3017==    at 0x96792B8: emit_vec8 (r300_maos.c:116)
==3017==    by 0x967974E: emit_vector (r300_maos.c:213)
==3017==    by 0x9679E41: r300EmitArrays (r300_maos.c:339)
==3017==    by 0x966B782: r300_run_vb_render (r300_render.c:386)
==3017==    by 0x966C360: r300_run_tcl_render (r300_render.c:575)
==3017==    by 0x9738273: _tnl_run_pipeline (t_pipeline.c:162)   
==3017==    by 0x977A346: _tnl_flush_vtx (t_vtx_exec.c:281)      
==3017==    by 0x9773D1C: _tnl_FlushVertices (t_vtx_api.c:873)   
==3017==    by 0x96D2866: _mesa_PopMatrix (matrix.c:270)
==3017==    by 0x40B60C: (within /usr/bin/ppracer)
==3017==    by 0x454C22: (within /usr/bin/ppracer)
==3017==    by 0x45F65A: (within /usr/bin/ppracer)

==3017== Conditional jump or move depends on uninitialised value(s)
==3017==    at 0x965FB0E: r300IsGartMemory (r300_ioctl.c:886)
==3017==    by 0x9679821: r300EmitElts (r300_maos.c:237)
==3017==    by 0x966AEEC: r300_render_vb_primitive (r300_render.c:311)
==3017==    by 0x966B98F: r300_run_vb_render (r300_render.c:437)
==3017==    by 0x965A944: radeonDrawElements (radeon_vtxfmt_a.c:335)
==3017==    by 0x9727C29: neutral_DrawElements (vtxfmt_tmp.h:341)   
==3017==    by 0x4479C0: (within /usr/bin/ppracer)
==3017==    by 0x44BF9D: (within /usr/bin/ppracer)
==3017==    by 0x41DF33: (within /usr/bin/ppracer)
==3017==    by 0x41F033: (within /usr/bin/ppracer)
==3017==    by 0x434DB4: (within /usr/bin/ppracer)
==3017==    by 0x45F65A: (within /usr/bin/ppracer)

I think that the emit_vec8 case _may_ be caused by the r300IsGartMemory case as
emit_vec8 calls r300IsGartMemory and acts on the result when this waring is caused.

The other 2 both seem to indicate that
rmesa->radeon.radeonScreen->gartTextures.size is uninitialised.

I've set a breakpoint in gdb on r300GartOffsetFromVirtual, and then printed the
contents of rmesa->radeon.radeonScreen->gartTextures. I'm not supposed how
things are supposed to look, but this inded seems odd:
{handle = 3492814848, size = 5111808, map = 0x2aaab7233000}

Anyways I've spent way too much time on this r300 experiment as is already,
without any real progress. So i'm dropping my good old 9250 back in my PC for
know (thats much quiter too!).

Let me know if there is anything else I can do. I do believe that valgrind can
be a helpfull tool for debugging this, but it needs to be in the hands of
someone more familiar with the code.
Comment 7 Aapo Tahkola 2006-07-31 16:35:14 UTC
Those errors are normal because the GART memory address comes from a function
that valgrind has not trapped and thus doesn't know its valid.
I believe this bug relates to vertex programs as it changes color when moving
tux left/right. Last time I saw this was more than 6 months ago so I might be
wrong...

Can you try again with current cvs?
Comment 8 multinymous 2006-07-31 17:33:06 UTC
Current CVS fixes the colored tracks and tracks-related hang on my box.
Works perfectly now.
Comment 9 Hans de Goede 2006-08-02 13:25:59 UTC
I can confirm that this is fixed in current CVS, and bug 7372 is also fixed.
Very good job guys!

As far as I'm concerned (and I'm the reporter) this bug can be closed. But I
don't know what is the usual procedure for Mesa (maybe you keep bugs open untill
a fixed version is published), so I'm leaving this open for now.
Comment 10 Jerome Glisse 2006-08-03 01:18:38 UTC
We simply mark it as fixed :)