Bug 20539 - Segmentation Fault with Radeon (maybe with pixmap command)
Summary: Segmentation Fault with Radeon (maybe with pixmap command)
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/r300 (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: medium major
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-03-08 09:04 UTC by Andreas Cord-Landwehr
Modified: 2009-05-17 08:09 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Backtrace for Segfault (2.92 KB, text/plain)
2009-03-08 09:04 UTC, Andreas Cord-Landwehr
Details
Valgrind run (132.72 KB, text/plain)
2009-03-13 02:23 UTC, Andreas Cord-Landwehr
Details
Run: output since kwin was started (10.28 KB, text/plain)
2009-03-13 03:37 UTC, Andreas Cord-Landwehr
Details
Another Valgrind output (597.82 KB, text/plain)
2009-03-13 03:38 UTC, Andreas Cord-Landwehr
Details
Another backtrace by Xorg log: /var/log/Xorg.log.old (55.20 KB, text/plain)
2009-03-14 03:05 UTC, Andreas Cord-Landwehr
Details
Valgrind run 2: valgrind output (576.56 KB, text/plain)
2009-03-14 03:43 UTC, Andreas Cord-Landwehr
Details
Valgrind run 2: according Xorg.log (90.85 KB, text/plain)
2009-03-14 03:44 UTC, Andreas Cord-Landwehr
Details
Valgrind run 2: according .xsession (59.63 KB, text/plain)
2009-03-14 03:44 UTC, Andreas Cord-Landwehr
Details
Increase reference count of texture objects referenced in current hardware state (2.88 KB, patch)
2009-04-28 01:52 UTC, Michel Dänzer
Details | Splinter Review

Description Andreas Cord-Landwehr 2009-03-08 09:04:58 UTC
Created attachment 23655 [details]
Backtrace for Segfault

On my installation the radeon driver results in a segault. The bug has also been posted on http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=516567
For additional information (X-Server version, etc.) please look there.

The reproduction is still heavy. By opening Eclipse a lot of times, switching desktops, switching to other programs it is possible to get a crash in around 10 to 20 minutes. But until now I don't see any regularity in these crashes.
Comment 1 Michel Dänzer 2009-03-09 02:34:51 UTC
I suspect it's some kind of memory corruption; if you can run the X server in valgrind, that might give some hints.

This might be related to bug 17358, but let's keep the reports separate for now.
Comment 2 Andreas Cord-Landwehr 2009-03-13 02:23:01 UTC
Created attachment 23827 [details]
Valgrind run

Valgrind with default options and file-output. KDE was invoked by "export; startkde".
Comment 3 Michel Dänzer 2009-03-13 02:38:16 UTC
Thanks. Unfortunately, all the 'invalid write' errors in RADEON.*CP are false positives due to valgrind not knowing about the GART memory used for transferring commands to the GPU. You'll either have to suppress those somehow (can't help you with that, sorry) or try again with --error-limit=no and see if any other errors appear.

I don't know if the xkbcomp errors are harmless or could be a problem.
Comment 4 Andreas Cord-Landwehr 2009-03-13 03:37:14 UTC
Created attachment 23828 [details]
Run: output since kwin was started

This output was generated from the moment, kwin was restarted with "kwin --replace". I think this is the most interesting part. The next file give the output after kwin crashed.
Comment 5 Andreas Cord-Landwehr 2009-03-13 03:38:25 UTC
Created attachment 23829 [details]
Another Valgrind output

Unfortunately valgrind overwrites the output file, each time it makes a new summary. Here is the last part of the output.
Comment 6 Michel Dänzer 2009-03-13 04:04:59 UTC
(In reply to comment #4)
> This output was generated from the moment, kwin was restarted with "kwin
> --replace".

So, does the problem only happen after restarting kwin? Also, are you using the kwin compositing effects, and if so using the OpenGL or XRender backend?

> I think this is the most interesting part.

Yeah, looks like there's a use-after-free in the Mesa r300 driver, that could be a candidate for causing memory corruption. Might be even more useful if there were debugging symbols for r300_dri.so as well.

> The next file give the output after kwin crashed.

I can't see any valgrind output about the X server itself in there, just a lot of '�' characters.

P.S. Please set MIME type text/plain for plain text attachments.
Comment 7 Andreas Cord-Landwehr 2009-03-14 03:05:14 UTC
Created attachment 23845 [details]
Another backtrace by Xorg log: /var/log/Xorg.log.old

As the server just crashed and the logfile was not destroyed at reboot, I can finally give you the Xorg.log file. I also upgraded (to get debugging symbols) libgl1-mesa-dri from 7.3~lenny0 to 7.3-1. That means: no change in crashing behavior by this change.
Comment 8 Andreas Cord-Landwehr 2009-03-14 03:14:56 UTC
The reason I emphasized the usage of kwin is, that kwin and also my keyboard did not work while running Xorg with valgrind. Probably I simply need some parameters to get these both to work...
But I think the interesting point is: I'm using KDE 4.2 with the fancy (standard) kwin window decorations. (e.g. Desktop effects enabled with "Improved window management", "Shadows" and "Various animations" as they are called in KDE desktop settings.) If I remember correctly, the crash always occurred on usage of such an effect (fading of window, shadowing, or switching screen).
Probably those effects and the according hardware functions were not used on former kwin versions that often...
Next I will try to get a better valgrind output.
Comment 9 Andreas Cord-Landwehr 2009-03-14 03:16:49 UTC
Oh, I forgot: kwin uses by default OpenGL. Also me.
Comment 10 Andreas Cord-Landwehr 2009-03-14 03:43:38 UTC
Created attachment 23846 [details]
Valgrind run 2: valgrind output
Comment 11 Andreas Cord-Landwehr 2009-03-14 03:44:13 UTC
Created attachment 23847 [details]
Valgrind run 2: according Xorg.log
Comment 12 Andreas Cord-Landwehr 2009-03-14 03:44:39 UTC
Created attachment 23848 [details]
Valgrind run 2: according .xsession
Comment 13 Arnaud 2009-03-16 09:17:04 UTC
Hello,
I have the same problem with my radeon 9600 when using compiz with kde 3.5.10 :
- with driver 6.11.0 I have had random freezes of the screen about once a day, each time during the fading in of a new window or the fading out of a closed window. The machine was still reachable with ssh but xorg process was unkillable, which required a reboot. I don't remember having had real crashes.
- since I updated to driver 6.12.0, i have had one crash, in the same situation, and the server was restarted by kdm. But there was no backtrace info in /var/log/Xorg.log.old

I will try to provide more information next time this happen.
Comment 14 Michel Dänzer 2009-03-17 05:05:33 UTC
(In reply to comment #9)
> Oh, I forgot: kwin uses by default OpenGL. Also me.

Does the problem also happen with the XRender backend?
Comment 15 Andreas Cord-Landwehr 2009-03-18 14:32:10 UTC
It seems not to crash when using Xrender or effects switched off. But it is not absolutely for sure, because it is hard to reproduce a crash.
If needed I can try to get a valgrind run when using Xrender as comparison.
Comment 16 Michel Dänzer 2009-03-19 01:40:11 UTC
The valgrind error below looks like the most likely culprit at this point. Apparently the Mesa r300 driver is accessing texture object memory after it's been freed. I'm not sure if it's the responsibility of the driver or the Mesa core to prevent this, assuming the driver for now.

==27872== Invalid read of size 4
==27872==    at 0x7A7F4BB: r300UpdateTexture (r300_texstate.c:576)
==27872==    by 0x7A7F5A8: r300UpdateTextureState (r300_texstate.c:648)
==27872==    by 0x7A782B9: r300UpdateShaderStates (r300_state.c:2685)
==27872==    by 0x7A7B6D2: r300RunRender (r300_render.c:314)
==27872==    by 0x7B135F3: _tnl_run_pipeline (t_pipeline.c:158)
==27872==    by 0x7B13B64: _tnl_draw_prims (t_draw.c:402)
==27872==    by 0x7B0C823: vbo_exec_vtx_flush (vbo_exec_draw.c:251)
==27872==    by 0x7B07F27: vbo_exec_FlushVertices (vbo_exec_api.c:751)
==27872==    by 0x7B98460: _mesa_PopAttrib (attrib.c:862)
==27872==    by 0x5922AD0: __glXDisp_PopAttrib (indirect_dispatch.c:1445)
==27872==    by 0x594C77E: __glXDisp_Render (glxcmds.c:1783)
==27872==    by 0x5951079: __glXDispatch (glxext.c:523)
==27872==  Address 0x4ec6e00 is 24 bytes inside a block of size 1,572 free'd
==27872==    at 0x4024E3A: free (vg_replace_malloc.c:323)
==27872==    by 0x7AD01AC: _mesa_free (imports.c:85)
==27872==    by 0x7A60767: driDestroyTextureObject (texmem.c:353)
==27872==    by 0x7A7E26D: r300DeleteTexture (r300_tex.c:1000)
==27872==    by 0x7AF2F00: _mesa_reference_texobj (texobj.c:317)
==27872==    by 0x7AF45A0: _mesa_DeleteTextures (texobj.c:852)
==27872==    by 0x5926692: __glXDisp_DeleteTextures (indirect_dispatch.c:2848)
==27872==    by 0x5951079: __glXDispatch (glxext.c:523)
==27872==    by 0x808C51E: Dispatch (dispatch.c:437)
==27872==    by 0x80716F4: main (main.c:397)
Comment 17 Andreas Cord-Landwehr 2009-04-24 00:41:38 UTC
Some additional information: I recently upgraded libgl1-mesa-dri, libgl1-mesa-glx, libglu1-mesa to version 7.4. But the above described bugs still exist.
Comment 18 Michel Dänzer 2009-04-28 01:52:19 UTC
Created attachment 25219 [details] [review]
Increase reference count of texture objects referenced in current hardware state

Does this patch help?
Comment 19 Andreas Cord-Landwehr 2009-05-17 08:09:04 UTC
Hello, during the last 7 days I worked with the patched version and no crash occured. So, I'm pretty shure that the bug is fixed. The next days I will try to convince myself by a valgrind run, but everything seems fine.

Thanks to all,
   Andreas

PS: instead of the patch I used the current debian package, that contain the patch.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.