17895 – Radeon XPRESS 200M (RC410) - xorg crash after prolonged exercise of compiz

Bug 17895 - Radeon XPRESS 200M (RC410) - xorg crash after prolonged exercise of compiz

Summary: Radeon XPRESS 200M (RC410) - xorg crash after prolonged exercise of compiz

Status:	RESOLVED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Mesa core (show other bugs)
Version:	unspecified
Hardware:	x86 (IA32) Linux (All)

Importance:	medium major
Assignee:	mesa-dev
QA Contact:

URL:
Whiteboard:
Keywords:

Duplicates (3):	15809 17829 20673 (view as bug list)
Depends on:
Blocks:

Reported:	2008-10-03 15:52 UTC by Alex Villacís Lasso
Modified:	2009-08-21 09:36 UTC (History)
CC List:	4 users (show)

See Also:
i915 platform:
i915 features:

Attachments
Xorg.0.log after X server crashes (35.06 KB, text/plain) 2008-10-03 15:52 UTC, Alex Villacís Lasso	Details
Another log displaying the same crash (32.88 KB, text/plain) 2008-10-17 11:07 UTC, Alex Villacís Lasso	Details
Full gdb log until crash (18.92 KB, text/plain) 2008-10-27 15:53 UTC, Alex Villacís Lasso	Details
Crash with week-old git tree (32.51 KB, text/plain) 2008-12-04 12:28 UTC, Alex Villacís Lasso	Details
Crash with git tree from December 18, 2008 (58.03 KB, text/plain) 2008-12-22 09:12 UTC, Alex Villacís Lasso	Details
GDB log with more detailed backtrace (9.45 KB, text/plain) 2009-01-06 09:20 UTC, Alex Villacís Lasso	Details
Xorg.0.log after X server crashes (58.03 KB, text/plain) 2009-01-06 09:21 UTC, Alex Villacís Lasso	Details
Debug patch with checks for invalid pointers (1.84 KB, patch) 2009-01-20 15:54 UTC, Alex Villacís Lasso	Details \| Splinter Review
New log file with debug patch, after crash (57.97 KB, text/plain) 2009-01-20 15:56 UTC, Alex Villacís Lasso	Details
logfile after the crash (33.94 KB, text/plain) 2009-08-19 11:15 UTC, flo	Details
View All

Description Alex Villacís Lasso 2008-10-03 15:52:25 UTC

Created attachment 19361 [details]
Xorg.0.log after X server crashes

Linux 2.6.26.3-14.fc8 Fedora 8

All code was checked out with git_xorg.sh script as of 2008-08-22. Kernel
driver radeon was compiled from current git too. See bug #17723 for description of setup.

Ensured that at least 64Mb of video memory are available, else I hit bug #17723 . 

Compiz is enabled, and it works, for a while. After a few hours, including at least one change of emerald theme, the X server crashes with a SIGSEGV.

Backtrace:
0: /usr/bin/X(xf86SigHandler+0x79) [0x80b0ed9]
1: [0x110400]
2: /home/alex/xserver/lib/dri/r300_dri.so(_mesa_update_texture+0x251) [0x3ecc01]
3: /home/alex/xserver/lib/dri/r300_dri.so(_mesa_update_state_locked+0x801) [0x3d
73e1]
4: /home/alex/xserver/lib/dri/r300_dri.so(_mesa_update_state+0x2a) [0x3d753a]
5: /home/alex/xserver/lib/dri/r300_dri.so(_mesa_GetIntegerv+0x258) [0x4b30f8]
6: /home/alex/xserver/lib/xorg/modules/extensions//libglx.so [0x2529f0]
7: /home/alex/xserver/lib/xorg/modules/extensions//libglx.so [0x245470]
8: /home/alex/xserver/lib/xorg/modules/extensions//libglx.so [0x24412a]
9: /home/alex/xserver/lib/xorg/modules/extensions//libglx.so [0x248966]
10: /usr/bin/X(Dispatch+0x32f) [0x8086eaf]
11: /usr/bin/X(main+0x3da) [0x806c79a]
12: /lib/libc.so.6(__libc_start_main+0xe0) [0x978390]
13: /usr/bin/X [0x806bc11]

Fatal server error:
Caught signal 11.  Server aborting


dmesg shows no sign of failed message as shown in bug #17723. Xorg.0.log attached.

It seems at least one change of emerald theme is required. However, the emerald theme change itself does not trigger the crash. It might occur some time later, or not at all. Right before the crash of this (attached) log, I was exercising a compiz plugin, and then tried to call a dialog on gedit.

Comment 1 Alex Villacís Lasso 2008-10-17 11:07:39 UTC

Created attachment 19724 [details]
Another log displaying the same crash

The crashes are occurring again. Log attached.

Comment 2 Michel Dänzer 2008-10-21 10:09:55 UTC

Can you get a backtrace with gdb?

Comment 3 Alex Villacís Lasso 2008-10-27 15:53:12 UTC

(In reply to comment #2)
> Can you get a backtrace with gdb?
> 

I tried attaching gdb to a running X session, and directing logging to a file. After a crash, I got this:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1209039088 (LWP 6417)]
0x081167ea in miSpriteSourceValidate (pDrawable=0xd0c5d58, x=1, y=134693389, 
    width=136110148, height=187747912) at misprite.c:423
423         SCREEN_PROLOGUE (pScreen, SourceValidate);
Detaching from program: /home/alex/xserver/bin/Xorg, process 6417
Quitting: ptrace: No such process.

Comment 4 Alex Villacís Lasso 2008-10-27 15:53:57 UTC

Created attachment 19890 [details]
Full gdb log until crash

Comment 5 Michel Dänzer 2008-10-28 01:10:22 UTC

(In reply to comment #3)
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread -1209039088 (LWP 6417)]
> 0x081167ea in miSpriteSourceValidate (pDrawable=0xd0c5d58, x=1, y=134693389, 
>     width=136110148, height=187747912) at misprite.c:423
> 423         SCREEN_PROLOGUE (pScreen, SourceValidate);
> Detaching from program: /home/alex/xserver/bin/Xorg, process 6417
> Quitting: ptrace: No such process.

Did you explicitly detach after the SIGSEGV, or did that happen automatically? If you get a prompt after the SIGSEGV, enter

bt full

to get a detailed backtrace.

Comment 6 Alex Villacís Lasso 2008-12-04 12:28:56 UTC

Created attachment 20818 [details]
Crash with week-old git tree

This is a crash from a week-old git compile.

I must add that almost all the times I have seen the crash, emerald quits first, without leaving any backtrace, and has to be restarted.

Comment 7 Alex Villacís Lasso 2008-12-22 09:12:57 UTC

Created attachment 21406 [details]
Crash with git tree from December 18, 2008

Comment 8 Alex Villacís Lasso 2009-01-06 09:20:39 UTC

Created attachment 21726 [details]
GDB log with more detailed backtrace

This time I managed to attach to the X server via ssh before the crash and log a full backtrace. Backtrace attached.

Comment 9 Alex Villacís Lasso 2009-01-06 09:21:38 UTC

Created attachment 21727 [details]
Xorg.0.log after X server crashes

Log file from the crash with GDB and detailed trace. Apart from this, it is identical to the previous log.

Comment 10 Alex Villacís Lasso 2009-01-20 15:54:31 UTC

Created attachment 22115 [details] [review]
Debug patch with checks for invalid pointers

I tried making this patch to check whether this fixes the bug, or at least shows any messages. Now I am getting a different crash.

Comment 11 Alex Villacís Lasso 2009-01-20 15:56:39 UTC

Created attachment 22116 [details]
New log file with debug patch, after crash

This is the log file after the crash, with the debug patch applied.

Is there anything else I should be doing? The previous patch was made from an educated guess at where the first crash was. What do you think of this patch?

Comment 12 Brian Paul 2009-01-21 07:13:21 UTC

So, texUnit->CurrentRect is NULL.  That should never happen (unless the context is being torn down/deleted).  The "Current" texture object pointers should never be null.  They should either point to the texture that the user bound with glBindTexture() or should point to the default texture objects in the ctx->Shared state.

I'm afraid the patch is just hiding the real issue elsewhere.

This may be a reference counting bug somewhere.  I could add some assertions to try to narrow it down.  I'll check them into git ASAP.  I probably won't hold up the 7.3 release though unless we can make progress on this today.

Comment 13 Michel Dänzer 2009-04-28 00:43:19 UTC

*** Bug 17829 has been marked as a duplicate of this bug. ***

Comment 14 Michel Dänzer 2009-04-28 00:45:39 UTC

*** Bug 15809 has been marked as a duplicate of this bug. ***

Comment 15 Michel Dänzer 2009-04-28 00:46:17 UTC

*** Bug 20673 has been marked as a duplicate of this bug. ***

Comment 16 Michel Dänzer 2009-04-28 00:49:55 UTC

(In reply to comment #12)
> This may be a reference counting bug somewhere.  I could add some assertions to
> try to narrow it down.  I'll check them into git ASAP.

Are those assertions in now? I suppose distribution binaries will usually be built with assertions disabled though... Maybe somebody could try catching the problem with a gdb watchpoint or something like that.

Comment 17 martin 2009-04-28 01:23:33 UTC

Really easy repro steps for this bug is:

1. boot ubuntu jaunty final version on one of the affected systems
2. run "sudo apt-get install compizconfig-settings-manager"
3. launch the settings manager from system::preferences and activate the ring
switcher plugin
4. hold down SUPER+TAB so that the ring spins around full speed
5. xorg SEGV after like 2-3 seconds tops

I have triggered the bug using these steps on the following cards:

01:00.0 VGA compatible controller [0300]: ATI Technologies Inc RV380 [Radeon
X600 (PCIE)] [1002:5b62]
01:00.0 VGA compatible controller [0300]: ATI Technologies Inc RV350 AP [Radeon
9600] [1002:4150]

PS. I'd be willing to try patches with assertions or whatever and send back results.

Comment 18 Jaro Pitkävirta 2009-04-28 11:16:48 UTC

I'm also experiencing this bug.

My card is: 

02:00.0 VGA compatible controller [0300]: ATI Technologies Inc R430 [Radeon X800 (PCIE)] [1002:554f]

Backtrace:
0: /usr/X11R6/bin/X(xorg_backtrace+0x3b) [0x813518b]
1: /usr/X11R6/bin/X(xf86SigHandler+0x55) [0x80c7be5]
2: [0xb7f22400]
3: /usr/lib/dri/r300_dri.so(_mesa_update_state_locked+0x832) [0xa53a6152]
4: /usr/lib/dri/r300_dri.so(_mesa_update_state+0x2a) [0xa53a628a]
5: /usr/lib/dri/r300_dri.so(_mesa_GetIntegerv+0x278) [0xa54780c8]
6: /usr/lib/xorg/modules/extensions//libglx.so [0xb78a1132]
7: /usr/lib/xorg/modules/extensions//libglx.so [0xb78932e8]
8: /usr/lib/xorg/modules/extensions//libglx.so [0xb78921a7]
9: /usr/lib/xorg/modules/extensions//libglx.so [0xb7896d6a]
10: /usr/X11R6/bin/X(Dispatch+0x33f) [0x808d57f]
11: /usr/X11R6/bin/X(main+0x3bd) [0x80722ed]
12: /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe5) [0xb7af5775]
13: /usr/X11R6/bin/X [0x80717a1]
Saw signal 11.  Server aborting.

Comment 19 Michel Dänzer 2009-04-29 08:29:31 UTC

It's a long shot, but can someone try the Mesa r300 driver patch attached to bug 20539 to see if it helps for this? It should fix a case of using memory after free, which could theoretically cause all sorts of funny behaviour...

Comment 20 Xavier Bestel 2009-04-29 08:39:05 UTC

Just out of curiosity, do you have an idea when this bug was introduced ?

Comment 21 martin 2009-04-29 15:54:49 UTC

I've tried the patch suggested by Michel Dänzer in comment #19.

For me it sort of worked actually but I accidently ran "sudo dpkg -i *.deb" on mesa which borked by system a bit. Afterwards I think I managed to restore my system (at least I could repro the bug again and then I installed only the patched dri and glx packages). One strange thing remains on my system after this hickup though and that is that "glxinfo | grep direct" now says "no" (if anyway knows how to fix this please tell me).

Also, I uploaded my x86/x64 debs of ubuntu jaunty's mesa 7.4 with michel dänzers patch added, to this location (if someone else is using ubuntu jaunty maybe you can use this for testing as well?):
http://temp.minimum.se/mesa_with_fixed_ati_bug/

Comment 22 Michel Dänzer 2009-04-30 01:29:34 UTC

(In reply to comment #21)
> For me it sort of worked actually [...]

What does this mean exactly? The steps from comment #17 no longer cause a crash?


> One strange thing remains on my system after this hickup though and that is
> that "glxinfo | grep direct" now says "no" (if anyway knows how to fix this
> please tell me).

LIBGL_DEBUG=verbose glxinfo

should give more information.

Comment 23 martin 2009-04-30 02:40:26 UTC

Yes, I can spin the ring switcher full speed for 30 seconds straight without crashes. I also tried lots of other things like spinning the cube fast and what not and with this fix I was unable to crash. The only thing that made me unsure was the "direct rendering: no" because I was afraid I had misconfigured my system in such a way that I was no longer hitting the same execution path in the code (and thus just wasn't seeing the bug anymore).

If I run glxinfo with verbose this is what it says:
direct rendering: No (LIBGL_ALWAYS_INDIRECT set)
Also if I do "env | grep LIBGL" I can see "LIBGL_ALWAYS_INDIRECT=1" but I have no idea where this was set and by what file/program etc? As far as I know I have not set this myself.

Actually after googling around I found this bug (marked INVALID):
https://bugs.launchpad.net/ubuntu/+source/desktop-effects/+bug/137388
(this bug describes by issue pretty accurately. If I launch a gnome-terminal and then do "glxinfo | grep direct" it says "direct rendering: No (LIBGL_ALWAYS_INDIRECT set)" however in the same session on the same computer if I press ALT-F2 and type "xterm" and then do "glxinfo | grep direct" then it says "direct rendering: yes".

Comment 24 martin 2009-04-30 02:44:02 UTC

Actually, if I put a gnome-terminal launcher onto the GNOME panel and then launch it then I do get "glxinfo | grep direct" displaying "yes". However, if I launch gnome-terminal using my custom keybinding "CTRL-ALT-A" then "glxinfo | grep direct" prints "No (LIBGL_ALWAYS_INDIRECT set)". I understand why this happens now, it's because that keybinding is something I configured in gconf under /apps/metacity/global_keybindings and when I use compiz basically I think compiz reuses the same keybindings read from the gconf of metacity so basically compiz is the process that is the parent of my gnome-terminal when I launch it with CTRL-ALT-A and of course compiz sets LIBGL_ALWAYS_INDIRECT for it's own process when starting up.

God that made really confused for a while.

Comment 25 martin 2009-04-30 03:54:55 UTC

@Dänzer, I've asked a bunch of other ubuntu users if they could try my DEBs as well so see if that fixes the bug for them. See LP bug 368049:
https://bugs.launchpad.net/xserver-xorg-driver-ati/+bug/368049

So far, at least one other ubuntu user (who has a "Radeon X700 (PCIE)" with RV410 chipset) has confirmed that this fixes the bug on this machine.

Comment 26 Michel Dänzer 2009-04-30 05:17:00 UTC

Fix pushed to Git master and mesa_7_4_branch, thanks for testing.

Comment 27 Xavier Bestel 2009-05-01 01:52:40 UTC

I tried your debs on my debian/sid system. They apparently fix a problem where constantly resizing a gnome-terminal could hang the server, but they don't fix server hangs when running OpenGL apps, like the Carousel screensaver (see bug #9252).

Comment 28 flo 2009-08-19 11:14:19 UTC

i'm experiencing a similar problem. however it's not neccessary to switch the emerald-theme, but i have to switch between the window managers or at least reload them (compiz has to be involved: either switch from compiz, switch to 
compiz or reload compiz). i am not able to reprocuce it, it happens randomly

i have a ATI radeon mobility 9800 and i'm using radeon V6.12.2 and xorg 1.6.3
i'll attach my x-logfile

maybe the following is helpful, maybe completely irrelevant:
when i don't load the module dri2, compiz starts, but the screen is completely white, only the mouse pointer is visible. the system stays usable, i can click around and it reacts, but invisible. when i interrupt (ctrl+C) compiz again, the white disappears and my system is normally usable again.
when i set the server flag "AIGLX" to "off" compiz doesn't start

Comment 29 flo 2009-08-19 11:15:49 UTC

Created attachment 28788 [details]
logfile after the crash

that's the log about the crash with radeon 6.12.2 and xorg 1.6.3

Comment 30 Michel Dänzer 2009-08-21 09:36:14 UTC

(In reply to comment #28)
> i'm experiencing a similar problem. however it's not neccessary to switch the
> emerald-theme,

Please always file a new bug unless it's 100% certain it's the same one (which is unlikely here given the above). It's easier to mark separate reports as duplicates than to untangle information about several issues in a single report.

> but i have to switch between the window managers or at least reload them
> (compiz has to be involved: either switch from compiz, switch to compiz or
> reload compiz).

That sounds like an X server issue which has been fixed in http://cgit.freedesktop.org/xorg/xserver/commit/?id=2075d4bf9e53b8baef0b919da6c44771220cd4a5 and http://cgit.freedesktop.org/xorg/xserver/commit/?id=3020b1d43e34fca08cd51f7c7c8ed51497d49ef3 .

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.