Created attachment 19361 [details] Xorg.0.log after X server crashes Linux 2.6.26.3-14.fc8 Fedora 8 All code was checked out with git_xorg.sh script as of 2008-08-22. Kernel driver radeon was compiled from current git too. See bug #17723 for description of setup. Ensured that at least 64Mb of video memory are available, else I hit bug #17723 . Compiz is enabled, and it works, for a while. After a few hours, including at least one change of emerald theme, the X server crashes with a SIGSEGV. Backtrace: 0: /usr/bin/X(xf86SigHandler+0x79) [0x80b0ed9] 1: [0x110400] 2: /home/alex/xserver/lib/dri/r300_dri.so(_mesa_update_texture+0x251) [0x3ecc01] 3: /home/alex/xserver/lib/dri/r300_dri.so(_mesa_update_state_locked+0x801) [0x3d 73e1] 4: /home/alex/xserver/lib/dri/r300_dri.so(_mesa_update_state+0x2a) [0x3d753a] 5: /home/alex/xserver/lib/dri/r300_dri.so(_mesa_GetIntegerv+0x258) [0x4b30f8] 6: /home/alex/xserver/lib/xorg/modules/extensions//libglx.so [0x2529f0] 7: /home/alex/xserver/lib/xorg/modules/extensions//libglx.so [0x245470] 8: /home/alex/xserver/lib/xorg/modules/extensions//libglx.so [0x24412a] 9: /home/alex/xserver/lib/xorg/modules/extensions//libglx.so [0x248966] 10: /usr/bin/X(Dispatch+0x32f) [0x8086eaf] 11: /usr/bin/X(main+0x3da) [0x806c79a] 12: /lib/libc.so.6(__libc_start_main+0xe0) [0x978390] 13: /usr/bin/X [0x806bc11] Fatal server error: Caught signal 11. Server aborting dmesg shows no sign of failed message as shown in bug #17723. Xorg.0.log attached. It seems at least one change of emerald theme is required. However, the emerald theme change itself does not trigger the crash. It might occur some time later, or not at all. Right before the crash of this (attached) log, I was exercising a compiz plugin, and then tried to call a dialog on gedit.
Created attachment 19724 [details] Another log displaying the same crash The crashes are occurring again. Log attached.
Can you get a backtrace with gdb?
(In reply to comment #2) > Can you get a backtrace with gdb? > I tried attaching gdb to a running X session, and directing logging to a file. After a crash, I got this: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1209039088 (LWP 6417)] 0x081167ea in miSpriteSourceValidate (pDrawable=0xd0c5d58, x=1, y=134693389, width=136110148, height=187747912) at misprite.c:423 423 SCREEN_PROLOGUE (pScreen, SourceValidate); Detaching from program: /home/alex/xserver/bin/Xorg, process 6417 Quitting: ptrace: No such process.
Created attachment 19890 [details] Full gdb log until crash
(In reply to comment #3) > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread -1209039088 (LWP 6417)] > 0x081167ea in miSpriteSourceValidate (pDrawable=0xd0c5d58, x=1, y=134693389, > width=136110148, height=187747912) at misprite.c:423 > 423 SCREEN_PROLOGUE (pScreen, SourceValidate); > Detaching from program: /home/alex/xserver/bin/Xorg, process 6417 > Quitting: ptrace: No such process. Did you explicitly detach after the SIGSEGV, or did that happen automatically? If you get a prompt after the SIGSEGV, enter bt full to get a detailed backtrace.
Created attachment 20818 [details] Crash with week-old git tree This is a crash from a week-old git compile. I must add that almost all the times I have seen the crash, emerald quits first, without leaving any backtrace, and has to be restarted.
Created attachment 21406 [details] Crash with git tree from December 18, 2008
Created attachment 21726 [details] GDB log with more detailed backtrace This time I managed to attach to the X server via ssh before the crash and log a full backtrace. Backtrace attached.
Created attachment 21727 [details] Xorg.0.log after X server crashes Log file from the crash with GDB and detailed trace. Apart from this, it is identical to the previous log.
Created attachment 22115 [details] [review] Debug patch with checks for invalid pointers I tried making this patch to check whether this fixes the bug, or at least shows any messages. Now I am getting a different crash.
Created attachment 22116 [details] New log file with debug patch, after crash This is the log file after the crash, with the debug patch applied. Is there anything else I should be doing? The previous patch was made from an educated guess at where the first crash was. What do you think of this patch?
So, texUnit->CurrentRect is NULL. That should never happen (unless the context is being torn down/deleted). The "Current" texture object pointers should never be null. They should either point to the texture that the user bound with glBindTexture() or should point to the default texture objects in the ctx->Shared state. I'm afraid the patch is just hiding the real issue elsewhere. This may be a reference counting bug somewhere. I could add some assertions to try to narrow it down. I'll check them into git ASAP. I probably won't hold up the 7.3 release though unless we can make progress on this today.
*** Bug 17829 has been marked as a duplicate of this bug. ***
*** Bug 15809 has been marked as a duplicate of this bug. ***
*** Bug 20673 has been marked as a duplicate of this bug. ***
(In reply to comment #12) > This may be a reference counting bug somewhere. I could add some assertions to > try to narrow it down. I'll check them into git ASAP. Are those assertions in now? I suppose distribution binaries will usually be built with assertions disabled though... Maybe somebody could try catching the problem with a gdb watchpoint or something like that.
Really easy repro steps for this bug is: 1. boot ubuntu jaunty final version on one of the affected systems 2. run "sudo apt-get install compizconfig-settings-manager" 3. launch the settings manager from system::preferences and activate the ring switcher plugin 4. hold down SUPER+TAB so that the ring spins around full speed 5. xorg SEGV after like 2-3 seconds tops I have triggered the bug using these steps on the following cards: 01:00.0 VGA compatible controller [0300]: ATI Technologies Inc RV380 [Radeon X600 (PCIE)] [1002:5b62] 01:00.0 VGA compatible controller [0300]: ATI Technologies Inc RV350 AP [Radeon 9600] [1002:4150] PS. I'd be willing to try patches with assertions or whatever and send back results.
I'm also experiencing this bug. My card is: 02:00.0 VGA compatible controller [0300]: ATI Technologies Inc R430 [Radeon X800 (PCIE)] [1002:554f] Backtrace: 0: /usr/X11R6/bin/X(xorg_backtrace+0x3b) [0x813518b] 1: /usr/X11R6/bin/X(xf86SigHandler+0x55) [0x80c7be5] 2: [0xb7f22400] 3: /usr/lib/dri/r300_dri.so(_mesa_update_state_locked+0x832) [0xa53a6152] 4: /usr/lib/dri/r300_dri.so(_mesa_update_state+0x2a) [0xa53a628a] 5: /usr/lib/dri/r300_dri.so(_mesa_GetIntegerv+0x278) [0xa54780c8] 6: /usr/lib/xorg/modules/extensions//libglx.so [0xb78a1132] 7: /usr/lib/xorg/modules/extensions//libglx.so [0xb78932e8] 8: /usr/lib/xorg/modules/extensions//libglx.so [0xb78921a7] 9: /usr/lib/xorg/modules/extensions//libglx.so [0xb7896d6a] 10: /usr/X11R6/bin/X(Dispatch+0x33f) [0x808d57f] 11: /usr/X11R6/bin/X(main+0x3bd) [0x80722ed] 12: /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe5) [0xb7af5775] 13: /usr/X11R6/bin/X [0x80717a1] Saw signal 11. Server aborting.
It's a long shot, but can someone try the Mesa r300 driver patch attached to bug 20539 to see if it helps for this? It should fix a case of using memory after free, which could theoretically cause all sorts of funny behaviour...
Just out of curiosity, do you have an idea when this bug was introduced ?
I've tried the patch suggested by Michel Dänzer in comment #19. For me it sort of worked actually but I accidently ran "sudo dpkg -i *.deb" on mesa which borked by system a bit. Afterwards I think I managed to restore my system (at least I could repro the bug again and then I installed only the patched dri and glx packages). One strange thing remains on my system after this hickup though and that is that "glxinfo | grep direct" now says "no" (if anyway knows how to fix this please tell me). Also, I uploaded my x86/x64 debs of ubuntu jaunty's mesa 7.4 with michel dänzers patch added, to this location (if someone else is using ubuntu jaunty maybe you can use this for testing as well?): http://temp.minimum.se/mesa_with_fixed_ati_bug/
(In reply to comment #21) > For me it sort of worked actually [...] What does this mean exactly? The steps from comment #17 no longer cause a crash? > One strange thing remains on my system after this hickup though and that is > that "glxinfo | grep direct" now says "no" (if anyway knows how to fix this > please tell me). LIBGL_DEBUG=verbose glxinfo should give more information.
Yes, I can spin the ring switcher full speed for 30 seconds straight without crashes. I also tried lots of other things like spinning the cube fast and what not and with this fix I was unable to crash. The only thing that made me unsure was the "direct rendering: no" because I was afraid I had misconfigured my system in such a way that I was no longer hitting the same execution path in the code (and thus just wasn't seeing the bug anymore). If I run glxinfo with verbose this is what it says: direct rendering: No (LIBGL_ALWAYS_INDIRECT set) Also if I do "env | grep LIBGL" I can see "LIBGL_ALWAYS_INDIRECT=1" but I have no idea where this was set and by what file/program etc? As far as I know I have not set this myself. Actually after googling around I found this bug (marked INVALID): https://bugs.launchpad.net/ubuntu/+source/desktop-effects/+bug/137388 (this bug describes by issue pretty accurately. If I launch a gnome-terminal and then do "glxinfo | grep direct" it says "direct rendering: No (LIBGL_ALWAYS_INDIRECT set)" however in the same session on the same computer if I press ALT-F2 and type "xterm" and then do "glxinfo | grep direct" then it says "direct rendering: yes".
Actually, if I put a gnome-terminal launcher onto the GNOME panel and then launch it then I do get "glxinfo | grep direct" displaying "yes". However, if I launch gnome-terminal using my custom keybinding "CTRL-ALT-A" then "glxinfo | grep direct" prints "No (LIBGL_ALWAYS_INDIRECT set)". I understand why this happens now, it's because that keybinding is something I configured in gconf under /apps/metacity/global_keybindings and when I use compiz basically I think compiz reuses the same keybindings read from the gconf of metacity so basically compiz is the process that is the parent of my gnome-terminal when I launch it with CTRL-ALT-A and of course compiz sets LIBGL_ALWAYS_INDIRECT for it's own process when starting up. God that made really confused for a while.
@Dänzer, I've asked a bunch of other ubuntu users if they could try my DEBs as well so see if that fixes the bug for them. See LP bug 368049: https://bugs.launchpad.net/xserver-xorg-driver-ati/+bug/368049 So far, at least one other ubuntu user (who has a "Radeon X700 (PCIE)" with RV410 chipset) has confirmed that this fixes the bug on this machine.
Fix pushed to Git master and mesa_7_4_branch, thanks for testing.
I tried your debs on my debian/sid system. They apparently fix a problem where constantly resizing a gnome-terminal could hang the server, but they don't fix server hangs when running OpenGL apps, like the Carousel screensaver (see bug #9252).
i'm experiencing a similar problem. however it's not neccessary to switch the emerald-theme, but i have to switch between the window managers or at least reload them (compiz has to be involved: either switch from compiz, switch to compiz or reload compiz). i am not able to reprocuce it, it happens randomly i have a ATI radeon mobility 9800 and i'm using radeon V6.12.2 and xorg 1.6.3 i'll attach my x-logfile maybe the following is helpful, maybe completely irrelevant: when i don't load the module dri2, compiz starts, but the screen is completely white, only the mouse pointer is visible. the system stays usable, i can click around and it reacts, but invisible. when i interrupt (ctrl+C) compiz again, the white disappears and my system is normally usable again. when i set the server flag "AIGLX" to "off" compiz doesn't start
Created attachment 28788 [details] logfile after the crash that's the log about the crash with radeon 6.12.2 and xorg 1.6.3
(In reply to comment #28) > i'm experiencing a similar problem. however it's not neccessary to switch the > emerald-theme, Please always file a new bug unless it's 100% certain it's the same one (which is unlikely here given the above). It's easier to mark separate reports as duplicates than to untangle information about several issues in a single report. > but i have to switch between the window managers or at least reload them > (compiz has to be involved: either switch from compiz, switch to compiz or > reload compiz). That sounds like an X server issue which has been fixed in http://cgit.freedesktop.org/xorg/xserver/commit/?id=2075d4bf9e53b8baef0b919da6c44771220cd4a5 and http://cgit.freedesktop.org/xorg/xserver/commit/?id=3020b1d43e34fca08cd51f7c7c8ed51497d49ef3 .
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.