Bug 32803

Summary: *** glibc detected *** /usr/bin/Xorg: free(): invalid pointer: 0x0000000000d48bf0 ***
Product: xorg Reporter: Jeff Mahoney <jeffm>
Component: Server/Acceleration/EXAAssignee: Xorg Project Team <xorg-team>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xorg.0.log for config info
none
Crash captured in gdm log on radeon
none
Log from Xorg started with valgrind
none
GDB log during crash
none
Pad size of system memory copy for 1x1 pixmaps none

Description Jeff Mahoney 2011-01-03 13:02:16 UTC
Created attachment 41600 [details]
Xorg.0.log for config info

I'm running openSUSE Factory with Xorg 7.6_1.9.3-99.1.x86_64 from the X11:Xorg repository. The kernel is the Kernel-of-the-Day kernel, version 2.6.37-rc8-desktop (openSUSE master git revision 2948ffa9)

I've run into these crash-during-free messages on both nouveau and radeon. I reported some of the crashes in Novell bug 652523[1], which was opened here as bug 32453. That report was closed, but I suspect that even though we were both experiencing glibc malloc check abort()'s, they probably have different root causes (as mine still exist).

The gist is that X will crash pretty frequently on me. It was *really* reliable when trying to play video on Nouveau with totem, but I'm also running into it fairly frequently when closing large apps (like a runaway firefox) on radeon.

I found that one way I could try to avoid the crashes was to start GDM with the following environment variables, which get passed to Xorg when it starts up. Then the glibc abort messages get caught in the gdm log.

export MALLOC_PERTURB_=69
export MALLOC_CHECK_=3
export LIBC_FATAL_STDERR_=1

[1] https://bugzilla.novell.com/show_bug.cgi?id=652523
Comment 1 Jeff Mahoney 2011-01-03 13:03:43 UTC
Created attachment 41601 [details]
Crash captured in gdm log on radeon

This is the crash as I experience it on radeon.
Comment 2 Jeff Mahoney 2011-01-03 13:05:33 UTC
Additional logs, including the ones from my nouveau crashes are still available in the Novell bug. I can upload them here if needed but I stole the disk out of the Nouveau machine to put into the radeon machine and the logs have rotated out.
Comment 3 Michel Dänzer 2011-01-04 02:18:04 UTC
Can you get a full gdb backtrace from the free() failure(s)? Though as you seem to get them all over the place, the problem might be memory corruption, in which case something like running the X server in valgrind might be in order.
Comment 4 Jeff Mahoney 2011-01-04 07:14:07 UTC
Ok, sure. I've set up gdb to capture the full trace. I agree that it's not likely to be helpful. I'll get valgrind set up for the next time it crashes.
Comment 5 Jeff Mahoney 2011-01-04 08:03:06 UTC
Created attachment 41623 [details]
Log from Xorg started with valgrind

Here's the log captured from Xorg starting up with valgrind. Unfortunately, it doesn't capture the problem because the screen was actually blank with no pointer. The log does indicate several issues that it caught while starting up:
==1206== Syscall param ioctl(generic) points to unaddressable byte(s)
==1206==    at 0x65DAA47: ioctl (syscall-template.S:82)
==1206==    by 0x514DBE: linuxMapPci (linuxPci.c:310)
==1206==    by 0x515262: xf86MapLegacyIO (linuxPci.c:447)
==1206==    by 0x46C91A: xf86ClaimPciSlot (xf86pciBus.c:275)
==1206==    by 0x46CB5E: xf86PciProbeDev (xf86pciBus.c:566)
==1206==    by 0x518423: xf86CallDriverProbe (xf86Bus.c:86)
==1206==    by 0x518CBE: xf86BusConfig (xf86Bus.c:130)
==1206==    by 0x46B0AB: InitOutput (xf86Init.c:507)
==1206==    by 0x4259AC: main (main.c:209)
==1206==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==1206== ==1206== Syscall param ioctl(generic) points to uninitialised byte(s)
==1206==    at 0x65DAA47: ioctl (syscall-template.S:82)
==1206==    by 0x81FD917: drmIoctl (in /usr/lib64/libdrm.so.2.4.0)
==1206==    by 0x82010E4: drmModeGetCrtc (in /usr/lib64/libdrm.so.2.4.0)
==1206==    by 0x8CFAB49: ??? (in /usr/lib64/xorg/modules/drivers/radeon_drv.so)
==1206==    by 0x8CF7771: ??? (in /usr/lib64/xorg/modules/drivers/radeon_drv.so)
==1206==    by 0x46B1A1: InitOutput (xf86Init.c:540)
==1206==    by 0x4259AC: main (main.c:209)
==1206==  Address 0x7ff000760 is on thread 1's stack
==1206==    at 0x65DAA47: ioctl (syscall-template.S:82)
==1206==    by 0x81FD917: drmIoctl (in /usr/lib64/libdrm.so.2.4.0)
==1206==    by 0x82017E9: drmModeGetProperty (in /usr/lib64/libdrm.so.2.4.0)
==1206==    by 0x8CFAFD0: ??? (in /usr/lib64/xorg/modules/drivers/radeon_drv.so)
==1206==    by 0x8CF7771: ??? (in /usr/lib64/xorg/modules/drivers/radeon_drv.so)
==1206==    by 0x46B1A1: InitOutput (xf86Init.c:540)
==1206==    by 0x4259AC: main (main.c:209)
==1206==  Address 0x7ff000788 is on thread 1's stack
==1206== Syscall param ioctl(generic) points to uninitialised byte(s)
==1206==    at 0x65DAA47: ioctl (syscall-template.S:82)
==1206==    by 0x81FD917: drmIoctl (in /usr/lib64/libdrm.so.2.4.0)
==1206==    by 0x820133B: drmModeGetEncoder (in /usr/lib64/libdrm.so.2.4.0)
==1206==    by 0x8CFAC21: ??? (in /usr/lib64/xorg/modules/drivers/radeon_drv.so)
==1206==    by 0x8CF7771: ??? (in /usr/lib64/xorg/modules/drivers/radeon_drv.so)
==1206==    by 0x46B1A1: InitOutput (xf86Init.c:540)
==1206==    by 0x4259AC: main (main.c:209)
==1206==  Address 0x7ff0007b8 is on thread 1's stack

==1206== Syscall param ioctl(generic) points to uninitialised byte(s)
==1206==    at 0x65DAA47: ioctl (syscall-template.S:82)
==1206==    by 0x81FD917: drmIoctl (in /usr/lib64/libdrm.so.2.4.0)
==1206==    by 0x82017E9: drmModeGetProperty (in /usr/lib64/libdrm.so.2.4.0)
==1206==    by 0x8CF9E7F: ??? (in /usr/lib64/xorg/modules/drivers/radeon_drv.so)
==1206==    by 0x4897E6: xf86ProbeOutputModes (xf86Crtc.c:1614)
==1206==    by 0x48B2C7: xf86InitialConfiguration (xf86Crtc.c:2369)
==1206==    by 0x8CFADD7: ??? (in /usr/lib64/xorg/modules/drivers/radeon_drv.so)
==1206==    by 0x8CF7771: ??? (in /usr/lib64/xorg/modules/drivers/radeon_drv.so)
==1206==    by 0x46B1A1: InitOutput (xf86Init.c:540)
==1206==    by 0x4259AC: main (main.c:209)
==1206==  Address 0x7ff0003d8 is on thread 1's stack

(there are more in the log)
Comment 6 Jeff Mahoney 2011-01-04 08:12:13 UTC
Created attachment 41624 [details]
GDB log during crash
Comment 7 Michel Dänzer 2011-01-04 08:43:11 UTC
(In reply to comment #4)
> Ok, sure. I've set up gdb to capture the full trace. I agree that it's not
> likely to be helpful.

Thanks, indeed. FWIW though it might be interesting to print *pPixmap and *pExaPixmap from the exaDestroyPixmap_mixed frame.


(In reply to comment #5)
> Here's the log captured from Xorg starting up with valgrind. Unfortunately, it
> doesn't capture the problem because the screen was actually blank with no
> pointer.

Beware that that's the X server's default startup behaviour these days... How exactly did you start the server and clients?

> The log does indicate several issues that it caught while starting up:

Valgrind's complaints related to ioctls are usually false positives, because it doesn't know exactly what the ioctls do. So I'm afraid we still don't have a smoking gun.
Comment 8 Jeff Mahoney 2011-01-05 09:28:46 UTC
(gdb) print *pPixmap
$6 = {drawable = {type = 1 '\001', class = 0 '\000', depth = 8 '\b', 
    bitsPerPixel = 8 '\b', id = 0, x = 0, y = 0, width = 1, height = 1, 
    pScreen = 0x81dd10, serialNumber = 177107}, devPrivates = 0xd67e00, 
  refcnt = 1, devKind = 4, devPrivate = {ptr = 0x0, val = 0, uval = 0, 
    fptr = 0}, screen_x = 0, screen_y = 0, usage_hint = 0}
(gdb) print *pExaPixmap
$7 = {area = 0x0, score = 0, use_gpu_copy = 0, sys_ptr = 0xd67c50 "", 
  sys_pitch = 4, fb_ptr = 0x0, fb_pitch = 256, fb_size = 0, accel_blocked = 0, 
  pDamage = 0xd67ea0, validSys = {extents = {x1 = 0, y1 = 0, x2 = 1, y2 = 1}, 
    data = 0x0}, validFB = {extents = {x1 = 0, y1 = 0, x2 = 1, y2 = 1}, 
    data = 0x0}, driverPriv = 0x0}

I also tried using valgrind with --sim-hints=lax-ioctls to get around the ioctl mess but it still didn't come up.
Comment 9 Michel Dänzer 2011-01-05 09:48:41 UTC
Created attachment 41674 [details] [review]
Pad size of system memory copy for 1x1 pixmaps

Would this patch happen to help?
Comment 10 Jeff Mahoney 2011-01-05 11:46:22 UTC
(In reply to comment #9)
> Created an attachment (id=41674) [details]
> Pad size of system memory copy for 1x1 pixmaps
> 
> Would this patch happen to help?

Yep. That seems to have fixed it for me. I used to be able to cause the crash pretty reliably by closing firefox with a bunch of tabs open and that doesn't do it anymore.
Comment 11 Michel Dänzer 2011-01-06 10:47:29 UTC
Fix has landed in Git master and been nominated for the stable branches, thanks for the report and testing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.