Bug 27380

Summary: X segfault in miCopyRegion
Product: xorg Reporter: Bryce Harrington <bryce>
Component: Server/Acceleration/EXAAssignee: Xorg Project Team <xorg-team>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: major    
Priority: high CC: gary.pajer, jerrylamos, konstantin, wenzhuo
Version: 7.4 (2008.09)Keywords: regression
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xorg.0.log
none
dmesg.txt
none
XorgLogOld.txt
none
XorgLog.txt
none
dmesg
none
dmesg #2
none
CurrentDmesg.txt
none
XorgLogOld.txt
none
valgrind report
none
Backported fix none

Description Bryce Harrington 2010-03-30 14:25:57 UTC
Forwarding this bug from Ubuntu reporter jerrylamos:
http://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-ati/+bug/539772

[Problem]
Several users are reporting an X crash with a backtrace like this one.  It seems to happen in relation to firefox usage.  This one occurred when using AOL, another sees it when typing 5-6 characters slowly in the firefox search bar.

[Original Description]
Was entering data on AOL mail when Lucid crashed and login screen came up.

Jerry


Backtrace:
0: /usr/bin/X (xorg_backtrace+0x3b) [0x80e880b]
1: /usr/bin/X (0x8048000+0x61aed) [0x80a9aed]
2: (vdso) (__kernel_rt_sigreturn+0x0) [0x1f8410]
3: /usr/lib/xorg/modules/drivers/radeon_drv.so (0x50e000+0xb20f9) [0x5c00f9]
4: /usr/lib/xorg/modules/libexa.so (0xfe4000+0x9490) [0xfed490]
5: /usr/lib/xorg/modules/libexa.so (0xfe4000+0x9558) [0xfed558]
6: /usr/bin/X (miCopyRegion+0x21b) [0x819ac9b]
7: /usr/bin/X (miDoCopy+0x44d) [0x819b1bd]
8: /usr/lib/xorg/modules/libexa.so (0xfe4000+0x7a1a) [0xfeba1a]
9: /usr/bin/X (0x8048000+0xd9e83) [0x8121e83]
10: /usr/bin/X (0x8048000+0x28dd5) [0x8070dd5]
11: /usr/bin/X (0x8048000+0x2a457) [0x8072457]
12: /usr/bin/X (0x8048000+0x1ed3a) [0x8066d3a]
13: /lib/tls/i686/cmov/libc.so.6 (__libc_start_main+0xe6) [0x280bd6]
14: /usr/bin/X (0x8048000+0x1e921) [0x8066921]
Segmentation fault at address 0x6c4



Architecture: i386
Date: Tue Mar 16 14:40:06 2010
DistroRelease: Ubuntu 10.04
DkmsStatus: Error: [Errno 2] No such file or directory
InstallationMedia: Ubuntu 10.04 "Lucid Lynx" - Alpha i386 (20091214)
Lsusb:
 Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 003 Device 002: ID 0a81:0205 Chesen Electronics Corp. PS/2 Keyboard+Mouse Adapter
 Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: IBM 23736U0
Package: xorg 1:7.5+3ubuntu1
PccardctlIdent:
 Socket 0:
   no product info available
 Socket 1:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
 Socket 1:
   no card
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-16-generic root=UUID=19c58970-42dc-43e0-bf4f-11e52dbea1d8 ro quiet splash
ProcEnviron:
 LANG=en_US.utf8
 ProcVersionSignature: Ubuntu 2.6.32-16.25-generic
SourcePackage: xorg
Uname: Linux 2.6.32-16-generic i686
dmi.bios.date: 10/13/2005
dmi.bios.vendor: IBM
dmi.bios.version: 1RETDNWW (3.19 )
dmi.board.name: 23736U0
dmi.board.vendor: IBM
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: IBM
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnIBM:bvr1RETDNWW(3.19):bd10/13/2005:svnIBM:pn23736U0:pvrThinkPadT40:rvnIBM:rn23736U0:rvrNotAvailable:cvnIBM:ct10:cvrNotAvailable:
dmi.product.name: 23736U0
dmi.product.version: ThinkPad T40
dmi.sys.vendor: IBM
glxinfo: Error: [Errno 2] No such file or directory
system:
  codename:           lucid
 architecture:       i686
 kernel:             2.6.32-16-generic
Comment 1 Bryce Harrington 2010-03-30 14:29:37 UTC
00:00.0 Host bridge [0600]: Intel Corporation 82855PM Processor to I/O Controller [8086:3340] (rev 03)
	Subsystem: IBM Device [1014:0529]
01:00.0 VGA compatible controller [0300]: ATI Technologies Inc Radeon Mobility M7 LW [Radeon Mobility 7500] [1002:4c57]
	Subsystem: IBM Device [1014:0530]
Comment 2 Bryce Harrington 2010-03-30 14:35:52 UTC
Created attachment 34552 [details]
Xorg.0.log
Comment 3 Bryce Harrington 2010-03-30 14:36:07 UTC
Created attachment 34553 [details]
dmesg.txt
Comment 4 Bryce Harrington 2010-03-30 14:36:52 UTC
Created attachment 34554 [details]
XorgLogOld.txt
Comment 5 Bryce Harrington 2010-03-30 14:37:08 UTC
Created attachment 34555 [details]
XorgLog.txt
Comment 6 Bryce Harrington 2010-03-30 14:37:23 UTC
Created attachment 34556 [details]
dmesg
Comment 7 Bryce Harrington 2010-03-30 14:38:12 UTC
Created attachment 34557 [details]
dmesg #2
Comment 8 Bryce Harrington 2010-03-30 14:38:34 UTC
Created attachment 34558 [details]
CurrentDmesg.txt
Comment 9 Bryce Harrington 2010-03-30 14:39:05 UTC
Created attachment 34559 [details]
XorgLogOld.txt
Comment 10 Michel Dänzer 2010-03-31 01:18:59 UTC
A gdb backtrace with debugging symbols for the radeon driver (and preferably the X server as well) would be helpful.
Comment 11 Bryce Harrington 2010-04-16 15:32:57 UTC
The reporters say this is all they can get on this bug:

.
Thread 1 (Thread 7382):
#0  0x002b5fb6 in ?? () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#1  0x00000000 in ?? ()
No symbol table info available.
Comment 12 Bryce Harrington 2010-04-16 15:34:10 UTC
There's a coredump on the downstream bug 561433 if you're interested.
Comment 13 Pauli 2010-04-16 21:17:35 UTC
(In reply to comment #11)
> The reporters say this is all they can get on this bug:
> 
> .
> Thread 1 (Thread 7382):
> #0  0x002b5fb6 in ?? () from /lib/tls/i686/cmov/libc.so.6
> No symbol table info available.
> #1  0x00000000 in ?? ()
> No symbol table info available.

It doesn't look like that this is caused by same bug. This looks like stack overflow.

But in any case valgrind is the best tool for debugging stack overflows. Running xserver in valgrind and collecting the output to a file would be best way to collect info.

something like:
valgrind X <X params> > /tmp/valgrind.crash.report.txt
Comment 14 Pauli 2010-04-16 22:55:48 UTC
> something like:
> valgrind X <X params> &> /tmp/valgrind.crash.report.txt

& mark is important missing character from original command.
Comment 15 Wenzhuo Zhang 2010-04-19 17:59:50 UTC
Created attachment 35173 [details]
valgrind report

I am one of the downstream bug reporter. I just collected valgrind report using the following command:

sudo ls && sudo valgrind Xorg :1.0 &> /tmp/valgrind-X.log & sleep 15; export DISPLAY=:1.0; /etc/X11/Xsession
Comment 16 Michel Dänzer 2010-04-20 02:26:45 UTC
Looks like it might be related to bug 27510. Has Ubuntu backported EXA changes from 1.8 to 1.7? If not, I can help with backporting the fix from that bug for testing.
Comment 17 Bryce Harrington 2010-04-20 10:05:02 UTC
Ubuntu has not backported anything significant for EXA from 1.8 so far, so a backport of what fixes you think should be included would be helpful for this bug.
Comment 18 Michel Dänzer 2010-04-21 01:58:01 UTC
Created attachment 35200 [details] [review]
Backported fix

Does this fix the problem?
Comment 19 Wenzhuo Zhang 2010-04-21 02:42:33 UTC
Anxiously awaiting test packages from Bryce...
Comment 20 Bryce Harrington 2010-04-21 18:18:29 UTC
> Anxiously awaiting test packages from Bryce...

Here you go:
  https://edge.launchpad.net/~bryceharrington/+archive/purple/+packages
Comment 21 Wenzhuo Zhang 2010-04-22 05:03:49 UTC
I've been running the updated X server without crash for about 5 hours already. The problem can no longer be reproduced using my method, i.e. typing in the search bar of Firefox. Fantastic!
Comment 22 Michel Dänzer 2010-04-22 05:24:47 UTC
Fix pushed to server-1.7-nominations Git branch, thanks for testing.
Comment 23 Wenzhuo Zhang 2010-04-22 06:13:10 UTC
I just noticed that when the updated X server is running, the Linux kernel issues a lot of error messages like these:

[ 4955.429783] [drm:radeon_fence_wait] *ERROR* last signaled fence(0x0001FEE8)
[ 4955.621867] [drm:radeon_fence_wait] *ERROR* fence(f69361a0:0x0001FEE9) 40ms timeout
[ 4955.621880] [drm:radeon_fence_wait] *ERROR* last signaled fence(0x0001FEE9)
[ 4955.644982] [drm:radeon_fence_wait] *ERROR* fence(f1980c60:0x0001FEEA) 52ms timeout
Comment 24 Michel Dänzer 2010-04-22 06:15:55 UTC
(In reply to comment #23)
> I just noticed that when the updated X server is running, the Linux kernel
> issues a lot of error messages like these:

And those only appear with the patch? Is there any noticeable negative effect other than the messages?
Comment 25 Wenzhuo Zhang 2010-04-22 06:33:21 UTC
I just checked the timestamps of the error messages. They were generated when the random screensaver was running (at dinner time). - In order to give the updated X server more test, I purposely enabled the random screensaver. - I don't know if the linux kernel issues the error messages too with the previous version of the X server. It's probably not a side effect of the fix.
Comment 26 Wenzhuo Zhang 2010-04-22 22:04:39 UTC
(In reply to comment #24)

> And those only appear with the patch? Is there any noticeable negative effect
> other than the messages?

I downgraded to the previous version and confirmed that the Linux kernel also issues these error messages when the random Gnome-screensaver is running on the previous version. I am not sure which specific screensaver is causing this. Anyway, it's a separate problem than this issue.

No noticeable negative side effect. Thanks!
Comment 27 Michel Dänzer 2010-04-29 02:17:57 UTC
*** Bug 27869 has been marked as a duplicate of this bug. ***
Comment 28 Julien Cristau 2010-05-26 10:16:52 UTC
*** Bug 28262 has been marked as a duplicate of this bug. ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.