33233 – [RADEON:UMS:RS780] radeon computer freezes when moving the mouse

Bug 33233 - [RADEON:UMS:RS780] radeon computer freezes when moving the mouse

Summary: [RADEON:UMS:RS780] radeon computer freezes when moving the mouse

Status:	RESOLVED FIXED

Alias:	None

Product:	xorg
Classification:	Unclassified
Component:	Server/General (show other bugs)
Version:	7.5 (2009.10)
Hardware:	x86 (IA32) All

Importance:	medium normal
Assignee:	Xorg Project Team
QA Contact:	Xorg Project Team

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2011-01-18 03:08 UTC by Pierre Bailly
Modified:	2011-03-10 09:32 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments
Little program to test the bug. (447 bytes, text/x-csrc) 2011-01-18 03:08 UTC, Pierre Bailly	no flags	Details
LSPCI of the first computer (1.99 KB, application/octet-stream) 2011-01-18 03:10 UTC, Pierre Bailly	no flags	Details
LSPCI of the second computer (2.07 KB, application/octet-stream) 2011-01-18 03:10 UTC, Pierre Bailly	no flags	Details
LSPCI of the third computer (2.27 KB, application/octet-stream) 2011-01-18 03:11 UTC, Pierre Bailly	no flags	Details
"perf stat" report when moving the mouse (872 bytes, application/octet-stream) 2011-01-18 03:15 UTC, Pierre Bailly	no flags	Details
"perf stat" report without touching the mouse (828 bytes, application/octet-stream) 2011-01-18 03:16 UTC, Pierre Bailly	no flags	Details
dmesg after bug was reproduced (66.54 KB, text/plain) 2011-02-11 09:47 UTC, Anisse Astier	no flags	Details
View All

Description Pierre Bailly 2011-01-18 03:08:12 UTC

Created attachment 42147 [details]
Little program to test the bug.

Hello,
I have a bug on many ATI computers with a debian squeeze with a homemade
kernel 2.6.37. (I join you the "lspci" of three of them.)


The bug appears whether I use fglrx or radeon, with or without KMS.
Bug does not appear if I use "vesa" driver or on Intel computers.

I think my bug is similar to this one:
https://bugs.freedesktop.org/show_bug.cgi?id=32830

I can reproduce this bug three different ways:
- Move the mouse and launch a setxkbmap (you don't need an argument).
- Move the mouse, launch Openoffice with a bad argument (Ex: "ooffice -lag")
   and wait (while always moving the mouse) until OpenOffice is loaded.
- Launch Openoffice, move the mouse and call a programme which using a program
   using libxtst. (See attachment "xtestfakekey.c".)



The bug can be described : your mouse will slow down, everything X-related on
your computer will freeze (Ex: videos will stop). But, when you stop the mouse,
everything will start again.

If we move the mouse long enough, X will show the following errors:
	"
Errors from xkbcomp are not fatal to the X server
  [mi] EQ overflowing. The server is probably stuck in an infinite loop.
Backtrace:
0: /usr/bin/X (xorg_backtrace+0x3b) [0x80e7aab]
1: /usr/bin/X (mieqEnqueue+0x1ab) [0x80e73ab]
2: /usr/bin/X (xf86PostMotionEventP+0xd2) [0x80c1882]
3: /usr/bin/X (xf86PostMotionEvent+0x68) [0x80c1a08]
4: /usr/lib/xorg/modules/input/synaptics_drv.so (0xb732c000+0x3508)[0xb732f508]
5: /usr/lib/xorg/modules/input/synaptics_drv.so (0xb732c000+0x5aed)[0xb7331aed]
6: /usr/bin/X (0x8048000+0x6c29f) [0x80b429f]
7: /usr/bin/X (0x8048000+0x11e3c4) [0x81663c4]
8: (vdso) (__kernel_sigreturn+0x0) [0xffffe400]
9: /lib/libpthread.so.0 (fork+0x14) [0xb76f9094]
10: /usr/bin/X (Popen+0x9b) [0x80a9f2b]
11: /usr/bin/X (XkbDDXLoadKeymapByNames+0x1c7) [0x81bb897]
12: /usr/bin/X (0x8048000+0xf1d0c) [0x8139d0c]
13: /usr/bin/X (0x8048000+0x2b027) [0x8073027]
14: /usr/bin/X (0x8048000+0x1e95a) [0x806695a]
15: /lib/libc.so.6 (__libc_start_main+0xe6) [0xb7467c76]
16: /usr/bin/X (0x8048000+0x1e541) [0x8066541]
"

And multiple times the following message:
"
Warning:          Multiple interpretations of "NoSymbol+AnyOf(all)"
                    Using last definition for duplicate fields
"

I tried changing distribution (debian squeeze, then gentoo), and changing my
window manager (gnome then E17), but nothing changed.

I try to resolve the bug with new or older version of the kernel.
The bug is a regression created between Kernel 2.6.32.27 an 2.6.33 and is
not since resolved.
I tried a git bisect between 2.6.32 and 2.6.33, but there's a bug between the
two releases preventing the kernel to boot (sometimes with KMS,
sometimes without, sometimes both)

During my git bisect, I notice that the bug appear sooner without KMS.
In Kernel 2.6.37, the bug appear in the two situations.

I try tu use strace in setxkbmap, the lag will appear in the following line:
"poll([{fd=3, events=POLLIN}], 1, -1)    = 1 ([{fd=3, revents=POLLIN}])"
The fd of this poll is defined:
"socket(PF_FILE, SOCK_STREAM|SOCK_CLOEXEC, 0) = 3"

I try a latrace in setxkbmap, and the lag will appear in the following line:
"
20689           xcb_wait_for_reply [/usr/lib/libxcb.so.1]
20689             pthread_mutex_lock [/lib/libc.so.6]
20689             pthread_mutex_unlock [/lib/libc.so.6]
20689             poll [/lib/libc.so.6]                *lag here*
20689             pthread_mutex_lock [/lib/libc.so.6]
20689             read [/lib/libc.so.6]
20689             malloc [/lib/libc.so.6]
20689             memcpy [/lib/libc.so.6]
20689             memmove [/lib/libc.so.6]
20689             read [/lib/libc.so.6]
20689             read [/lib/libc.so.6]
20689             __errno_location [/lib/libc.so.6]
20689             poll [/lib/libc.so.6]
20689             read [/lib/libc.so.6]
20689             malloc [/lib/libc.so.6]
20689             pthread_cond_signal [/lib/libc.so.6]
20689             free [/lib/libc.so.6]
20689             pthread_cond_destroy [/lib/libc.so.6]
20689             pthread_cond_signal [/lib/libc.so.6]
20689             pthread_mutex_unlock [/lib/libc.so.6]
"


After all these test, I always cannot find what I can do to stop this bug.
I also join report from ftrace and perf, in order to give you more information.
Ftrace cannot parse hrtimer functions (like "hrtimer_start").
And perf shows that the "strcmp" keeps all the processor.



Do you have any idea ?
Regards,

Comment 1 Pierre Bailly 2011-01-18 03:10:23 UTC

Created attachment 42148 [details]
LSPCI of the first computer

Comment 2 Pierre Bailly 2011-01-18 03:10:53 UTC

Created attachment 42149 [details]
LSPCI of the second computer

Comment 3 Pierre Bailly 2011-01-18 03:11:17 UTC

Created attachment 42150 [details]
LSPCI of the third computer

Comment 4 Pierre Bailly 2011-01-18 03:15:30 UTC

Created attachment 42151 [details]
"perf stat" report when moving the mouse

Comment 5 Pierre Bailly 2011-01-18 03:16:16 UTC

Created attachment 42152 [details]
"perf stat" report without touching the mouse

Comment 6 Pierre Bailly 2011-01-18 03:48:44 UTC

Big file:

"perf record" report when moving the mouse: http://dl.free.fr/iiX4hDed2
"perf record" report without touching the mouse: http://dl.free.fr/bvHAm21ga
"trace record": http://dl.free.fr/v3nXtx5HK

Comment 7 Anisse Astier 2011-02-11 06:56:47 UTC

I'll be taking over for testing procedure, etc.

For "homemade kernel" it just means it was tested on a mainline kernel with a homemade .config file.

Comment 8 Anisse Astier 2011-02-11 09:45:43 UTC

I tried with "SWCursor" "true" option, the bug still happens, except that the mouse is frozen instead of being laggy (as long as you move it).

Here is /proc/interrupt after boot (nomodeset mode) :

           CPU0       
  0:      15785   IO-APIC-edge      timer
  1:          8   IO-APIC-edge      i8042
  8:          1   IO-APIC-edge      rtc0
  9:          0   IO-APIC-fasteoi   acpi
 12:        172   IO-APIC-edge      i8042
 16:        372   IO-APIC-fasteoi   hda_intel, ath9k
 17:        301   IO-APIC-fasteoi   ehci_hcd:usb1, ehci_hcd:usb2, ehci_hcd:usb3
 18:         29   IO-APIC-fasteoi   ohci_hcd:usb4, ohci_hcd:usb5, ohci_hcd:usb6, radeon@pci:0000:01:05.0
 42:      10780   PCI-MSI-edge      ahci
 43:        829   PCI-MSI-edge      eth0
NMI:          0   Non-maskable interrupts
LOC:      18550   Local timer interrupts
SPU:          0   Spurious interrupts
PMI:          0   Performance monitoring interrupts
IWI:          0   IRQ work interrupts
RES:          0   Rescheduling interrupts
CAL:          0   Function call interrupts
TLB:          0   TLB shootdowns
THR:          0   Threshold APIC interrupts
MCE:          0   Machine check exceptions
MCP:          1   Machine check polls
ERR:          0
MIS:          0

And after the bug was reproduced:

           CPU0       
  0:      26051   IO-APIC-edge      timer
  1:         12   IO-APIC-edge      i8042
  8:          1   IO-APIC-edge      rtc0
  9:          0   IO-APIC-fasteoi   acpi
 12:       3280   IO-APIC-edge      i8042
 16:        372   IO-APIC-fasteoi   hda_intel, ath9k
 17:        434   IO-APIC-fasteoi   ehci_hcd:usb1, ehci_hcd:usb2, ehci_hcd:usb3
 18:         29   IO-APIC-fasteoi   ohci_hcd:usb4, ohci_hcd:usb5, ohci_hcd:usb6, radeon@pci:0000:01:05.0
 42:      12065   PCI-MSI-edge      ahci
 43:       1006   PCI-MSI-edge      eth0
NMI:          0   Non-maskable interrupts
LOC:      34257   Local timer interrupts
SPU:          0   Spurious interrupts
PMI:          0   Performance monitoring interrupts
IWI:          0   IRQ work interrupts
RES:          0   Rescheduling interrupts
CAL:          0   Function call interrupts
TLB:          0   TLB shootdowns
THR:          0   Threshold APIC interrupts
MCE:          0   Machine check exceptions
MCP:          1   Machine check polls
ERR:          0
MIS:          0

Comment 9 Anisse Astier 2011-02-11 09:47:07 UTC

Created attachment 43261 [details]
dmesg after bug was reproduced

Here is the dmesg on latest Linus's tree from today (HEAD: d2478521afc20227658a10a8c5c2bf1a2aa615b3)

Comment 10 Jerome Glisse 2011-03-08 10:25:24 UTC

We don't support UMS anymore, if it works you lucky, if it doesn't we don't care, please test with KMS

Note that fglrx failing suggest somethings is wrong with your hw, maybe you can test another operating system

Comment 11 Anisse Astier 2011-03-10 09:32:31 UTC

I can confirm that with today's Linus tree, using KMS, the bug is not reproducible anymore, while it is using UMS.
We'll switch to KMS then.

Marking as closed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.