Created attachment 42147 [details] Little program to test the bug. Hello, I have a bug on many ATI computers with a debian squeeze with a homemade kernel 2.6.37. (I join you the "lspci" of three of them.) The bug appears whether I use fglrx or radeon, with or without KMS. Bug does not appear if I use "vesa" driver or on Intel computers. I think my bug is similar to this one: https://bugs.freedesktop.org/show_bug.cgi?id=32830 I can reproduce this bug three different ways: - Move the mouse and launch a setxkbmap (you don't need an argument). - Move the mouse, launch Openoffice with a bad argument (Ex: "ooffice -lag") and wait (while always moving the mouse) until OpenOffice is loaded. - Launch Openoffice, move the mouse and call a programme which using a program using libxtst. (See attachment "xtestfakekey.c".) The bug can be described : your mouse will slow down, everything X-related on your computer will freeze (Ex: videos will stop). But, when you stop the mouse, everything will start again. If we move the mouse long enough, X will show the following errors: " Errors from xkbcomp are not fatal to the X server [mi] EQ overflowing. The server is probably stuck in an infinite loop. Backtrace: 0: /usr/bin/X (xorg_backtrace+0x3b) [0x80e7aab] 1: /usr/bin/X (mieqEnqueue+0x1ab) [0x80e73ab] 2: /usr/bin/X (xf86PostMotionEventP+0xd2) [0x80c1882] 3: /usr/bin/X (xf86PostMotionEvent+0x68) [0x80c1a08] 4: /usr/lib/xorg/modules/input/synaptics_drv.so (0xb732c000+0x3508)[0xb732f508] 5: /usr/lib/xorg/modules/input/synaptics_drv.so (0xb732c000+0x5aed)[0xb7331aed] 6: /usr/bin/X (0x8048000+0x6c29f) [0x80b429f] 7: /usr/bin/X (0x8048000+0x11e3c4) [0x81663c4] 8: (vdso) (__kernel_sigreturn+0x0) [0xffffe400] 9: /lib/libpthread.so.0 (fork+0x14) [0xb76f9094] 10: /usr/bin/X (Popen+0x9b) [0x80a9f2b] 11: /usr/bin/X (XkbDDXLoadKeymapByNames+0x1c7) [0x81bb897] 12: /usr/bin/X (0x8048000+0xf1d0c) [0x8139d0c] 13: /usr/bin/X (0x8048000+0x2b027) [0x8073027] 14: /usr/bin/X (0x8048000+0x1e95a) [0x806695a] 15: /lib/libc.so.6 (__libc_start_main+0xe6) [0xb7467c76] 16: /usr/bin/X (0x8048000+0x1e541) [0x8066541] " And multiple times the following message: " Warning: Multiple interpretations of "NoSymbol+AnyOf(all)" Using last definition for duplicate fields " I tried changing distribution (debian squeeze, then gentoo), and changing my window manager (gnome then E17), but nothing changed. I try to resolve the bug with new or older version of the kernel. The bug is a regression created between Kernel 2.6.32.27 an 2.6.33 and is not since resolved. I tried a git bisect between 2.6.32 and 2.6.33, but there's a bug between the two releases preventing the kernel to boot (sometimes with KMS, sometimes without, sometimes both) During my git bisect, I notice that the bug appear sooner without KMS. In Kernel 2.6.37, the bug appear in the two situations. I try tu use strace in setxkbmap, the lag will appear in the following line: "poll([{fd=3, events=POLLIN}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])" The fd of this poll is defined: "socket(PF_FILE, SOCK_STREAM|SOCK_CLOEXEC, 0) = 3" I try a latrace in setxkbmap, and the lag will appear in the following line: " 20689 xcb_wait_for_reply [/usr/lib/libxcb.so.1] 20689 pthread_mutex_lock [/lib/libc.so.6] 20689 pthread_mutex_unlock [/lib/libc.so.6] 20689 poll [/lib/libc.so.6] *lag here* 20689 pthread_mutex_lock [/lib/libc.so.6] 20689 read [/lib/libc.so.6] 20689 malloc [/lib/libc.so.6] 20689 memcpy [/lib/libc.so.6] 20689 memmove [/lib/libc.so.6] 20689 read [/lib/libc.so.6] 20689 read [/lib/libc.so.6] 20689 __errno_location [/lib/libc.so.6] 20689 poll [/lib/libc.so.6] 20689 read [/lib/libc.so.6] 20689 malloc [/lib/libc.so.6] 20689 pthread_cond_signal [/lib/libc.so.6] 20689 free [/lib/libc.so.6] 20689 pthread_cond_destroy [/lib/libc.so.6] 20689 pthread_cond_signal [/lib/libc.so.6] 20689 pthread_mutex_unlock [/lib/libc.so.6] " After all these test, I always cannot find what I can do to stop this bug. I also join report from ftrace and perf, in order to give you more information. Ftrace cannot parse hrtimer functions (like "hrtimer_start"). And perf shows that the "strcmp" keeps all the processor. Do you have any idea ? Regards,
Created attachment 42148 [details] LSPCI of the first computer
Created attachment 42149 [details] LSPCI of the second computer
Created attachment 42150 [details] LSPCI of the third computer
Created attachment 42151 [details] "perf stat" report when moving the mouse
Created attachment 42152 [details] "perf stat" report without touching the mouse
Big file: "perf record" report when moving the mouse: http://dl.free.fr/iiX4hDed2 "perf record" report without touching the mouse: http://dl.free.fr/bvHAm21ga "trace record": http://dl.free.fr/v3nXtx5HK
I'll be taking over for testing procedure, etc. For "homemade kernel" it just means it was tested on a mainline kernel with a homemade .config file.
I tried with "SWCursor" "true" option, the bug still happens, except that the mouse is frozen instead of being laggy (as long as you move it). Here is /proc/interrupt after boot (nomodeset mode) : CPU0 0: 15785 IO-APIC-edge timer 1: 8 IO-APIC-edge i8042 8: 1 IO-APIC-edge rtc0 9: 0 IO-APIC-fasteoi acpi 12: 172 IO-APIC-edge i8042 16: 372 IO-APIC-fasteoi hda_intel, ath9k 17: 301 IO-APIC-fasteoi ehci_hcd:usb1, ehci_hcd:usb2, ehci_hcd:usb3 18: 29 IO-APIC-fasteoi ohci_hcd:usb4, ohci_hcd:usb5, ohci_hcd:usb6, radeon@pci:0000:01:05.0 42: 10780 PCI-MSI-edge ahci 43: 829 PCI-MSI-edge eth0 NMI: 0 Non-maskable interrupts LOC: 18550 Local timer interrupts SPU: 0 Spurious interrupts PMI: 0 Performance monitoring interrupts IWI: 0 IRQ work interrupts RES: 0 Rescheduling interrupts CAL: 0 Function call interrupts TLB: 0 TLB shootdowns THR: 0 Threshold APIC interrupts MCE: 0 Machine check exceptions MCP: 1 Machine check polls ERR: 0 MIS: 0 And after the bug was reproduced: CPU0 0: 26051 IO-APIC-edge timer 1: 12 IO-APIC-edge i8042 8: 1 IO-APIC-edge rtc0 9: 0 IO-APIC-fasteoi acpi 12: 3280 IO-APIC-edge i8042 16: 372 IO-APIC-fasteoi hda_intel, ath9k 17: 434 IO-APIC-fasteoi ehci_hcd:usb1, ehci_hcd:usb2, ehci_hcd:usb3 18: 29 IO-APIC-fasteoi ohci_hcd:usb4, ohci_hcd:usb5, ohci_hcd:usb6, radeon@pci:0000:01:05.0 42: 12065 PCI-MSI-edge ahci 43: 1006 PCI-MSI-edge eth0 NMI: 0 Non-maskable interrupts LOC: 34257 Local timer interrupts SPU: 0 Spurious interrupts PMI: 0 Performance monitoring interrupts IWI: 0 IRQ work interrupts RES: 0 Rescheduling interrupts CAL: 0 Function call interrupts TLB: 0 TLB shootdowns THR: 0 Threshold APIC interrupts MCE: 0 Machine check exceptions MCP: 1 Machine check polls ERR: 0 MIS: 0
Created attachment 43261 [details] dmesg after bug was reproduced Here is the dmesg on latest Linus's tree from today (HEAD: d2478521afc20227658a10a8c5c2bf1a2aa615b3)
We don't support UMS anymore, if it works you lucky, if it doesn't we don't care, please test with KMS Note that fglrx failing suggest somethings is wrong with your hw, maybe you can test another operating system
I can confirm that with today's Linus tree, using KMS, the bug is not reproducible anymore, while it is using UMS. We'll switch to KMS then. Marking as closed.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.