Bug 28320

Summary: Xserver hangs in an infinite loop
Product: xorg Reporter: Pavel S. <pav_s>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED DUPLICATE QA Contact: Xorg Project Team <xorg-team>
Severity: major    
Priority: high CC: mcepl, picogeyer
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg - nothing special here
none
[mi] overflowing...
none
dmesg - this looks pomising...
none
Gdb backtrace of Xorg none

Description Pavel S. 2010-05-30 04:06:56 UTC
"[mi] EQ overflowing. The server is probably stuck in an infinite loop."

This is the only meaningful explanation I get in Xorg.0.log.old after restarting the computer (Lenovo Thinkpad T410 @ nVidia NVS 3100M). After searching on the Internet , I know, that this message is generated by a variety of bugs, so tell me please, which information you need to locate the bug...

I am running X.Org X Server 1.8.1 on x86_64 Gentoo with kernel 2.6.34 but I experienced this bug since 2.6.33 (maybe earlier, too). 

[description]
I first thought, that this bug was caused by my composition manager (xcompmgr) but after disabling it the problem persisted.This bug happens spontaneous  and completely at random. The Xserver stops redrawing anything but the mouse is still moving (hwcursor). I could login through ssh  from another computer and attach  gdb to the server, to find me inside a tight loop in drmIoctl()@xf86drm.c making ioctl() and getting -1 and  errno==EINTR every time. After killing and restarting the server I reliably get a hard lock (ssh server stops answering my requests, kernel hangs).

[This bug happens in standalone and dualhead mode...]

Till now I have not found any way to reliably reproduce this bug.
(I will post some Xserver logs + dmesg soon)

thanks in advance
Comment 1 Pavel S. 2010-05-30 04:07:48 UTC
Created attachment 35950 [details]
dmesg - nothing special here
Comment 2 Pavel S. 2010-05-30 04:08:30 UTC
Created attachment 35951 [details]
 [mi] overflowing...
Comment 3 Pavel S. 2010-05-30 04:18:53 UTC
Created attachment 35953 [details]
dmesg - this looks pomising...

After killing the server I get some messages in the dmesg, maybe this helps...
Comment 4 Pico 2010-06-08 07:15:11 UTC
Created attachment 36145 [details]
Gdb backtrace of Xorg

I think I'm having exactly the same issue.
Would this backtrace of Xorg help?
Comment 5 Matej Cepl 2010-06-15 07:09:29 UTC
That " [mi] EQ overflowing" message is completely meaningless. Please, read http://marc.info/?l=fedora-devel-list&m=124101535025331&w=2 and then take a look at the disaster which happens when people start to file this message as a bug at https://bugzilla.redhat.com/show_bug.cgi?id=465884. 

Much more interesting is this backtrace:

Backtrace:
[  5443.624] 0: /usr/bin/X (xorg_backtrace+0x28) [0x46c3c8]
[  5443.624] 1: /usr/bin/X (mieqEnqueue+0x1eb) [0x45b21b]
[  5443.624] 2: /usr/bin/X (xf86PostMotionEventP+0xc8) [0x472828]
[  5443.624] 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f29c6ccc000+0x407f) [0x7f29c6cd007f]
[  5443.625] 4: /usr/bin/X (0x400000+0x6f6d7) [0x46f6d7]
[  5443.625] 5: /usr/bin/X (0x400000+0xf647a) [0x4f647a]
[  5443.625] 6: /lib/libpthread.so.0 (0x7f29ca790000+0xedf0) [0x7f29ca79edf0]
[  5443.625] 7: /lib/libc.so.6 (ioctl+0x7) [0x7f29c8f0bf17]
[  5443.625] 8: /usr/lib/libdrm.so.2 (drmIoctl+0x23) [0x7f29c881eeb3]
[  5443.625] 9: /usr/lib/libdrm.so.2 (drmCommandWrite+0x1b) [0x7f29c881f13b]
[  5443.625] 10: /usr/lib/libdrm_nouveau.so.1 (0x7f29c7741000+0x31ed) [0x7f29c77441ed]
[  5443.625] 11: /usr/lib/libdrm_nouveau.so.1 (nouveau_bo_map_range+0xff) [0x7f29c774439f]
[  5443.625] 12: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (0x7f29c7947000+0x5f5d) [0x7f29c794cf5d]
[  5443.625] 13: /usr/lib64/xorg/modules/libexa.so (0x7f29c70de000+0x4637) [0x7f29c70e2637]
[  5443.625] 14: /usr/lib64/xorg/modules/libexa.so (0x7f29c70de000+0x74d2) [0x7f29c70e54d2]
[  5443.625] 15: /usr/lib64/xorg/modules/libexa.so (0x7f29c70de000+0x4817) [0x7f29c70e2817]
[  5443.625] 16: /usr/bin/X (0x400000+0xb6a3a) [0x4b6a3a]
[  5443.625] 17: /usr/bin/X (ValidateGC+0x24) [0x447b54]
[  5443.625] 18: /usr/bin/X (0x400000+0x14758b) [0x54758b]
[  5443.625] 19: /usr/bin/X (0x400000+0x1476b0) [0x5476b0]
[  5443.625] 20: /usr/bin/X (0x400000+0x149d33) [0x549d33]
[  5443.625] 21: /usr/bin/X (ConfigureWindow+0xb22) [0x42a302]
[  5443.625] 22: /usr/bin/X (0x400000+0x378d7) [0x4378d7]
[  5443.625] 23: /usr/bin/X (0x400000+0x3830c) [0x43830c]
[  5443.625] 24: /usr/bin/X (0x400000+0x24de5) [0x424de5]
[  5443.625] 25: /lib/libc.so.6 (__libc_start_main+0xe6) [0x7f29c8e61a26]
[  5443.625] 26: /usr/bin/X (0x400000+0x24999) [0x424999]

Looks like a crash in libdrm (or maybe nouveau driver) to me.
Comment 6 Pavel S. 2010-06-15 07:22:34 UTC
(In reply to comment #5)
> That " [mi] EQ overflowing" message is completely meaningless. Please, read
> http://marc.info/?l=fedora-devel-list&m=124101535025331&w=2 and then take a
> look at the disaster which happens when people start to file this message as a
> bug at https://bugzilla.redhat.com/show_bug.cgi?id=465884. 

Yes, I knew this, that ist why I looked into the GDB where the server was hanging. I am not experienced with X debugging to understand this bug though, so I decided to ask here.

> Much more interesting is this backtrace:

It looks to me that the server causes an deadlock in the kernel, and after that it hopelessly hangs in that tight loop trying to get a response from the kernel....
Comment 7 Marcin Koƛcielnicki 2010-06-16 12:59:17 UTC
The famous Mysterious NVA3-NVA8 Hang.

*** This bug has been marked as a duplicate of bug 26980 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.