Bug 21931 - xserver freezes, infinite loop in mi/miarc.c, drawQuadrant() (intel graphics), [mi] EQ overflowing. The server is probably stuck in an infinite loop.
Summary: xserver freezes, infinite loop in mi/miarc.c, drawQuadrant() (intel graphics)...
Status: RESOLVED WORKSFORME
Alias: None
Product: xorg
Classification: Unclassified
Component: Server/General (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Xorg Project Team
QA Contact: Xorg Project Team
URL: http://bugs.gentoo.org/show_bug.cgi?i...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-05-25 09:31 UTC by Cyp
Modified: 2010-10-20 10:15 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Trace from yet another crash (11.25 KB, text/plain)
2009-05-31 04:49 UTC, Cyp
no flags Details
Xorg.0.log (48.51 KB, application/x-trash)
2009-06-02 22:47 UTC, Cyp
no flags Details
Another crash (15.92 KB, text/plain)
2009-06-05 22:53 UTC, Cyp
no flags Details
Another freeze (11.26 KB, text/plain)
2009-06-07 04:27 UTC, Cyp
no flags Details
Log file after a crash (124.78 KB, application/x-trash)
2009-06-07 12:28 UTC, rile
no flags Details
Another crash, this time with silken mouse off. (15.92 KB, text/plain)
2009-06-14 12:09 UTC, Cyp
no flags Details
Xorg.0.log with silken mouse off. (34.95 KB, application/x-trash)
2009-06-14 12:10 UTC, Cyp
no flags Details
Another crash, this time with intel driver 2.7.1 (111.64 KB, application/x-trash)
2009-06-15 22:23 UTC, Cyp
no flags Details
Another crash, this time with 1.6.1.901-r3 (26.27 KB, application/x-trash)
2009-06-19 00:22 UTC, Cyp
no flags Details
Crash on starting xine/tvtime with/without kernel modesetting (21.69 KB, application/octet-stream)
2009-06-19 05:42 UTC, Cyp
no flags Details
A new log after a crash (16.17 KB, text/plain)
2010-01-27 02:10 UTC, Vladimír Čunát
no flags Details
Xorg.0.log (26.83 KB, text/plain)
2010-02-18 05:16 UTC, Ján Bednár
no flags Details

Description Cyp 2009-05-25 09:31:57 UTC
Please see bug posted at bugs.gentoo.org.
Attachments with lots of trace there.

http://bugs.gentoo.org/show_bug.cgi?id=266330


Summary:

Sometimes xorg-server freezes. It happens every few days, sometimes doesn't happen for a few days, sometimes happens a few times in the same day. I've only noticed it happening when I was using the keyboard/mouse, but maybe that's just because the screen doesn't update as much when not using the keyboard/mouse.

I inserted trace into mi/miarc.c, showing that there is an infinite loop in drawQuadrant(). The infinite loop is caused by NANs in computeBound().

In one freeze, parc->height was 20 (at one point in time), but def->h was 0 at some point in time after computeAcc was called.

In another freeze, def->h was correctly 10.000000 and def->a1 was 90.000000, but somehow bound->ellipse.max was 0.000000 after the computeBound call (followed by some resulting NANs in the rest of bound).

I can see the values of the variables, and the code that is supposed to change the variables seems correct, so I don't know what's happening.

I would think it was a compiler problem, but I was unable to reproduce it in a standalone version of miarc.c, calling with the same parameters. And adding lots and lots of trace didn't make the problem mysteriously disappear, like I would expect from a compiler problem.
Comment 1 Cyp 2009-05-31 04:49:42 UTC
Created attachment 26316 [details]
Trace from yet another crash

It seems the more trace I add, the more the bug runs around.

The bug has run back into computeAcc, where the "def->h = ((double) tarc->height) / 2.0;" statement fails to execute... (tarc->height is 20, def->h either gets set to 0, or fails to get set.)

As far as I can tell, either another thread has a buffer overflow, and is overwriting the stack with zeroes, or there's some weird compiler problem...

Is xorg multithreaded, by the way?
Comment 2 Julien Cristau 2009-05-31 05:56:43 UTC
> Is xorg multithreaded, by the way?
> 
no.
Comment 3 rile 2009-05-31 11:20:49 UTC
Hi!

I have the same problem here.

If you think it's aboout compiler, maybe changing optimization flags works. My Xorg is compiled with CFLAGS="-O2 -march=prescott -pipe". If other users with this problem have the same (or similar) optimization flags, lets try with -O0 or something.

Does this optimization thing make any sense [I don't know much about the topic :(]?
Comment 4 Cyp 2009-06-02 22:47:29 UTC
Created attachment 26368 [details]
Xorg.0.log

Odd... This time the infinite loop isn't anywhere near miarc. (So I just have the default xorg log backtrace.)

Why do the backtraces contain this?
0: /usr/bin/X(xorg_backtrace+0x26) [0x4f2dc6]
1: /usr/bin/X(mieqEnqueue+0x271) [0x4d3ec1]
2: /usr/bin/X(xf86PostMotionEventP+0xc4) [0x4716e4]
3: /usr/lib64/xorg/modules/input//evdev_drv.so [0x7f9340415c78]
4: /usr/bin/X [0x487b05]
5: /usr/bin/X [0x46fdf6]
6: /lib/libpthread.so.0 [0x7f935744da00]

I don't know about drmIoctl, but there certainly aren't any calls to xf86PostMotionEventP from miarc.c.
Comment 5 Cyp 2009-06-05 22:53:29 UTC
Created attachment 26484 [details]
Another crash

This time, the innerYfromXY function returns a NAN, despite the parameters being valid.

I'm wondering if there could be a kernel problem restoring floating-point registers, or something like that. I've no idea how to debug that or find out if that's the case.

In the case of compiler issues, I would expect the freezes to be more deterministic...

How are the log messages saying the event queue is overflowing (along with a backtrace of the frozen thread) produced, without another thread to monitor the frozen thread?
Comment 6 Cyp 2009-06-05 23:00:12 UTC
With all that talk about kernels and compilers, I should probably have mentioned which version I have...


Processor:
Core2 (x86_64)

Compiler:
gcc (Gentoo 4.3.3-r2 p1.1, pie-10.1.5) 4.3.3

Kernel:
Linux version 2.6.29-gentoo-r1 (root@) (gcc version 4.3.3 (Gentoo 4.3.3-r2 p1.1, pie-10.1.5) ) #1 SMP Thu Apr 23 21:00:15 CEST 2009

Colour of computer case / orientation of computer screen:
white / north (The problem is so strange, I don't know what data is relevant...)
Comment 7 Cyp 2009-06-07 04:27:36 UTC
Created attachment 26504 [details]
Another freeze

Just after the one and only assignment to outer.min in computeBound, the value of bound->outer.min = -9223372036854775808, which doesn't seem valid, but that's not the problem...

Just after the computeBound function returns, *AFTER NO ASSIGNMENTS AT ALL* to bound.outer.min, bound.outer.min = nan.

That is, my variables are getting modified, without any assignments to the variables taking place.

Can anyone tell me how to debug this?
Comment 8 rile 2009-06-07 12:28:39 UTC
Created attachment 26515 [details]
Log file after a crash

Here's mine log file after a crash (a minute ago).
I hope it helps.

Does anybody have an idea what can be wrong?
Comment 9 Michel Dänzer 2009-06-08 06:45:59 UTC
(In reply to comment #5)
> I'm wondering if there could be a kernel problem restoring floating-point
> registers, or something like that.

[...]

> How are the log messages saying the event queue is overflowing (along with a
> backtrace of the frozen thread) produced, without another thread to monitor the
> frozen thread?

Mouse events are generated from a SIGIO signal handler.

Does disabling Silken Mouse (see the Xorg manpage) work around the problem? If so, there might indeed be a problem between the signal handler and floating point code.
Comment 10 rile 2009-06-11 09:19:00 UTC
For the record:
I upgrade to Xorg 1.6.1 and intel GMA driver 2.7.1 a few days ago and there was no crashes so far.
Maybe I was lucky, but maybe this bug gone away ;)
Comment 11 Cyp 2009-06-14 12:09:05 UTC
Created attachment 26775 [details]
Another crash, this time with silken mouse off.

(In reply to comment #9)
> [...]
> Does disabling Silken Mouse (see the Xorg manpage) work around the problem? If
> so, there might indeed be a problem between the signal handler and floating
> point code.

No, it crashed again a moment ago. I'll attach the xorg.0.log. It says silken mouse disabled.

Now I just upgraded to intel-2.7.1, hoping that helps somehow.
Comment 12 Cyp 2009-06-14 12:10:30 UTC
Created attachment 26776 [details]
Xorg.0.log with silken mouse off.
Comment 13 Cyp 2009-06-15 22:23:33 UTC
Created attachment 26823 [details]
Another crash, this time with intel driver 2.7.1

Upgrading to intel driver 2.7.1 didn't help...

xorg-server-1.6.1 is currently "masked" in gentoo, so I didn't try upgrading to that yet.
Comment 14 Rémi Cardona 2009-06-15 22:35:35 UTC
(In reply to comment #13)
> xorg-server-1.6.1 is currently "masked" in gentoo, so I didn't try upgrading to
> that yet.

Please do try it, I'll probably unmask it by the end of the week, you might as well test it. You should unmask xkeyboard-config as well.

Thanks
Comment 15 Cyp 2009-06-19 00:22:54 UTC
Created attachment 26946 [details]
Another crash, this time with 1.6.1.901-r3

xorg-server-1.6.1.901-r3 also froze... This time there's no backtrace in the Xorg.0.log, but I assume it's the same problem, whatever it is...
Comment 16 Cyp 2009-06-19 05:42:11 UTC
Created attachment 26957 [details]
Crash on starting xine/tvtime with/without kernel modesetting

I tried upgrading from gentoo-sources-2.6.29-r1 to gentoo-sources-2.6.30-r1, in case that would somehow change something. (Changing some preemption kernel setting fixed some failures in einstein@home, some kernels back.)

I tried enabling "Enable modesetting on intel by default" in the kernel when upgrading.

Now xine and tvtime freeze the graphics when starting. I think they both do some kind of colour keying hack.

I tried disabling "Enable modesetting on intel by default" in gentoo-sources-2.6.30-r1, and they both still crash.

If I switch to a text mode terminal (with Control+Alt+F2), it stays in graphics mode, but some colours get replaced with other colours (as if changing a few colours in the palette, although I think it's 16- or 24-bit colour). I can use the text mode terminal normally, I just can't see it at all. (Same with/without kernel modesetting.)

I assume changing the "Enable modesetting on intel by default" is sufficient to enable/disable kernel modesetting...
The Xorg.0.log with/without kernel modesetting is exactly identical, except for the kernel build time and the log file time.

glxgears seems normal (I see gears, and nothing crashes or explodes).

(Anyone know if there's a point in trying gentoo-sources-2.6.29-r5?)
Comment 17 Cyp 2009-07-29 08:50:27 UTC
This bug seems to have evaporated, I can't reproduce it any more.

I tried upgrading from 2.6.29-gentoo-r1 to 2.6.30, but it didn't work at all (couldn't start graphics, or something like that), so reverted back to 2.6.29-gentoo-r1. (Same kernel, didn't recompile it. But I did run 'make install' again. The newly installed kernel was identical to the old one.)

Since then, there have been any crashes (except for one when I was experimenting with infrared and loading/unloading random kernel modules).

I don't know what to resolve this bug as - the closest one would be the WORKSFORME, but there should be a MYSTERIOUSLYVANISHED option. Maybe the bug has decided to haunt some other program instead.

3-4 weeks later, I upgraded the kernel to 2.6.30, and it works now (don't have kernel modesetting enabled). I'll probably upgrade xorg-server and the intel driver again soon, so maybe the bug won't be able to come back, even if it wants to...

My current best theory is that some of the atoms in my processor were doing a random walk, such that the processor started failing sporadically, when running floating point code, and the atoms happened to wander back into place, so that my processor works again. More plausible theories are welcome.

I guess there's no point trying to debug this unless someone else can still reproduce it.

Again, this bug seems to have evaporated, I can't reproduce it any more.
Comment 18 Cyp 2009-07-29 08:51:04 UTC
P.S. Thanks everyone for info and trying to help.
Comment 19 Michel Dänzer 2009-07-29 09:19:38 UTC
I think WONTFIX is close enough, thanks for the update.
Comment 20 Vladimír Čunát 2010-01-27 02:07:22 UTC
Well, I'm unfortunately reproducing these freezes for months - they occur every other day and seem to be unaffected by any upgrades. Now I finally installed debugging support which got me here.

I'm attaching my Xorg.0.log. If You need more information or a better trace, tell me.

Comment 21 Vladimír Čunát 2010-01-27 02:10:13 UTC
Created attachment 32841 [details]
A new log after a crash
Comment 22 Ján Bednár 2010-02-18 05:14:14 UTC
I'm also experiencing this issue. Nearly everyday my xorg-server crashes/freezes. Sometimes, after freeze, I'm able to move mouse cursor, sometimes the crash is total. But I have different hardware configuration (Radeon HD 3650).
My configuration is following:
distribution: gentoo
kernel: 2.6.32.8-johny_b #2 SMP PREEMPT Wed Feb 17 08:25:55 CET 2010 i686 Intel(R) Core(TM)2 Duo CPU T9400 @ 2.53GHz GenuineIntel GNU/Linux
xorg-server: 1.7.5
libdrm: 2.4.17 - 2.4.18
mesa: 7.7
xf86-video-ati: git master
gcc: 4.3.4
glibc: 2.10.1-r1
CFLAGS="-march=core2 -O3 -pipe -fomit-frame-pointer"

I'm usign HP EliteBook 8530p with Radeon HD 3650.

I'm attaching Xorg.0.log.
Comment 23 Ján Bednár 2010-02-18 05:16:03 UTC
Created attachment 33384 [details]
Xorg.0.log
Comment 24 Michel Dänzer 2010-02-18 05:20:49 UTC
(In reply to comment #22)
> I'm also experiencing this issue. Nearly everyday my xorg-server
> crashes/freezes. Sometimes, after freeze, I'm able to move mouse cursor,
> sometimes the crash is total. But I have different hardware configuration
> (Radeon HD 3650).

So it's most certainly not the same issue. Please file a new bug, only follow up to an existing one if there appears to be a 100% match of the setup and symptoms and you can add actual new information about the problem.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.