Bug 32242 - EnterVT/LeaveVT: crash with msg: [mi] EQ overflowing. The server is probably stuck in an infinite loop. (bisect attached)
EnterVT/LeaveVT: crash with msg: [mi] EQ overflowing. The server is probably ...
Status: RESOLVED NOTOURBUG
Product: xorg
Classification: Unclassified
Component: Server/General
7.5 (2009.10)
x86-64 (AMD64) Linux (All)
: high critical
Assigned To: Xorg Project Team
Xorg Project Team
2011BRB_Reviewed
:
Depends on:
Blocks: xserver-1.12
  Show dependency treegraph
 
Reported: 2010-12-08 14:46 UTC by Jochen Keil
Modified: 2011-12-23 23:45 UTC (History)
5 users (show)

See Also:
i915 platform:
i915 features:


Attachments
fix for the crash-on-vtchange-problem (1.20 KB, patch)
2010-12-11 15:29 UTC, Jochen Keil
no flags Details | Splinter Review
revert commit d75e8146c414bfd512ba5dbd4a83acb334bbe19b (8.94 KB, patch)
2011-03-08 13:14 UTC, Jochen Keil
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Jochen Keil 2010-12-08 14:46:27 UTC
The X Server crashes after switching to a VT for the second time (the first time it works without problems) with this backtrace:

Backtrace:
0: X (xorg_backtrace+0x28) [0x49ef08]
1: X (0x400000+0x5ffb9) [0x45ffb9]
2: /lib/libpthread.so.0 (0x7fe58f486000+0xf1c0) [0x7fe58f4951c0]
3: /usr/lib/xorg/modules/drivers/nvidia_drv.so (0x7fe58a003000+0xbe19b) [0x7fe58a0c119b]
4: /usr/lib/xorg/modules/drivers/nvidia_drv.so (0x7fe58a003000+0xbe7b5) [0x7fe58a0c17b5]
5: /usr/lib/xorg/modules/drivers/nvidia_drv.so (0x7fe58a003000+0xbf361) [0x7fe58a0c2361]
6: /usr/lib/xorg/modules/drivers/nvidia_drv.so (0x7fe58a003000+0x3c5e64) [0x7fe58a3c8e64]
7: /usr/lib/xorg/modules/drivers/nvidia_drv.so (0x7fe58a003000+0x3bc07a) [0x7fe58a3bf07a]
8: X (0x400000+0x162f06) [0x562f06]
9: X (BlockHandler+0x50) [0x431200]
10: X (WaitForSomething+0x10f) [0x4595bf]
11: X (0x400000+0x2cef2) [0x42cef2]
12: X (0x400000+0x2123e) [0x42123e]
13: /lib/libc.so.6 (__libc_start_main+0xfd) [0x7fe58e40bc4d]
14: X (0x400000+0x20de9) [0x420de9]
Segmentation fault at address 0x25


This is the offending commit according to a bisect:
d75e8146c414bfd512ba5dbd4a83acb334bbe19b is the first bad commit
commit d75e8146c414bfd512ba5dbd4a83acb334bbe19b
Author: Keith Packard <keithp@keithp.com>
Date:   Mon Jul 12 16:01:34 2010 -0700

    Unwrap/rewrap EnterVT/LeaveVT completely, Fixes 28998
    
    Because some EnterVT code needs to remove it self from the
    call chain, we need to fix all of the wrappers to correctly
    unwrap/rewrap during the call chain. This is a follow-on to the fix
    for bug 27114 in commit 68a9ee8370e6f9b38218376ac92d5130a5b0ef1e.
    
    Signed-off-by: Keith Packard <keithp@keithp.com>
    Tested-by: Jesse Barnes <jesse.barnes@intel.com>
    Reviewed-by: Daniel Stone <daniel@fooishbar.org>
    Reviewed-by: Tiago Vignatti <tiago.vignatti@nokia.com>

:040000 040000 a302fa328e4ef3500ef954b57741498979238e74 00e618e04b26cc10d0baf8269bc6e54e4570eaea M      glx
:040000 040000 c7cbcde94f5a2168e841ed037e25137f86dccc40 0ac56be3f456d9e1a7e002abb5b330817d2a535a M      hw


bisect log:
git bisect start
# good: [a71dbc03e65cf7b0654a6eca93ce0bf6a1711ffa] Bump to version 1.8.99.904 (1.9 RC4)
git bisect good a71dbc03e65cf7b0654a6eca93ce0bf6a1711ffa
# bad: [a2c13f0d6548310e3cd115cf486d3e43edf23dcc] Bump to version 1.8.99.905 (1.9 RC5)
git bisect bad a2c13f0d6548310e3cd115cf486d3e43edf23dcc
# good: [c65280ce8df4836bd7424a90482e8aa00ab6f447] Increase advertised RENDER protocol minor version to 11
git bisect good c65280ce8df4836bd7424a90482e8aa00ab6f447
# good: [2307ab5bc9365ebbe04568edb7c7620a23689b70] Merge remote branch 'whot/for-keith'
git bisect good 2307ab5bc9365ebbe04568edb7c7620a23689b70
# good: [b2b9c458a46e9a41c3c76ffe83a2b580a41d0e90] XQuartz: Remove some dead code.
git bisect good b2b9c458a46e9a41c3c76ffe83a2b580a41d0e90
# bad: [0540c46066f938ad5611c56081cfcd8457a9b718] EXA: Finish access to pixmap if it's prepared at destruction time.
git bisect bad 0540c46066f938ad5611c56081cfcd8457a9b718
# bad: [d75e8146c414bfd512ba5dbd4a83acb334bbe19b] Unwrap/rewrap EnterVT/LeaveVT completely, Fixes 28998
git bisect bad d75e8146c414bfd512ba5dbd4a83acb334bbe19b


You might find more information in this thread:
http://www.nvnews.net/vbulletin/showthread.php?p=2361777
Comment 1 Daniel Stone 2010-12-09 10:30:21 UTC
On Wed, Dec 08, 2010 at 02:46:28PM -0800, bugzilla-daemon@freedesktop.org wrote:
> The X Server crashes after switching to a VT for the second time (the first
> time it works without problems) with this backtrace:
> 
> Backtrace:
> 0: X (xorg_backtrace+0x28) [0x49ef08]
> 1: X (0x400000+0x5ffb9) [0x45ffb9]
> 2: /lib/libpthread.so.0 (0x7fe58f486000+0xf1c0) [0x7fe58f4951c0]
> 3: /usr/lib/xorg/modules/drivers/nvidia_drv.so (0x7fe58a003000+0xbe19b)
> [0x7fe58a0c119b]
> 4: /usr/lib/xorg/modules/drivers/nvidia_drv.so (0x7fe58a003000+0xbe7b5)
> [0x7fe58a0c17b5]
> [...]
> Segmentation fault at address 0x25
> 
> 
> This is the offending commit according to a bisect:
> d75e8146c414bfd512ba5dbd4a83acb334bbe19b is the first bad commit
> commit d75e8146c414bfd512ba5dbd4a83acb334bbe19b
> Author: Keith Packard <keithp@keithp.com>
> Date:   Mon Jul 12 16:01:34 2010 -0700
> 
>     Unwrap/rewrap EnterVT/LeaveVT completely, Fixes 28998
> 
>     Because some EnterVT code needs to remove it self from the
>     call chain, we need to fix all of the wrappers to correctly
>     unwrap/rewrap during the call chain. This is a follow-on to the fix
>     for bug 27114 in commit 68a9ee8370e6f9b38218376ac92d5130a5b0ef1e.
> 
>     Signed-off-by: Keith Packard <keithp@keithp.com>
>     Tested-by: Jesse Barnes <jesse.barnes@intel.com>
>     Reviewed-by: Daniel Stone <daniel@fooishbar.org>
>     Reviewed-by: Tiago Vignatti <tiago.vignatti@nokia.com>

This would be a bug in the NVIDIA proprietary driver - please take it up
with them.

Cheers,
Daniel
Comment 2 Jochen Keil 2010-12-09 11:09:43 UTC
Hi Daniel,

why do you think this is a Nvidia related bug?
I mean the bisect tracks it down perfectly to this one commit which breaks the server. Why should the same Nvidia driver version break one X version and another not?

Maybe you could give some more details on this.

Thank you in advance.
Comment 3 Daniel Stone 2010-12-10 11:12:46 UTC
On Thu, Dec 09, 2010 at 11:09:44AM -0800, bugzilla-daemon@freedesktop.org wrote:
> why do you think this is a Nvidia related bug?
> I mean the bisect tracks it down perfectly to this one commit which breaks the
> server. Why should the same Nvidia driver version break one X version and
> another not?
> 
> Maybe you could give some more details on this.
> 
> Thank you in advance.

The server change was a correctness fix - previously wrapping of
EnterVT/LeaveVT was wrong all over the place and mostly only worked by
accident.  The NVIDIA driver wraps EnterVT/LeaveVT too, and my guess is
that it's doing it incorrectly.  (Especially given that the backtrace is
in nvidia_drv.so at the time of the crash ...)

Of course, the NVIDIA driver being proprietary, we can't debug it to
find out what the exact problem is; even if it is a server problem, the
NVIDIA guys will need to tell us what it is.
Comment 4 Jochen Keil 2010-12-11 15:29:39 UTC
Created attachment 41022 [details] [review]
fix for the crash-on-vtchange-problem
Comment 5 Jochen Keil 2010-12-11 15:32:38 UTC
Comment on attachment 41022 [details] [review]
fix for the crash-on-vtchange-problem

(Sorry, had a little browser-bugtracking-system-fight)

Hello guys,

after playing a bit around I came up with a patch for this issue. Since I don't know Xorg in detail this might be completely wrong (which I think so since it kind of reverts the old behaviour..).

The fix I propose implies that there is probably some pointer/memory mess. However, that's just a guess since I do not know what pScrn/CMapScreenPtr/XF86XVScreenPtr actually are good for.

Please have a look at it and tell me if it's ok or if I'm just kind of reverting the old behaviour.
Comment 6 Daniel Stone 2010-12-12 08:36:09 UTC
On Sat, Dec 11, 2010 at 03:32:39PM -0800, bugzilla-daemon@freedesktop.org wrote:
> Please have a look at it and tell me if it's ok or if I'm just kind of
> reverting the old behaviour.

That's pretty much just reverting to the old behaviour, yeah.
Comment 7 Jochen Keil 2010-12-12 11:36:01 UTC
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12.12.2010 17:36, bugzilla-daemon@freedesktop.org wrote:
>> Please have a look at it and tell me if it's ok or if I'm just kind
>> of reverting the old behaviour.
>
> That's pretty much just reverting to the old behaviour, yeah.

I was afraid of that. Is there something else I could try?
The whole thing just looks like if it's made for a race condition imho.

Is there some kind of documentation for this?
What do EnterVT/LeaveVT actually do? What's the purpose of the
XF86XVScreenRec struct?

Another workaround would be to disable XFree86-VidModeExtension and
XVideo extension but I haven't tried that yet.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk0FIpIACgkQtVwvsA+W4CCswgCfSNPwPeKGgdBZ7nql4vP37Vos
S20AnR67kE5rN3x5O1uZ+2IY+p9uOiFS
=X7Ac
-----END PGP SIGNATURE-----
Comment 8 Jochen Keil 2011-03-08 13:14:56 UTC
Created attachment 44248 [details] [review]
revert commit d75e8146c414bfd512ba5dbd4a83acb334bbe19b

This patch reverts the whole commit. Useful for nvidia users who experience this bug. An nvidia employee said in the forum at nvnews that there will be a fix for this in a future driver release. Let's hope the best. Until then reverting this particular commit might help.
Comment 9 Jeremy Huddleston 2011-10-31 17:10:58 UTC
Please send your patch to xorg-devel for review
Comment 10 Keith Packard 2011-12-23 23:45:15 UTC
I'm not interested in any patches not coming from someone with source to the nVidia driver in question. Without sources, there's no way for anyone to know what (if any) fix in the X server would be correct. Given that someone nVidia has apparently promised a fix at some point, I'm closing this bug with the assumption that it's just a bug in their driver.