Bug 91316

Summary: Xorg crashes a lot
Product: xorg Reporter: Karol Herbst <karolherbst>
Component: Server/DDX/XorgAssignee: Xorg Project Team <xorg-team>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: major    
Priority: medium CC: bugzilla, fourdan, jlp.bugs, martin.peres, mike, r9ku1q, slyfox, tobias.johannes.klausmann
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
xorg.log
none
xorg.conf
none
os: Fix upper bound for looping over fd_set.fd_mask[]
none
os: make sure the clientsWritable fd_set is initialized before use
none
os: make sure the clientsWritable fd_set is initialized before use none

Description Karol Herbst 2015-07-12 13:23:20 UTC
Created attachment 117074 [details]
xorg.log

Since one or two days Xorg crashes for me in two different situations:

1. loading the nouveau driver
2. switching or closing tabs in chromium

I am using:
xorg-server-1.17.2
linux-4.1.1
mesa-master
x11-drivers/xf86-video-intel-2.99.917 with DRI3 enabled

If I sould add more information, please ask.
Comment 1 Karol Herbst 2015-07-12 13:23:39 UTC
Created attachment 117075 [details]
xorg.conf
Comment 2 Karol Herbst 2015-07-14 12:52:35 UTC
got a better backtrace now:

#0  0x00000000005e3d30 in FlushAllOutput () at /usr/src/debug/x11-base/xorg-server-1.17.2/xorg-server-1.17.2/os/io.c:675
#1  0x00000000005dbcac in WaitForSomething (pClientsReady=pClientsReady@entry=0x2ea41e0) at /usr/src/debug/x11-base/xorg-server-1.17.2/xorg-server-1.17.2/os/WaitFor.c:217
#2  0x000000000043b7f1 in Dispatch () at /usr/src/debug/x11-base/xorg-server-1.17.2/xorg-server-1.17.2/dix/dispatch.c:361
#3  0x0000000000440cb4 in dix_main (argc=10, argv=0x7ffde9059dc8, envp=<optimized out>) at /usr/src/debug/x11-base/xorg-server-1.17.2/xorg-server-1.17.2/dix/main.c:298
#4  0x000000327d620050 in __libc_start_main (main=0x427840 <main>, argc=10, argv=0x7ffde9059dc8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffde9059db8) at libc-start.c:289
#5  0x000000000042786e in _start ()
Comment 3 Karol Herbst 2015-07-15 06:43:07 UTC
I should add some information:

distro: Gentoo

video cards:
Intel HD 4600
Nvidia GTX 770M without any outputs
Comment 4 Karol Herbst 2015-07-15 10:59:21 UTC
actually the nouveau related issue may be a completly different one, and backtrace is the one related to the chromium triggered issue
Comment 5 Ilia Mirkin 2015-07-16 21:02:55 UTC
Looks like xproto commit http://cgit.freedesktop.org/xorg/proto/xproto/commit/?id=2c94cdb453bc641246cc8b9a876da9799bee1ce7 increased XFD_SETSIZE to 512, but MAXCLIENTS has remained at 256. This causes out-of-bounds accesses to the clients array.
Comment 6 Chris Wilson 2015-07-16 21:10:26 UTC
That doesn't follow though, since we still never should populate the fd_set above MaxClients and so should never access e.g. ConnectionTranslation[MaxClients] above its bounds.
Comment 7 Karol Herbst 2015-07-16 21:19:16 UTC
will try to see if xproto is the culprit for sure
Comment 8 Tobias Klausmann 2015-07-16 21:24:14 UTC
Another backtrace:

[  6747.078] AllocNewConnection: client index = 37, socket fd = 69
[  6748.073] (EE) 
[  6748.073] (EE) Backtrace:
[  6748.076] (EE) 0: /usr/bin/X (xorg_backtrace+0x41) [0x58df11]
[  6748.076] (EE) 1: /usr/bin/X (0x400000+0x1921a9) [0x5921a9]
[  6748.076] (EE) 2: /lib64/libc.so.6 (0x7f87ca9f6000+0x336c0) [0x7f87caa296c0]
[  6748.076] (EE) 3: /usr/bin/X (FlushAllOutput+0xa6) [0x591786]
[  6748.076] (EE) 4: /usr/bin/X (WaitForSomething+0x6e5) [0x58b395]
[  6748.076] (EE) 5: /usr/bin/X (0x400000+0x3cec1) [0x43cec1]
[  6748.076] (EE) 6: /usr/bin/X (0x400000+0x4108b) [0x44108b]
[  6748.076] (EE) 7: /lib64/libc.so.6 (__libc_start_main+0xf0) [0x7f87caa16790]
[  6748.076] (EE) 8: /usr/bin/X (_start+0x29) [0x42c579]
[  6748.076] (EE) 
[  6748.076] (EE) Segmentation fault at address 0x313ce37f8
[  6748.076] (EE) 
Fatal server error:
[  6748.076] (EE) Caught signal 11 (Segmentation fault). Server aborting
[  6748.076] (EE) 
[  6748.076] (EE) 

As Ilia already noted: it seems to be xproto 7.0.28. I am testing the issue right now (it crashes randomly for me after a while, so it takes a while before i can say for sure it is xproto).
Comment 9 Karol Herbst 2015-07-16 22:31:56 UTC
so far a xproto downgrade helped me here
Comment 10 Chris Wilson 2015-07-16 22:40:11 UTC
*** Bug 91367 has been marked as a duplicate of this bug. ***
Comment 11 Tobias Klausmann 2015-07-16 22:55:39 UTC
(In reply to Karol Herbst from comment #9)
> so far a xproto downgrade helped me here

Yep, it didn't crash for a while now (normally it would have crashed several times). I'm pretty sure now it is the increase of 256 to 512 clients
Comment 12 Mike Lothian 2015-07-17 02:10:41 UTC
Thanks for pointing this out - I was looking in all the wrong places for commits that might be all fault
Comment 13 Chris Wilson 2015-07-17 10:17:49 UTC
Created attachment 117194 [details] [review]
os: Fix upper bound for looping over fd_set.fd_mask[]
Comment 14 Olivier Fourdan 2015-07-17 14:28:08 UTC
(In reply to Chris Wilson from comment #13)
> Created attachment 117194 [details] [review] [review]
> os: Fix upper bound for looping over fd_set.fd_mask[]
> 
> the actual fd_set.fd_mask[] was not increased as that is defined in
> terms of FD_SETSIZE and is used by all iterators inside Xpoll.h.
> 
> X however does iterate over XFD_SETSIZE instead. The result being that
> we increased the upper bound without increasing the array size with the
> result we were reading garbage and subsequently crashing.

I agree with the patch but I wonder if this patch sent upstream would address the mismatch between XFD_SETSIZE and FD_SETSIZE:

http://lists.x.org/archives/xorg-devel/2015-July/046946.html

And fix the issues in xproto instead?
Comment 15 Martin Peres 2015-07-17 14:43:14 UTC
Created attachment 117199 [details] [review]
os: make sure the clientsWritable fd_set is initialized before use

This patch should fix the crashes. Please test it this week end and I will send it for inclusion on Monday!

Thanks to everyone involved for helping tracking this down!
Comment 16 Chris Wilson 2015-07-17 14:58:32 UTC
(In reply to Olivier Fourdan from comment #14)
> I agree with the patch but I wonder if this patch sent upstream would
> address the mismatch between XFD_SETSIZE and FD_SETSIZE:
> 
> http://lists.x.org/archives/xorg-devel/2015-July/046946.html
> 
> And fix the issues in xproto instead?

Since this isn't the bug, we may as well fix the loop iterator in xproto.
Comment 17 Martin Peres 2015-07-17 15:14:24 UTC
Created attachment 117200 [details] [review]
os: make sure the clientsWritable fd_set is initialized before use

Updated the commit message to really explain what is going on!
Comment 18 Tobias Klausmann 2015-07-17 18:43:39 UTC
I'm happy with either change (Chris, Martin). I guess we should push these out to the repo soon (and maybe even make another bugfix release for the xserver)
Comment 19 Karol Herbst 2015-07-17 23:04:16 UTC
okay, with the patch from Martin I don't get crashes anymore, thanks a lot.
Comment 20 Karol Herbst 2015-07-20 22:45:01 UTC
still no crash :)
Comment 21 Timo Aaltonen 2015-09-08 06:03:48 UTC
Neither of these patches made upstream yet, and the crasher also happens with older servers that are built against the new proto. So what's the plan here?
Comment 22 Martin Peres 2015-09-08 06:49:18 UTC
(In reply to Timo Aaltonen from comment #21)
> Neither of these patches made upstream yet, and the crasher also happens
> with older servers that are built against the new proto. So what's the plan
> here?

The problem is fixed upstream. Just not with my patch.

You will need X from git if you want to use the latest xproto.
Comment 23 Timo Aaltonen 2015-09-08 07:06:52 UTC
That's not too helpful for users of earlier servers, or is the proper fix there to revert to earlier xproto?
Comment 24 Timo Aaltonen 2015-09-08 07:08:45 UTC
*** Bug 91607 has been marked as a duplicate of this bug. ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.