Bug 22679

Summary:	Xorg sporatically crashes during first session after boot running compiz
Product:	xorg	Reporter:	Ben Gamari <bgamari>
Component:	Server/General	Assignee:	Xorg Project Team <xorg-team>
Status:	RESOLVED INVALID	QA Contact:	Xorg Project Team <xorg-team>
Severity:	normal
Priority:	medium	CC:	eric225125
Version:	git
Hardware:	Other
OS:	All
Whiteboard:
i915 platform:		i915 features:

Description Ben Gamari 2009-07-08 19:36:10 UTC

If I run compiz for more than 5 minutes, Xorg will almost certainly crash. This only seems to be true for the first Xorg session after the machine has been started. I have not seen another crash after the initial one, despite running compiz. A rather unhelpful (to me) backtrace is reproduced below.

GDB session:
(gdb) bt
#0  0x00007f529a7948e3 in select () from /lib/libc.so.6
#1  0x0000000000478fb9 in WaitForSomething (pClientsReady=0x46d27b0) at WaitFor.c:230
#2  0x0000000000455048 in Dispatch () at dispatch.c:362
#3  0x00000000004268be in main (argc=8, argv=0x7fff39890158, envp=0x7fff398901a0) at main.c:283
(gdb) print MaxClients
$1 = 256
(gdb) print LastSelectMask
$2 = {fds_bits = {2251799813554858, 0 <repeats 15 times>}}
(gdb) print wt
$3 = (struct timeval *) 0x7fff3988ff40
(gdb) print *wt
$4 = {tv_sec = 0, tv_usec = 252709}
(gdb) list
225		    XFD_COPYSET(&ClientsWriteBlocked, &clientsWritable);
226		    i = Select (MaxClients, &LastSelectMask, &clientsWritable, NULL, wt);
227		}
228		else 
229		{
230		    i = Select (MaxClients, &LastSelectMask, NULL, NULL, wt);
231		}
232		selecterr = GetErrno();
233		WakeupHandler(i, (pointer)&LastSelectMask);
234		SmartScheduleStartTimer ();
(gdb) c
Continuing.

Program received signal SIGABRT, Aborted.
0x00007f529a6e9025 in raise () from /lib/libc.so.6
(gdb) bt
#0  0x00007f529a6e9025 in raise () from /lib/libc.so.6
#1  0x00007f529a6eac33 in abort () from /lib/libc.so.6
#2  0x0000000000493e5c in ddxGiveUp () at xf86Init.c:1397
#3  0x0000000000493f5c in AbortDDX () at xf86Init.c:1442
#4  0x000000000046e095 in AbortServer () at log.c:404
#5  0x000000000046e5a3 in FatalError (f=0x616c80 "Caught signal %d (%s). Server aborting\n") at log.c:529
#6  0x000000000047fc06 in OsSigHandler (signo=3, sip=0x7fff3988f9f0, unused=0x7fff3988f8c0) at osinit.c:152
#7  <signal handler called>
#8  0x00007f529a7948e3 in select () from /lib/libc.so.6
#9  0x0000000000478fb9 in WaitForSomething (pClientsReady=0x46d27b0) at WaitFor.c:230
#10 0x0000000000455048 in Dispatch () at dispatch.c:362
#11 0x00000000004268be in main (argc=8, argv=0x7fff39890158, envp=0x7fff398901a0) at main.c:283

Comment 1 Ben Gamari 2009-07-08 23:45:04 UTC

Preliminary systemtap results seem to indicate the signal is from keventd. Here is the siginfo:

Breakpoint 1, OsSigHandler (signo=3, sip=0x7fff9372d1b0, unused=0x7fff9372d080) at osinit.c:127
127       if (OsSigWrapper != NULL) {
(gdb) print sip
$1 = (siginfo_t *) 0x7fff9372d1b0
(gdb) print *sip
$2 = {si_signo = 3, si_errno = 0, si_code = 128, _sifields = {_pad = {0, 0, 4439628, 0, -1, 250, -173712783, 2, 2097155, 1, -1821191168, 32767, 76704400, 0, 95228656, 0,
      76704400, 0, 3, 1, 0, 0, 76704400, 0, -1821191280, 32767, 4440141, 0}, _kill = {si_pid = 0, si_uid = 0}, _timer = {si_tid = 0, si_overrun = 0, si_sigval = {
        sival_int = 4439628, sival_ptr = 0x43be4c}}, _rt = {si_pid = 0, si_uid = 0, si_sigval = {sival_int = 4439628, sival_ptr = 0x43be4c}}, _sigchld = {si_pid = 0,
      si_uid = 0, si_status = 4439628, si_utime = 1078036791295, si_stime = 12711189105}, _sigfault = {si_addr = 0x0}, _sigpoll = {si_band = 0, si_fd = 4439628}}}

Comment 2 Ben Gamari 2009-07-08 23:55:12 UTC

Here is the event as traced by systemtap,

[0249 ben@ben-laptop ~] $ sudo stap -vv sigquit.stap
[sudo] password for ben:
SystemTap translator/driver (version 0.9.8/0.141 non-git sources)
Copyright (C) 2005-2009 Red Hat, Inc. and others
This is free software; see the source for copying conditions.
Session arch: x86_64 release: 2.6.31-rc2-ben
Created temporary directory "/tmp/stap7Gig8j"
Searched '/usr/share/systemtap/tapset/x86_64/*.stp', found 3
Searched '/usr/share/systemtap/tapset/*.stp', found 51
Pass 1: parsed user script and 54 library script(s) in 230usr/20sys/502real ms.
probe __send_signal@:-1 kernel reloc=.dynamic section=.text pc=0xffffffff8106b300
probe send_sigqueue@kernel/signal.c:1345 kernel reloc=.dynamic section=.text pc=0xffffffff8106b1a0
probe force_sig@kernel/signal.c:1259 kernel reloc=.dynamic section=.text pc=0xffffffff8106b6d0
probe send_sig@kernel/signal.c:1253 kernel reloc=.dynamic section=.text pc=0xffffffff8106c700
probe send_sig_info@kernel/signal.c:1231 kernel reloc=.dynamic section=.text pc=0xffffffff8106c670
probe force_sig_info@kernel/signal.c:986 kernel reloc=.dynamic section=.text pc=0xffffffff8106b5c0
WARNING: read-only local variable 'pid_name' (alternatives: sig_name sig_pid): identifier 'pid_name' at sigquit.stap:37:15
 source:                                sig_name, pid_name, sig_pid, execname(), uid())
                                                  ^
WARNING: read-only local variable 'sig_pid' (alternatives: sig_name pid_name): identifier 'sig_pid' at :37:25
 source:                                sig_name, pid_name, sig_pid, execname(), uid())
                                                            ^
Pass 2: analyzed script: 6 probe(s), 13 function(s), 18 embed(s), 0 global(s) in 1040usr/2120sys/206343real ms.
Pass 3: using cached /home/ben/.systemtap/cache/76/stapconf_766082f19d0d792182ae7d8592ffdb3e_480.h
Pass 3: using cached /home/ben/.systemtap/cache/a5/stap_a5993b4da2481ff59a7d3fabcfcdf00c_17760.c
Pass 4: using cached /home/ben/.systemtap/cache/a5/stap_a5993b4da2481ff59a7d3fabcfcdf00c_17760.ko
Pass 5: starting run.
Running /usr/bin/staprun -v /tmp/stap7Gig8j/stap_a5993b4da2481ff59a7d3fabcfcdf00c_17760.ko
send_signal: SIGQUIT was sent to X (pid:2787) by events/1 uid:0
 0xffffffff8106b301 : T.649+0x1/0x2c0 [kernel]
 0xffffffff8106b8f3 : __group_send_sig_info+0x13/0x20 [kernel]
 0xffffffff8106c254 : group_send_sig_info+0x54/0x90 [kernel]
 0xffffffff8106c428 : __kill_pgrp_info+0x48/0x80 [kernel]
 0xffffffff8106c4a0 : kill_pgrp+0x40/0x60 [kernel]
 0xffffffff812eab52 : n_tty_receive_buf+0x482/0x12e0 [kernel]
 0xffffffff812ee373 : flush_to_ldisc+0x103/0x1d0 [kernel]
 0xffffffff81070d0a : worker_thread+0x15a/0x280 [kernel]
 0xffffffff81075cbe : kthread+0x9e/0xb0 [kernel]
 0xffffffff8101312a : child_rip+0xa/0x20 [kernel]
 0xffffffff81075c20 : kthread+0x0/0xb0 [kernel] (inexact)
 0xffffffff81013120 : child_rip+0x0/0x20 [kernel] (inexact)

Comment 3 Adam Jackson 2018-06-11 20:43:14 UTC

(In reply to Ben Gamari from comment #2)

> send_signal: SIGQUIT was sent to X (pid:2787) by events/1 uid:0
>  0xffffffff8106b301 : T.649+0x1/0x2c0 [kernel]
>  0xffffffff8106b8f3 : __group_send_sig_info+0x13/0x20 [kernel]
>  0xffffffff8106c254 : group_send_sig_info+0x54/0x90 [kernel]
>  0xffffffff8106c428 : __kill_pgrp_info+0x48/0x80 [kernel]
>  0xffffffff8106c4a0 : kill_pgrp+0x40/0x60 [kernel]
>  0xffffffff812eab52 : n_tty_receive_buf+0x482/0x12e0 [kernel]
>  0xffffffff812ee373 : flush_to_ldisc+0x103/0x1d0 [kernel]

This is the tty layer saying you hit ^\ (or whatever else you have mapped to SIGQUIT with stty). That's almost certainly not X's fault, though there have been cases where the input drivers didn't put things enough in raw mode and thus the kernel would still process those events.

If you think that's what's happening to you, please open a new bug so we can track it there.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.