Bug 77457

Summary:	SIGABRT and SIGSEGV in epoll_wait during wl_event_loop_dispatch
Product:	Wayland	Reporter:	Anu Reddy <anasuyax.r.nannuri>
Component:	weston	Assignee:	Wayland bug list <wayland-bugs>
Status:	VERIFIED NOTABUG	QA Contact:
Severity:	normal
Priority:	medium
Version:	unspecified
Hardware:	Other
OS:	All
Whiteboard:
i915 platform:		i915 features:
Attachments:	gdb-backtrace1 gdb-backtrace2

Description Anu Reddy 2014-04-15 00:08:09 UTC

when weston receives kill signal ‘SIGABRT' causes weston abort in epoll_wait(). When weston is killed via  'SIGSEGV'  causes segmentation fault  in epoll_wait(). 


Steps:
1. Launch  : $ weston
2. Execute:  $ killall -SIGABRT weston
3. Execute: $ killall –SIGSEGV weston
4. See attached gdb_backtrace


Software Stack
==============
Kernel: 3.13.6-200.fc20.x86_64
Systemd: 212 (rawhide)
wayland (HEAD) 1.4.91-0-g5e2cfd2
drm (HEAD) libdrm-2.4.52-0-g46d451c
mesa (HEAD) mesa-10.1-0-g4a86465
libva (HEAD) libva-1.2.1-0-g88ed1eb
intel-driver (HEAD) 1.2.2-0-g121e70d
cairo (HEAD) heads/1.12-0-g59e2a93
libinput (HEAD) remotes/origin/HEAD-0-gc5c503c
weston (HEAD) 1.4.91-0-g79d5a6e

Comment 1 Anu Reddy 2014-04-15 00:09:57 UTC

Created attachment 97372 [details]
gdb-backtrace1

Comment 2 Anu Reddy 2014-04-15 00:10:33 UTC

Created attachment 97373 [details]
gdb-backtrace2

Comment 3 Pekka Paalanen 2014-04-15 07:22:07 UTC

What would you expect to happen, when a compositor receives a SEGV or ABRT?
What should work differently than it does now?
I mean, what is the problem you see here?

To me it seems like a process getting a SEGV or ABRT should die, so I'm not sure what the bug here is.

Comment 4 U. Artie Eoff 2014-04-15 15:58:22 UTC

Correct.  SIGSEGV and SIGABRT are meant to trigger a segmentation fault and abort, respectively.  The point of testing these signals is to ensure weston or weston-launch exit codes reflect this appropriately.  It would be bad if the exit code returned 0, for instance.

Comment 5 Pekka Paalanen 2014-04-15 17:02:41 UTC

I just remembered that weston does have handlers for these two signals. They both execute on_caught_signal(), around http://cgit.freedesktop.org/wayland/weston/tree/src/compositor.c#n3885 which then raises SIGTRAP.

This might affect the process exit code, but surely it won't exit with status 0, right? But the exit code might not reflect ABRT or SEGV properly. Not sure if that could be a problem.

Comment 6 Anu Reddy 2014-04-15 17:30:26 UTC

SIGTRAP, SIGABRT and SIGKILL signals are exiting with 'non zero' exit code. But when weston receives SIGABRT and SIGTRAP signals, I see below message on tty.

Trace/breakpoint trap (core dumped) – (weston)_main

Comment 7 Anu Reddy 2014-04-15 17:33:29 UTC

I mean....

SIGSEGV, SIGABRT and SIGKILL signals are exiting with 'non zero' exit code.
But when weston receives SIGABRT and SIGSEGV signals, I see below message on
tty.

Trace/breakpoint trap (core dumped) – (weston)_main



(In reply to comment #6)
> SIGTRAP, SIGABRT and SIGKILL signals are exiting with 'non zero' exit code.
> But when weston receives SIGABRT and SIGTRAP signals, I see below message on
> tty.

Trace/breakpoint trap (core dumped) – (weston)_main

Comment 8 U. Artie Eoff 2014-04-15 17:53:04 UTC

(In reply to comment #5)
> I just remembered that weston does have handlers for these two signals. They
> both execute on_caught_signal(), around
> http://cgit.freedesktop.org/wayland/weston/tree/src/compositor.c#n3885 which
> then raises SIGTRAP.
> 
> This might affect the process exit code, but surely it won't exit with
> status 0, right? But the exit code might not reflect ABRT or SEGV properly.
> Not sure if that could be a problem.

Right. As long as weston exits with any non-zero status for these signals, then it shouldn't be a problem in most cases.  Basically, when doing test automation we want to be sure that a weston crash is detectable with a non-zero exit code so that the test result can reflect that correctly.

Comment 9 Pekka Paalanen 2014-04-16 05:33:37 UTC

(In reply to comment #7)
> I mean....
> 
> SIGSEGV, SIGABRT and SIGKILL signals are exiting with 'non zero' exit code.
> But when weston receives SIGABRT and SIGSEGV signals, I see below message on
> tty.
> 
> Trace/breakpoint trap (core dumped) – (weston)_main

Yes, that is the current intended behaviour. If you read the comment in on_caught_signal(), you'll see why it's there.

If we were to re-raise the SEGV or ABRT, using gdb might be harder. OTOH, we could have a command line switch for choosing gdb-friendly operation.

But, like Artie said, until someone sees a practical problem here, things should be ok as is.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.