Bug 77457

Summary: SIGABRT and SIGSEGV in epoll_wait during wl_event_loop_dispatch
Product: Wayland Reporter: Anu Reddy <anasuyax.r.nannuri>
Component: westonAssignee: Wayland bug list <wayland-bugs>
Status: VERIFIED NOTABUG QA Contact:
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: gdb-backtrace1
gdb-backtrace2

Description Anu Reddy 2014-04-15 00:08:09 UTC
when weston receives kill signal ‘SIGABRT' causes weston abort in epoll_wait(). When weston is killed via  'SIGSEGV'  causes segmentation fault  in epoll_wait(). 


Steps:
1. Launch  : $ weston
2. Execute:  $ killall -SIGABRT weston
3. Execute: $ killall –SIGSEGV weston
4. See attached gdb_backtrace


Software Stack
==============
Kernel: 3.13.6-200.fc20.x86_64
Systemd: 212 (rawhide)
wayland (HEAD) 1.4.91-0-g5e2cfd2
drm (HEAD) libdrm-2.4.52-0-g46d451c
mesa (HEAD) mesa-10.1-0-g4a86465
libva (HEAD) libva-1.2.1-0-g88ed1eb
intel-driver (HEAD) 1.2.2-0-g121e70d
cairo (HEAD) heads/1.12-0-g59e2a93
libinput (HEAD) remotes/origin/HEAD-0-gc5c503c
weston (HEAD) 1.4.91-0-g79d5a6e
Comment 1 Anu Reddy 2014-04-15 00:09:57 UTC
Created attachment 97372 [details]
gdb-backtrace1
Comment 2 Anu Reddy 2014-04-15 00:10:33 UTC
Created attachment 97373 [details]
gdb-backtrace2
Comment 3 Pekka Paalanen 2014-04-15 07:22:07 UTC
What would you expect to happen, when a compositor receives a SEGV or ABRT?
What should work differently than it does now?
I mean, what is the problem you see here?

To me it seems like a process getting a SEGV or ABRT should die, so I'm not sure what the bug here is.
Comment 4 U. Artie Eoff 2014-04-15 15:58:22 UTC
Correct.  SIGSEGV and SIGABRT are meant to trigger a segmentation fault and abort, respectively.  The point of testing these signals is to ensure weston or weston-launch exit codes reflect this appropriately.  It would be bad if the exit code returned 0, for instance.
Comment 5 Pekka Paalanen 2014-04-15 17:02:41 UTC
I just remembered that weston does have handlers for these two signals. They both execute on_caught_signal(), around http://cgit.freedesktop.org/wayland/weston/tree/src/compositor.c#n3885 which then raises SIGTRAP.

This might affect the process exit code, but surely it won't exit with status 0, right? But the exit code might not reflect ABRT or SEGV properly. Not sure if that could be a problem.
Comment 6 Anu Reddy 2014-04-15 17:30:26 UTC
SIGTRAP, SIGABRT and SIGKILL signals are exiting with 'non zero' exit code. But when weston receives SIGABRT and SIGTRAP signals, I see below message on tty.

Trace/breakpoint trap (core dumped) – (weston)_main
Comment 7 Anu Reddy 2014-04-15 17:33:29 UTC
I mean....

SIGSEGV, SIGABRT and SIGKILL signals are exiting with 'non zero' exit code.
But when weston receives SIGABRT and SIGSEGV signals, I see below message on
tty.

Trace/breakpoint trap (core dumped) – (weston)_main



(In reply to comment #6)
> SIGTRAP, SIGABRT and SIGKILL signals are exiting with 'non zero' exit code.
> But when weston receives SIGABRT and SIGTRAP signals, I see below message on
> tty.

Trace/breakpoint trap (core dumped) – (weston)_main
Comment 8 U. Artie Eoff 2014-04-15 17:53:04 UTC
(In reply to comment #5)
> I just remembered that weston does have handlers for these two signals. They
> both execute on_caught_signal(), around
> http://cgit.freedesktop.org/wayland/weston/tree/src/compositor.c#n3885 which
> then raises SIGTRAP.
> 
> This might affect the process exit code, but surely it won't exit with
> status 0, right? But the exit code might not reflect ABRT or SEGV properly.
> Not sure if that could be a problem.

Right. As long as weston exits with any non-zero status for these signals, then it shouldn't be a problem in most cases.  Basically, when doing test automation we want to be sure that a weston crash is detectable with a non-zero exit code so that the test result can reflect that correctly.
Comment 9 Pekka Paalanen 2014-04-16 05:33:37 UTC
(In reply to comment #7)
> I mean....
> 
> SIGSEGV, SIGABRT and SIGKILL signals are exiting with 'non zero' exit code.
> But when weston receives SIGABRT and SIGSEGV signals, I see below message on
> tty.
> 
> Trace/breakpoint trap (core dumped) – (weston)_main

Yes, that is the current intended behaviour. If you read the comment in on_caught_signal(), you'll see why it's there.

If we were to re-raise the SEGV or ABRT, using gdb might be harder. OTOH, we could have a command line switch for choosing gdb-friendly operation.

But, like Artie said, until someone sees a practical problem here, things should be ok as is.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.