Bug 42285 - plymouth signal handlers modify linked lists resulting in frequent crashes
Summary: plymouth signal handlers modify linked lists resulting in frequent crashes
Status: RESOLVED FIXED
Alias: None
Product: plymouth
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium major
Assignee: Ray Strode [halfline]
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-10-26 09:33 UTC by James Hunt
Modified: 2012-06-08 12:12 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description James Hunt 2011-10-26 09:33:30 UTC
I've been investigating a couple of Ubuntu bugs where plymouth SIGSEGVs:

https://bugs.launchpad.net/ubuntu/+source/plymouth/+bug/849414
https://bugs.launchpad.net/ubuntu/+source/plymouth/+bug/553745

After a tip-off from Vorlon in areas to consider, I think we now have the answer: a number of handlers are manipulating linked lists. The problem is that since the signals occur asynchronously:

1) The lists they manipulate could be in *any* state when the signal arrives.
2) Having manipulated potentially inconsistent lists, the handlers return
   execution to the point just before the signal arrived.

So, even if the list did randomly happen to be consistent prior to the signal arriving, the chance of them being inconsistent after the signal has been handled is becomes a lot higher.

An example:

1) src/plugins/splash/script/plugin.c:show_splash_screen() registers a the 'on_interrupt' handler:

  ply_event_loop_watch_signal (plugin->loop,
                               SIGINT,                        
                               (ply_event_handler_t)          
                               on_interrupt, plugin);


2) on_interrupt() calls stop_animation().
3) stop_animation() calls stop_script_animation().
4) stop_script_animation() calls ply_event_loop_stop_watching_for_timeout().
   Trouble ensures here since the function can call
   ply_list_remove_node() which modifies the main loop.


The fix seems to be to ensure that all 'non-fatal' signal handlers only set  flags (such as is done by ply_event_loop_exit()) such that the main loop can take action after the handler has completed. This way, the lists are always guaranteed to be in a consistent state regardless of when signals occur.

To trigger the bug, run the following a number of times:

    sudo kill -SIGINT plymouthd
Comment 1 Ray Strode [halfline] 2011-10-26 10:00:45 UTC
ply_event_loop_watch_signal isn't just a wrapper around signal() or sigaction(). It's a command to make signal handling "safe" by keeping the actual signal handler tiny (basically just does a write() call iirc) and deferring execution of the passed in function until the main loop runs.

Having said that, the SIGINT handlers are bogus code that were copy and pasted around from the early days of plymouth and should be removed.  They won't get run during boot though, so they aren't the source of any of the issues your users are seeing.
Comment 2 Ray Strode [halfline] 2012-06-08 12:12:59 UTC
Okay i've dropped the SIGINT handlers:

commit bb46ffb90ade92d87d82fadcea3a0243c80dea79
Author: Ray Strode <rstrode@redhat.com>
Date:   Fri Jun 8 15:11:52 2012 -0400

    splash: drop SIGINT handlers
    
    They serve no real purpose.
    
    https://bugs.freedesktop.org/show_bug.cgi?id=42285


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.