Bug 28548 - plymouthd crashed with SIGSEGV in ply_event_loop_process_pending_events()
Summary: plymouthd crashed with SIGSEGV in ply_event_loop_process_pending_events()
Status: RESOLVED FIXED
Alias: None
Product: plymouth
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: medium normal
Assignee: Ray Strode [halfline]
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-06-15 02:52 UTC by Neil Perry
Modified: 2012-04-24 15:11 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
patch (2.12 KB, patch)
2010-06-15 02:52 UTC, Neil Perry
Details | Splinter Review
reference event sources before calling timeout handler to ensure events stay valid for loop iteration. (2.89 KB, patch)
2012-04-17 09:39 UTC, James Hunt
Details | Splinter Review

Description Neil Perry 2010-06-15 02:52:00 UTC
Created attachment 36279 [details] [review]
patch

Inital bug report https://bugs.launchpad.net/ubuntu/+source/plymouth/+bug/553745

just upgraded to lucid, first reboot tried to start accounts

ProblemType: Crash
DistroRelease: Ubuntu 10.04
Package: plymouth 0.8.1-4
ProcVersionSignature: Ubuntu 2.6.32-19.28-generic 2.6.32.10+drm33.1
Uname: Linux 2.6.32-19-generic i686
NonfreeKernelModules: wl
Architecture: i386
Date: Thu Apr 1 22:14:11 2010
DefaultPlymouth: /lib/plymouth/themes/kubuntu-logo/kubuntu-logo.plymouth
ExecutablePath: /sbin/plymouthd
MachineType: Dell Inc. Inspiron 1545
ProcCmdLine: root=UUID=73530dc8-e3cc-4147-b801-4725b2f88141 ro quiet splash vga=792
ProcCmdline: /sbin/plymouthd --mode=boot --attach-to-session
ProcEnviron: PATH=(custom, no user)
ProcFB: 0 inteldrmfb
SegvAnalysis:
 Segfault happened at: 0xfb5fe1 <ply_event_loop_process_pending_events+513>: test %eax,0x4(%esi)
 PC (0x00fb5fe1) ok
 source "%eax" ok
 destination "0x4(%esi)" (0x00000004) not located in a known VMA region (needed writable region)!
SegvReason: writing NULL VMA
Signal: 11
SourcePackage: plymouth
StacktraceTop:
 ply_event_loop_process_pending_events ()
 ply_event_loop_run () from /lib/libply.so.2
 ?? ()
 __libc_start_main () from /lib/tls/i686/cmov/libc.so.6
 ?? ()
TextPlymouth: /lib/plymouth/themes/ubuntu-text/ubuntu-text.plymouth
Title: plymouthd crashed with SIGSEGV in ply_event_loop_process_pending_events()
UserGroups:

dmi.bios.date: 07/17/2009
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A10
dmi.board.name: 0G848F
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA10:bd07/17/2009:svnDellInc.:pnInspiron1545:pvr:rvnDellInc.:rn0G848F:rvr:cvnDellInc.:ct8:cvr:
dmi.product.name: Inspiron 1545
dmi.sys.vendor: Dell Inc.


StacktraceTop:
 ply_event_loop_process_pending_events (
 ply_event_loop_run (loop=0x90280a8) at ply-event-loop.c:1312
 main (argc=3, argv=0xbfeba124) at main.c:1916
Comment 1 Ray Strode 2010-06-15 07:31:26 UTC
Review of attachment 36279 [details] [review]:

::: mountall-2.14/src/mountall.c
@@ +2742,2 @@
 			nih_assert_not_reached ();
 		}

This stuff seems to be unrelated to plymouth

::: plymouth-0.8.2/src/libply/ply-event-loop.c
@@ +1193,3 @@
 
+      if (watch == NULL)
+	{

This shouldn't even happen.  If this is happening there must be a bug some where else in the code.

@@ +1248,3 @@
+      number_of_received_events = epoll_wait (loop->epoll_fd,
+					      events, PLY_EVENT_LOOP_NUM_EVENT_HANDLERS,
+					      timeout);

Ah, this is a nice catch! Hard to believe this bug went unnoticed for so long.  Unfortunately, it just means plymouth was processing less events per iteration than its configured upper limit (8 instead of 64).  I don't think this is related to your crash.  I've committed this part though:  http://cgit.freedesktop.org/plymouth/commit/?id=26bc6f87c0507d0b457b23548094c475e00e75c9

::: plymouth-0.8.2/src/libply/ply-list.c
@@ +225,2 @@
   list->number_of_nodes--;
+  // assert (ply_list_find_node (list, node->data) != node);

again, if this is failing there is a bug somewhere else in the code.
Comment 2 James Hunt 2012-04-17 09:38:51 UTC
I think I've worked out what is going on here (and which caused me to
raise the erroneous bug 42285):

In ply_event_loop_process_pending_events(),
ply_event_loop_handle_timeouts() is being called *after* epoll_wait(),
but ply_event_loop_handle_timeouts() may free event sources.

I can reliabily force plymouthd to SIGSEGV (in various parts of the
code) by running the following:

  plymouth show-splash
  plymouth quit

I'm seeing epoll_wait() return with a single valid fd event.
ply_event_loop_handle_timeouts() then runs, and calls
main.c:on_boot_splash_idle(). This causes the event source object
referred to in the epoll_wait() event set to be freed and its reference
count set to zero. After ply_event_loop_handle_timeouts() finishes, the
now invalid source object pointed to by the epoll event data is
referenced (it now has a reference_count of 1), and the invalid event is
now processed with varying SIGSEGV scenarios ensuing.

Currently ply_event_loop_process_pending_events() can be summarized as:

1. get events.
2. handle timeouts.
3. reference event sources.
4. process events.
5. unreference event sources.

The attached patch changes this slightly to be effectively:

1. get events.
2. reference event sources.
3. handle timeouts.
4. process events.
5. unreference event sources.
Comment 3 James Hunt 2012-04-17 09:39:58 UTC
Created attachment 60179 [details] [review]
reference event sources before calling timeout handler to ensure events stay valid for loop iteration.

If ply_event_loop_handle_timeouts() is called before the events returned
by epoll_wait() are referenced, a timeout handler can free an event
source leading to undefined behaviour (and frequently SIGSEGV crashes)
once ply_event_loop_handle_timeouts() has finished since
ply_event_loop_process_pending_events() continues to try and process the
now invalid event sources. Thanks to cjwatson for a simpler solution to
my original fix.

Save errno values to ensure logic is robust in case handler makes
system call.
Comment 4 Ray Strode [halfline] 2012-04-24 14:36:07 UTC
Hi,

(In reply to comment #2)
> I think I've worked out what is going on here (and which caused me to
> raise the erroneous bug 42285):
great!

> In ply_event_loop_process_pending_events(),
> ply_event_loop_handle_timeouts() is being called *after* epoll_wait(),
> but ply_event_loop_handle_timeouts() may free event sources.
ah ha.  So to be a little more specific, a timeout callback may call ply_event_loop_stop_watching_fd and if that fd became ready at the same time as the callback then crash would result.
 
> I can reliabily force plymouthd to SIGSEGV (in various parts of the
> code) by running the following:
> 
>   plymouth show-splash
>   plymouth quit
In this case plymouth quit queues a callback to get called after the animation gets to an idle state, on_boot_splash_idle calls functions that ultimately stops watching the terminal fd.  if that callback happens in the same iteration of the loop that the terminal fd becomes ready, then boom. Makes sense. Thanks for figuring this out.
Comment 5 Ray Strode [halfline] 2012-04-24 14:50:18 UTC
Comment on attachment 60179 [details] [review]
reference event sources before calling timeout handler to ensure events stay valid for loop iteration.

Review of attachment 60179 [details] [review]:
-----------------------------------------------------------------

::: src/libply/ply-event-loop.c
@@ +1296,5 @@
> +       {
> +         ply_event_source_t *source;
> +         source = (ply_event_source_t *) (events[i].data.ptr);
> +
> +         ply_event_source_take_reference (source);

we can't just take the reference inside the loop, since the loop may get called repeatedly because of EINTR.  We could explicitly check for EINTR and not take the reference in that case, I guess.  We do need to ensure events don't get dispatched after they are no longer being watched, but I think we should still be okay.  The source is going to stick around, now, but the destination will still get free'd at stop_watching_fd time, so the source will in effect be orphaned and harmless.
Comment 6 Ray Strode [halfline] 2012-04-24 15:11:13 UTC
Alright, pushed with minor changes:

http://cgit.freedesktop.org/plymouth/commit/?id=16c7b306b34839d95a05f246f2184a7aab03b180

Thanks for tracking this down!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.