Created attachment 36279 [details] [review] patch Inital bug report https://bugs.launchpad.net/ubuntu/+source/plymouth/+bug/553745 just upgraded to lucid, first reboot tried to start accounts ProblemType: Crash DistroRelease: Ubuntu 10.04 Package: plymouth 0.8.1-4 ProcVersionSignature: Ubuntu 2.6.32-19.28-generic 2.6.32.10+drm33.1 Uname: Linux 2.6.32-19-generic i686 NonfreeKernelModules: wl Architecture: i386 Date: Thu Apr 1 22:14:11 2010 DefaultPlymouth: /lib/plymouth/themes/kubuntu-logo/kubuntu-logo.plymouth ExecutablePath: /sbin/plymouthd MachineType: Dell Inc. Inspiron 1545 ProcCmdLine: root=UUID=73530dc8-e3cc-4147-b801-4725b2f88141 ro quiet splash vga=792 ProcCmdline: /sbin/plymouthd --mode=boot --attach-to-session ProcEnviron: PATH=(custom, no user) ProcFB: 0 inteldrmfb SegvAnalysis: Segfault happened at: 0xfb5fe1 <ply_event_loop_process_pending_events+513>: test %eax,0x4(%esi) PC (0x00fb5fe1) ok source "%eax" ok destination "0x4(%esi)" (0x00000004) not located in a known VMA region (needed writable region)! SegvReason: writing NULL VMA Signal: 11 SourcePackage: plymouth StacktraceTop: ply_event_loop_process_pending_events () ply_event_loop_run () from /lib/libply.so.2 ?? () __libc_start_main () from /lib/tls/i686/cmov/libc.so.6 ?? () TextPlymouth: /lib/plymouth/themes/ubuntu-text/ubuntu-text.plymouth Title: plymouthd crashed with SIGSEGV in ply_event_loop_process_pending_events() UserGroups: dmi.bios.date: 07/17/2009 dmi.bios.vendor: Dell Inc. dmi.bios.version: A10 dmi.board.name: 0G848F dmi.board.vendor: Dell Inc. dmi.chassis.type: 8 dmi.chassis.vendor: Dell Inc. dmi.modalias: dmi:bvnDellInc.:bvrA10:bd07/17/2009:svnDellInc.:pnInspiron1545:pvr:rvnDellInc.:rn0G848F:rvr:cvnDellInc.:ct8:cvr: dmi.product.name: Inspiron 1545 dmi.sys.vendor: Dell Inc. StacktraceTop: ply_event_loop_process_pending_events ( ply_event_loop_run (loop=0x90280a8) at ply-event-loop.c:1312 main (argc=3, argv=0xbfeba124) at main.c:1916
Review of attachment 36279 [details] [review]: ::: mountall-2.14/src/mountall.c @@ +2742,2 @@ nih_assert_not_reached (); } This stuff seems to be unrelated to plymouth ::: plymouth-0.8.2/src/libply/ply-event-loop.c @@ +1193,3 @@ + if (watch == NULL) + { This shouldn't even happen. If this is happening there must be a bug some where else in the code. @@ +1248,3 @@ + number_of_received_events = epoll_wait (loop->epoll_fd, + events, PLY_EVENT_LOOP_NUM_EVENT_HANDLERS, + timeout); Ah, this is a nice catch! Hard to believe this bug went unnoticed for so long. Unfortunately, it just means plymouth was processing less events per iteration than its configured upper limit (8 instead of 64). I don't think this is related to your crash. I've committed this part though: http://cgit.freedesktop.org/plymouth/commit/?id=26bc6f87c0507d0b457b23548094c475e00e75c9 ::: plymouth-0.8.2/src/libply/ply-list.c @@ +225,2 @@ list->number_of_nodes--; + // assert (ply_list_find_node (list, node->data) != node); again, if this is failing there is a bug somewhere else in the code.
I think I've worked out what is going on here (and which caused me to raise the erroneous bug 42285): In ply_event_loop_process_pending_events(), ply_event_loop_handle_timeouts() is being called *after* epoll_wait(), but ply_event_loop_handle_timeouts() may free event sources. I can reliabily force plymouthd to SIGSEGV (in various parts of the code) by running the following: plymouth show-splash plymouth quit I'm seeing epoll_wait() return with a single valid fd event. ply_event_loop_handle_timeouts() then runs, and calls main.c:on_boot_splash_idle(). This causes the event source object referred to in the epoll_wait() event set to be freed and its reference count set to zero. After ply_event_loop_handle_timeouts() finishes, the now invalid source object pointed to by the epoll event data is referenced (it now has a reference_count of 1), and the invalid event is now processed with varying SIGSEGV scenarios ensuing. Currently ply_event_loop_process_pending_events() can be summarized as: 1. get events. 2. handle timeouts. 3. reference event sources. 4. process events. 5. unreference event sources. The attached patch changes this slightly to be effectively: 1. get events. 2. reference event sources. 3. handle timeouts. 4. process events. 5. unreference event sources.
Created attachment 60179 [details] [review] reference event sources before calling timeout handler to ensure events stay valid for loop iteration. If ply_event_loop_handle_timeouts() is called before the events returned by epoll_wait() are referenced, a timeout handler can free an event source leading to undefined behaviour (and frequently SIGSEGV crashes) once ply_event_loop_handle_timeouts() has finished since ply_event_loop_process_pending_events() continues to try and process the now invalid event sources. Thanks to cjwatson for a simpler solution to my original fix. Save errno values to ensure logic is robust in case handler makes system call.
Hi, (In reply to comment #2) > I think I've worked out what is going on here (and which caused me to > raise the erroneous bug 42285): great! > In ply_event_loop_process_pending_events(), > ply_event_loop_handle_timeouts() is being called *after* epoll_wait(), > but ply_event_loop_handle_timeouts() may free event sources. ah ha. So to be a little more specific, a timeout callback may call ply_event_loop_stop_watching_fd and if that fd became ready at the same time as the callback then crash would result. > I can reliabily force plymouthd to SIGSEGV (in various parts of the > code) by running the following: > > plymouth show-splash > plymouth quit In this case plymouth quit queues a callback to get called after the animation gets to an idle state, on_boot_splash_idle calls functions that ultimately stops watching the terminal fd. if that callback happens in the same iteration of the loop that the terminal fd becomes ready, then boom. Makes sense. Thanks for figuring this out.
Comment on attachment 60179 [details] [review] reference event sources before calling timeout handler to ensure events stay valid for loop iteration. Review of attachment 60179 [details] [review]: ----------------------------------------------------------------- ::: src/libply/ply-event-loop.c @@ +1296,5 @@ > + { > + ply_event_source_t *source; > + source = (ply_event_source_t *) (events[i].data.ptr); > + > + ply_event_source_take_reference (source); we can't just take the reference inside the loop, since the loop may get called repeatedly because of EINTR. We could explicitly check for EINTR and not take the reference in that case, I guess. We do need to ensure events don't get dispatched after they are no longer being watched, but I think we should still be okay. The source is going to stick around, now, but the destination will still get free'd at stop_watching_fd time, so the source will in effect be orphaned and harmless.
Alright, pushed with minor changes: http://cgit.freedesktop.org/plymouth/commit/?id=16c7b306b34839d95a05f246f2184a7aab03b180 Thanks for tracking this down!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.