Summary: | Polkitd: The utils_spawn_data_free reap timeout subprocess did not work resulting in a large number of zombie processes | ||
---|---|---|---|
Product: | PolicyKit | Reporter: | lining916740672 |
Component: | daemon | Assignee: | David Zeuthen (not reading bugmail) <zeuthen> |
Status: | RESOLVED MOVED | QA Contact: | David Zeuthen (not reading bugmail) <zeuthen> |
Severity: | critical | ||
Priority: | high | CC: | dh.herrmann, lining916740672 |
Version: | unspecified | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
0001-add-child-reaper-thread-to-fix-zombies
0001-polkitd-make-sure-child-process-exits-will-be-proces.patch polkitd-fix-zombie-not-reaped-when-js-spawned-proces.patch |
Description
lining916740672
2018-04-13 05:22:00 UTC
Created attachment 138819 [details] [review] 0001-add-child-reaper-thread-to-fix-zombies I made a patch to fix this issue. The root cause is : static void utils_spawn_data_free (UtilsSpawnData *data) { if (data->child_pid != 0) { GSource *source; kill (data->child_pid, SIGTERM); /* OK, we need to reap for the child ourselves - we don't want * to use waitpid() because that might block the calling * thread (the child might handle SIGTERM and use several * seconds for cleanup/rollback). * * So we use GChildWatch instead. * * Avoid taking a references to ourselves. but note that we need * to pass the GSource so we can nuke it once handled. */ source = g_child_watch_source_new (data->child_pid); g_source_set_callback (source, (GSourceFunc) utils_child_watch_from_release_cb, source, (GDestroyNotify) g_source_destroy); g_source_attach (source, data->main_context); g_source_unref (source); data->child_pid = 0; } The GChildWatch in utils_spawn_data_free didn't work due to the release of main_loop and main context outside. Why not simply turn SIGTERM into SIGKILL and use waitid(2)? I mean, we are dealing with a timeout here, no reason to try to be graceful. (In reply to David Herrmann from comment #4) > Why not simply turn SIGTERM into SIGKILL and use waitid(2)? I mean, we are > dealing with a timeout here, no reason to try to be graceful. Hi,I made a better patch to fix this. I will send out next Monday. :) Created attachment 139417 [details]
0001-polkitd-make-sure-child-process-exits-will-be-proces.patch
Hi,I made a better patch to make sure child process exits will be processed. This patch seems to be simpler. I made 3 timeout source. The 1st one will send SIGTERM at 10s, 2nd one will send SIGKILL at 15s, last one quit the main loop. Once child process exit and child watch source was processed , the main loop quit. Otherwise we quit main loop at 20s. Timer1: 10s send SIGTERM. Timer2: 15s send SIGKILL Timer3: 20s exit the mainloop 0 ~ 10s: child exit normally 10 ~ 15s: child exit by SIGTERM 15 ~ 20s: child exit by SIGKILL 20s ~ : child seems to be abnormal. we quit main loop. Please give me some comments or suggestions on fixing the issue. : ) I would prefer to send SIGKILL straight away and use waitid(2) to guarantee it is collected. Anyway, your patch looks fine. Lets see whether a polkit maintainer can apply it. Created attachment 139459 [details] [review] polkitd-fix-zombie-not-reaped-when-js-spawned-proces.patch This patch seems to be much simpler and better. (In reply to David Herrmann from comment #8) > I would prefer to send SIGKILL straight away and use waitid(2) to guarantee > it is collected. > > Anyway, your patch looks fine. Lets see whether a polkit maintainer can > apply it. Hi,I post a new patch, this one seems mucher simpler. This patch attaches source to global default main context and can work. Change: - g_source_attach (source, data->main_context); + /* attach source to the global default main context */ + g_source_attach (source, NULL) (In reply to lining916740672 from comment #10) > - g_source_attach (source, data->main_context); > + /* attach source to the global default main context */ > + g_source_attach (source, NULL) According to glib docs g_source_attach() is safe to attach to other threads. The callback we use is localized to the source itself, so I see no harm in doing that. Furthermore, no threading should be involved here, since the js-authority is executed inline, but I am not entirely sure it is invoked in the main-thread. Regardless: I think this is safe. I still believe sending SIGKILL is the right thing to do. But I also think this patch is also the right thing to do to reap children correctly. Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Not sure who to ping to pick this up and merge upstream, though. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/polkit/polkit/issues/11. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.