Bug 106021 - Polkitd: The utils_spawn_data_free reap timeout subprocess did not work resulting in a large number of zombie processes
Summary: Polkitd: The utils_spawn_data_free reap timeout subprocess did not work resul...
Status: RESOLVED MOVED
Alias: None
Product: PolicyKit
Classification: Unclassified
Component: daemon (show other bugs)
Version: unspecified
Hardware: All All
: high critical
Assignee: David Zeuthen (not reading bugmail)
QA Contact: David Zeuthen (not reading bugmail)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-13 05:22 UTC by lining916740672
Modified: 2018-08-20 21:34 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
0001-add-child-reaper-thread-to-fix-zombies (6.84 KB, patch)
2018-04-13 08:28 UTC, lining916740672
Details | Splinter Review
0001-polkitd-make-sure-child-process-exits-will-be-proces.patch (3.18 KB, text/plain)
2018-05-08 02:14 UTC, lining916740672
Details
polkitd-fix-zombie-not-reaped-when-js-spawned-proces.patch (1.13 KB, patch)
2018-05-10 09:14 UTC, lining916740672
Details | Splinter Review

Description lining916740672 2018-04-13 05:22:00 UTC
Hi,

We found a prolem of polkitd.
 
When run subprocess timeout in rules ,the utils_spawn_data_free reap timeout subprocess did not work , and it result in a large number of zombie processes

It can be reproduced in fedora27. And the upstream has not fix it.

How to reproduce:
1.  Add a debug rule , this rule waill run spawn process over 10s and result in a timeout
[root@localhost ~]# cat /etc/polkit-1/rules.d/01-test.rules   
 polkit.addRule(function(action, subject) {             
        polkit.log("debug start")                       
         try {                                          
             polkit.spawn(["/usr/bin/sleep", "15"]);       
         } catch (error) {                              
             //    polkit.log(error)                    
         }                                              
 });                                                    
  
2.  have a look at  the prcess of polkitd,
[root@localhost ~]# ps -ef |grep polkit |grep -v polkit
polkitd   1501     1  0 Mar31 ?        00:02:51 /usr/lib/polkit-1/polkitd --no-debug
polkitd   5060  1501  0 12:37 ?        00:00:00 [sleep] <defunct>
polkitd   5367  1501  0 12:38 ?        00:00:00 [sleep] <defunct>
polkitd   5631  1501  0 12:38 ?        00:00:00 [sleep] <defunct>
polkitd   5915  1501  0 12:38 ?        00:00:00 [sleep] <defunct>
polkitd  14052  1501  0 12:42 ?        00:00:00 sleep 20

[root@localhost ~]# journalctl -fu polkit
-- Logs begin at Sat 2018-03-31 14:36:03 CST. --
Apr 03 12:39:11 2-3 polkitd[1501]: /etc/polkit-1/rules.d/01-test.rules:5: Error: Error spawning helper: Timed out after 10 seconds (g-io-error-quark, 24)
Apr 03 12:39:21 2-3 polkitd[1501]: /etc/polkit-1/rules.d/01-test.rules:5: Error: Error spawning helper: Timed out after 10 seconds (g-io-error-quark, 24)
Apr 03 12:40:11 2-3 polkitd[1501]: /etc/polkit-1/rules.d/01-test.rules:5: Error: Error spawning helper: Timed out after 10 seconds (g-io-error-quark, 24)
Apr 03 12:40:21 2-3 polkitd[1501]: /etc/polkit-1/rules.d/01-test.rules:5: Error: Error spawning helper: Timed out after 10 seconds (g-io-error-quark, 24)
Comment 1 lining916740672 2018-04-13 08:28:58 UTC
Created attachment 138819 [details] [review]
0001-add-child-reaper-thread-to-fix-zombies
Comment 2 lining916740672 2018-04-13 08:34:40 UTC
I made a patch to fix this issue.

The root cause is :


static void
utils_spawn_data_free (UtilsSpawnData *data)
{

  if (data->child_pid != 0)
    {
      GSource *source;
      kill (data->child_pid, SIGTERM);
      /* OK, we need to reap for the child ourselves - we don't want
       * to use waitpid() because that might block the calling
       * thread (the child might handle SIGTERM and use several
       * seconds for cleanup/rollback).
       *
       * So we use GChildWatch instead.
       *
       * Avoid taking a references to ourselves. but note that we need
       * to pass the GSource so we can nuke it once handled.
       */
      source = g_child_watch_source_new (data->child_pid);
      g_source_set_callback (source,
                             (GSourceFunc) utils_child_watch_from_release_cb,
                             source,
                             (GDestroyNotify) g_source_destroy);
      g_source_attach (source, data->main_context);
      g_source_unref (source);
      data->child_pid = 0;
    }
Comment 3 lining916740672 2018-04-13 08:38:18 UTC
The GChildWatch in utils_spawn_data_free didn't work due to the release of main_loop and main context outside.
Comment 4 David Herrmann 2018-05-03 16:42:35 UTC
Why not simply turn SIGTERM into SIGKILL and use waitid(2)? I mean, we are dealing with a timeout here, no reason to try to be graceful.
Comment 5 lining916740672 2018-05-05 12:10:05 UTC
(In reply to David Herrmann from comment #4)
> Why not simply turn SIGTERM into SIGKILL and use waitid(2)? I mean, we are
> dealing with a timeout here, no reason to try to be graceful.

Hi,I made a better patch to fix this. I will send out next Monday. :)
Comment 6 lining916740672 2018-05-08 02:14:54 UTC
Created attachment 139417 [details]
0001-polkitd-make-sure-child-process-exits-will-be-proces.patch
Comment 7 lining916740672 2018-05-08 02:28:37 UTC
Hi,I made a better patch to make sure child process  exits will be processed.
This patch seems to be simpler.

I made 3 timeout source.  
The 1st one will send SIGTERM at 10s, 
2nd one will send SIGKILL at 15s, 
last one quit the main loop. 
Once child process exit and child watch source was processed , the main loop quit. Otherwise we quit main loop at 20s.

Timer1: 10s send SIGTERM.
Timer2: 15s send SIGKILL
Timer3: 20s exit the mainloop

0  ~ 10s: child exit normally
10 ~ 15s: child exit by SIGTERM
15 ~ 20s: child exit by SIGKILL
20s ~   : child seems to be abnormal. we quit main loop.

Please give me some comments or suggestions on fixing the issue.    : )
Comment 8 David Herrmann 2018-05-09 10:35:05 UTC
I would prefer to send SIGKILL straight away and use waitid(2) to guarantee it is collected.

Anyway, your patch looks fine. Lets see whether a polkit maintainer can apply it.
Comment 9 lining916740672 2018-05-10 09:14:30 UTC
Created attachment 139459 [details] [review]
polkitd-fix-zombie-not-reaped-when-js-spawned-proces.patch

This patch seems to be much simpler  and better.
Comment 10 lining916740672 2018-05-14 08:01:54 UTC
(In reply to David Herrmann from comment #8)
> I would prefer to send SIGKILL straight away and use waitid(2) to guarantee
> it is collected.
> 
> Anyway, your patch looks fine. Lets see whether a polkit maintainer can
> apply it.

Hi,I post a new patch, this one seems mucher simpler. This patch attaches source to global default main context and can work.

Change:
-      g_source_attach (source, data->main_context);
+      /* attach source to the global default main context */
+      g_source_attach (source, NULL)
Comment 11 David Herrmann 2018-08-15 11:02:00 UTC
(In reply to lining916740672 from comment #10)
> -      g_source_attach (source, data->main_context);
> +      /* attach source to the global default main context */
> +      g_source_attach (source, NULL)

According to glib docs g_source_attach() is safe to attach to other threads. The callback we use is localized to the source itself, so I see no harm in doing that. Furthermore, no threading should be involved here, since the js-authority is executed inline, but I am not entirely sure it is invoked in the main-thread.

Regardless: I think this is safe.

I still believe sending SIGKILL is the right thing to do. But I also think this patch is also the right thing to do to reap children correctly.

    Reviewed-by: David Herrmann <dh.herrmann@gmail.com>

Not sure who to ping to pick this up and merge upstream, though.
Comment 12 GitLab Migration User 2018-08-20 21:34:29 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/polkit/polkit/issues/11.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.