84567 – reinitialize_after_fork is deadlocky

Bug 84567 - reinitialize_after_fork is deadlocky

Summary: reinitialize_after_fork is deadlocky

Status:	RESOLVED FIXED

Alias:	None

Product:	p11-glue
Classification:	Unclassified
Component:	p11-kit (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	Stef Walter
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-10-01 20:49 UTC by Andy Lutomirski
Modified:	2014-11-20 19:24 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments
p11-kit: Use getpid() to detect when process has forked (3.12 KB, patch) 2014-10-02 10:25 UTC, Stef Walter	Details \| Splinter Review
p11-kit: Use getpid() to detect when process has forked (7.65 KB, patch) 2014-10-02 10:56 UTC, Stef Walter	Details \| Splinter Review
p11-kit: Use pthread_atfork() in a safe manner (15.66 KB, patch) 2014-10-03 07:44 UTC, Stef Walter	Details \| Splinter Review
p11-kit: Use pthread_atfork() in a safe manner (15.68 KB, patch) 2014-10-03 18:56 UTC, Stef Walter	Details \| Splinter Review
Test case (2.14 KB, text/plain) 2014-11-20 19:12 UTC, Andy Lutomirski	Details
Show Obsolete (3) View All

Description Andy Lutomirski 2014-10-01 20:49:48 UTC

reinitialize_after_fork calls p11_lock, which is neither async-signal-safe (as required by POSIX for use after fork in a multithreaded process) nor even a remotely good idea (it's blatantly prone to deadlocks).

At least GnuTLS 3.2, and probably many other things, initialize p11-kit even in contexts in which nothing will ever, under any circumstances, initialize a pkcs11 module.  Nonetheless, these processes are prone to deadlocks when forking.  This happens to be multiple times per day.

In p11-kit's case, there isn't even a good reason to take the lock.  That whole function could be replaced with an increment (which doesn't even need to be atomic -- that function is never called with multiple threads running) of a global variable counting the number of forks, and initialize_called could be replaced with some sentinel if the module has never been initialized or the fork count at last initialization.

You'll still deadlock if someone tries to *use* p11-kit after an otherwise deadlocky fork, but most users of fork are about to call exec and will work just fine.

Comment 1 Stef Walter 2014-10-02 10:25:03 UTC

Created attachment 107220 [details] [review]
p11-kit: Use getpid() to detect when process has forked

Comment 2 Stef Walter 2014-10-02 10:25:46 UTC

Thanks for the clear bug report. I agree with your assessment. Could you try the attached patch against git master?

Please let me know if you'd prefer to test a patch against some other version of p11-kit, and I'll try to get one to you. Obviously such a bug is hard to reproduce, so having you be able to reproduce it is worth a lot.

Comment 3 Stef Walter 2014-10-02 10:56:01 UTC

Created attachment 107222 [details] [review]
p11-kit: Use getpid() to detect when process has forked

Comment 4 Andy Lutomirski 2014-10-02 16:39:54 UTC

In the process of re-testing this morning, I realized that I misdiagnosed the issue slightly.  Here's a backtrace:

#0  0x0000003a80032da3 in base::internal::SpinLockDelay(int volatile*, int, int) () from /lib64/libtcmalloc.so.4
#1  0x0000003a80032c67 in SpinLock::SlowLock() () from /lib64/libtcmalloc.so.4
#2  0x0000003a800242c1 in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /lib64/libtcmalloc.so.4
#3  0x0000003a800272c5 in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) ()
   from /lib64/libtcmalloc.so.4
#4  0x0000003a80035835 in tc_calloc () from /lib64/libtcmalloc.so.4
#5  0x00000033ab227388 in p11_array_new () from /lib64/libp11-kit.so.0
#6  0x00000033ab2210ab in p11_proxy_after_fork () from /lib64/libp11-kit.so.0
#7  0x00000033ab219ba2 in reinitialize_after_fork () from /lib64/libp11-kit.so.0
#8  0x00000035d8abc987 in fork () from /lib64/libc.so.6

The issue that I'm reliably hitting is not actually that p11_lock is deadlocking (although that would also be possible); it's that reinitialize_after_fork is allocating memory, deadlocking tc_malloc (which has its own magic atfork handler to address this, but it hasn't run yet [1]).

Your patch clearly fixes this, too, since this call stack is clearly impossible with your patch applied. :)

I'm running your patch through a few rounds of my test suite, and it hasn't failed yet.

BTW, there's a minor gotcha in your patch: it's remotely possible that a sequence of forks could result in reusing the same PID.

[1] I think that this is a side-effect of an old glibc bug: https://sourceware.org/bugzilla/show_bug.cgi?id=14379

Comment 5 Andy Lutomirski 2014-10-02 16:46:33 UTC

FWIW, in the comments to your patch, you mention that pthread_atfork is nearly impossible to use cleanly without atomics.  Unless I'm missing something, this is true for the prepare handler, but not for the parent and child handlers -- there are no threads when the parent and child handlers are called.

Comment 6 Stef Walter 2014-10-03 05:52:28 UTC

No it is impossible to use threads + fork() without exec(), and by extension pthread_atfork() cleanly.

I know there's no other threads around, but any lock which happens to be taken at fork time (including those in malloc() and printf()) will cause a deadlock in the child process.

The bug we're fixing here is allowing fork() + exec() to work without deadlocking.

Comment 7 Andy Lutomirski 2014-10-03 06:06:32 UTC

Right.  But fork_count++ in the child handler can't possibly deadlock.  With threads, you can still call exec safely afterwards and, without threads, it'll work.


That being said, your patch is unlikely to cause problems.  At the very least, for the race to hit, a pkcs11-using program will have to fork twice.

Comment 8 Stef Walter 2014-10-03 06:20:22 UTC

(In reply to Andy Lutomirski from comment #7)
> Right.  But fork_count++ in the child handler can't possibly deadlock.  With
> threads, you can still call exec safely afterwards and, without threads,
> it'll work.
> 
> 
> That being said, your patch is unlikely to cause problems.  At the very
> least, for the race to hit, a pkcs11-using program will have to fork twice.

It's a good point. I'll update the patch to use pthread_atfork() with fork_count++ instead of getpid(). getpid() is cached with  glibc but may do a syscall for every invocation in other unixes.

Comment 9 Stef Walter 2014-10-03 07:44:16 UTC

Created attachment 107246 [details] [review]
p11-kit: Use pthread_atfork() in a safe manner

Instead of trying to perform actions in pthread_atfork() which
are not async-signal-safe, just increment a counter so we can
later tell if the process has forked.

Note this does not make it safe to mix threads and forking without
immediately execing. This is a far broader problem that p11-kit,
however we now do the right thing when fork+exec is used from a
thread.

Comment 10 Stef Walter 2014-10-03 07:45:14 UTC

Could you test the latest patch? If it works then I'll merge it into the master and stable branches.

Comment 11 Stef Walter 2014-10-03 18:56:41 UTC

Created attachment 107280 [details] [review]
p11-kit: Use pthread_atfork() in a safe manner

Instead of trying to perform actions in pthread_atfork() which
are not async-signal-safe, just increment a counter so we can
later tell if the process has forked.

Note this does not make it safe to mix threads and forking without
immediately execing. This is a far broader problem that p11-kit,
however we now do the right thing when fork+exec is used from a
thread.

Comment 12 Andy Lutomirski 2014-10-03 20:16:48 UTC

I haven't tested that p11-kit still works, but the deadlock seems to be gone with the latest patch.

Thanks!

Comment 13 Stef Walter 2014-10-04 06:30:42 UTC

Attachment 107280 [details] pushed as 0ecc141 - p11-kit: Use pthread_atfork() in a safe manner

Comment 14 Stef Walter 2014-10-04 06:31:58 UTC

Pushed to master branch and stable branch (with a slight adjustment). Will be part of next releases.

Comment 15 Andy Lutomirski 2014-11-20 19:12:54 UTC

Created attachment 109773 [details]
Test case

Here's a test case for this bug.  Maybe someone else will find it useful.

Comment 16 Stef Walter 2014-11-20 19:24:13 UTC

Thanks. Interesting.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.