Bug 63046 - p11_library_mutex problems on W32
Summary: p11_library_mutex problems on W32
Status: RESOLVED FIXED
Alias: None
Product: p11-glue
Classification: Unclassified
Component: p11-kit (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Windows (All)
: medium major
Assignee: Stef Walter
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 63062
  Show dependency treegraph
 
Reported: 2013-04-02 20:28 UTC by LRN
Modified: 2013-04-03 10:45 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Replaces "/tmp/" with the result of GetTempPath() on W32 (2.75 KB, patch)
2013-04-02 21:13 UTC, LRN
Details | Splinter Review
Uses different path separators depending on the platform (2.18 KB, patch)
2013-04-02 21:36 UTC, LRN
Details | Splinter Review
Fix build on Win32 (703 bytes, patch)
2013-04-03 08:36 UTC, Stef Walter
Details | Splinter Review
Don't use library locks from p11-kit tool (1.36 KB, patch)
2013-04-03 08:39 UTC, Stef Walter
Details | Splinter Review
Separate library init from message code (37.46 KB, patch)
2013-04-03 08:39 UTC, Stef Walter
Details | Splinter Review
Don't use free() on memory allocated by LocalFree() (759 bytes, patch)
2013-04-03 08:43 UTC, Stef Walter
Details | Splinter Review

Description LRN 2013-04-02 20:28:19 UTC
p11-kit.exe crashed on me.
Tried to debug it, discovered that it crashes in p11_lock().
Debugged it further.
Turns out, the value gdb prints for &p11_library_mutex changes
depending on which part of the code it's in. It's one value inside
tool.c and any code called from the .exe file, and another in code
called from DllMain.
Then i've noticed that p11_library_mutex is not even exported, because
it doesn't match -export-symbols-regex. Renamed it to
p11_kit_library_mutex. Now it's exported, but p11-kit.exe doesn't
import it.

At this point i gave up and hacked a function-based interface to
p11_library_mutex. Then set a break point at p11_kit_library_mutex()
function i wrote - and discovered that even with functional interface
it has two versions of that function - one from the .exe, the other -
from the .dll (and, accordingly, breakpoint was set in 2 locations).

Looked up how p11-kit is linked and (oh, the horror!) discovered that
it's linked to libp11-library.a (statically, obviously). The very file
that does all these mutex-related things. And libp11-kit is linked to
it too. Statically. Which is how they get two versions of everything.

So...what to do? And what was your intention, exactly?
Comment 1 LRN 2013-04-02 20:29:13 UTC
-------- Original Message --------
Subject: Re: p11_library_mutex problems on W32
Date: Tue, 02 Apr 2013 21:49:45 +0200
From: Stef Walter
To: LRN
CC: p11-glue@lists.freedesktop.org

Thanks for digging into the issue.

Typically I build mingw builds every so often before major releases,
and fix issues I see. So you may be seeing some win32 related code rot
related to some recent p11-kit refactoring.

On 04/01/2013 08:14 PM, LRN wrote:
> p11-kit.exe crashed on me.

What version? And what platform are you running on? Do the tests (ie:
make check) crash? Can you file a bug?

> Tried to debug it, discovered that it crashes in p11_lock(). 
> Debugged it further.

Hmmm, smells like a mutex is not being initialized on win32. Perhaps
being erroneously used by code that's not in a library.

> Turns out, the value gdb prints for &p11_library_mutex changes 
> depending on which part of the code it's in. It's one value inside
>  tool.c and any code called from the .exe file, and another in code
>  called from DllMain.

The p11-kit library and trust module, each use their own locking and
their own 'big' global mutex, callable through p11_lock() and
p11_unlock().

These are all called the same thing (ie: p11_library_mutex) but are
different instances as they are in separate DLLs. Not meant to be shared.

> Looked up how p11-kit is linked and (oh, the horror!) discovered 
> that it's linked to libp11-library.a (statically, obviously). The 
> very file that does all these mutex-related things. And libp11-kit 
> is linked to it too. Statically. Which is how they get two versions
> of everything.

Yes, the plan is to link common code statically into the various
modules that use it. But it was a regression (due to recent
refactoring) to link the mutex code into p11-kit.

So attached are three patches which I'd be interested if you could
test. 

My guess is that the first patch will fix your problem. The second
patch cleans up the way we build the code, and hopefully makes things
a bit clearer. The third patch fixes another WIN32 build problem.
Comment 2 LRN 2013-04-02 20:40:01 UTC
On 02.04.2013 23:49, Stef Walter wrote:
> On 04/01/2013 08:14 PM, LRN wrote:
>> p11-kit.exe crashed on me.
> 
> What version?
0.17.5, the latest one

> And what platform are you running on?
x86_64 NT 6.1.7601 (that's w32devspeek for "Windows 7 x64")

> Do the tests (ie:
> make check) crash?
AFAIR, i tried to run the testsuite, and that's when crash surfaced.

> 
>> Tried to debug it, discovered that it crashes in p11_lock(). 
>> Debugged it further.
> 
> Hmmm, smells like a mutex is not being initialized on win32. Perhaps
> being erroneously used by code that's not in a library.
Yes, in the end that was it - DllMain() called p11_mutex_init() and initialized its copy of the critical section, while the executable tried to use its own, uninitialized version.

>> Turns out, the value gdb prints for &p11_library_mutex changes 
>> depending on which part of the code it's in. It's one value inside
>>  tool.c and any code called from the .exe file, and another in code
>>  called from DllMain.
> 
> The p11-kit library and trust module, each use their own locking and
> their own 'big' global mutex, callable through p11_lock() and
> p11_unlock().
> 
> These are all called the same thing (ie: p11_library_mutex) but are
> different instances as they are in separate DLLs. Not meant to be shared.
No, in my case one copy belonged to p11-kit.exe (it got that copy from its static libp11-library.a part), and another - to libp11_kit-0.dll (which also got it from libp11-library.a).
p11-kit.exe calls verbose_arg(), which in turn calls p11_kit_be_loud () and p11_message_loud ().
p11_kit_be_loud() is a normal libp11_kit library function, part of the public interface. Inside itself it uses the critical section object of the .dll. That object is initialized, so p11_kit_be_loud() completes without errors.
p11_message_loud() is a libp11-library.a function, and thus .exe and .dll each have their own copy of it. When it does p11_lock(), it tries to lock the .exe version of critical section object - and crashes.

> 
>> Looked up how p11-kit is linked and (oh, the horror!) discovered 
>> that it's linked to libp11-library.a (statically, obviously). The 
>> very file that does all these mutex-related things. And libp11-kit 
>> is linked to it too. Statically. Which is how they get two versions
>> of everything.
> 
> Yes, the plan is to link common code statically into the various
> modules that use it. But it was a regression (due to recent
> refactoring) to link the mutex code into p11-kit.
> 
> So attached are three patches which I'd be interested if you could
> test.
> 
> My guess is that the first patch will fix your problem. The second
> patch cleans up the way we build the code, and hopefully makes things
> a bit clearer. The third patch fixes another WIN32 build problem.

OK, tried these. p11-kit compiled. Tried to run the testsuite right away. Some tests passed, some failed. I won't attach a full log (unless you think it's worth it), because i immediately tried to run p11-kit.exe list-modules -v (the command that i've used before to debug the crash) - and it crashed again, although this time it crashes on deinitialization.

Location of the crash is p11_library_uninit(), line 195: free().
The value of thread_local is 21 (whatever that is), and not equal to TLS_OUT_OF_INDEXES, so data = TlsGetValue (thread_local) gets called, obtains data ((LPVOID) 0x81a628 , hard to tell whether it is valid or not), then tries to free it. Apparently, the data is not valid, and free() crashes.

After checking the place where data is allocated, i think you should use LocalFree() rather than free().

Tried with LocalFree(), p11-kit.exe list-modules  -v doesn't crash now.
Comment 3 LRN 2013-04-02 21:13:23 UTC
Created attachment 77341 [details] [review]
Replaces "/tmp/" with the result of GetTempPath() on W32
Comment 4 LRN 2013-04-02 21:14:32 UTC
Other tests that caught my attention were tools/tests, which appeared to be crashing.

Turns out, it was an assertion failure, in each one of them (by the way, calling abort() on W32 is not a good idea; it's better to call DebugBreak(), then ExitProcess ()).

Apparently, they are all trying to mkdtemp ("/tmp/..."), which obviously fails, since W32 doesn't support POSIX paths.

attachment #77341 [details] [review] should take care of that.
Comment 5 LRN 2013-04-02 21:36:21 UTC
Created attachment 77342 [details] [review]
Uses different path separators depending on the platform

";" on W32, ":" everywhere else.
Fixes remaining problems (with module tests), now all tests pass.
Comment 6 Stef Walter 2013-04-03 08:36:19 UTC
Created attachment 77363 [details] [review]
Fix build on Win32
Comment 7 Stef Walter 2013-04-03 08:39:14 UTC
Created attachment 77364 [details] [review]
Don't use library locks from p11-kit tool
Comment 8 Stef Walter 2013-04-03 08:39:16 UTC
Created attachment 77365 [details] [review]
Separate library init from message code
Comment 9 Stef Walter 2013-04-03 08:43:41 UTC
Created attachment 77366 [details] [review]
Don't use free() on memory allocated by LocalFree()
Comment 10 Stef Walter 2013-04-03 08:44:53 UTC
Attachment 77364 [details] pushed as ae7dd1b - Don't use library locks from p11-kit tool
Attachment 77365 [details] pushed as fcc3a83 - Separate library init from message code
Attachment 77366 [details] pushed as c3f1b0a - Don't use free() on memory allocated by LocalFree()
Comment 11 Stef Walter 2013-04-03 08:46:09 UTC
Attachment 77363 [details] pushed as b7ccd06 - Fix build on Win32
Comment 12 Stef Walter 2013-04-03 10:45:10 UTC
Path handling is a separate bug #63062, so closing this one.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.