p11-kit.exe crashed on me. Tried to debug it, discovered that it crashes in p11_lock(). Debugged it further. Turns out, the value gdb prints for &p11_library_mutex changes depending on which part of the code it's in. It's one value inside tool.c and any code called from the .exe file, and another in code called from DllMain. Then i've noticed that p11_library_mutex is not even exported, because it doesn't match -export-symbols-regex. Renamed it to p11_kit_library_mutex. Now it's exported, but p11-kit.exe doesn't import it. At this point i gave up and hacked a function-based interface to p11_library_mutex. Then set a break point at p11_kit_library_mutex() function i wrote - and discovered that even with functional interface it has two versions of that function - one from the .exe, the other - from the .dll (and, accordingly, breakpoint was set in 2 locations). Looked up how p11-kit is linked and (oh, the horror!) discovered that it's linked to libp11-library.a (statically, obviously). The very file that does all these mutex-related things. And libp11-kit is linked to it too. Statically. Which is how they get two versions of everything. So...what to do? And what was your intention, exactly?
-------- Original Message -------- Subject: Re: p11_library_mutex problems on W32 Date: Tue, 02 Apr 2013 21:49:45 +0200 From: Stef Walter To: LRN CC: p11-glue@lists.freedesktop.org Thanks for digging into the issue. Typically I build mingw builds every so often before major releases, and fix issues I see. So you may be seeing some win32 related code rot related to some recent p11-kit refactoring. On 04/01/2013 08:14 PM, LRN wrote: > p11-kit.exe crashed on me. What version? And what platform are you running on? Do the tests (ie: make check) crash? Can you file a bug? > Tried to debug it, discovered that it crashes in p11_lock(). > Debugged it further. Hmmm, smells like a mutex is not being initialized on win32. Perhaps being erroneously used by code that's not in a library. > Turns out, the value gdb prints for &p11_library_mutex changes > depending on which part of the code it's in. It's one value inside > tool.c and any code called from the .exe file, and another in code > called from DllMain. The p11-kit library and trust module, each use their own locking and their own 'big' global mutex, callable through p11_lock() and p11_unlock(). These are all called the same thing (ie: p11_library_mutex) but are different instances as they are in separate DLLs. Not meant to be shared. > Looked up how p11-kit is linked and (oh, the horror!) discovered > that it's linked to libp11-library.a (statically, obviously). The > very file that does all these mutex-related things. And libp11-kit > is linked to it too. Statically. Which is how they get two versions > of everything. Yes, the plan is to link common code statically into the various modules that use it. But it was a regression (due to recent refactoring) to link the mutex code into p11-kit. So attached are three patches which I'd be interested if you could test. My guess is that the first patch will fix your problem. The second patch cleans up the way we build the code, and hopefully makes things a bit clearer. The third patch fixes another WIN32 build problem.
On 02.04.2013 23:49, Stef Walter wrote: > On 04/01/2013 08:14 PM, LRN wrote: >> p11-kit.exe crashed on me. > > What version? 0.17.5, the latest one > And what platform are you running on? x86_64 NT 6.1.7601 (that's w32devspeek for "Windows 7 x64") > Do the tests (ie: > make check) crash? AFAIR, i tried to run the testsuite, and that's when crash surfaced. > >> Tried to debug it, discovered that it crashes in p11_lock(). >> Debugged it further. > > Hmmm, smells like a mutex is not being initialized on win32. Perhaps > being erroneously used by code that's not in a library. Yes, in the end that was it - DllMain() called p11_mutex_init() and initialized its copy of the critical section, while the executable tried to use its own, uninitialized version. >> Turns out, the value gdb prints for &p11_library_mutex changes >> depending on which part of the code it's in. It's one value inside >> tool.c and any code called from the .exe file, and another in code >> called from DllMain. > > The p11-kit library and trust module, each use their own locking and > their own 'big' global mutex, callable through p11_lock() and > p11_unlock(). > > These are all called the same thing (ie: p11_library_mutex) but are > different instances as they are in separate DLLs. Not meant to be shared. No, in my case one copy belonged to p11-kit.exe (it got that copy from its static libp11-library.a part), and another - to libp11_kit-0.dll (which also got it from libp11-library.a). p11-kit.exe calls verbose_arg(), which in turn calls p11_kit_be_loud () and p11_message_loud (). p11_kit_be_loud() is a normal libp11_kit library function, part of the public interface. Inside itself it uses the critical section object of the .dll. That object is initialized, so p11_kit_be_loud() completes without errors. p11_message_loud() is a libp11-library.a function, and thus .exe and .dll each have their own copy of it. When it does p11_lock(), it tries to lock the .exe version of critical section object - and crashes. > >> Looked up how p11-kit is linked and (oh, the horror!) discovered >> that it's linked to libp11-library.a (statically, obviously). The >> very file that does all these mutex-related things. And libp11-kit >> is linked to it too. Statically. Which is how they get two versions >> of everything. > > Yes, the plan is to link common code statically into the various > modules that use it. But it was a regression (due to recent > refactoring) to link the mutex code into p11-kit. > > So attached are three patches which I'd be interested if you could > test. > > My guess is that the first patch will fix your problem. The second > patch cleans up the way we build the code, and hopefully makes things > a bit clearer. The third patch fixes another WIN32 build problem. OK, tried these. p11-kit compiled. Tried to run the testsuite right away. Some tests passed, some failed. I won't attach a full log (unless you think it's worth it), because i immediately tried to run p11-kit.exe list-modules -v (the command that i've used before to debug the crash) - and it crashed again, although this time it crashes on deinitialization. Location of the crash is p11_library_uninit(), line 195: free(). The value of thread_local is 21 (whatever that is), and not equal to TLS_OUT_OF_INDEXES, so data = TlsGetValue (thread_local) gets called, obtains data ((LPVOID) 0x81a628 , hard to tell whether it is valid or not), then tries to free it. Apparently, the data is not valid, and free() crashes. After checking the place where data is allocated, i think you should use LocalFree() rather than free(). Tried with LocalFree(), p11-kit.exe list-modules -v doesn't crash now.
Created attachment 77341 [details] [review] Replaces "/tmp/" with the result of GetTempPath() on W32
Other tests that caught my attention were tools/tests, which appeared to be crashing. Turns out, it was an assertion failure, in each one of them (by the way, calling abort() on W32 is not a good idea; it's better to call DebugBreak(), then ExitProcess ()). Apparently, they are all trying to mkdtemp ("/tmp/..."), which obviously fails, since W32 doesn't support POSIX paths. attachment #77341 [details] [review] should take care of that.
Created attachment 77342 [details] [review] Uses different path separators depending on the platform ";" on W32, ":" everywhere else. Fixes remaining problems (with module tests), now all tests pass.
Created attachment 77363 [details] [review] Fix build on Win32
Created attachment 77364 [details] [review] Don't use library locks from p11-kit tool
Created attachment 77365 [details] [review] Separate library init from message code
Created attachment 77366 [details] [review] Don't use free() on memory allocated by LocalFree()
Attachment 77364 [details] pushed as ae7dd1b - Don't use library locks from p11-kit tool Attachment 77365 [details] pushed as fcc3a83 - Separate library init from message code Attachment 77366 [details] pushed as c3f1b0a - Don't use free() on memory allocated by LocalFree()
Attachment 77363 [details] pushed as b7ccd06 - Fix build on Win32
Path handling is a separate bug #63062, so closing this one.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.