Bug 76252

Summary: Dynamic loading/unloading of opengl32.dll results in a deadlock
Product: Mesa Reporter: cgerlach42
Component: Mesa coreAssignee: mesa-dev
Status: RESOLVED FIXED QA Contact:
Severity: major    
Priority: medium CC: brianp, jfonseca
Version: 10.1   
Hardware: x86-64 (AMD64)   
OS: Windows (All)   
Whiteboard:
i915 platform: i915 features:

Description cgerlach42 2014-03-17 07:39:35 UTC
I'm using llvmpipe for software rendering under windows. To switch easily between hardware and software rendering I loaded the opengl32.dll dynamically. Unloading the hardware OpenGL library works fine, but unloading mesa results in a deadlock.

Stacktrace:
opengl32.dll!lp_rast_destroy(lp_rasterizer * rast) Line 948	C
opengl32.dll!llvmpipe_destroy_screen(pipe_screen * _screen) Line 439	C
opengl32.dll!stw_cleanup() Line 180	C
opengl32.dll!DllMain(HINSTANCE__ * hinstDLL, unsigned long fdwReason, void * lpReserved) Line 168	C
opengl32.dll!__DllMainCRTStartup(void * hDllHandle, unsigned long dwReason, void * lpreserved) Line 508	C
ntdll.dll!LdrpUnloadDll()	Unknown
ntdll.dll!LdrUnloadDll()	Unknown
KernelBase.dll!FreeLibrary()	Unknown
QtCore4.dll!QLibraryPrivate::unload_sys() Line 157	C++
QtCore4.dll!QLibraryPrivate::unload() Line 488	C++

Setting LP_NUM_THREADS=0 workarounds this problem.
Comment 1 cgerlach42 2014-03-26 07:19:56 UTC
Some short test program that provokes the deadlock:

#include <QtCore/QLibrary.h>

int main (int, char **)
{
  QLibrary library ("some path to llvmpipe opengl32.dll");

  library.load ();
  printf ("loaded: %d\n", library.isLoaded ());

  library.unload ();
  printf ("loaded: %d\n", library.isLoaded ());

  return 0;
};

The unload call never returns.
Comment 2 Jose Fonseca 2014-03-26 13:45:48 UTC
The stack trace looks sensible.  Something must be preventing the llvmpipe worker threads to shutdown normally.

Could you please provide the stack trace of all threads? (So we can see what are the llvmpipe worker threads doing.)
Comment 3 cgerlach42 2014-03-27 06:28:18 UTC
I set LP_NUM_THREADS=1 to reduce the number of stracktraces and used Mesa 10.1 with llvm 3.4. If you need more information or something I should test, please feel free to ask.

Thread 1:
 	ntdll.dll!NtWaitForSingleObject()	Unknown
 	KernelBase.dll!WaitForSingleObjectEx()	Unknown
 	opengl32.dll!lp_rast_destroy(lp_rasterizer * rast=0x000000000035ee90) Line 948	C
 	opengl32.dll!llvmpipe_destroy_screen(pipe_screen * _screen=0x000007fedbc18c70) Line 439	C
 	opengl32.dll!stw_cleanup() Line 180	C
 	opengl32.dll!DllMain(HINSTANCE__ * hinstDLL=0x000007fe00000003, unsigned long fdwReason=24, void * lpReserved=0x000000000034de00) Line 168	C
 	opengl32.dll!__DllMainCRTStartup(void * hDllHandle=0x000007fedaac0000, unsigned long dwReason=0, void * lpreserved=0x0000000000000000) Line 508	C
 	ntdll.dll!LdrpUnloadDll()	Unknown
 	ntdll.dll!LdrUnloadDll()	Unknown
 	KernelBase.dll!FreeLibrary()	Unknown
 	QtCore4.dll!QLibraryPrivate::unload_sys() Line 157	C++
 	QtCore4.dll!QLibraryPrivate::unload() Line 488	C++
>	TomGRT-test_unload.exe!main(int __formal=0, char * * __formal=0x01cf4983fe91acb2) Line 18	C++
 	TomGRT-test_unload.exe!__tmainCRTStartup() Line 536	C
 	kernel32.dll!BaseThreadInitThunk()	Unknown
 	ntdll.dll!RtlUserThreadStart()	Unknown

Thread 2:
>	ntdll.dll!NtWaitForSingleObject()	Unknown
 	ntdll.dll!RtlpWaitOnCriticalSection()	Unknown
 	ntdll.dll!RtlEnterCriticalSection()	Unknown
 	ntdll.dll!LdrShutdownThread()	Unknown
 	ntdll.dll!RtlExitUserThread()	Unknown
 	msvcr110.dll!_endthreadex(unsigned int retcode=0) Line 408	C
 	msvcr110.dll!_callthreadstartex() Line 354	C
 	msvcr110.dll!_threadstartex(void * ptd=0x000000000035f020) Line 332	C
 	kernel32.dll!BaseThreadInitThunk()	Unknown
 	ntdll.dll!RtlUserThreadStart()	Unknown
Comment 4 Jose Fonseca 2014-03-27 14:57:36 UTC
Thanks.

The problem is DllMain calls and thread creation/destruction are serialized (there are several mentions of this in MSDN and StackOverflow), hence the deadlock.

I'm not sure the best way of solving this though.  The difficulty here is that WGL has no clear "Shutdown OpenGL" entry point, other than DllMain.  I can't see anywhere else we can shutdown.

The alternative to deadlock is leaking resources...


It might be worth to try a different alternative: instead of loading/unloading llvmpipe's opengl32.dll and/or system opengl32.dll, I'd recommend you rename llvmpipe's opengl32.dll it as "llvmpipe.dll" and keep it loaded all the time. And simply switch between the system's entrypoints, vs llvmpipe's entry points.

This still has some difficulties, as GDI32.dll invokes opengl32.dll directly....


I really don't see an easy way out...


One alternative would be to replace in lp_rast_destroy() the following code

   /* Wait for threads to terminate before cleaning up per-thread data */
   for (i = 0; i < rast->num_threads; i++) {
      pipe_thread_wait(rast->threads[i]);
   }

with a signal, i.e., instead of waiting for the threads to finish, simply wait for the threads to signal they are ready to finish, which should happend concurrently with DllMain.


Still, this business of shutdown on DllMain seems unsustainable in the long term. Though I really don't a better alternative.
Comment 5 Patrick Baggett 2014-03-27 15:11:04 UTC
José,

Can you explain the problem more? Thread 2's stack trace seems like it is waiting on ... something, but it isn't clear what or why. Is this the NT subsystem pausing it during thread unload? If so, why does this cause a problem, i.e. why does lp_rast_destroy() hang instead of proceed if there is only one thread that is accessing the lock?

Patrick
Comment 6 Jose Fonseca 2014-03-27 16:52:48 UTC
(In reply to comment #5)
> Can you explain the problem more? Thread 2's stack trace seems like it is
> waiting on ... something, but it isn't clear what or why. 

> Is this the NT
> subsystem pausing it during thread unload? 

I can only guess, but yes, this is some internal DllMain-related global lock.

> If so, why does this cause a
> problem, i.e. why does lp_rast_destroy() hang instead of proceed if there is
> only one thread that is accessing the lock?

I believe this lock is also held by the main thread.  From what I gather the dead lock is as follows:

 - the main thread held a lock before entering DllMain(DLL_PROCESS_DETACH), and is now wait for the llvmpipe worker thread to finish

 - the llvmpipe worker threads are trying to get the lock so it calls DllMain(DLL_THREAD_DETACH) for all DLLs, so it can never finish

See also:

 -  http://support.microsoft.com/default.aspx?scid=kb;EN-US;142243

  - http://msdn.microsoft.com/en-us/library/ms682583%28VS.85%29.aspx

    "Because DLL notifications are serialized, entry-point functions should not attempt to communicate with other threads or processes. Deadlocks may occur as a result."

  - http://stackoverflow.com/questions/2603583/boost-thread-hanging-on-endthreadex 
  - http://stackoverflow.com/questions/353038/endthreadex0-hangs
  - http://stackoverflow.com/questions/10441048/exit-thread-upon-deleting-static-object-during-unload-dll-causes-deadlock
Comment 7 cgerlach42 2014-03-28 09:57:47 UTC
José,

thanks for the feedback. 

Our first try was to call the dll llvmpipe.dll and we ran exactly in the issues you mentioned regarding GDI. Therefore we don't see this as an option for us.

A colleaque had another idea:
We export stw_cleanup and call it before unloading the dll. This seems to work very well and shuts down the threads as expected.

If you don't see any problems with this approach, we could live with this workaround.
Comment 8 Jose Fonseca 2014-04-01 14:24:20 UTC
(In reply to comment #7)
> A colleaque had another idea:
> We export stw_cleanup and call it before unloading the dll. This seems to
> work very well and shuts down the threads as expected.
> 
> If you don't see any problems with this approach, we could live with this
> workaround.

Yes, that should work, but I think that my other solution:

  "One alternative would be to replace in lp_rast_destroy() the following code

     /* Wait for threads to terminate before cleaning up per-thread data */
     for (i = 0; i < rast->num_threads; i++) {
        pipe_thread_wait(rast->threads[i]);
     }

  with a signal, i.e., instead of waiting for the threads to finish, simply wait     for the threads to signal they are ready to finish, which should happend concurrently with DllMain."

Is more general, and would avoid leaking internal implementation details outside.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.