Created attachment 131599 [details]
Sample code for reproducing the bug
We found out this issue in our GPU driver tests. It happens when many threads are created and every thread creates wl_display object and calls eglGetDisplay(). Then, some threads are getting EGL_NO_DISPLAY.
I implemented a sample code for reproducing the bug (main.c). The issue occurs very often when I run the binary with 300 threads. I tried it in mesa 11.2 but I saw the related code for upstream is the same.
I debugged mesa code and I found out that a thread gets EGL_NO_DISPLAY only when _eglGetNativePlatform() returns _EGL_INVALID_PLATFORM.
native_platform variable is defined as static in _eglGetNativePlatform(). Therefore, different threads are sharing the same variable. When I remove the static keyword and recompile mesa binaries, the issue never occurs.
In my platform, EGL_PLATFORM and EGL_DISPLAY env variables are never defined. Therefore, _eglGetNativePlatformFromEnv() function always returns _EGL_INVALID_PLATFORM.
I think issue occurs when a thread is in the end of _eglGetNativePlatform() just before returning found native_platform, and the other thread overwrites this static variable in _eglNativePlatformFromEnv() function.
Most likely issue is introduced by this commit: 7adb9b094894a512c019b3378eb9e3c69d830edc
Thanks for the report Emre.
Should be fixed in master as of
Author: Eric Engestrom <email@example.com>
Date: Thu Jun 15 23:53:55 2017 +0100
egl/display: make platform detection thread-safe
Please give it a try and reopen if the issue persists.
Meanwhile we'll pick those for the stable release (might be 17.1.4, since the 17.1.3 queue is already out).