Summary: | dlopen'ing libudev.so.1 from static library initializer corrupts TLS state | ||
---|---|---|---|
Product: | Mesa | Reporter: | Timo R. <timo> |
Component: | Mesa core | Assignee: | mesa-dev |
Status: | RESOLVED MOVED | QA Contact: | mesa-dev |
Severity: | major | ||
Priority: | high | CC: | eero.t.tamminen |
Version: | git | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: | hack fix |
Description
Timo R.
2015-08-15 17:05:16 UTC
The problem seems to originate from here: http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/state_trackers/clover/api/platform.cpp#n29 The platform object is created during static initialization and somewhere in the class hierarchy a dlopen() is triggered. Since _clover_platform only seems to be used in clGetPlatformIDs(), my idea would be to make this object static and put it into the function itself, so it only gets initialization when the function is first called. Created attachment 117708 [details] [review] hack fix Untested hack/fix that is also not thread-safe. (In reply to Tobias Jakobi from comment #2) > Created attachment 117708 [details] [review] [review] > hack fix > > Untested hack/fix that is also not thread-safe. That's unlikely to work, static local variables are no different to globals regarding initialization order, and, yeah, it seems like a hack because pipe_loader_probe() shouldn't be doing anything that could corrupt the TLS state when called at initialization time. It looks like this might be a regression from the series de5c2b6f2b53924bceab6f4b8255d8e9dcad21b4..cc32d25454c382a971e81ae584a4296fdf492e70(which are indeed not part of any released version yet), you may want to bisect which change introduced the problem. > That's unlikely to work, static local variables are no different to globals > regarding initialization order, To my knowledge, static local variables are initialized on the first call to the function, whereas global variables are initialized in the libraries early static initializer, which runs during library load. > and, yeah, it seems like a hack because > pipe_loader_probe() shouldn't be doing anything that could corrupt the TLS > state when called at initialization time. The simple act of calling dlopen on libudev.so.1 from within the early static initializer is enough to corrupt the TLS state, but only if some later library also links against libudev.so.1. So not initializing the structure on library-load, but on first function call might actualy help. > It looks like this might be a regression from the series > de5c2b6f2b53924bceab6f4b8255d8e9dcad21b4.. > cc32d25454c382a971e81ae584a4296fdf492e70(which are indeed not part of any > released version yet), you may want to bisect which change introduced the > problem. Not sure when I'll get to this, but I'll see what i can do. (In reply to Timo R. from comment #4) > > That's unlikely to work, static local variables are no different to globals > > regarding initialization order, > > To my knowledge, static local variables are initialized on the first call to > the function, whereas global variables are initialized in the libraries > early static initializer, which runs during library load. > IIRC static local variables are allowed to be initialized statically under roughly the same set of conditions in which global variables are -- That said because the platform constructor has side-effects it looks like you're right and the platform will necessarily have to be initialized dynamically the first time the function is run. > > and, yeah, it seems like a hack because > > pipe_loader_probe() shouldn't be doing anything that could corrupt the TLS > > state when called at initialization time. > > The simple act of calling dlopen on libudev.so.1 from within the early > static initializer is enough to corrupt the TLS state, but only if some > later library also links against libudev.so.1. > So not initializing the structure on library-load, but on first function > call might actualy help. > The thing is you have no guarantee that the function it's now being initialized from will not itself be called from a static-storage variable initializer, so assuming that the conditions you describe are enough to corrupt the TLS state this will only be hiding the problem. > > It looks like this might be a regression from the series > > de5c2b6f2b53924bceab6f4b8255d8e9dcad21b4.. > > cc32d25454c382a971e81ae584a4296fdf492e70(which are indeed not part of any > > released version yet), you may want to bisect which change introduced the > > problem. > > Not sure when I'll get to this, but I'll see what i can do. Hello This remind me this. Something similar have happen to ocl-icd, see https://bugzilla.redhat.com/show_bug.cgi?id=1219646 latrace tool could tell something useful: http://people.redhat.com/jolsa/latrace/ Did anyone find the time to bisect ? I won't mind reverting any of my commits but I'd like to know which one as I cannot really test this here. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/990. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.