Bug 107092

Summary:	Thread leak when changing context size
Product:	Mesa	Reporter:	Cory Quammen <cory.quammen>
Component:	Drivers/Gallium/llvmpipe	Assignee:	mesa-dev
Status:	RESOLVED MOVED	QA Contact:	mesa-dev
Severity:	normal
Priority:	medium	CC:	chuck.atkins, cory.quammen
Version:	unspecified
Hardware:	Other
OS:	All
Whiteboard:
i915 platform:		i915 features:
Attachments:	apitrace output from the example VTK Python script showing the thread leak

Description Cory Quammen 2018-07-02 21:04:02 UTC

Created attachment 140440 [details]
apitrace output from the example

I have observed a thread leak in Linux when using offscreen rendering in VTK via llvmpipe. A minimal VTK example code is attached, as is the output of running apitrace.

One can observe the thread leak by doing the following:

* down ParaView-5.5.2-Qt5-MPI-Linux-64bit.tar.gz from www.paraview.org/download
* tar xzf ParaView-5.5.2-Qt5-MPI-Linux-64bit.tar.gz
* cd ParaView-5.5.2-Qt5-MPI-Linux-64bit
* Run

  bin/pvbatch --mesa-llvm vtk-mesa-threads-growth.py

In this example, we are repeatedly resizing the render window. Each time the window is resized, the current context is destroyed with glXDestroyContext. A new context is created with the new desired size, but looking at the output of htop, the threads used from the previous render never rejoin the main thread, causing the total number of threads to grow each time the render window size is changed. For long-running animation creation, this situation can lead to a major system slow down.

If this is not a mesa bug and it looks like I am doing something not correct in the trace, please let me know.

Comment 1 Cory Quammen 2018-07-02 21:04:40 UTC

Created attachment 140441 [details]
VTK Python script showing the thread leak

Comment 2 Cory Quammen 2018-07-02 21:06:39 UTC

Please note that the apitrace output is from a single iteration of the loop in vtk-mesa-threads-growth.py

Comment 3 Roland Scheidegger 2018-07-03 02:02:19 UTC

I'm near certain this is a bug in the xlib state tracker rather than llvmpipe. swr will leak as well, you just don't notice because it doesn't create additional threads.
glXChooseFBConfig will call choose_visual(), which will call xmesa_init(), which will create the pipe screen (via xmesa_init_display()). (llvmpipe threads are per screen, not per context.)
(mesa/src/gallium/state_trackers/glx/xlib/ xm_api.c and glx_api.c)

Note that there's 2 places where the screens would get destroyed (one in xmesa_close_display(), reached via XCloseDisplay(), the other is in XMesaDestroyContext(), reached via glXDestroyContext()). Both are commented out and marked as FIXME, since apparently it's not safe to destroy the screen at this point since surfaces can still be around...
This looks apparently broken but it's not obvious to me how to fix it. At first glance it looks like screen destruction should be delayed until there's neither contexts nor surfaces referencing it are around but I guess it isn't that easy...

Comment 4 Cory Quammen 2018-07-03 14:19:59 UTC

Thank you for your quick reply and insight, Roland. Creating and destroying glX contexts is a bit of a peculiarity in VTK that we may be able to do away with - I'm not sure why VTK does that, so maybe we can work around this issue.

With regards to freeing the screen, I'm afraid I'm nearly completely unversed in the Mesa library, so I'm at a loss as to an appropriate fix. Could a first step be to free the screen if all surfaces and contexts referencing it are free? Is it possible to check that?

Comment 5 Roland Scheidegger 2018-07-04 02:28:02 UTC

(In reply to Cory Quammen from comment #4)
> Thank you for your quick reply and insight, Roland. Creating and destroying
> glX contexts is a bit of a peculiarity in VTK that we may be able to do away
> with - I'm not sure why VTK does that, so maybe we can work around this
> issue.
> 
> With regards to freeing the screen, I'm afraid I'm nearly completely
> unversed in the Mesa library, so I'm at a loss as to an appropriate fix.
> Could a first step be to free the screen if all surfaces and contexts
> referencing it are free? Is it possible to check that?
I think the problem is precisely that we don't know which surfaces and contexts reference a screen. Surfaces itself aren't tied to contexts, and nearly all surfaces are created by pipe screen methods and have to be destroyed by the equivalent destroy methods, but this isn't true for the surfaces created by X (and you can and will get things like glXSwapBuffers without any glx context around any longer).
(At some point we actually did free the screens in xmesa_close_display(), but this caused crashes, and the commit trading the crashes back for the leaks was:
ommit feb71117aebc0932a96b548b4c402b010a008b2d
Author: George Kyriazis <george.kyriazis@intel.com>
Date:   Fri Mar 4 12:26:00 2016 -0700

    st/xlib: Don't destroy screen on XCloseDisplay()
    
    screen may still be used by other resources that are not yet freed.
    To correctly fix this there will be a need to account for resources
    differently, but this quick fix is not any worse than the original
    code that leaked screens anyway.
    
    Reviewed-by: Brian Paul <brianp@vmware.com>
)

I am not entirely sure what actually can still be around if the display is closed though, my knowledge of the glx code there isn't all that great - but certainly this is an old issue which should really be addressed.

Comment 6 GitLab Migration User 2019-09-18 18:33:32 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/248.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.