There is a drm problem having processes with multiple drm file descriptors open,
since a deadlock may occur when DRM tries to grab the lock for the kernel context.
This is, AFAICT only done for drm drivers implementing reclaim_buffers_locked, but
in general the kernel needs to grab the lock for the kernel context when it needs to
wait for engine idle for some reason.
In the X server case, we have the following situation:
1) The 2D driver opens a drm file descriptor.
2) The 2D driver obtains the hardware lock.
3) AIGLX opens a drm file descriptor.
4) AIGLX closes the file descriptor.
Point 4 triggers a reclaim_buffers_locked() call, and the kernel tries to grab
the lock. It then
senses that the lock is not associated with the closing file descriptor and
waits for the lock to be
released. This will never happen since the X server is waiting for 4) to
complete, and a deadlock
To solve this, we could either try to make DRM smarter by tracking the thread
holding the lock, or we can restrict DRM open file descriptors to one per
thread. Currently any 2D driver using a DRM driver implementing
reclaim_buffers_locked() will deadlock when AIGLX is enabled.
The DRM locking mechanism is currently assuming that for a given context, the hw
lock should only be obtained using a single file descriptor. Otherwise, DRM will
lose track of the file descriptor used to obtain the lock.
I've been thinking a bit about this:
reclaim_buffers_locked() probably only needs the lock to be able to wait for engine idle without waiting forever. (The discussion is based on this).
We need a similar mechanism for fence object implementations on simple hardware that only wait for engine idle and after that declares all active fence objects as signaled.
Wit this in mind, we could probably attach a kernel reference counter to the hardware lock that, if non-zero, indicates that the kernel wants the lock if it is released. The lock itself is marked contended if held by somebody else.
Now, when the lock is released it is grabbed by the kernel and released when the refcount goes to zero.
The following situations can occur:
1) Engine is idle without the kernel ever grabbing the lock. All is fine, fences can be expired and buffers can be reclaimed, and the refcount will be decreased. This is the situaton when the current process holds the lock, and since we've been waiting for idle, no new command submissions can occur anyway.
2) The kernel succeeds in grabbing the lock. All is well.
This would get rid of some ugly code and save us from restricting the number of open file descriptors.
Isn't it just a bug that AIGLX opens/closes DRM file descriptors while holding the lock?
That said, the patch you posted to the dri-devel list probably has other merits though.
Yes, but there's more to the problem. It will occur with any app that has more than one file descriptor open to drm. If it grabs the lock and then dies, the file descriptors will be closed in an arbitrary order, and possibly a file descriptor not holding the lock will be closed first and a deadlock will occur.
So while AIGLX could be fixed, the problem will probably pop up in the future.
Do you still experience this issue with newer soft ?
Please check the status of your issue.