Bug 109102 - At dual monitor intel_do_flush_locked failed: Resource deadlock avoided
Summary: At dual monitor intel_do_flush_locked failed: Resource deadlock avoided
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i915 (show other bugs)
Version: 18.2
Hardware: x86 (IA32) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-12-19 14:29 UTC by Gert vd Kraats
Modified: 2019-09-18 19:41 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
stacktrace (17.49 KB, text/plain)
2018-12-19 14:29 UTC, Gert vd Kraats
Details
ubuntu-cogl-patch (3.07 KB, text/plain)
2018-12-19 14:31 UTC, Gert vd Kraats
Details
libdrm-patch (461 bytes, patch)
2019-01-26 10:52 UTC, Gert vd Kraats
Details | Splinter Review
fix intel_blit.c (4.98 KB, patch)
2019-02-19 23:43 UTC, Gert vd Kraats
Details | Splinter Review
libdrm_ignore_deadlock (773 bytes, patch)
2019-02-19 23:46 UTC, Gert vd Kraats
Details | Splinter Review

Description Gert vd Kraats 2018-12-19 14:29:06 UTC
Created attachment 142855 [details]
stacktrace

This is a copy of ubuntu bug 1797882. It should be directly submitted to mesa.

Using ubuntu 18.10 wayland with dual monitors, configured above each other.
Dock is configured at both displays, not hiding.
Icons for "terminal" and "libreofffice Writer" are present at the dock.

Start terminal on primary screen by mouseclick on dock.
Start libreoffice Writer on primary screen by mouseclick on dock.
Terminate libreoffice Witer by mouseclick on X.
Repeat the starting and stopping of Writer.

Login-screen will appear. Syslog shows:
Oct 15 00:09:51 Gert2 org.gnome.Shell.desktop[5892]: intel_do_flush_locked failed: Resource deadlock avoided
Oct 15 00:09:51 Gert2 gnome-terminal-[6439]: Error reading events from display: Connection reset by peer
Oct 15 00:09:51 Gert2 systemd[5755]: gnome-terminal-server.service: Main process exited, code=exited, status=1/FAILURE

This problem doesnot occur in a dual monitor-session without wayland.
It also doesnot occur, if only one monitor used with wayland
It also doesnot occur, if the dock is only present at the primary screen.
It also doesnot occur if second started application is present at the dock and doesnot add an icon to the dock (as libreoffice does).

All other dual monitor/dock configurations seem to have this problem.

No idea if this is helpful info, but the used graphics card does not support OpenGL version 2.1:
Oct 15 00:02:32 Gert2 org.gnome.Shell.desktop[4426]: Require OpenGL version 2.1 or later.
Oct 15 00:02:32 Gert2 org.gnome.Shell.desktop[4426]: Failed to initialize glamor
Oct 15 00:02:32 Gert2 org.gnome.Shell.desktop[4426]: Failed to initialize glamor, falling back to sw

glxinfo:
OpenGL vendor string: VMware, Inc.
OpenGL renderer string: llvmpipe (LLVM 7.0, 128 bits)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 18.2.2
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.1 Mesa 18.2.2
OpenGL shading language version string: 1.40
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 18.2.2
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00
OpenGL ES profile extensions:

Extra info:

Without solving this problem, wayland cannot be used after logon, when using dual monitor. The sesssion stops very easy and often. Work is lost; you have to logon again. E.g. the simple switching between 2 overlapping windows causes the end of the session.

I changed src/mesa/drivers/dri/i915/intel_batchbuffer.c to force a coredump in this case. The stacktrace is added to this bug-report.
Linenumbers might deviate a little bit because of extra coding of tracing.
The deadlock always occurs at cogl_onscreen_swap_buffers calling cogl_flush(), at the first journal-batch-flush for the offscreen-framebuffer.

The deadlock disappeared as soon as cogl is compiled with disabled batching!
To minimize this disabling of batching, I made a very dirty but working patch, which is attached to this bug.
At program clutter_stage_cogl_redraw_view routine paint_stage is called for the "Unclipped stage paint".
This call is manipulated to flush immedately the first journal-entry of the default onscreen framebuffer.
This is done by misusing and changing program cogl_framebuffer_set_viewport and by changing _cogl_journal_flush.
Apparenly this early flushing causes a lock to be set which avoids the deadlock.

I do not know how to see which locks are held and by which process, so I cannot solve the root-cause of the problem, but I assume some extra lock must be set in this case to avoid deadlock.

With this dirty patch combined with other suggested (simple) patches at #1790525, #1795774 and #1795760 wayland can be run without any problems on dual monitor and "old" intel graphic card.
It performs better than lightdm, specially if the monitors are positioned aside of each other.
Comment 1 Gert vd Kraats 2018-12-19 14:31:37 UTC
Created attachment 142856 [details]
ubuntu-cogl-patch
Comment 2 Chris Wilson 2018-12-19 14:35:07 UTC
-EDEADLOCK = userspace tried to use more fence registers than the HW supports.
Comment 3 Gert vd Kraats 2019-01-26 10:50:53 UTC
I am using intel i915 gen 3 with 16 fence registers.
When displaying the variable total_fences
at function drm_intel_gem_check_aperture_space at libdrm-2.4.95/intel/intel_bufmgr_gem.c,
it is seen that the deadlock-abort is occurring at total_fences == 16,
where the previous display shows total_fences == 14,
so there is an increment of 2 fences in stead of the usual increment of 1.
Apparently the deadlock occurs at the "freeing fences phase", that is started if total_fences is too high.
I decided to lower the available_fences from 14 to 13 (see proposed patch).
This solved the problem.
After this patch I still saw sometimes total_fences == 16,
where the previous display shows total_fences == 13, so there is an increment of 3 fences. This did not cause a deadlock at the "freeing fences phase".
Comment 4 Gert vd Kraats 2019-01-26 10:52:47 UTC
Created attachment 143231 [details] [review]
libdrm-patch
Comment 5 Gert vd Kraats 2019-02-19 23:39:44 UTC
Some more investigation, understanding and adding 2 other possible fixes for the problem.
The problem occurs at Ubuntu 18.10 only at gdm3 with wayland using dual monitor.
It is not occuring with wayland at single monitor.
It totally doesnot occur if gdm3 is not using wayland.
At ubuntu 18.04 the same problem exists with wayland, but is not occurring so often.

At ubuntu 18.10 with wayland and dual monitor the user session immediately aborts, if 16 or more favorites are allocated at the dock.
For some reason wayland seems to use one extra fence register at dual monitor.
It is noticed at dual monitor, that gdm3 with wayland uses 2 calls to clutter_stage_cogl_redraw_view for 2 logical screens for every call to clutter_stage_cogl_redraw; gdm3 without wayland does only one call to clutter_stage_cogl_redraw_view for 1 logical screen.

The crash occurs at intel_batchbuffer_flush. Always the last batch at such a flush is coming from function emit_copy_blit and intelClearWithBlit at src/mesa/drivers/dri/i915/intel_blit.c. At the call to dri_bufmgr_check_aperture_space they indicate to use zero fence registers, but in fact they use one. This violates the limit of 14, causing a crash at dual monitor.
So the call to dri_bufmgr_check_aperture_space must be postponed, until the needed fence register is added in the middle of the batch-generation and then the batch-actions must be undone, a flush is called and batch is regenerated again.
For this function intel_batchbuffer_emit_reset is added.
This also is done for function intel_miptree_set_alpha_to_one, although I never saw a call of this function.
See fix error3_patch2.txt.

Another (dirty) solution is to decrement the availablity and just continue.
This gives somewhere a failure, but I could not see a failing layout.
See fix error3_patch3.txt.
Comment 6 Gert vd Kraats 2019-02-19 23:43:09 UTC
Created attachment 143414 [details] [review]
fix intel_blit.c

fix error_patch2
Comment 7 Gert vd Kraats 2019-02-19 23:46:01 UTC
Created attachment 143415 [details] [review]
libdrm_ignore_deadlock
Comment 8 Gert vd Kraats 2019-07-28 22:43:01 UTC
I am currently using Debian 10 Buster with Wayland.
This problem is not existing anymore at this release.
Wayland no longer uses an extra fence register if dual monitor is used.

The wrong reservation of fence registers at intel_blit.c still exists, but does not harm, because the limit of 14 usable fence registers is very safe.
A limit of 15 might be possible, if reservation at intel_blit.c is unbugged.
Comment 9 GitLab Migration User 2019-09-18 19:41:27 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/790.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.