Bug 81139 - Rendering sometimes halts in waiting for back buffers with dri3 & xwayland
Summary: Rendering sometimes halts in waiting for back buffers with dri3 & xwayland
Status: RESOLVED NOTOURBUG
Alias: None
Product: Mesa
Classification: Unclassified
Component: GLX (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: mesa-dev
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-07-10 02:32 UTC by Boyan Ding
Modified: 2014-07-16 02:35 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
A simple program that can effectively trigger the halt on my machine (2.56 KB, text/plain)
2014-07-10 03:41 UTC, Boyan Ding
Details

Description Boyan Ding 2014-07-10 02:32:53 UTC
Xwayland now uses dri3 & glamor as means of acceleration, but rendering sometimes halts randomly in waiting for back buffers(See Bug 80963 in wayland). glxgears (and probably programs rendering in single buffer) is affected.

Stacktrace when halted is listed below:
at  glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

#0  0x00007ffff70bd800 in __poll_nocancel () from /usr/lib/libc.so.6
#1  0x00007ffff55fc992 in ?? () from /usr/lib/libxcb.so.1
#2  0x00007ffff55fddc9 in xcb_wait_for_special_event ()
   from /usr/lib/libxcb.so.1
#3  0x00007ffff7bbc495 in dri3_find_back (c=0x605790, priv=0xa346e0)
    at dri3_glx.c:1103
#4  0x00007ffff7bbc51b in dri3_get_buffer (driDrawable=0x6a48e0, format=4107, 
    buffer_type=dri3_buffer_back, loaderPrivate=0xa346e0) at dri3_glx.c:1127
#5  0x00007ffff7bbc92e in dri3_get_buffers (driDrawable=0x6a48e0, format=4107, 
    stamp=0x6a4910, loaderPrivate=0xa346e0, buffer_mask=1, 
    buffers=0x7fffffffe640) at dri3_glx.c:1274
#6  0x00007ffff3563c10 in intel_update_image_buffers (brw=0x7ffff7fd4040, 
    drawable=0x6a48e0) at brw_context.c:1395
#7  0x00007ffff3563436 in intel_update_renderbuffers (context=0x6a4020, 
    drawable=0x6a48e0) at brw_context.c:1087
#8  0x00007ffff35634cd in intel_prepare_render (brw=0x7ffff7fd4040)
    at brw_context.c:1108
#9  0x00007ffff3556969 in brw_clear (ctx=0x7ffff7fd4040, mask=18)
    at brw_clear.c:234
#10 0x00007ffff31b875e in _mesa_Clear (mask=16640) at main/clear.c:226
#11 0x00000000004027fa in draw () at gears.c:183
#12 0x00007ffff76b1ac4 in ?? () from /usr/lib/libglut.so.3
#13 0x00007ffff76b5329 in fgEnumWindows () from /usr/lib/libglut.so.3
#14 0x00007ffff76b207d in glutMainLoopEvent () from /usr/lib/libglut.so.3
#15 0x00007ffff76b28e5 in glutMainLoop () from /usr/lib/libglut.so.3
#16 0x00000000004031af in main (argc=1, argv=0x7fffffffeb78) at gears.c:405
Comment 1 Axel Davy 2014-07-10 03:05:07 UTC
Could you tell which distribution you are using ?

My guess it that it is a libxcb bug.
Debian packages have the fix (which is http://cgit.freedesktop.org/xcb/libxcb/commit/?id=3b72a2c9d1d656c74c691a45689e1d637f669e3a)
Comment 2 Boyan Ding 2014-07-10 03:18:34 UTC
(In reply to comment #1)
> Could you tell which distribution you are using ?
I'm using Arch Linux, in which almost everything is vanilla. 

> My guess it that it is a libxcb bug.
> Debian packages have the fix (which is
> http://cgit.freedesktop.org/xcb/libxcb/commit/
> ?id=3b72a2c9d1d656c74c691a45689e1d637f669e3a)
Thanks, I'll give it a try.
Comment 3 Boyan Ding 2014-07-10 03:41:54 UTC
Created attachment 102515 [details]
A simple program that can effectively trigger the halt on my machine

The problem persists with that patch.

This is a simple program which I accidentally find can trigger the halt effectively on my machine, and is the same after applying that patch.
Comment 4 Boyan Ding 2014-07-11 01:43:22 UTC
(In reply to comment #1)
> My guess it that it is a libxcb bug.
> Debian packages have the fix (which is
> http://cgit.freedesktop.org/xcb/libxcb/commit/
> ?id=3b72a2c9d1d656c74c691a45689e1d637f669e3a)

I tried git version of libxcb and xcb-proto today and the problem still remains.
Comment 5 Boyan Ding 2014-07-11 03:12:25 UTC
I think it's not about xcb bug.
I turned on DebugPresent output and that's what I get when running glxgears:
q 1 0x2bf5a30   351258: 00400006 -> 00400002 (crtc (nil))
	e 1 ust 5854416007 msc 351258
	c 0x2bf5a30   351258: 00400006 -> 00400002
	i 00400006
	d 1 0x2bf5a30   351258: 00400006 -> 00400002
q 2 0x35e42a0 111514530873345: 00400006 -> 00400002 (crtc (nil))
	e 2 ust 5854416646 msc 351258
	c 0x35e42a0   351258: 00400006 -> 00400002
	i 00400006
	d 2 0x35e42a0 111514530873345: 00400006 -> 00400002
q 3 0x35e42a0 111514530873346: 00400006 -> 00400002 (crtc (nil))
	e 3 ust 5854417065 msc 351258
	c 0x35e42a0   351258: 00400006 -> 00400002
	i 00400006
	d 3 0x35e42a0 111514530873346: 00400006 -> 00400002
q 4 0x35e42a0 463856467970: 00400006 -> 00400002 (crtc (nil))
(both the output and the animation stop here)

That means the event waited for is even not sent.
Comment 6 Axel Davy 2014-07-11 03:27:16 UTC
I tested the program and confirmed I have the same problem with mesa git.

However on this branch https://github.com/axeldavy/mesa/tree/submit4,
I don't get the bug. This is quite strange.

Could test it and say if you have the bug with it too ?
Comment 7 Axel Davy 2014-07-11 03:45:49 UTC
Hum... strangely after testing again, it doesn't work too now with this branch.

Perhaps the bug is from Xserver side.
Comment 8 Boyan Ding 2014-07-11 04:00:35 UTC
(In reply to comment #7)
> Perhaps the bug is from Xserver side.

I guess it may be the case. But where is the problem in xserver -- xwayland or glamor or else? If it is not in xwayland, I guess there is possibly a way to reproduce that out of it. But I'm not an expert in that.
Comment 9 Axel Davy 2014-07-11 04:06:15 UTC
For Xwayland, Present uses the Present fallback implementation.

If never present_fake_queue_vblank is called with a msc too low (already passed), then perhaps it's going to wait forever. My guess is that for an unknown reason, this function is called with a past msc.
Comment 10 Boyan Ding 2014-07-11 09:37:56 UTC
(In reply to comment #9)
> If never present_fake_queue_vblank is called with a msc too low (already
> passed), then perhaps it's going to wait forever. My guess is that for an
> unknown reason, this function is called with a past msc.

I can observe the target_msc changes drastically when running glxgears, when traced at present_pixmap. In the apps that works okay, target_msc only increases by one.
Comment 11 Boyan Ding 2014-07-11 14:11:15 UTC
I found the following things:
1. When things are right the window_msc argument of present_pixmap is always a small number (often 1 or sometimes 2), but when things starts to go wrong, it can be very big.

2. When things go wrong (window_msc is very big), present_pixmap is directly originated in dri3_swap_buffers in dri3_glx.c in mesa, which is called by glXSwapBuffers like the following:
   (*pdraw->psc->driScreen->swapBuffers)(pdraw, 0, 0, 0, flush)
                                                ^
3. target_msc (will be window_msc in present_pixmap) in dri3_swap_buffers is originally 0 (note the mark on the previous line). So it is re-calculated according to the following expression:
    target_msc = priv->msc + priv->swap_interval * (priv->send_sbc - priv->recv_sbc);
and priv->msc is guilty of the big value (Seems that it should be 0 or 1 normally).

How can priv->msc change?
Comment 12 Axel Davy 2014-07-11 23:00:05 UTC
Have you rebuild mesa and xwayland after installing libxcb git ?

I think I had accidentally scratched my libxcb installation by the arch package, but installing the libxcb git doesn't seem to be sufficient : I rebuilt Mesa and Xwayland, and can't reproduce the bug anymore.
Comment 13 Boyan Ding 2014-07-12 01:02:44 UTC
(In reply to comment #12)
> Have you rebuild mesa and xwayland after installing libxcb git ?
Oh, I didn't.

Things now works like a charm after the rebuild. The ABI has changed so rebuilding is necessary.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.