Bug 94108

Summary: hang waiting for a reply message (xcb_wait_for_reply)
Product: xorg Reporter: Bp_at_wind <bertrand.piolin>
Component: Server/GeneralAssignee: Xorg Project Team <xorg-team>
Status: RESOLVED MOVED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: arun.preetham, chris, jaffer.intel, psychon, zoran.stojsavljevic
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xorg.log
none
bt of all app running threads none

Description Bp_at_wind 2016-02-12 09:07:46 UTC
Created attachment 121702 [details]
Xorg.log

the issue is about the graphic stack hanging waiting for a reply message (xcb_wait_for_reply).
As a result the front panel UI (based on Qt5) freezes.

At this stage we cannot pinpoint the root cause, we tried to analyse buffers, kernel logs, DRI, libdrm but could not find anything obvious. 
The GPU seems to be working fine after the issue occurs.
We also tried to upgrade parts of the graphics stack from Mesalib-9.1.3 to MesaLib-10.4.4.tar.bz2, libdrm-2.4.40 to libdrm-2.4.58, xf86-video-intel-2.9.902 to xf86-video-intel-2.99.917 and the issue still occurs.
Comment 1 Bp_at_wind 2016-02-12 09:10:33 UTC
the backtrace starts like this:
/home/views/qt/lib/liblfpCore.so.1(_Z13dumpBacktraceP10sigcontexti+0x30)[0x7faec6c0da10]
[0x623c00]
/lib64/libpthread.so.0(pthread_cond_wait+0xbd)[0x31e200c02d]
/usr/lib64/libxcb.so.1[0x363ba0a749]
/usr/lib64/libxcb.so.1[0x363ba0b9ff]
/usr/lib64/libxcb.so.1(xcb_wait_for_reply+0x62)[0x363ba0bb02]
/usr/lib64/libX11.so.6(_XReply+0x107)[0x31e1040c57]
/usr/lib64/libGL.so.1(+0x44f53)[0x7faec30d5f53]
/usr/lib64/libGL.so.1(+0x4270f)[0x7faec30d370f]
/usr/lib64/dri/i965_dri.so(+0x324510)[0x7faeaf293510]
/usr/lib64/dri/i965_dri.so(+0x3248cd)[0x7faeaf2938cd]
/usr/lib64/dri/i965_dri.so(+0x318ff7)[0x7faeaf287ff7
Comment 2 Uli Schlachter 2016-02-12 17:00:22 UTC
Do you also have backtraces for other threads? If libxcb waits on a pthread_cond_wait(), some other thread should be waiting in poll().

You didn't mention your libxcb version. Could you also try libxcb from git? (I'm asking because of https://cgit.freedesktop.org/xcb/libxcb/commit/?id=5b40681c887192307f3ae147d2158870aa79c05f and https://cgit.freedesktop.org/xcb/libxcb/commit/?id=f85661c3bca97faa72431df92a3867be39a74e23 which both fix hangs involving xcb_wait_for_special_event())
Comment 3 Bp_at_wind 2016-02-15 13:11:47 UTC
I am working on backtraces for the other threads.
In the meantime we could try the fixes you mentioned and it does not help.
The first commit was already integrated and with the second the issue still occurs.
Comment 4 Zoran Stojsavljevic 2016-03-08 07:12:01 UTC
Hello Bertrand (Piolin). I miss some information, so I can understand this problem much better.

I need to understand which HW you are using, and what is the closest INTEL RVP/CRB, so I can also try to repeat/reproduce this error/bug?

Also, I need type of the panel you are using, and which GFX interface (HDMI, DP, eDP, DVI, VGA)?!

I also need your kernel version (I remember through fog that for WRL6 it was 3.10, not sure if 3.10.88 or similar number)? If you do not want to change kernel, you should adopt kernel.org kernel: longterm: 3.10.99 (2016-03-03 release date).
_______

For your info, I did the following:

HW used: my D-1500 (former BDW-DE ES2 V1 HW) – Camelback Mountain CRB, CPUID 0x50662
KERNEL used: Linux localhost.localdomain 4.4.3-300.fc23.x86_64 #1 SMP Fri Feb 26 18:45:40 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Flat Panel used: HP 19" (old around 7 years), VGA I/F from CM CRB platform.

My data in red (please, do note that Fedora 23 kernel is already @ 4.4.3 numbers – I’ll very soon switch to F24, after May 2016, Beta release):
    Mesa - 11.0.4 (11.1.0)
    Libdrm - 2.4.65 (2.4.65)
    Libva -1.6.2 (1.6.2)
    vaapi intel-driver - 1.6.2 (I have gstreamer-vaapi 1.6.3 for the most packages)
    Cairo -1.14.4 (1.14.2)
    Xorg Xserver - 1.18.0 (1.18.0)
    Intel-gpu-tools - 1.13 (2.99.217)

Just to mention, no any of xcb_wait_for_reply message found in dmesg logs, for several (at least 3) retrys.

Thank you,
Zoran
Comment 5 Bp_at_wind 2016-03-08 07:26:55 UTC
Hello Zoran
In order to reproduce the issue you must use customer hardware and UI application.
With this setup we have a test script that triggers the issue.

I can provide that setup for you, just let me know where to ship it?

Best regards
Bertrand
Comment 6 Bp_at_wind 2016-03-09 11:42:02 UTC
Hi Zoran
the issue could be reproduced with wrl8 so here is some details of that setup:
> I need to understand which HW you are using, and what is the closest INTEL RVP/CRB, so I can also try to repeat/reproduce this error/bug?
hardware is based on Flathead creek CRB with an i7-4770TE CPU @ 2.30GHz
The panel has a custom LVDS connector. And it is a custom BIOS.
> I also need your kernel version 
it is a 

and rest of the graphics stack:
Mesa 10.4.4
X.Org X Server 1.14.0
pixman: 0.30.2
xf86-video-intel: module version = 2.99.917
Comment 7 Bp_at_wind 2016-03-09 11:44:19 UTC
sorry, kernel version is Linux localhost 4.1.15
Comment 8 Bp_at_wind 2016-03-11 21:38:44 UTC
(In reply to Uli Schlachter from comment #2)
> Do you also have backtraces for other threads? If libxcb waits on a
> pthread_cond_wait(), some other thread should be waiting in poll().
The other running threads are basically part of the UI, how would you recommend to debug those exchange of messages?
is there any Debug flag we could activate to track all these messages and find which one had no reply?
> 
> You didn't mention your libxcb version. Could you also try libxcb from git?
> (I'm asking because of
> https://cgit.freedesktop.org/xcb/libxcb/commit/
> ?id=5b40681c887192307f3ae147d2158870aa79c05f and
> https://cgit.freedesktop.org/xcb/libxcb/commit/
> ?id=f85661c3bca97faa72431df92a3867be39a74e23 which both fix hangs involving
> xcb_wait_for_special_event())
this test was done and the issue still occurs.
Comment 9 Uli Schlachter 2016-03-12 09:57:25 UTC
Let's try again: What are the backtraces of the other threads? Is any other thread inside libxcb-code?

Also, does the application use libGL / libX11 from multiple threads? Is XInitThreads() called at startup?

According to the only backtrace you provided, libX11 asks for a reply that we don't have yet and some other thread "grabbed" the socket. Could you get a debug build of libxcb and libX11 so that the backtraces provide more information?
Comment 10 Bp_at_wind 2016-03-15 15:29:35 UTC
Created attachment 122328 [details]
bt of all app running threads
Comment 11 Bp_at_wind 2016-03-15 15:31:52 UTC
(In reply to Uli Schlachter from comment #9)
> Let's try again: What are the backtraces of the other threads? Is any other
> thread inside libxcb-code?
now atatched. 
> 
> Also, does the application use libGL / libX11 from multiple threads? Is
> XInitThreads() called at startup?
(work in progress)
> 
> According to the only backtrace you provided, libX11 asks for a reply that
> we don't have yet and some other thread "grabbed" the socket. Could you get
> a debug build of libxcb and libX11 so that the backtraces provide more
> information?
how do you recommend to activate debug for these two, a debug build of libxcb does not produce any log.
Comment 12 Bp_at_wind 2016-03-16 09:02:06 UTC
(In reply to Uli Schlachter from comment #9)
> Let's try again: What are the backtraces of the other threads? Is any other
> thread inside libxcb-code?
> 
> Also, does the application use libGL / libX11 from multiple threads? 
According Qt documentation (we also checked it enabling some debug information and attaching the debugger..), a basic renderer is used for Mesa drivers. 
This renderer is single-threaded.
Actually we are using Qt 5.4.0. You may find convenient to check out Qt codebase. 
You will find system integration layer at:
                        qtbase/src/plugins/platforms/xcb/
And most of openGL usage at:
qtdeclarative/src/quick/scenegraph

> Is XInitThreads() called at startup?
Yes, it is. 

> 
> According to the only backtrace you provided, libX11 asks for a reply that
> we don't have yet and some other thread "grabbed" the socket. Could you get
> a debug build of libxcb and libX11 so that the backtraces provide more
> information?
Comment 13 Uli Schlachter 2016-03-16 17:51:50 UTC
So the backtraces (which are a mess) say that there are just two threads running code inside of libxcb. One of the is sitting in xcb_wait_for_event() -> poll(), the other inside of xcb_wait_for_reply() -> pthread_cond_wait(). This seems valid and thus I guess that the bug is inside of the server. Let's see what they think.
Comment 14 Kaveh 2016-03-16 18:28:19 UTC
The Qt main page is recommending that Qt 5.5 be used (https://wiki.qt.io/Main_Page). Is there a reason that version cannot be used to make sure the bug is not fixed with that update?
Comment 15 Bp_at_wind 2016-03-17 14:41:47 UTC
(In reply to Kaveh from comment #14)
> The Qt main page is recommending that Qt 5.5 be used
> (https://wiki.qt.io/Main_Page). Is there a reason that version cannot be
> used to make sure the bug is not fixed with that update?
that is a good point. Under investigation.
Comment 16 Bp_at_wind 2016-03-18 15:37:59 UTC
(In reply to Kaveh from comment #14)
> The Qt main page is recommending that Qt 5.5 be used
> (https://wiki.qt.io/Main_Page). Is there a reason that version cannot be
> used to make sure the bug is not fixed with that update?
After checking the changeset of 5.5 we cannot see relevant changes related to the problem we are pointing. Here the issue is not about deadlocks using blocking xcb_wait_for_reply / xcb_wait_for_event calls inside Qt layer but instead a deadlock using these primitives inside i965_dri.so openGL implementation.

Also, Qt is recommending this version because is the latest stable release but note that it is only a single minor version upgrade (we are actually on Qt.5.4.0).
Comment 17 Bp_at_wind 2016-03-18 15:45:57 UTC
(In reply to Bp_at_wind from comment #12)
> (In reply to Uli Schlachter from comment #9)
> > Let's try again: What are the backtraces of the other threads? Is any other
> > thread inside libxcb-code?
> > 
> > Also, does the application use libGL / libX11 from multiple threads? 
> According Qt documentation (we also checked it enabling some debug
> information and attaching the debugger..), a basic renderer is used for Mesa
> drivers. 
> This renderer is single-threaded.
> Actually we are using Qt 5.4.0. You may find convenient to check out Qt
> codebase. 
> You will find system integration layer at:
>                         qtbase/src/plugins/platforms/xcb/
> And most of openGL usage at:
> qtdeclarative/src/quick/scenegraph
> 
some clarification, besides the basic renderer which is single threaded there’s another thread handled by Qt which is a kind of listener for xcb events. So the answer is yes, there may be several threads inside libxcb. I don’t know if this may contribute to the issue as everyone holds its own xcb_connection (maybe the physical layer among connections of the same process and the XServer are shared..?).
Comment 18 gatis.paeglis 2018-09-14 10:32:44 UTC
I think this issue is the same as I have reported in https://bugs.freedesktop.org/show_bug.cgi?id=91448
Comment 19 GitLab Migration User 2018-12-13 22:35:24 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/xserver/issues/497.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.