Summary: | hang waiting for a reply message (xcb_wait_for_reply) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Bp_at_wind <bertrand.piolin> | ||||||
Component: | Server/General | Assignee: | Xorg Project Team <xorg-team> | ||||||
Status: | RESOLVED MOVED | QA Contact: | Xorg Project Team <xorg-team> | ||||||
Severity: | normal | ||||||||
Priority: | medium | CC: | arun.preetham, chris, jaffer.intel, psychon, zoran.stojsavljevic | ||||||
Version: | unspecified | ||||||||
Hardware: | Other | ||||||||
OS: | All | ||||||||
Whiteboard: | |||||||||
i915 platform: | i915 features: | ||||||||
Attachments: |
|
the backtrace starts like this: /home/views/qt/lib/liblfpCore.so.1(_Z13dumpBacktraceP10sigcontexti+0x30)[0x7faec6c0da10] [0x623c00] /lib64/libpthread.so.0(pthread_cond_wait+0xbd)[0x31e200c02d] /usr/lib64/libxcb.so.1[0x363ba0a749] /usr/lib64/libxcb.so.1[0x363ba0b9ff] /usr/lib64/libxcb.so.1(xcb_wait_for_reply+0x62)[0x363ba0bb02] /usr/lib64/libX11.so.6(_XReply+0x107)[0x31e1040c57] /usr/lib64/libGL.so.1(+0x44f53)[0x7faec30d5f53] /usr/lib64/libGL.so.1(+0x4270f)[0x7faec30d370f] /usr/lib64/dri/i965_dri.so(+0x324510)[0x7faeaf293510] /usr/lib64/dri/i965_dri.so(+0x3248cd)[0x7faeaf2938cd] /usr/lib64/dri/i965_dri.so(+0x318ff7)[0x7faeaf287ff7 Do you also have backtraces for other threads? If libxcb waits on a pthread_cond_wait(), some other thread should be waiting in poll(). You didn't mention your libxcb version. Could you also try libxcb from git? (I'm asking because of https://cgit.freedesktop.org/xcb/libxcb/commit/?id=5b40681c887192307f3ae147d2158870aa79c05f and https://cgit.freedesktop.org/xcb/libxcb/commit/?id=f85661c3bca97faa72431df92a3867be39a74e23 which both fix hangs involving xcb_wait_for_special_event()) I am working on backtraces for the other threads. In the meantime we could try the fixes you mentioned and it does not help. The first commit was already integrated and with the second the issue still occurs. Hello Bertrand (Piolin). I miss some information, so I can understand this problem much better. I need to understand which HW you are using, and what is the closest INTEL RVP/CRB, so I can also try to repeat/reproduce this error/bug? Also, I need type of the panel you are using, and which GFX interface (HDMI, DP, eDP, DVI, VGA)?! I also need your kernel version (I remember through fog that for WRL6 it was 3.10, not sure if 3.10.88 or similar number)? If you do not want to change kernel, you should adopt kernel.org kernel: longterm: 3.10.99 (2016-03-03 release date). _______ For your info, I did the following: HW used: my D-1500 (former BDW-DE ES2 V1 HW) – Camelback Mountain CRB, CPUID 0x50662 KERNEL used: Linux localhost.localdomain 4.4.3-300.fc23.x86_64 #1 SMP Fri Feb 26 18:45:40 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Flat Panel used: HP 19" (old around 7 years), VGA I/F from CM CRB platform. My data in red (please, do note that Fedora 23 kernel is already @ 4.4.3 numbers – I’ll very soon switch to F24, after May 2016, Beta release): Mesa - 11.0.4 (11.1.0) Libdrm - 2.4.65 (2.4.65) Libva -1.6.2 (1.6.2) vaapi intel-driver - 1.6.2 (I have gstreamer-vaapi 1.6.3 for the most packages) Cairo -1.14.4 (1.14.2) Xorg Xserver - 1.18.0 (1.18.0) Intel-gpu-tools - 1.13 (2.99.217) Just to mention, no any of xcb_wait_for_reply message found in dmesg logs, for several (at least 3) retrys. Thank you, Zoran Hello Zoran In order to reproduce the issue you must use customer hardware and UI application. With this setup we have a test script that triggers the issue. I can provide that setup for you, just let me know where to ship it? Best regards Bertrand Hi Zoran the issue could be reproduced with wrl8 so here is some details of that setup: > I need to understand which HW you are using, and what is the closest INTEL RVP/CRB, so I can also try to repeat/reproduce this error/bug? hardware is based on Flathead creek CRB with an i7-4770TE CPU @ 2.30GHz The panel has a custom LVDS connector. And it is a custom BIOS. > I also need your kernel version it is a and rest of the graphics stack: Mesa 10.4.4 X.Org X Server 1.14.0 pixman: 0.30.2 xf86-video-intel: module version = 2.99.917 sorry, kernel version is Linux localhost 4.1.15 (In reply to Uli Schlachter from comment #2) > Do you also have backtraces for other threads? If libxcb waits on a > pthread_cond_wait(), some other thread should be waiting in poll(). The other running threads are basically part of the UI, how would you recommend to debug those exchange of messages? is there any Debug flag we could activate to track all these messages and find which one had no reply? > > You didn't mention your libxcb version. Could you also try libxcb from git? > (I'm asking because of > https://cgit.freedesktop.org/xcb/libxcb/commit/ > ?id=5b40681c887192307f3ae147d2158870aa79c05f and > https://cgit.freedesktop.org/xcb/libxcb/commit/ > ?id=f85661c3bca97faa72431df92a3867be39a74e23 which both fix hangs involving > xcb_wait_for_special_event()) this test was done and the issue still occurs. Let's try again: What are the backtraces of the other threads? Is any other thread inside libxcb-code? Also, does the application use libGL / libX11 from multiple threads? Is XInitThreads() called at startup? According to the only backtrace you provided, libX11 asks for a reply that we don't have yet and some other thread "grabbed" the socket. Could you get a debug build of libxcb and libX11 so that the backtraces provide more information? Created attachment 122328 [details]
bt of all app running threads
(In reply to Uli Schlachter from comment #9) > Let's try again: What are the backtraces of the other threads? Is any other > thread inside libxcb-code? now atatched. > > Also, does the application use libGL / libX11 from multiple threads? Is > XInitThreads() called at startup? (work in progress) > > According to the only backtrace you provided, libX11 asks for a reply that > we don't have yet and some other thread "grabbed" the socket. Could you get > a debug build of libxcb and libX11 so that the backtraces provide more > information? how do you recommend to activate debug for these two, a debug build of libxcb does not produce any log. (In reply to Uli Schlachter from comment #9) > Let's try again: What are the backtraces of the other threads? Is any other > thread inside libxcb-code? > > Also, does the application use libGL / libX11 from multiple threads? According Qt documentation (we also checked it enabling some debug information and attaching the debugger..), a basic renderer is used for Mesa drivers. This renderer is single-threaded. Actually we are using Qt 5.4.0. You may find convenient to check out Qt codebase. You will find system integration layer at: qtbase/src/plugins/platforms/xcb/ And most of openGL usage at: qtdeclarative/src/quick/scenegraph > Is XInitThreads() called at startup? Yes, it is. > > According to the only backtrace you provided, libX11 asks for a reply that > we don't have yet and some other thread "grabbed" the socket. Could you get > a debug build of libxcb and libX11 so that the backtraces provide more > information? So the backtraces (which are a mess) say that there are just two threads running code inside of libxcb. One of the is sitting in xcb_wait_for_event() -> poll(), the other inside of xcb_wait_for_reply() -> pthread_cond_wait(). This seems valid and thus I guess that the bug is inside of the server. Let's see what they think. The Qt main page is recommending that Qt 5.5 be used (https://wiki.qt.io/Main_Page). Is there a reason that version cannot be used to make sure the bug is not fixed with that update? (In reply to Kaveh from comment #14) > The Qt main page is recommending that Qt 5.5 be used > (https://wiki.qt.io/Main_Page). Is there a reason that version cannot be > used to make sure the bug is not fixed with that update? that is a good point. Under investigation. (In reply to Kaveh from comment #14) > The Qt main page is recommending that Qt 5.5 be used > (https://wiki.qt.io/Main_Page). Is there a reason that version cannot be > used to make sure the bug is not fixed with that update? After checking the changeset of 5.5 we cannot see relevant changes related to the problem we are pointing. Here the issue is not about deadlocks using blocking xcb_wait_for_reply / xcb_wait_for_event calls inside Qt layer but instead a deadlock using these primitives inside i965_dri.so openGL implementation. Also, Qt is recommending this version because is the latest stable release but note that it is only a single minor version upgrade (we are actually on Qt.5.4.0). (In reply to Bp_at_wind from comment #12) > (In reply to Uli Schlachter from comment #9) > > Let's try again: What are the backtraces of the other threads? Is any other > > thread inside libxcb-code? > > > > Also, does the application use libGL / libX11 from multiple threads? > According Qt documentation (we also checked it enabling some debug > information and attaching the debugger..), a basic renderer is used for Mesa > drivers. > This renderer is single-threaded. > Actually we are using Qt 5.4.0. You may find convenient to check out Qt > codebase. > You will find system integration layer at: > qtbase/src/plugins/platforms/xcb/ > And most of openGL usage at: > qtdeclarative/src/quick/scenegraph > some clarification, besides the basic renderer which is single threaded there’s another thread handled by Qt which is a kind of listener for xcb events. So the answer is yes, there may be several threads inside libxcb. I don’t know if this may contribute to the issue as everyone holds its own xcb_connection (maybe the physical layer among connections of the same process and the XServer are shared..?). I think this issue is the same as I have reported in https://bugs.freedesktop.org/show_bug.cgi?id=91448 -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/xserver/issues/497. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 121702 [details] Xorg.log the issue is about the graphic stack hanging waiting for a reply message (xcb_wait_for_reply). As a result the front panel UI (based on Qt5) freezes. At this stage we cannot pinpoint the root cause, we tried to analyse buffers, kernel logs, DRI, libdrm but could not find anything obvious. The GPU seems to be working fine after the issue occurs. We also tried to upgrade parts of the graphics stack from Mesalib-9.1.3 to MesaLib-10.4.4.tar.bz2, libdrm-2.4.40 to libdrm-2.4.58, xf86-video-intel-2.9.902 to xf86-video-intel-2.99.917 and the issue still occurs.