Bug 22082 - X Application hangs randomly, has reproduce step
Summary: X Application hangs randomly, has reproduce step
Status: RESOLVED NOTOURBUG
Alias: None
Product: XCB
Classification: Unclassified
Component: Library (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: high critical
Assignee: xcb mailing list dummy
QA Contact: xcb mailing list dummy
URL:
Whiteboard:
Keywords:
: 21850 (view as bug list)
Depends on:
Blocks:
 
Reported: 2009-06-04 03:59 UTC by Zeng Zhaoming
Modified: 2011-03-16 12:07 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
xorg.log (119.82 KB, application/octet-stream)
2009-06-04 03:59 UTC, Zeng Zhaoming
Details

Description Zeng Zhaoming 2009-06-04 03:59:28 UTC
Created attachment 26431 [details]
xorg.log

env:
  Xserver:  X.Org X Server 1.6.1  Release Date: 2009-4-14
  Linux Kernel: Linux 2.6.28.9 i686
  mesa: OpenGL renderer string: Mesa DRI Intel(R) 965GM GEM 20090114 x86/MMX/SSE2
        OpenGL version string: 2.1 Mesa 7.6-devel. UXA accelerate
  glut: freeglut 3.8.0

Description:
  Have writen a opengl application, 
  a render thread update texture area and handling input event, like mouse move, keyboard trigger etc. 
  Another thread do xrandr. 

  Application get hangs. It looks xserver not hangs, because xset and xrandr from terminal has response.

  Backtrace application stack, get outputs like:

Thread 29 (Thread -1479996528 (LWP 4631)):
#0  0xb7f42424 in __kernel_vsyscall ()
#1  0xb7d7d236 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0xb7177fab in _xcb_conn_wait () from /opt/xorg/lib/libxcb.so.1
#3  0xb717a25a in xcb_wait_for_reply () from /opt/xorg/lib/libxcb.so.1
#4  0xb7341128 in _XReply () from /opt/xorg/lib/libX11.so.6
#5  0xb7928278 in DRI2GetBuffers () from /opt/xorg/lib/libGL.so.1
#6  0xb79275a5 in dri2GetBuffers () from /opt/xorg/lib/libGL.so.1
#7  0xa722139a in intel_update_renderbuffers ()
#8  0xa7221b2f in intelMakeCurrent () from /opt/xorg/lib/dri/i965_dri.so
#9  0xa7215fda in driBindContext () from /opt/xorg/lib/dri/i965_dri.so
#10 0xb7926e6c in dri2BindContext () from /opt/xorg/lib/libGL.so.1
#11 0xb79035ad in glXMakeCurrentReadSGI () from /opt/xorg/lib/libGL.so.1
#12 0xb7903933 in glXMakeCurrent () from /opt/xorg/lib/libGL.so.1
#13 0xb78c9d12 in fgSetWindow () from /opt/xorg/lib/libglut.so.3
#14 0xb78c5689 in glutMainLoopEvent () from /opt/xorg/lib/libglut.so.3
#15 0xb7962af1 in loop () from /opt/nvr/libnvr_opengl.so
#16 0xb7962eca in render_task () from /opt/nvr/libnvr_opengl.so
#17 0xb7d7943b in start_thread () from /lib/libpthread.so.0
#18 0xb776afde in clone () from /lib/libc.so.6

sometimes get:
#0  0xb7f42424 in __kernel_vsyscall ()
#1  0xb7d7d236 in poll()...
#2  0xb7177fab in _xcb_conn_wait () from /opt/xorg/lib/libxcb.so.1
#3  0xb717a25a in xcb_wait_for_reply () from /opt/xorg/lib/libxcb.so.1
#4  0xb7341128 in _XReply () from /opt/xorg/lib/libX11.so.6
#5  0xb7928278 in DRI2GetBuffers () from /opt/xorg/lib/libGL.so.1
#6  0xb79275a5 in dri2GetBuffers () from /opt/xorg/lib/libGL.so.1
#7  0xa722139a in intel_update_renderbuffers ()
#8  0xa7221b2f in intelMakeCurrent () from /opt/xorg/lib/dri/i965_dri.so
#9  0xa7215fda in driBindContext () from /opt/xorg/lib/dri/i965_dri.so
#10 0xb7926e6c in dri2BindContext () from /opt/xorg/lib/libGL.so.1
#11 0xb79035ad in glXMakeCurrentReadSGI () from /opt/xorg/lib/libGL.so.1
#12 0xb7903933 in glXMakeCurrent () from /opt/xorg/lib/libGL.so.1
#13 0xb78c9d12 in fgSetWindow () from /opt/xorg/lib/libglut.so.3
#14 0xb78c5689 in glutMainLoopEvent () from /opt/xorg/lib/libglut.so.3
#15 0xb7962af1 in loop () from /opt/nvr/libnvr_opengl.so
#16 0xb7962eca in render_task () from /opt/nvr/libnvr_opengl.so
#17 0xb7d7943b in start_thread () from /lib/libpthread.so.0
#18 0xb776afde in clone () from /lib/libc.so.6

Seems connection lock deadlock..
Comment 1 Zeng Zhaoming 2009-06-04 04:07:18 UTC
reproduce step:
  1. In mouse click hook, update texture area. And click mouse.
  2. Meanwhile, another thread fork() and execve xrandr.
  
Comment 2 Zeng Zhaoming 2009-06-04 04:17:23 UTC
Same step, get something like:

#18 0xb771f8c9 in exit () from /lib/libc.so.6
#19 0xb7390b00 in _XDefaultError () from /opt/xorg/lib/libX11.so.6
#20 0xb7390bd3 in _XError () from /opt/xorg/lib/libX11.so.6
#21 0xb73981f1 in _XReply () from /opt/xorg/lib/libX11.so.6
#22 0xb797f278 in DRI2GetBuffers () from /opt/xorg/lib/libGL.so.1
#23 0xb797e5a5 in dri2GetBuffers () from /opt/xorg/lib/libGL.so.1
#24 0xa723839a in intel_update_renderbuffers ()
#25 0xa7238b2f in intelMakeCurrent () from /opt/xorg/lib/dri/i965_dri.so
#26 0xa722cfda in driBindContext () from /opt/xorg/lib/dri/i965_dri.so
#27 0xb797de6c in dri2BindContext () from /opt/xorg/lib/libGL.so.1
#28 0xb795a5ad in glXMakeCurrentReadSGI () from /opt/xorg/lib/libGL.so.1
#29 0xb795a933 in glXMakeCurrent () from /opt/xorg/lib/libGL.so.1
#30 0xb7920d12 in fgSetWindow () from /opt/xorg/lib/libglut.so.3
#31 0xb791c656 in glutMainLoopEvent () from /opt/xorg/lib/libglut.so.3
#32 0xb79b9af1 in loop () from /opt/nvr/libnvr_opengl.so
#33 0xb79b9eca in render_task () from /opt/nvr/libnvr_opengl.so
#34 0xb7dd043b in start_thread () from /lib/libpthread.so.0
#35 0xb77c1fde in clone () from /lib/libc.so.6

And application exit..
Comment 3 Zeng Zhaoming 2009-06-04 04:24:11 UTC
full backtrace for exit() case:

(gdb) bt -18 full
#18 0xb771f8c9 in exit () from /lib/libc.so.6
No symbol table info available.
#19 0xb7390b00 in _XDefaultError (dpy=0xb9697d0, event=0xa7c9feec) at XlibInt.c:2875
No locals.
#20 0xb7390bd3 in _XError (dpy=0xb9697d0, rep=0xd184bf8) at XlibInt.c:2924
        rtn_val = <value optimized out>
        event = {type = 0, xany = {type = 0, serial = 194418640, send_event = 4199772, display = 0x4d65, window = 3070923161}, xkey = {type = 0, 
    serial = 194418640, send_event = 4199772, display = 0x4d65, window = 3070923161, root = 3073757916, subwindow = 3074875480, time = 2815033152, 
    x = -1208321342, y = -1220091376, x_root = 0, y_root = 1, state = 1, keycode = 0, same_screen = -1221195121}, xbutton = {type = 0, serial = 194418640, 
    send_event = 4199772, display = 0x4d65, window = 3070923161, root = 3073757916, subwindow = 3074875480, time = 2815033152, x = -1208321342, 
    y = -1220091376, x_root = 0, y_root = 1, state = 1, button = 0, same_screen = -1221195121}, xmotion = {type = 0, serial = 194418640, 
    send_event = 4199772, display = 0x4d65, window = 3070923161, root = 3073757916, subwindow = 3074875480, time = 2815033152, x = -1208321342, 
    y = -1220091376, x_root = 0, y_root = 1, state = 1, is_hint = 0 '\0', same_screen = -1221195121}, xcrossing = {type = 0, serial = 194418640, 
    send_event = 4199772, display = 0x4d65, window = 3070923161, root = 3073757916, subwindow = 3074875480, time = 2815033152, x = -1208321342, 
    y = -1220091376, x_root = 0, y_root = 1, mode = 1, detail = 0, same_screen = -1221195121, focus = -1221230592, state = 1127008}, xfocus = {type = 0, 
    serial = 194418640, send_event = 4199772, display = 0x4d65, window = 3070923161, mode = -1221209380, detail = -1220091816}, xexpose = {type = 0, 
    serial = 194418640, send_event = 4199772, display = 0x4d65, window = 3070923161, x = -1221209380, y = -1220091816, width = -1479934144, 
    height = -1208321342, count = -1220091376}, xgraphicsexpose = {type = 0, serial = 194418640, send_event = 4199772, display = 0x4d65, 
    drawable = 3070923161, x = -1221209380, y = -1220091816, width = -1479934144, height = -1208321342, count = -1220091376, major_code = 0, 
    minor_code = 1}, xnoexpose = {type = 0, serial = 194418640, send_event = 4199772, display = 0x4d65, drawable = 3070923161, major_code = -1221209380, 
    minor_code = -1220091816}, xvisibility = {type = 0, serial = 194418640, send_event = 4199772, display = 0x4d65, window = 3070923161, 
    state = -1221209380}, xcreatewindow = {type = 0, serial = 194418640, send_event = 4199772, display = 0x4d65, parent = 3070923161, window = 3073757916, 
    x = -1220091816, y = -1479934144, width = -1208321342, height = -1220091376, border_width = 0, override_redirect = 1}, xdestroywindow = {type = 0, 
    serial = 194418640, send_event = 4199772, display = 0x4d65, event = 3070923161, window = 3073757916}, xunmap = {type = 0, serial = 194418640, 
    send_event = 4199772, display = 0x4d65, event = 3070923161, window = 3073757916, from_configure = -1220091816}, xmap = {type = 0, serial = 194418640, 
    send_event = 4199772, display = 0x4d65, event = 3070923161, window = 3073757916, override_redirect = -1220091816}, xmaprequest = {type = 0, 
    serial = 194418640, send_event = 4199772, display = 0x4d65, parent = 3070923161, window = 3073757916}, xreparent = {type = 0, serial = 194418640, 
    send_event = 4199772, display = 0x4d65, event = 3070923161, window = 3073757916, parent = 3074875480, x = -1479934144, y = -1208321342, 
    override_redirect = -1220091376}, xconfigure = {type = 0, serial = 194418640, send_event = 4199772, display = 0x4d65, event = 3070923161, 
    window = 3073757916, x = -1220091816, y = -1479934144, width = -1208321342, height = -1220091376, border_width = 0, above = 1, override_redirect = 1}, 
  xgravity = {type = 0, serial = 194418640, send_event = 4199772, display = 0x4d65, event = 3070923161, window = 3073757916, x = -1220091816, 
    y = -1479934144}, xresizerequest = {type = 0, serial = 194418640, send_event = 4199772, display = 0x4d65, window = 3070923161, width = -1221209380, 
    height = -1220091816}, xconfigurerequest = {type = 0, serial = 194418640, send_event = 4199772, display = 0x4d65, parent = 3070923161, 
    window = 3073757916, x = -1220091816, y = -1479934144, width = -1208321342, height = -1220091376, border_width = 0, above = 1, detail = 1, 
    value_mask = 0}, xcirculate = {type = 0, serial = 194418640, send_event = 4199772, display = 0x4d65, event = 3070923161, window = 3073757916, 
    place = -1220091816}, xcirculaterequest = {type = 0, serial = 194418640, send_event = 4199772, display = 0x4d65, parent = 3070923161, 
    window = 3073757916, place = -1220091816}, xproperty = {type = 0, serial = 194418640, send_event = 4199772, display = 0x4d65, window = 3070923161, 
    atom = 3073757916, time = 3074875480, state = -1479934144}, xselectionclear = {type = 0, serial = 194418640, send_event = 4199772, display = 0x4d65, 
    window = 3070923161, selection = 3073757916, time = 3074875480}, xselectionrequest = {type = 0, serial = 194418640, send_event = 4199772, 
    display = 0x4d65, owner = 3070923161, requestor = 3073757916, selection = 3074875480, target = 2815033152, property = 3086645954, time = 3074875920}, 
  xselection = {type = 0, serial = 194418640, send_event = 4199772, display = 0x4d65, requestor = 3070923161, selection = 3073757916, target = 3074875480, 
    property = 2815033152, time = 3086645954}, xcolormap = {type = 0, serial = 194418640, send_event = 4199772, display = 0x4d65, window = 3070923161, 
    colormap = 3073757916, new = -1220091816, state = -1479934144}, xclient = {type = 0, serial = 194418640, send_event = 4199772, display = 0x4d65, 
    window = 3070923161, message_type = 3073757916, format = -1220091816, data = {b = "@??zú·\020?\000\000\000\000\001\000\000", s = {-192, -22583, 
        31426, -18438, -7664, -18618, 0, 0, 1, 0}, l = {-1479934144, -1208321342, -1220091376, 0, 1}}}, xmapping = {type = 0, serial = 194418640, 
    send_event = 4199772, display = 0x4d65, window = 3070923161, request = -1221209380, first_keycode = -1220091816, count = -1479934144}, xerror = {
    type = 0, display = 0xb9697d0, resourceid = 4199772, serial = 19813, error_code = 153 '\231', request_code = 145 '\221', minor_code = 10 '\n'}, 
  xkeymap = {type = 0, serial = 194418640, send_event = 4199772, display = 0x4d65, window = 3070923161, 
    key_vector = "?5·X?@??zú·\020?\000\000\000\000\001\000\000\000\001\000\000"}, pad = {0, 194418640, 4199772, 19813, -1224044135, -1221209380, 
    -1220091816, -1479934144, -1208321342, -1220091376, 0, 1, 1, 0, -1221195121, -1221230592, 1127008, -1221209380, -1220106172, 0, 194418640, -1479934056, 
    -1208298944, 0}}
        async = (_XAsyncHandler *) 0xa7c9feec
        next = <value optimized out>
#21 0xb73981f1 in _XReply (dpy=0xb9697d0, rep=0xa7c9ffc8, extra=0, discard=0) at xcb_io.c:506
        ext = (_XExtension *) 0x0
        ret_code = <value optimized out>
        error = (xcb_generic_error_t *) 0xd184bf8
        c = (xcb_connection_t *) 0xb969d10
        current = (PendingRequest *) 0xcfa1130
        __PRETTY_FUNCTION__ = "_XReply"
#22 0xb797f278 in DRI2GetBuffers (dpy=0xb9697d0, drawable=4194306, width=0xbcbf5f4, height=0xbcbf5f8, attachments=0xa7ca00a4, count=2, outCount=0xa7ca00d4)
    at dri2.c:256
        info = (XExtDisplayInfo *) 0xb84c728
        rep = {type = 0 '\0', pad1 = 153 '\231', sequenceNumber = 19813, length = 4199772, width = 143720458, height = 179897152, count = 136167516, 
  pad2 = 3219591016, pad3 = 134829076, pad4 = 179897152}
        buffers = <value optimized out>
        repBuffer = {attachment = 2815033352, name = 2804822056, pitch = 32993, cpp = 194621072, flags = 3080188331}
        i = 1
#23 0xb797e5a5 in dri2GetBuffers (driDrawable=0xbcbf5d0, width=0xbcbf5f4, height=0xbcbf5f8, attachments=0xa7ca00a4, count=2, out_count=0xa7ca00d4, 
    loaderPrivate=0xbcbf538) at dri2_glx.c:336
        pdraw = <value optimized out>
        buffers = <value optimized out>
#24 0xa723839a in intel_update_renderbuffers (context=0xb97bdb8, drawable=0xbcbf5d0) at intel_context.c:264
        rb = <value optimized out>
        region = <value optimized out>
        depth_region = <value optimized out>
        intel = (struct intel_context *) 0xb9868b0
        buffers = <value optimized out>
        screen = (__DRIscreen *) 0xb97bdf0
        i = 0
        count = <value optimized out>
        attachments = {0, 1, 3080042386, 3080344544, 3, 2815033592, 2900445812, 194494860, 1074029664, 2815033576}
        name = <value optimized out>
        region_name = <value optimized out>
        __func__ = "intel_update_renderbuffers"
#25 0xa7238b2f in intelMakeCurrent (driContextPriv=0xb97bdb8, driDrawPriv=0xbcbf5d0, driReadPriv=0xbcbf5d0) at intel_context.c:880
        intel = (struct intel_context *) 0xb9868b0
        intel_fb = (struct intel_framebuffer *) 0xbcbf640
        psp = (__DRIscreenPrivate *) 0xb97bdf0
#26 0xa722cfda in driBindContext (pcp=0xb97bdb8, pdp=0xbcbf5d0, prp=0xbcbf5d0) at ../common/dri_util.c:200
        psp = (__DRIscreenPrivate *) 0xb97bdf0
#27 0xb797de6c in dri2BindContext (context=0xb974250, draw=0xbcbf538, read=0xbcbf538) at dri2_glx.c:100
No locals.
#28 0xb795a5ad in MakeContextCurrent (dpy=0xb9697d0, draw=4194306, read=4194306, gc=0xb981200) at glxcurrent.c:418
        pdraw = (__GLXDRIdrawable *) 0xbcbf538
        pread = (__GLXDRIdrawable *) 0x1
        reply = {type = 56 '8', unused = 157 '\235', sequenceNumber = 30781, length = 2815033784, contextTag = 3079925828, pad2 = 2815033832, 
  pad3 = 3079798015, pad4 = 194461472, pad5 = 2815033868, pad6 = 2815033896}
        oldGC = (const GLXContext) 0xb981200
        opcode = <value optimized out>
        oldOpcode = 151 '\227'
        bindReturnValue = <value optimized out>
        state = <value optimized out>
Comment 4 Zeng Zhaoming 2009-06-04 05:13:21 UTC
when application exit. Xserver output:

X Error of failed request:  BadRegion (invalid Region parameter)
  Major opcode of failed request:  145 (XFIXES)
  Minor opcode of failed request:  10 (XFixesDestroyRegion)
  Serial number of failed request:  6284
  Current serial number in output stream:  6284
Comment 5 Zeng Zhaoming 2009-06-04 20:00:04 UTC
*** Bug 21850 has been marked as a duplicate of this bug. ***
Comment 6 Jamey Sharp 2009-10-09 23:00:19 UTC
I'm not sure where to start troubleshooting this report, so let's start with some easy questions. What version of libxcb and libX11 are you using? Did you compile them yourself, or are you using binaries from a distro? If you're using distro binaries, which distribution are you using?

Thanks for reassigning this, Alan. At first glance it looks plausible that this could be an XCB or Xlib bug, but I don't have any guesses yet.
Comment 7 xp@kedacom 2011-01-26 18:11:46 UTC
what's the new progress of this bug now?
Comment 8 Josh Triplett 2011-01-26 21:10:55 UTC
(In reply to comment #7)
> what's the new progress of this bug now?

IIRC, we tried reproducing it and couldn't with the test case on the current version, so we asked for more information on the versions of libraries in use.  I don't see any activity on this report since that point.
Comment 9 xp@kedacom 2011-01-26 23:33:09 UTC
Thanks for your response. Actually, I'm a colleague of Mr.Zeng Zhaoming, and now I'm charging in this bug.

We compiled X components ourselves:
x-server: 1.6.1
Xlib: 1.1.2
libxcb: 1.3
mesa: 7.7.1
intel_driver: 2.7.99

I know maybe it's hard to find out the exact cause of this bug. However, with those infomations up, I think xserver send a "BadRegion" to client, then the client does exit() in handling XError. The BadRegion error is generated in ProcDRI2CopyRegion() in xserver, and I don't know why.

Need your help, thx!
Comment 10 Julien Danjou 2011-01-27 01:35:15 UTC
Your library versions are quite outdated, so you should start by upgrading them. There's a chance that the bug (if there's one) is already fixed.
Comment 11 xp@kedacom 2011-01-27 03:18:56 UTC
(In reply to comment #10)
> Your library versions are quite outdated, so you should start by upgrading
> them. There's a chance that the bug (if there's one) is already fixed.

You are right, it should be upgraded, but there is really quite a lot of works to do for upgrading. So I wish (just wish) that somebody can have a look at this and figure out what causes this bug.

In the other hand, to avoid exit(), I think maybe I can use XSetErrorHandler() to replace the default X error handling function.
Comment 12 martijnwijns 2011-02-11 02:07:52 UTC
I think we encountered a similar problem, google for:

Xlib/XCB in multi-threaded situation results in deadlock 

to see the workaround that we created for it. Hope this helps.

(In reply to comment #11)
> (In reply to comment #10)
> > Your library versions are quite outdated, so you should start by upgrading
> > them. There's a chance that the bug (if there's one) is already fixed.
> 
> You are right, it should be upgraded, but there is really quite a lot of works
> to do for upgrading. So I wish (just wish) that somebody can have a look at
> this and figure out what causes this bug.
> 
> In the other hand, to avoid exit(), I think maybe I can use XSetErrorHandler()
> to replace the default X error handling function.
Comment 13 Jamey Sharp 2011-03-14 17:47:04 UTC
(In reply to comment #12)
> I think we encountered a similar problem, google for:
> 
> Xlib/XCB in multi-threaded situation results in deadlock 
> 
> to see the workaround that we created for it. Hope this helps.

I'm pretty sure this is a different bug, because apparently the second thread forks and runs the 'xrandr' command line program, rather than issuing the equivalent X requests on the same connection.

It isn't obvious to me that there's even an X bug here. Does DRI2 let you continue using the same context across screen reconfiguration events? If that's supposed to work, then perhaps this is a server bug?
Comment 14 xp@kedacom 2011-03-16 02:52:39 UTC
Hi all,

These days I have some new evidence about this bug, and now I think maybe Jamey is right, this is a different bug.

In our program, we do glutMainLoopEvent() in a display thread. And as it mentioned before, we will use xrandr in another thread for changing resolution when we need. BUT, at the moment, after changing resolution, we call the function "glutWarpPointer".

I had found that if either "glutMainLoopEvent" or "glutWarpPointer" is deleted, then this bug would never show up. So I think maybe it's a freeglut multi-threads problem or something else. Also I have tried to put "glutMainLoopEvent" and "glutWarpPointer" into one thread, in that case, our program is running well.

Does anybody know what happens in freeglut?
Comment 15 Jamey Sharp 2011-03-16 12:07:43 UTC
(In reply to comment #14)
> I had found that if either "glutMainLoopEvent" or "glutWarpPointer" is deleted,
> then this bug would never show up. So I think maybe it's a freeglut
> multi-threads problem or something else. Also I have tried to put
> "glutMainLoopEvent" and "glutWarpPointer" into one thread, in that case, our
> program is running well.

Ahh. A quick Google search suggests that yes, freeglut is not aware of threads. In that case, it sounds like you need to either make sure that all your glut calls are from the same thread, or use a mutex to ensure that you only call glut from one thread at a time.

Assuming that's actually your problem, I'm marking the bug resolved. If you find evidence that this really is an X bug, feel free to reopen this.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.