Bug 1976

Summary:

libXRandR has deadlock problems with threads enabled

Product:

xorg

Reporter:

Bastien Nocera <bugzilla>

Component:

Lib/other

Assignee:

Xorg Project Team <xorg-team>

Status:

RESOLVED FIXED

QA Contact:

Severity:

normal

Priority:

high

CC:

dberkholz, erik.andren, gajownik, jg, keithp, roland.mainz

Version:

6.8.1

Hardware:

x86 (IA32)

OS:

Linux (All)

Whiteboard:

i915 platform:

i915 features:

Attachments:

Description	Flags
test case	none
xrandr-deadlock-fix.patch	bugzilla: 6.8-branch?

Description Bastien Nocera 2004-11-30 17:33:29 UTC

I hit this hang when adding XRandR support to Totem:
#0  0x00baf7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x0011d21e in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#2  0x00119dcf in _L_mutex_lock_32 () from /lib/tls/libpthread.so.0
#3  0x00bc5650 in ?? ()
#4  0x00000042 in ?? ()
#5  0x00ddf59c in ?? () from /usr/X11R6/lib/libX11.so.6
#6  0x092cccd8 in ?? ()
#7  0x00000001 in ?? ()
#8  0xfeef6448 in ?? ()
#9  0x00d6d49c in _XUnregisterFilter () from /usr/X11R6/lib/libX11.so.6
#10 0x00d6d49c in _XUnregisterFilter () from /usr/X11R6/lib/libX11.so.6
#11 0x00d47948 in XQueryExtension () from /usr/X11R6/lib/libX11.so.6
#12 0x00d3d3eb in XInitExtension () from /usr/X11R6/lib/libX11.so.6
#13 0x009b4813 in XextAddDisplay () from /usr/X11R6/lib/libXext.so.6
#14 0x00a0ed1f in XRRFindDisplay () from /usr/X11R6/lib/libXrandr.so.2
#15 0x00a0f390 in XRRQueryVersion () from /usr/X11R6/lib/libXrandr.so.2
#16 0x00a0f9f5 in XRRGetScreenInfo () from /usr/X11R6/lib/libXrandr.so.2
#17 0x080608ba in bacon_resize_init () at bacon-resize.c:63
#18 0x0805bad5 in bacon_video_widget_realize (widget=0x93beb20)
    at bacon-video-widget-xine.c:1162
#19 0x001c50ae in g_cclosure_marshal_VOID__VOID ()
   from /usr//lib/libgobject-2.0.so.0
#20 0x001ad6b2 in g_cclosure_new_swap () from /usr//lib/libgobject-2.0.so.0
#21 0x001ad347 in g_closure_invoke () from /usr//lib/libgobject-2.0.so.0
#22 0x001c2616 in g_signal_has_handler_pending ()
   from /usr//lib/libgobject-2.0.so.0
#23 0x001c49bc in g_signal_emit_valist () from /usr//lib/libgobject-2.0.so.0
#24 0x001c4c5a in g_signal_emit () from /usr//lib/libgobject-2.0.so.0
#25 0x00585adc in gtk_widget_realize () from /usr//lib/libgtk-x11-2.0.so.0
#26 0x00586091 in gtk_widget_set_parent () from /usr//lib/libgtk-x11-2.0.so.0
#27 0x003d21c7 in gtk_box_pack_start () from /usr//lib/libgtk-x11-2.0.so.0
#28 0x003d248a in gtk_box_pack_start_defaults ()
   from /usr//lib/libgtk-x11-2.0.so.0
#29 0x003d2dea in gtk_box_reorder_child () from /usr//lib/libgtk-x11-2.0.so.0
#30 0x001c5c3f in g_cclosure_marshal_VOID__OBJECT ()
   from /usr//lib/libgobject-2.0.so.0
#31 0x001ad6b2 in g_cclosure_new_swap () from /usr//lib/libgobject-2.0.so.0
#32 0x001ad347 in g_closure_invoke () from /usr//lib/libgobject-2.0.so.0
#33 0x001c2616 in g_signal_has_handler_pending ()
   from /usr//lib/libgobject-2.0.so.0
#34 0x001c49bc in g_signal_emit_valist () from /usr//lib/libgobject-2.0.so.0
#35 0x001c4c5a in g_signal_emit () from /usr//lib/libgobject-2.0.so.0
#36 0x0040c148 in gtk_container_add () from /usr//lib/libgtk-x11-2.0.so.0
#37 0x080663e0 in video_widget_create (totem=0x92ebe98) at totem.c:3524
#38 0x0806846e in main (argc=1, argv=0xfeef7304) at totem.c:3956

This is similar to the old hang in libXi:
http://bugs.xfree86.org/show_bug.cgi?id=260

Let me know if a test case is required.

Comment 1 Bastien Nocera 2004-12-01 05:51:56 UTC

Created attachment 1436 [details]
test case

Compile with:
gcc -g -o test test.c `pkg-config --libs --cflags gtk+-2.0` -L/usr/X11R6/lib
-lXrandr -lXrender -lpthread

It will hang with this backtrace:
#9  0x00bf649c in _XUnregisterFilter () from /usr/X11R6/lib/libX11.so.6
#10 0x00bf649c in _XUnregisterFilter () from /usr/X11R6/lib/libX11.so.6
#11 0x00bd0948 in XQueryExtension () from /usr/X11R6/lib/libX11.so.6
#12 0x00bc63eb in XInitExtension () from /usr/X11R6/lib/libX11.so.6
#13 0x00c70813 in XextAddDisplay () from /usr/X11R6/lib/libXext.so.6
#14 0x001a7d1f in XRRFindDisplay () from /usr/X11R6/lib/libXrandr.so.2
#15 0x001a8390 in XRRQueryVersion () from /usr/X11R6/lib/libXrandr.so.2
#16 0x001a89f5 in XRRGetScreenInfo () from /usr/X11R6/lib/libXrandr.so.2
#17 0x08048894 in main (argc=1, argv=0xfeeaf2b4) at test.c:23

Comment 2 Daniel Stone 2004-12-03 08:11:46 UTC

I looked at the source for this for all possible deadlock spots, and found none.
 So I compiled the testcase, and:
daniels@catsby:~/canonical/xorg/arch/pristine/xorg-6.8.1/build-tree/xc/lib/Xrandr%
./foo
daniels@catsby:~/canonical/xorg/arch/pristine/xorg-6.8.1/build-tree/xc/lib/Xrandr%
echo $?
0

Comment 3 Daniel Stone 2004-12-03 08:18:27 UTC

Erh, I missed the '1' in the bug number, so thought this was far older than it
was, sorry.  I can't hit the deadlock personally, and it looks like the
functions being called are fairly lock-safe, but my knowledge of X locking is,
er, unspectacular.  Please reopen if I'm on crack.

Well, if I'm wrong.

Comment 4 Daniel Stone 2004-12-03 08:36:11 UTC

Yeah, I'm on crack.

(gdb) bt
#0  0xffffe410 in __kernel_vsyscall ()
#1  0x4050f64b in __lll_mutex_lock_wait () from /lib/tls/i686/cmov/libpthread.so.0
#2  0x4050c8a5 in _L_mutex_lock_24 () from /lib/tls/i686/cmov/libpthread.so.0
#3  0x00000000 in ?? ()
#4  0x00000000 in ?? ()
#5  0x4072b094 in ?? () from /usr/X11R6/lib/libXext.so.6
#6  0x4072a460 in ?? () from /usr/X11R6/lib/libXext.so.6
#7  0x4071be08 in ?? () from /usr/X11R6/lib/libX11.so.6
#8  0x0806d880 in ?? ()
#9  0x406a467a in _XUnregisterFilter () from /usr/X11R6/lib/libX11.so.6
#10 0x4067d515 in XQueryExtension () from /usr/X11R6/lib/libX11.so.6
#11 0x4067306b in XInitExtension () from /usr/X11R6/lib/libX11.so.6
#12 0x4072bb64 in XextAddDisplay () from /usr/X11R6/lib/libXext.so.6
#13 0x4071e169 in XRRFindDisplay () from /usr/X11R6/lib/libXrandr.so.2
#14 0x4071e90f in XRRQueryVersion () from /usr/X11R6/lib/libXrandr.so.2
#15 0x4071ecc8 in XRRGetScreenInfo () from /usr/X11R6/lib/libXrandr.so.2
#16 0x08048907 in main (argc=1, argv=0xbffff994) at foo.c:20

The deadlock is happening where I *imagine* it would happen, having looked at
the code.  Which makes me feel a little better.

Comment 5 Daniel Stone 2004-12-03 08:37:09 UTC

... and with debugging:
(gdb) bt
#0  0xffffe410 in __kernel_vsyscall ()
#1  0x4050f64b in __lll_mutex_lock_wait () from /lib/tls/i686/cmov/libpthread.so.0
#2  0x4050c8a5 in _L_mutex_lock_24 () from /lib/tls/i686/cmov/libpthread.so.0
#3  0x00000000 in ?? ()
#4  0x00000000 in ?? ()
#5  0x4072b094 in ?? () from /usr/X11R6/lib/debug/libXext.so.6
#6  0x4072a460 in ?? () from /usr/X11R6/lib/debug/libXext.so.6
#7  0x4071be08 in __JCR_LIST__ () from /usr/X11R6/lib/debug/libX11.so.6
#8  0x0806d880 in ?? ()
#9  0x406a467a in _XLockDisplay (dpy=0x4050f64b) at locking.c:478
#10 0x4067d515 in XQueryExtension (dpy=0x806d880, name=0x40720208 "RANDR",
major_opcode=0xfffffffc, first_event=0xfffffffc, 
    first_error=0xfffffffc) at QuExt.c:55
#11 0x4067306b in XInitExtension (dpy=0x806d880, name=0x40720208 "RANDR") at
InitExt.c:46
#12 0x4072bb64 in XextAddDisplay (extinfo=0x407203c0, dpy=0x806d880,
ext_name=0xfffffffc <Address 0xfffffffc out of bounds>, 
    hooks=0x40720220, nevents=1, data=0xfffffffc <Address 0xfffffffc out of
bounds>) at extutil.c:106
#13 0x4071e169 in XRRFindDisplay (dpy=0x806d880) at Xrandr.c:135
#14 0x4071e90f in _XRRGetScreenInfo (dpy=0x806d880, window=4294967292) at
Xrandr.c:433
#15 0x4071ecc8 in XRRGetScreenInfo (dpy=0x806d880, window=4294967292) at
Xrandr.c:578
#16 0x08048907 in main (argc=1, argv=0xbffff964) at foo.c:20

Comment 6 Daniel Stone 2004-12-03 10:11:48 UTC

The problem here is that XextAddDisplay and friends are *totally* not
threadsafe, so we can't call them with the lock held.  The really quick and
nasty hack:
XRRScreenConfiguration *XRRGetScreenInfo (Display *dpy, Window window)
{
  XRRScreenConfiguration *config;
  XRRFindDisplay(dpy);
  LockDisplay (dpy);
  config = _XRRGetScreenInfo(dpy, window);
  UnlockDisplay (dpy);
  SyncHandle ();
  return config;
}

(note the new XRRFindDisplay call, so AddDisplay gets called without the lock
being held, if needed.)

I suspect the better long-term solution would be to do like
Xrender/Xcomposite/et al, and re-implement these functions ourselves; Keith,
Jim, input?

Comment 7 Keith Packard 2004-12-03 10:49:57 UTC

If the 'nasty hack' is threadsafe, please commit it.  We can consider redoing
this extension along the Xrender lines at some point in the future, the purpose
there is to avoid using libXext in every application.

Comment 8 Bastien Nocera 2005-06-10 03:37:45 UTC

Created attachment 2865 [details] [review]
Backport from current CVS

Well, it's even worse than that, it doesn't need display locking at all.

Comment 9 Daniel Stone 2005-06-10 07:09:04 UTC

committed to HEAD

Comment 10 Bastien Nocera 2005-07-08 21:53:26 UTC

Could the fix also be committed to the XORG-6_8-branch in CVS?

Comment 11 Donnie Berkholz 2005-07-09 07:10:51 UTC

Comment on attachment 2865 [details] [review]
Backport from current CVS

Sounds like a good idea.

Comment 12 Donnie Berkholz 2005-07-09 07:14:22 UTC

Comment on attachment 2865 [details] [review]
Backport from current CVS

Err, that wasn't the patch. Anyone care to attach the actual committed patch
and switch the flag?

Comment 13 Bastien Nocera 2005-07-09 21:19:15 UTC

Created attachment 3056 [details]
source-surface-scale-paint-image-out.png

The patch that was committed to HEAD.

Comment 14 Mike A. Harris 2005-08-30 00:50:46 UTC

The attachment in comment #13 claiming to be a "patch" is a PNG image.
In order for this patch to make it into the 6.8.x branch (and thus into future
6.8.x) releases, someone needs to attach the patch to the bug report, confirm
the patch is attached, and the correct mimetype, and then edit the
attachment, and set the nomination flag to "?" to nominate it for 6.8.x
inclusion.

Hope this helps.

Comment 15 Bastien Nocera 2005-08-30 08:29:25 UTC

I can swear I never uploaded the patch in comment #13. I never even saw that
filename. I uploaded a patch called "xrandr-deadlock-fix.patch", not an image
called "source-surface-scale-paint-image-out.png".

Comment 16 Daniel Stone 2005-08-30 08:35:17 UTC

To be fair, it would have actually been a patch before the root filesystem kind
of tanked.

Comment 17 Bastien Nocera 2005-09-11 15:54:53 UTC

Created attachment 3231 [details] [review]
xrandr-deadlock-fix.patch

Haha! found it again!

Comment 18 Jim Gettys 2005-09-11 16:55:44 UTC

Comment on attachment 3231 [details] [review]
xrandr-deadlock-fix.patch

Looks sensible to me.

Comment 19 Erik Andren 2006-04-02 23:51:20 UTC

Is the last patch commited to HEAD or not?

Comment 20 Erik Andren 2006-08-02 12:26:41 UTC

Ping!

Comment 21 Jim Gettys 2006-08-02 12:29:47 UTC

One way to make sure it doesn't get ignored is to mark the bug as a blocker for
the next release.

Comment 22 Daniel Stone 2006-08-03 01:16:01 UTC

yes, that patch was exactly what was committed to HEAD over a year ago, per
comment #9; (re)closing given no-one cares about 6.8.x.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.