Bug 30450 - [bisected]mesa xdemo/glthreads get some black windows
[bisected]mesa xdemo/glthreads get some black windows
Status: ASSIGNED
Product: xorg
Classification: Unclassified
Component: Lib/Xlib
unspecified
All Linux (All)
: high major
Assigned To: Jamey Sharp
Xorg Project Team
2011BRB_Reviewed
:
: 20708 32261 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-09-29 02:17 UTC by fangxun
Modified: 2015-05-16 13:21 UTC (History)
9 users (show)

See Also:
i915 platform:
i915 features:


Attachments
xorg log (21.56 KB, text/plain)
2010-09-29 02:17 UTC, fangxun
no flags Details
Wake up _XReadEvents when _XReply might need a turn. (2.72 KB, patch)
2011-03-18 23:17 UTC, Jamey Sharp
no flags Details | Splinter Review
Wake up _XReadEvents when _XReply might need a turn. (2.83 KB, patch)
2011-03-19 16:29 UTC, Jamey Sharp
no flags Details | Splinter Review
Test program that spins instead of sleeping in XNextEvent with the above patch (160 bytes, text/x-csrc)
2011-03-22 08:50 UTC, Jamey Sharp
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description fangxun 2010-09-29 02:17:55 UTC
Created attachment 39041 [details]
xorg log

System Environment:
--------------------------
Platform:   pineview
Libdrm:     (master)2.4.21-21-g7ec9a1effa4f551897f91f3b017723a8adf011d9
Mesa:       (master)9476efe77ff196993937c3aa2e5bca725ceb0b41
Xserver: (master)xorg-server-1.9.0-71-gc768cdda92696b636c10bb2df64167d5274b4b99
Xf86_video_intel: (master)2.12.0-87-g08c2caca48323d6d5701dcef3486f850619d7905
Kernel: (master)9fe6206f400646a2322096b56c59891d530e8d51

Bug detailed description:
-------------------------
Startx and run mesa xdemo glthreads, we get some black windows, not rotating cub. Sometimes get still cubes. Bisect shows it's caused by LibX11. 
933aee1d5c53b0cc7d608011a29188b594c8d70b is the first bad commit.
commit 933aee1d5c53b0cc7d608011a29188b594c8d70b
Author: Jamey Sharp <jamey@minilop.net>
Date:   Fri Apr 16 20:18:28 2010 -0700

    Fix Xlib/XCB for multi-threaded applications (with caveats).

    Rather than trying to group all response processing in one monolithic
    process_responses function, let _XEventsQueued, _XReadEvents, and
    _XReply each do their own thing with a minimum of code that can all be
    reasoned about independently.

    Tested with `ico -threads 20`, which seems to be able to make many
    icosahedrons dance at once quite nicely now.

    Caveats:

    - Anything that was not thread-safe in Xlib before XCB probably still
      isn't. XListFontsWithInfo, for instance.

    - If one thread is waiting for events and another thread tries to read a
      reply, both will hang until an event arrives. Previously, if this
      happened it might work sometimes, but otherwise would trigger either
      an assertion failure or a permanent hang.

    - Versions of libxcb up to and including 1.6 have a bug that can cause
      xcb_wait_for_event or xcb_wait_for_reply to hang if they run
      concurrently with xcb_writev or other writers. So you'll want that fix
      as well.



Reproduce steps:
----------------
1. xinit &
2. ./glthreads -n 6
Comment 1 fangxun 2011-01-05 23:26:11 UTC
This issue still happens with following commits:
Libdrm:      (master)2.4.23-4-gbad5242a59aa8e31cf10749e2ac69b3c66ef7da0
Mesa:        (7.10)4e8f123f14e4a5bbd47c8cf7ec0c02d4ee6efd2d
Xserver:        (server-1.9-branch)xorg-server-1.9.3
Xf86_video_intel: (master)2.13.903-1-g22d7b61791c382088a6c0df5dce3a15405d6c495
Kernel: (master)3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5
Comment 2 meng 2011-03-07 03:55:22 UTC
This issue still happens with following commits:
-------------------------------------------------------------------
Libdrm:		(master)2.4.24-6-g3b04c73650b5e9bbcb602fdb8cea0b16ad82d0c0
Mesa:		(master)6538b5824e298eaebede2d9686c7607c44ab446
Kernel:	(drm-intel-fixes) 91355834646328e7edc6bd25176ae44bcd7386c7
Comment 3 Jamey Sharp 2011-03-14 17:39:22 UTC
*** Bug 32261 has been marked as a duplicate of this bug. ***
Comment 4 Jamey Sharp 2011-03-18 23:17:12 UTC
Created attachment 44609 [details] [review]
Wake up _XReadEvents when _XReply might need a turn.

This patch should fix this bug. I've tested this patch using `glthreads -n 6` (which was an excellent test case for this bug, thanks!) as well as `ico -threads 16` and various single-threaded clients. I don't think it introduces any regressions and I think it fully fixes this bug.

The bad news is that it depends on new libxcb API, which means we need a new libxcb release before this patch can go in, and libxcb master currently has reported regressions. So I don't know when that will happen.

In the meantime, if you could test against libxcb master (commit 2415c11dec5e5adb0c17f98aa52fbb371a4f8f23) and libX11-1.4.2 or newer plus this patch, and report whether it solves the problem for you, I'd sure appreciate it.
Comment 5 Jamey Sharp 2011-03-19 16:29:03 UTC
Created attachment 44625 [details] [review]
Wake up _XReadEvents when _XReply might need a turn.

The same patch, but made with git format-patch instead of git show. Not sure what I was thinking...
Comment 6 Jamey Sharp 2011-03-22 08:50:10 UTC
Created attachment 44715 [details]
Test program that spins instead of sleeping in XNextEvent with the above patch

I have to retract the above patch. Here's a correct (if pointless) single-threaded Xlib app that should block waiting for an event, but with the patch it instead spins, using 100% CPU. Uli had posted a multi-threaded test case that worked on unpatched Xlib, but fails like this program does with this patch too.

I need help getting this right.
Comment 7 Sven Gothel 2011-06-28 00:12:17 UTC
IMHO this is a duplicate of bug 20708.
Comment 8 Jeremy Huddleston 2011-10-03 20:53:02 UTC
*** Bug 20708 has been marked as a duplicate of this bug. ***
Comment 9 Guang Yang 2011-11-04 02:06:24 UTC
This issue still happens with following commits:
Libdrm:         (master)2.4.27-1-g961bf9b5c2866ccb4fedf2b45b29fb688519d0db
Mesa:           (7.11)b95767a57ad499a2ed7431e8b0b52966c6dc0a45
Kernel:         (master)c3b92c8787367a8bb53d57d9789b558f1295cc96
Comment 10 ye.tian 2013-02-22 03:27:32 UTC
The issue still exists with the follow commits:
---------------------------------
Kernel_version:     3.8
Libdrm:             2.4.42
Mesa:               (9.1)9.1-rc2
Xserver:            (server-1.13-branch)xorg-server-1.13.2
Xf86_video_intel:   (master)2.21.0
Cairo:              (master)1.12.12
Libva:              staging-20130205
Libva_intel_driver: staging-20130205
Comment 11 cancan,feng 2013-03-11 05:57:17 UTC
The problem still exists on the driver.

environment
--------------------
Libdrm:  (master)libdrm-2.4.42
Mesa:    (9.1)mesa-9.1(git-17493b8)
Xserver: (server-1.13-branch)xorg-server-1.13.2.902
Xf86_video_intel:(master)2.21.3
Cairo:   (master)1.12.14
Libva:   (master)libva-1.1.0
Libva_intel_driver: (master)00f65b78e6de520a4820702207ce098c6b073724
Kernel:  3.8
Comment 12 Christian König 2013-05-08 10:51:12 UTC
I'm also running into this problem now and spend the last day analyzing it before stumbling over this bugreport.

Basically it's a classic deadlock situation where different locking objects are acquired in different order depending on the code path.

I'm going to try to fix it, but can't promise anything.
Comment 13 shui yangwei 2013-06-14 05:45:48 UTC
This issue also exists in below environment:
-----------------------------------
Kernel: 3.9.5
Mesa: 9.1.3 (almost the same as RC1)
Libdrm: 2.4.45 (even older than RC1)
Xf86-video-intel: 2.21.9
Libva: master (to be 1.2)
Libva-intel-driver: master (to be 1.2)
Cairo: 1.12.14
Xserver: 1.14
Comment 14 Mike Blumenkrantz 2015-05-16 13:21:39 UTC
Still a problem.

Mesa: 10.5.4
Intel driver: 2.99.917
Xorg: 1.17
xcb: 1.11