Bug 29170 - [regression] Far Cry (in Wine) hangs on level load
Summary: [regression] Far Cry (in Wine) hangs on level load
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: GLX (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: mesa-dev
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-07-20 06:04 UTC by Sven Arvidsson
Modified: 2010-07-26 07:39 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
backtrace from hang (2.22 KB, text/plain)
2010-07-20 06:04 UTC, Sven Arvidsson
Details
Make global GLX mutex recursive. (927 bytes, patch)
2010-07-20 10:22 UTC, Nick Bowler
Details | Splinter Review
Test case. (2.57 KB, text/plain)
2010-07-20 17:35 UTC, Nick Bowler
Details

Description Sven Arvidsson 2010-07-20 06:04:33 UTC
Created attachment 37237 [details]
backtrace from hang

The game Far Cry (running in Wine) hangs when a level loads. Bisecting (and confirming by reverting) leads to this commit:

f8d81c31cee30821da3aab331a57f484f6a07a5d is the first bad commit
commit f8d81c31cee30821da3aab331a57f484f6a07a5d
Author: Nick Bowler <nbowler@draconx.ca>
Date:   Wed Jul 14 12:01:49 2010 -0400

    dri2: Track event mask in client code.

    When direct rendering is being used, DRI2 BufferSwapComplete events are
    sent unconditionally to clients, even if they haven't been requested.
    This causes error messages to be printed by every freeglut application
    of the form

      freeglut (./gears): Unknown X event type: 104

    and might confuse other clients.

    This is a fixed up version of the patch by Jesse Barnes, which drops
    BufferSwapComplete events if they are not requested by clients.

    Fixes fdo bug 27962.

    Signed-off-by: Nick Bowler <nbowler@draconx.ca>
    Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

:040000 040000 43b22479000d40e4034e467746bda73544d1ef4f
a7c81f4433d420249b67bee1a16bc047a45141a0 M    src

System environment:
-- system architecture: 32-bit
-- Linux distribution: Debian unstable
-- GPU: RV570
-- Model: Asus EAX1950Pro 256MB
-- Display connector: DVI
-- xf86-video-ati: cdeb1949c820242f05a8897d3ddd0718f204dacf
-- xserver: 1.8.99.904 (1.9.0 RC 4)
-- mesa: c1cbdbfde0a1f016f9d3f23a39cb7bc0b9825e12
-- drm: 6ea2bda5f5ec8f27359760ce580fdad3df0464df
-- kernel: 2.6.35-rc5
Comment 1 Nick Bowler 2010-07-20 10:22:49 UTC
Created attachment 37248 [details] [review]
Make global GLX mutex recursive.

Sigh.  Here's a horrible, horrible hack that should get the game working until
someone who understands mesa locking comes up with a better idea.

Looks like if a dri2 context is destroyed while a swap event is pending (seems
to happen easily if there are two contexts in one application, maybe it only
happens in this case?), we get a deadlock.  The garbage collector is called
while holding the global glx mutex, which calls XSync, which tries to handle
the pending swap event, which tries to take the mutex --> boom.

Looking at the code, it doesn't seem like this is entirely a new problem,
either: invalidate events also take the global mutex in DRI2WireToEvent and I
imagine they can cause a similar deadlock.

There's nothing radeon-specific about this, every DRI2 driver should be
affected.
Comment 2 Sven Arvidsson 2010-07-20 11:08:04 UTC
(In reply to comment #1)
> 
> Sigh.  Here's a horrible, horrible hack that should get the game working until
> someone who understands mesa locking comes up with a better idea.

The patch works fine here, thanks for taking the time to fix this!
 
> There's nothing radeon-specific about this, every DRI2 driver should be
> affected.

Thanks for confirming, where should it be reassigned to, Mesa core?
Comment 3 Nick Bowler 2010-07-20 17:35:33 UTC
Created attachment 37255 [details]
Test case.

A test case which reliably reproduces this issue for me.  Since there's
a race involved, the amount of time required to lock up varies.

It works by creating two contexts for subwindows of a larger parent window,
then repeatedly drawing on each and destroying/recreating one of the contexts
every iteration.
Comment 4 Nick Bowler 2010-07-26 07:33:18 UTC
Can you try again with latest git?  I think this might be fixed now.
Comment 5 Sven Arvidsson 2010-07-26 07:39:52 UTC
Yes, seems to work fine now, thanks!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.