Bug 26050 - DRI2 and compositing locks up
Summary: DRI2 and compositing locks up
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Server/Ext/DRI (show other bugs)
Version: git
Hardware: x86 (IA32) Linux (All)
: medium major
Assignee: Jesse Barnes
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-01-14 12:29 UTC by Simon Thum
Modified: 2010-02-05 14:35 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
full log including backtrace (26.80 KB, text/plain)
2010-01-14 12:29 UTC, Simon Thum
no flags Details
fixup pending swap drawable destruction (1.87 KB, patch)
2010-01-15 08:50 UTC, Jesse Barnes
no flags Details | Splinter Review

Description Simon Thum 2010-01-14 12:29:44 UTC
Created attachment 32645 [details]
full log including backtrace

Hi,

I'm on KMS + Radeon, Kernel 2.6.32.2 and things worked just fine the last weeks. Today I rebuilt my stack (X, mesa, ... all gentoo live pkgs) but I get strange behaviour when starting up kde.

Without compositing in kwin, all is fine.

But with kwin's opengl compositing, I get this trace (and a complete lockup):

Backtrace:
0: /usr/bin/X (xorg_backtrace+0x3b) [0x80b3e4b]
1: /usr/bin/X (0x8048000+0x60645) [0x80a8645]
2: (vdso) (__kernel_rt_sigreturn+0x0) [0xb780a40c]
3: /usr/bin/X (dixLookupPrivate+0x44) [0x809d774]
4: /usr/lib/xorg/modules/extensions/libdri2.so (0xb7398000+0x1394) [0xb7399394]
5: /usr/lib/xorg/modules/extensions/libdri2.so (DRI2DestroyDrawable+0x3c) [0xb739941c]
6: /usr/lib/xorg/modules/extensions/libglx.so (0xb73ab000+0x41ae8) [0xb73ecae8]
7: /usr/lib/xorg/modules/extensions/libglx.so (0xb73ab000+0x365f0) [0xb73e15f0]
8: /usr/bin/X (FreeClientResources+0xee) [0x806e9fe]
9: /usr/bin/X (CloseDownClient+0x6f) [0x809470f]
10: /usr/bin/X (0x8048000+0x51578) [0x8099578]
11: /usr/bin/X (0x8048000+0x1ee5a) [0x8066e5a]
12: /lib/libc.so.6 (__libc_start_main+0xfa) [0xb74da74a]


This is only reproducible on startup, i.e. if I try to switch it on via GUI, X somehow survives, though not kwin. No trace in that case.

I don't know, maybe it is related to the recently added DRI2 feature:
(II) AIGLX: enabled GLX_INTEL_swap_event

since I'm on Radeon.
Comment 1 Julien Cristau 2010-01-14 15:56:05 UTC
Reassigning to Jesse.
Comment 2 Jesse Barnes 2010-01-14 16:22:08 UTC
I have another report of this too, I'm trying to reproduce now.
Comment 3 Jesse Barnes 2010-01-14 17:38:07 UTC
Ok, I've reproduced it.  compiz/gnome works fine, but kwin fails and causes a server SEGV somehow.  Looks like a drawable that was already freed got freed again, causing the server crash.  I must have broken something in the DRI2/drawable lifetime rules, but we've had problems here before so maybe I just exposed a new one...

Kristian, any ideas?
Comment 4 Michel Dänzer 2010-01-15 07:57:17 UTC
One thing that looks fishy is the DRI2DrawableRec lifetime related to DRI2SwapComplete. The code added to DRI2DestroyDrawable() claims that the data structure will be used and freed in DRI2SwapComplete(), but I don't know how the latter could get to it given the dixSetPrivate() calls at the end of DRI2DestroyDrawable().

Conversely, DRI2SwapComplete() might free the data structure but not call dixSetPrivate(), so consequent DRI2GetDrawable() calls would return a pointer to freed memory.
Comment 5 Jesse Barnes 2010-01-15 08:50:57 UTC
Created attachment 32661 [details] [review]
fixup pending swap drawable destruction

Oh good catch Michel, that does look bogus.  This patch prevents the crash for me, and seems more correct.
Comment 6 Simon Thum 2010-01-16 01:20:05 UTC
Thanks for looking into this!

However there are two things worrying me:

1) I was unable to use KMS's proposed goodness, namely console switching (which works fine in general). I also saw no oops. Instead, my LCD went off after a few seconds. Doesn't this point at additional kernel issues?

2) is GLX_INTEL_swap_event intentionally cross-vendor?
Comment 7 Jesse Barnes 2010-01-18 18:46:19 UTC
(In reply to comment #6)
> Thanks for looking into this!
> 
> However there are two things worrying me:
> 
> 1) I was unable to use KMS's proposed goodness, namely console switching (which
> works fine in general). I also saw no oops. Instead, my LCD went off after a
> few seconds. Doesn't this point at additional kernel issues?

I'm not sure what you mean; it went off when trying to use the new kernel bits with page flipping enabled?  Or you can't get KMS going in general?

> 2) is GLX_INTEL_swap_event intentionally cross-vendor?

Yeah, it's vender neutral for the most part.  I think pretty much any driver could implement it.
Comment 8 Simon Thum 2010-01-19 01:18:50 UTC
> > 1) I was unable to use KMS's proposed goodness, namely console switching (which
> > works fine in general). I also saw no oops. Instead, my LCD went off after a
> > few seconds. Doesn't this point at additional kernel issues?
> 
> I'm not sure what you mean; it went off when trying to use the new kernel bits
> with page flipping enabled?  Or you can't get KMS going in general?

No, it was or is working fine. But that bug caused a complete lockup, I saw no oops (which IIRC was on of the reasons for kms to see them from X), I couldn't switch vt's any more, even sysrq didn't help. I'm not exactly sure any more about the last point  [too lazy to re-crash it], but in general since kms works I supposed to see either an oops or I ought to still be able to switch vt's.

Instead, the machine was pretty much dead. And so far, since it seems to be merely a X server issue, I concluded there must be more bugs lurking from below.

> 
> > 2) is GLX_INTEL_swap_event intentionally cross-vendor?
> 
> Yeah, it's vender neutral for the most part.  I think pretty much any driver
> could implement it.
Thanks!
Comment 9 Jesse Barnes 2010-01-19 20:55:41 UTC
On Tue, 19 Jan 2010 01:18:50 -0800 (PST)
bugzilla-daemon@freedesktop.org wrote:

> http://bugs.freedesktop.org/show_bug.cgi?id=26050
> 
> 
> 
> 
> 
> --- Comment #8 from Simon Thum <simon.thum@gmx.de>  2010-01-19
> 01:18:50 PST ---
> > > 1) I was unable to use KMS's proposed goodness, namely console
> > > switching (which works fine in general). I also saw no oops.
> > > Instead, my LCD went off after a few seconds. Doesn't this point
> > > at additional kernel issues?
> > 
> > I'm not sure what you mean; it went off when trying to use the new
> > kernel bits with page flipping enabled?  Or you can't get KMS going
> > in general?
> 
> No, it was or is working fine. But that bug caused a complete lockup,
> I saw no oops (which IIRC was on of the reasons for kms to see them
> from X), I couldn't switch vt's any more, even sysrq didn't help. I'm
> not exactly sure any more about the last point  [too lazy to re-crash
> it], but in general since kms works I supposed to see either an oops
> or I ought to still be able to switch vt's.
> 
> Instead, the machine was pretty much dead. And so far, since it seems
> to be merely a X server issue, I concluded there must be more bugs
> lurking from below.

Ah ok, that lockup may actually be a kernel issue.  I posted a page
flipping related fix to intel-gfx that should help (fixed the hang for
me anyway).
Comment 10 Simon Thum 2010-01-20 02:31:56 UTC
(In reply to comment #9)
> Ah ok, that lockup may actually be a kernel issue.  I posted a page
> flipping related fix to intel-gfx that should help (fixed the hang for
> me anyway).
Is this again vendor-independent or should radeon ppl be notified somehow?
Comment 11 Jesse Barnes 2010-01-25 09:23:12 UTC
AFAIk the radeon people are aware of it already; they mainly need to implement support for the new DRI2 hooks in order to get support for the extension (it's mostly server side).

Btw, just sent this fix out to xorg-devel for Keith to apply.
Comment 12 Jesse Barnes 2010-02-05 14:35:54 UTC
commit 711e26466ae04ae93ff4c48d377d83d68a6320e9
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Mon Jan 25 09:21:51 2010 -0800

    DRI2: handle drawable destruction properly at DRI2SwapComplete time


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.