17358 – server stuck in damageDestroyPixmap

Bug 17358 - server stuck in damageDestroyPixmap

Summary: server stuck in damageDestroyPixmap

Status:	RESOLVED FIXED

Alias:	None

Product:	xorg
Classification:	Unclassified
Component:	Server/DDX/Xorg (show other bugs)
Version:	git
Hardware:	x86 (IA32) Linux (All)

Importance:	medium normal
Assignee:	Xorg Project Team
QA Contact:	Xorg Project Team

URL:
Whiteboard:
Keywords:

Duplicates (2):	18906 19560 (view as bug list)
Depends on:
Blocks:	xserver-1.6.1
	Show dependency tree / graph

Reported:	2008-08-29 16:21 UTC by Mathieu Bérard
Modified:	2009-05-05 04:34 UTC (History)
CC List:	4 users (show)

See Also:
i915 platform:
i915 features:

Attachments
gdb backtrace (1.59 KB, text/plain) 2008-08-29 16:21 UTC, Mathieu Bérard	no flags	Details
damage debugging patch (1.02 KB, patch) 2009-01-30 18:01 UTC, Eric Anholt	no flags	Details \| Splinter Review
inlined damageDestroyPixmap function (1.25 KB, text/plain) 2009-02-13 09:07 UTC, Fabio Scaccabarozzi	no flags	Details
View All

Description Mathieu Bérard 2008-08-29 16:21:06 UTC

Created attachment 18583 [details]
gdb backtrace

From times to times (around once a week, maybe more) the server just stop
responding when a window is closed.
gdb from remote session shows that the server is spinning forever in damageDestroyPixmap:

        while ((pDamage = *pPrev))
        {
            damageRemoveDamage (pPrev, pDamage);
            if (!pDamage->isWindow)
                DamageDestroy (pDamage);
        }

the graphic stack is a git master checkout from 26 August 2008
using the radeon driver and compiz.

a gdb bt full backtrace is attached.

Comment 1 Adam Jackson 2008-09-12 12:12:25 UTC

Does this happen with the 1.5 server?

Comment 2 Mathieu Bérard 2008-09-12 12:28:46 UTC

(In reply to comment #1)
> Does this happen with the 1.5 server?
> 

nope, git master

Comment 3 Mathieu Bérard 2008-09-12 12:29:36 UTC

(In reply to comment #2)
> (In reply to comment #1)
> > Does this happen with the 1.5 server?
> > 
> 
> nope, git master
> 

to be bore correct: I haven't tested server 1.5 ,
the bug is observed on git master

Comment 4 Laurent Pinchart 2008-11-05 03:22:08 UTC

X server 1.5.2 (2:1.5.2-2ubuntu3) suffers from the same problem. The gdb backtrace is identical to the one reported by Mathieu.

The graphic stack comes from a freshly installed Ubuntu 8.10 running KDE4. Graphic hardware is a

01:00.1 Display controller: ATI Technologies Inc RV370 [Radeon X300SE]

using the radeon driver.

Comment 5 Michel Dänzer 2008-11-06 03:35:42 UTC

Can you trace the control flow with gdb to see why it doesn't terminate?

Comment 6 Laurent Pinchart 2008-11-06 03:59:18 UTC

pDamage->pNext points to pDamage, leading to an infinite loop.

(gdb) print *pPrev
$3 = (DamagePtr) 0x81dc968
(gdb) print *pDamage
$4 = {pNext = 0x81dc968, pNextWin = 0x81dc96c, damage = {extents = {x1 = 0, y1 = 0, x2 = 0, y2 = 0}, data = 0x0}, damageLevel = DamageReportRawRegion, isInternal = 135749696, closure = 0x8175fb0, isWindow = 135749408, pDrawable = 0x8175ea0, damageReport = 0x8175e00 <damageChangeClip>, damageDestroy = 0x8175d80 <damageDestroyClip>, reportAfter = 135748848, pendingDamage = {extents = {x1 = 0, y1 = 0, x2 = -26848, y2 = 2071}, data = 0x81794c0}}

Comment 7 Michel Dänzer 2008-11-06 05:09:13 UTC

(In reply to comment #6)
> pDamage->pNext points to pDamage, leading to an infinite loop.

Hmm, does rebuilding miext/damage/damage.c with DAMAGE_VALIDATE_ENABLE defined to 1 give a clue as to how this comes to be?

Comment 8 Laurent Pinchart 2008-11-06 07:28:54 UTC

Unfortunately DAMAGE_VALIDATE_ENABLE doesn't help. No error message is log, and the abort() calls are not taken. Should I try to set DAMAGE_DEBUG_ENABLE to 1 or will it generate way too much log messages (my X server usually runs for about an hour before it freezes) ?

Comment 9 Michel Dänzer 2008-12-05 19:09:03 UTC

*** Bug 18906 has been marked as a duplicate of this bug. ***

Comment 10 Michel Dänzer 2009-01-14 09:44:35 UTC

*** Bug 19560 has been marked as a duplicate of this bug. ***

Comment 11 Eric Anholt 2009-01-30 18:01:17 UTC

Created attachment 22405 [details] [review]
damage debugging patch

Could anyone that can reproduce the bug try running with this patch?  I note that 3/3 reports so far are radeon, but I couldn't identify anything clearly bad in radeon.  Is anybody doing this rotating with randr?

Comment 12 Mathieu Bérard 2009-01-31 06:26:18 UTC

(In reply to comment #11)
> Created an attachment (id=22405) [details]
> damage debugging patch
> 
> Could anyone that can reproduce the bug try running with this patch?  I note
> that 3/3 reports so far are radeon, but I couldn't identify anything clearly
> bad in radeon.  Is anybody doing this rotating with randr?
> 

Sorry but my radeon powered laptop died a month ago, and my current system,
for which I use nouveau doesn't has this bug.
But for the record, I neved used randr rotation.

Comment 13 Laurent Pinchart 2009-02-11 03:38:26 UTC

(In reply to comment #11)
> Created an attachment (id=22405) [details]
> damage debugging patch
> 
> Could anyone that can reproduce the bug try running with this patch?

Sorry for the late reply, I had to reproduce the issue.

The FatalError call isn't reached.

> I note that 3/3 reports so far are radeon, but I couldn't identify anything
> clearly bad in radeon.  Is anybody doing this rotating with randr?

I'm not.

Comment 14 Fabio Scaccabarozzi 2009-02-13 09:06:32 UTC

(In reply to comment #6)
> pDamage->pNext points to pDamage, leading to an infinite loop.
> 
> (gdb) print *pPrev
> $3 = (DamagePtr) 0x81dc968
> (gdb) print *pDamage
> $4 = {pNext = 0x81dc968, pNextWin = 0x81dc96c, damage = {extents = {x1 = 0, y1
> = 0, x2 = 0, y2 = 0}, data = 0x0}, damageLevel = DamageReportRawRegion,
> isInternal = 135749696, closure = 0x8175fb0, isWindow = 135749408, pDrawable =
> 0x8175ea0, damageReport = 0x8175e00 <damageChangeClip>, damageDestroy =
> 0x8175d80 <damageDestroyClip>, reportAfter = 135748848, pendingDamage =
> {extents = {x1 = 0, y1 = 0, x2 = -26848, y2 = 2071}, data = 0x81794c0}}
> 

Looking at inlined code might help a little (attachment).
The gdb output shows that *pPrev and pDamage match, as it should be by statement (2). Then damageRemoveDamage gets called, but the abort() call (enabled if DAMAGE_VALIDATE_ENABLE is defined) is never hit, because the assignement pDamage = *pPrev is executed *before* calling damageRemoveDamage, thus statement (4) is *always true* without needing to call damageRemoveDamage to check. Apart from this, damageDestroyPixmap would continue to work (avoiding infinite loop due to assignement (2)) if the only conditional was executed, statement (5). In this case it never happens, since (5) is executed only if pDamage->isWindow = 0, while gdb reports pDamage->isWindow = 135749408 (which is a corrupted value, since isWindow is supposed to be a bool type).
Chances are that:
a) statement (1) gets executed and somehow isWindow gets corrupted when executing this statement on some particular configurations (most likely)
b) random corruption of values stored in pPixmap in statement (1) on occasional basis (leading to corrupted isWindow) due to "damage external software"/hardware faults (other xserver components, radeon driver, kernel, ram) (unlikely)
c) buggy code produced by the compiler (likely)
Personally, I've been running radeon, radeonhd, on xserver>1.5 for the last five months without ever hitting this bug.

Comment 15 Fabio Scaccabarozzi 2009-02-13 09:07:47 UTC

Created attachment 22903 [details]
inlined damageDestroyPixmap function

Shows execution flow through damageDestroyPixmap

Comment 16 Michel Dänzer 2009-02-15 09:14:23 UTC

(In reply to comment #14)
> a) statement (1) gets executed and somehow isWindow gets corrupted when
> executing this statement on some particular configurations (most likely)

Actually that seems quite unlikely to me, as getPixmapDamageRef just boils down to dixLookupPrivateAddr.

> b) random corruption of values stored in pPixmap in statement (1) on occasional
> basis (leading to corrupted isWindow) due to "damage external
> software"/hardware faults (other xserver components, radeon driver, kernel,
> ram) (unlikely)
> c) buggy code produced by the compiler (likely)

Looking at the printed *pDamage again, actually it doesn't look like a DamageRec at all but like the static GCFuncs damageGCFuncs from line 438 of damage.c. So it does look like some kind of memory corruption, but I'd tend to consider b) more likely than c). If someone could reproduce the problem with the X server running in valgrind or at least gdb with something like Electric Fence, that might give a hint.

> Personally, I've been running radeon, radeonhd, on xserver>1.5 for the last
> five months without ever hitting this bug.

Same here, but for even longer.

Comment 17 Michel Dänzer 2009-04-30 07:29:35 UTC

A fix for a memory use-after-free in the r300 driver just went into Mesa Git master and mesa_7_4_branch. Would be great if you could try if it helps for this problem as well.

Comment 18 Laurent Pinchart 2009-05-05 01:50:11 UTC

I haven't been to reproduce the problem for quite some time now. It must have been fixed by a Ubuntu upgrade (probably Xorg or KDE).

Comment 19 Alexander Hunziker 2009-05-05 02:22:14 UTC

I second Laurent, on Ubuntu Jaunty, I have never been able to reproduce the problem. It cannot be a change in KDE though, since I am a Gnome only user.

Comment 20 Michel Dänzer 2009-05-05 04:34:44 UTC

Assuming it's been fixed, reopen if you can still reproduce with current bits.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.