Bug 27767

Summary: [bisected] bottom panel of gnome desktop is wrong
Product: xorg Reporter: fangxun <xunx.fang>
Component: Server/GeneralAssignee: Kristian Høgsberg <krh>
Status: VERIFIED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: major    
Priority: medium CC: b.brachaczek, bugs, maximlevitsky, otaylor, suka.hiroaki
Version: git   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
screenshot of wrong bottom panel
none
Fix XID confusion in DRI2
none
fix the fix
none
Fix the resource counting in FreeResourceByType()
none
roll-up patch none

Description fangxun 2010-04-21 00:05:33 UTC
Created attachment 35194 [details]
screenshot of wrong bottom panel 

System Environment:
--------------------------
Platform:        G45
Mesa:           (7.8)db3b34219ef1dbf9ebf5e524d3e459f9ad9571b5
Xserver:          (master)a92b2c2c8dd1e86ee852168146f01bdf72bfe2d0
Xf86_video_intel:  (master)9494f4e91f8c8c7a0f8d61c6883c9bfceb2cec46
Libdrm:         (master)c7650003c52ee29b7fa5ebf20dd134079f0b8488
Kernel: (master)01bf0b64579ead8a82e7cfc32ae44bc667e7ad0f

Bug detailed description:
-------------------------
Bottom panel is covered with top panel(can be seen in the attached screenshot). Though bottom panel can not be seen, it's function still 
works. This issue happens on all platforms. With bisect and find 1da1f33f2dd5b437dd56cd9f5d6782de4ad5a1bc is first bad commit.

commit 1da1f33f2dd5b437dd56cd9f5d6782de4ad5a1bc
Author: Kristian Høgsberg <krh@bitplanet.net>
Date:   Fri Apr 16 05:55:34 2010 -0400

    DRI2: Track DRI2 drawables as resources, not privates

    The main motivation here is to have the resource system clean up the
    DRI2 drawable automatically so glx doesn't have to.  Right now, the
    glx drawable resource must be destroyed before the X drawable, so that
    calling DRI2DestroyDrawable doesn't crash.  By making the DRI2
    drawable a resource, GLX doesn't have to worry about that and the
    resource destruction order becomes irrelevant.

    https://bugs.freedesktop.org/show_bug.cgi?id=26394

    Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
    Signed-off-by: Keith Packard <keithp@keithp.com>

Reproduce steps:
----------------
1.xinit&
2.gnome-session&
Comment 1 Kristian Høgsberg 2010-04-29 17:37:40 UTC
Created attachment 35338 [details] [review]
Fix XID confusion in DRI2

This only happens with a compositing manager, right?  Can you try applying the attached patch to the xserver and see if that fixes the problem?
Comment 2 Kristian Høgsberg 2010-04-30 15:33:50 UTC
Created attachment 35350 [details] [review]
fix the fix

Owen pointed out that my patch was broken and didn't actually do what it said it did.  This should work better.
Comment 3 Owen Taylor 2010-05-01 05:38:58 UTC
The more recent patch worked well in light testing, but more stressful testing (
starting and killing an application under a compositing manager in a tight loop) revealed X server crashes along the lines of:

Program received signal SIGSEGV, Segmentation fault.
FreeResource (id=21023285, skipDeleteFuncType=0) at resource.c:546
546		    if (res->id == id)
(gdb) bt
#0  FreeResource (id=21023285, skipDeleteFuncType=0) at resource.c:546
#1  0x00000000004287ca in ProcFreePixmap (client=0x1143920) at dispatch.c:1502
#2  0x000000000042c32c in Dispatch () at dispatch.c:439
#3  0x00000000004219ca in main (argc=<value optimized out>, argv=0x7fff79de5a78, 
    envp=<value optimized out>) at main.c:286
(gdb) frame 1
#1  0x00000000004287ca in ProcFreePixmap (client=0x1143920) at dispatch.c:1502
1502		FreeResource(stuff->id, RT_NONE);

The problem here, as I see it, is the code in DRI2DrawableGone() is calling FreeResourceByType(), but the resource management code isn't reentrant - if the drawable XID and the fake DRI drawable ID end up in the same hash bucket, then if DRIDrawableGone deletes a resource, the "prev" pointer can end up as garbage when DRIDrawableGone returns and FreeResource continues, causing crashes.

I'm not sure I understand the function of the fake DRI drawable ID - is the point to avoid having a manual hook on client disconnect and piggyback on the normal resource cleanup code? It seems otherwise like we could just use the client ID rather than the faked-up resource ID for the purposes of maintaining the mapping.
Comment 4 Kristian Høgsberg 2010-05-01 07:53:52 UTC
(In reply to comment #3)
> The more recent patch worked well in light testing, but more stressful testing
> (
> starting and killing an application under a compositing manager in a tight
> loop) revealed X server crashes along the lines of:
> 
> Program received signal SIGSEGV, Segmentation fault.
> FreeResource (id=21023285, skipDeleteFuncType=0) at resource.c:546
> 546            if (res->id == id)
> (gdb) bt
> #0  FreeResource (id=21023285, skipDeleteFuncType=0) at resource.c:546
> #1  0x00000000004287ca in ProcFreePixmap (client=0x1143920) at dispatch.c:1502
> #2  0x000000000042c32c in Dispatch () at dispatch.c:439
> #3  0x00000000004219ca in main (argc=<value optimized out>,
> argv=0x7fff79de5a78, 
>     envp=<value optimized out>) at main.c:286
> (gdb) frame 1
> #1  0x00000000004287ca in ProcFreePixmap (client=0x1143920) at dispatch.c:1502
> 1502        FreeResource(stuff->id, RT_NONE);
> 
> The problem here, as I see it, is the code in DRI2DrawableGone() is calling
> FreeResourceByType(), but the resource management code isn't reentrant - if the
> drawable XID and the fake DRI drawable ID end up in the same hash bucket, then
> if DRIDrawableGone deletes a resource, the "prev" pointer can end up as garbage
> when DRIDrawableGone returns and FreeResource continues, causing crashes.

AFAIK, it's supposed to be reentrant.  If you look at FreeResource() in dix/resource.c, you can see that it compares the value of clientTable[cid].elements before and after the callback.  If something is added or deleted in the callback, the element count changes and FreeResources() restarts the search by setting prev = head.

However, I just looked at FreeResourceByType, to make sure it adjusted the element count when it deleted the resources... and it doesnt!  It's a little hard to believe that this kind of bug could have stayed around for so long in the server, but I guess the only consequence (aside from this kind of bug) is that the resource hash tables grow a little bigger than they need to.

> I'm not sure I understand the function of the fake DRI drawable ID - is the
> point to avoid having a manual hook on client disconnect and piggyback on the
> normal resource cleanup code? It seems otherwise like we could just use the
> client ID rather than the faked-up resource ID for the purposes of maintaining
> the mapping.

Yeah, it's just piggybacking on the resource cleanup that happens at client disconnect.  Pretty standard technique in the server.  We could subscribe to the ClientStateCallback instead and delete all the refs from a client when it disconnects, but it would be more work and the resource code was designed to handle this kind of case.
Comment 5 Kristian Høgsberg 2010-05-01 07:55:08 UTC
Created attachment 35363 [details] [review]
Fix the resource counting in FreeResourceByType()

This fixes the resource counting when deleting by type.  I still can't belive this bug was in there...
Comment 6 Kristian Høgsberg 2010-05-01 09:14:36 UTC
Created attachment 35364 [details] [review]
roll-up patch

Another update with the following changes:

  1) Rolls up the FreeResourceByType() fix
  2) Undoes the FreeResourceByType() change in glxcmds.c:DoDestroyDrawable()
  3) Plugs a leak of 'ref' in DRI2DrawableGone()
Comment 7 maximlevitsky 2010-05-01 13:57:09 UTC
I applied the latest patch, and it seems to bring compiz back to life.

However, I noticed that I can crash the X in following way:

start gnome-screensaver-properties
switch between few screensavers.

X dies in DRI2WaitSwap, 

'if ((pPriv->swapsPending)'


I have nvidia geforce 8400M + nouveau + git tips of everything
(libdrm, mesa, xserver, nouveau ddx, ...)

With xserver-1.8 branch everything works just fine, including absence of above crash.
Comment 8 maximlevitsky 2010-05-03 17:43:57 UTC
Maybe it was a coincidence, because now after few updates, everything works just fine
Comment 9 Gordon Jin 2010-05-04 19:49:58 UTC
(In reply to comment #6)
> Created an attachment (id=35364) [details]
> roll-up patch
> 
> Another update with the following changes:
> 
>   1) Rolls up the FreeResourceByType() fix
>   2) Undoes the FreeResourceByType() change in glxcmds.c:DoDestroyDrawable()
>   3) Plugs a leak of 'ref' in DRI2DrawableGone()

Yes, this patch fixes on my side (on behalf of Xun - the original reporter, who is on vacation this week).
Comment 10 Kristian Høgsberg 2010-05-11 09:05:15 UTC
The patch series just go pulled into xserver master, bug should be fixed.  Please verify and close if it works.
Comment 11 maximlevitsky 2010-05-11 15:34:09 UTC
Works just fine!
Comment 12 fangxun 2010-05-13 03:19:54 UTC
It works well. Verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.