Bug 7447

Summary: X crashes when 32-bit windows are resized on a 16-bit desktop
Product: xorg Reporter: Aaron Plattner <aplattner>
Component: Server/GeneralAssignee: Keith Packard <keithp>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: high CC: dberkholz, jcristau, njustn
Version: 7.1 (2006.05)   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 8888    
Attachments:
Description Flags
Proposed fix
none
Use Composite when depths don't match
none
testcase for crasher none

Description Aaron Plattner 2006-07-06 14:21:00 UTC
RHEL4u3's crappy version of Metacity creates a 32-bit parent window for every
top-level window.  Start the X server at depth 16 and run xcompmgr -c, metacity,
and xterm.  Resize the xterm.  Eventually, the server will crash with a
backtrace that looks like this:

(gdb) r
Starting program: /home/aaron/X/7.1-root/bin/X -logfile /var/log/Xorg.0.log
-depth 16 -config xorg.conf.nv

X Window System Version 7.0.0
Release Date: 21 December 2005
X Protocol Version 11, Revision 0, Release 7.0
Build Operating System: Linux 2.6.16-gentoo-r7 i686 
Current Operating System: Linux tenor 2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:54:53
EST 2006 i686
Build Date: 05 July 2006
        Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
Module Loader present
Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(++) Log file: "/var/log/Xorg.0.log", Time: Thu Jul  6 14:09:04 2006
(++) Using config file: "/etc/X11/xorg.conf.nv"
[tcsetpgrp failed in terminal_inferior: Operation not permitted]
(EE) Failed to initialize GLX extension (Compatible NVIDIA X driver not found)

Program received signal SIGABRT, Aborted.
0x0054c7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) bt
#0  0x0054c7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x005907f5 in raise () from /lib/tls/libc.so.6
#2  0x00592199 in abort () from /lib/tls/libc.so.6
#3  0x005c44ea in __libc_message () from /lib/tls/libc.so.6
#4  0x005cc8c2 in _int_realloc () from /lib/tls/libc.so.6
#5  0x005cd786 in realloc () from /lib/tls/libc.so.6
#6  0x081e7679 in Xrealloc (ptr=0x98b5980, amount=48) at utils.c:1457
#7  0x0813f18f in miRegionValidate (badreg=0xbfe666b0, pOverlap=0xbfe6667c)
    at miregion.c:1516
#8  0x0814446f in miValidateTree (pParent=0x987c4d0, pChild=0x9876918, kind=VTOther)
    at mivaltree.c:703
#9  0x0814b01e in miSlideAndSizeWindow (pWin=0x9876918, x=215, y=329, w=214, h=290, 
    pSib=0x0) at miwindow.c:764
#10 0x081db368 in compResizeWindow (pWin=0x9876918, x=5, y=21, w=214, h=290,
pSib=0x0)
    at compwindow.c:397
#11 0x080761fd in ConfigureWindow (pWin=0x9876918, mask=12, vlist=0x986a380, 
    client=0x986b560) at window.c:2479
#12 0x08084b3c in ProcConfigureWindow (client=0x986b560) at dispatch.c:761
#13 0x0808409c in Dispatch () at dispatch.c:459
#14 0x0806e326 in main (argc=7, argv=0xbfe66e34, envp=0xbfe66e54) at main.c:447

Tested with both the nv and nvidia drivers on X.org 6.8.2, 7.1, and git commit
704e645207d88a2d0a372cf69f6abd778ed4c30b.
Comment 1 Daniel Stone 2007-02-27 01:32:50 UTC
Sorry about the phenomenal bug spam, guys.  Adding xorg-team@ to the QA contact so bugs don't get lost in future.
Comment 2 Adam Jackson 2007-04-08 13:38:09 UTC
Move to 7.3 tracker.
Comment 3 Aaron Plattner 2007-08-16 18:00:44 UTC
The issue appears to be caused by compNewPixmap.  I was under the impression that this function copied the contents of the old pixmap into the new one.  However, what it actually does is copy the contents of the *parent window* into the new pixmap.  When the parent window is depth 32 and the child pixmap is depth 16, this overruns the pixmap by a factor of 2.
Comment 4 Aaron Plattner 2007-08-16 19:49:36 UTC
Created attachment 11157 [details] [review]
Proposed fix

I was wrong: It ends up in fb24_32CopyMtoN so it overruns the destination pixmap by a factor of 3/2 rather than 2.
Comment 5 Daniel Stone 2007-08-16 21:54:46 UTC
Anyone got any objections to an embargo until August 29th? Might as well do it at the same time.
Comment 6 Daniel Stone 2007-08-16 22:09:48 UTC
And here I was thinking I was getting more competent with Bugzilla and security.
Comment 7 Aaron Plattner 2007-08-17 01:22:54 UTC
Comment on attachment 11157 [details] [review]
Proposed fix

That patch is totally busted.  I don't know what I was thinking.  I'll fix it tomorrow.
Comment 8 Eric Anholt 2007-08-17 10:49:48 UTC
Basically, the copyarea just has to be replaced with a composite, right?
Comment 9 Aaron Plattner 2007-08-17 10:51:59 UTC
That would work, though it would be slower.
Comment 10 Keith Packard 2007-08-31 22:01:42 UTC
Created attachment 11368 [details] [review]
Use Composite when depths don't match

We'll use Composite when the depths don't match. This patch seems obvious enough. I haven't tested it myself though.
Comment 11 Aaron Plattner 2007-09-01 19:10:56 UTC
That patch works for me.
Comment 12 Daniel Stone 2007-09-02 01:53:23 UTC
well, the patch is now in git, so i guess we should drop any pretense of secrecy.
Comment 13 Daniel Stone 2007-09-02 01:53:52 UTC
does anyone feel like backporting the fix?
Comment 14 Daniel Stone 2007-09-03 02:58:28 UTC
i'm having what i believe to be the same problem, caught internally (we use a 16-bit server):
STEPS TO REPRODUCE

launch xcompmgr &
run attached testcase and provide it '-f' as parameter.

------- Comment #5 From Stone Daniel (Nokia) 2007-08-17 09:49 [reply] -------

Strange, this doesn't happen in Xephyr.

------- Comment #6 From Palli Tapani (Nokia) 2007-08-17 10:24 [reply] -------

(In reply to comment #5)
> Strange, this doesn't happen in Xephyr.
> 

Yep, does not happen in Xephyr for me aswell :/ ... please try on target.

------- Comment #8 From Tamminen Eero (Nokia) 2007-08-17 11:22 [reply] -------

(In reply to comment #6)
> > Strange, this doesn't happen in Xephyr.
> 
> Yep, does not happen in Xephyr for me aswell :/

If you Valgrind Xephyr, do you get any errors?
(the error could be there, but causing crash only on arm)

------- Comment #9 From Stone Daniel (Nokia) 2007-08-17 13:43 [reply] -------

This one's not going to be much fun.  We die inside ValidateGC (specifically,
the composite wrapper), despite the ShmPutImage arguments being exactly the
same.  When setup comes, we do everything right:
Breakpoint 2, cwSetWindowPixmap (pWindow=0x2be9a0, pPixmap=0x2a7ef8)
    at ../../../miext/cw/cw.c:610
610     in ../../../miext/cw/cw.c
(gdb) print pWindow->drawable.id
$4 = 4194316 [note: this is 0x40000c]

But by the time we come down into ShmPutImage, we've lost out badly:
shm put image: format 2, depth 16, drawable 40000c, gc 40000d, shmseg 40000e, 
offset (nil), width 800, height 480, srcx 0, srcy 0, srcw 800, srch 480
Program received signal SIGSEGV, Segmentation fault.
0x001cc390 in cwGetBackingDrawable (pDrawable=0x2be9a0, x_off=0xbeb97f98,
    y_off=0xbeb97f94) at ../../../miext/cw/cw.c:96
96      in ../../../miext/cw/cw.c
(gdb) print pDrawable.id
$9 = 4194316
(gdb) print ((WindowPtr) pDrawable)->devPrivates[cwWindowIndex].ptr             
$28 = (pointer) 0x404ce008

(Note that x_off and y_off are pointers for return values, not offsets outside
the co-ordinate limits.)

 ------- Comment #11 From Stone Daniel (Nokia)  2007-09-03 12:56  [reply] -------

only this:
==17151== 
==17151== Conditional jump or move depends on uninitialised value(s)
==17151==    at 0x403D64F: pixman_region_translate (pixman-region.c:2017)
==17151==    by 0x80D556A: miTranslateRegion (miregion.c:1586)
==17151==    by 0x8388002: compSetRedirectBorderClip (compwindow.c:647)
==17151==    by 0x80D9BCC: miComputeClips (mivaltree.c:258)
==17151==    by 0x80DA67F: miValidateTree (mivaltree.c:770)
==17151==    by 0x80869E4: MapWindow (window.c:2841)
==17151==    by 0x8096975: ProcMapWindow (dispatch.c:740)
==17151==    by 0x80F8440: XaceCatchDispatchProc (xace.c:281)
==17151==    by 0x809712E: Dispatch (dispatch.c:502)
==17151==    by 0x807F2E4: main (main.c:452)

only happens with testcase -f, not testcase.  (note that the offending code is
in miTranslateRegion in our codebase, not pixman_translate_region.)
Comment 15 Daniel Stone 2007-09-03 03:00:17 UTC
Created attachment 11397 [details]
testcase for crasher

Running with no arguments and xcompmgr running on a 16bpp server is fine, but using -f causes the server to die.
Comment 16 Eric Anholt 2007-09-04 13:36:25 UTC
cherry-picked to server-1.4

daniels, I failed to reproduce your issue, with or without the cherry-pick, with a 16-bit XAA server.
Comment 17 Julien Cristau 2007-09-09 07:36:36 UTC
This has been assigned CVE-2007-4730, according to http://lists.debian.org/debian-security-announce/debian-security-announce-2007/msg00134.html
Comment 18 Tom S. 2008-04-22 05:24:26 UTC
I realize this bug is fixed, but I have to ask if there could be another way to close the security hole.  As it is now, what Aaron Plattner said "That would work, though it would be slower." is far beyond true.  It's to the point where it's impractical to use any composite aware terminal with compositing on.  

Is there any way to avoid using compiz for the case of differing depths, or possibly to set the default depth for all windows to be 32 when compiz is active even if the window is not using all 32 bits?

If need be I can provide more information, but the heart of it is that I remember a time when a composited terminal was faster than non, but now my newer system with much faster graphics (geforce 8800gt vs geforce fx5200) can't display a terminal emulator without lagging... doesn't seem to sit right somehow.

I'd be happy to work on developing a patch, but I'm admittedly new to X11 development, so I wanted to ask you, the experts, whether there's something which can be done before I sink time into it.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.