Summary: | Xvfb with the composite extension gets segmentation fault | ||||||
---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Irek Szczesniak <ijs> | ||||
Component: | Server/DDX/Xvfb | Assignee: | Xorg Project Team <xorg-team> | ||||
Status: | RESOLVED WORKSFORME | QA Contact: | |||||
Severity: | normal | ||||||
Priority: | high | CC: | ajax | ||||
Version: | 6.8.2 | ||||||
Hardware: | x86 (IA32) | ||||||
OS: | Linux (All) | ||||||
Whiteboard: | |||||||
i915 platform: | i915 features: | ||||||
Attachments: |
|
Description
Irek Szczesniak
2005-05-27 10:40:28 UTC
I got this additional information on GNU/Linux 2.6.11-1.27_FC3. I get this bug when I compile the whole X.org tree with: make World However, Xvfb works without the bug if I compile the tree with CDEBUGFLAGS=-g: make CDEBUGFLAGS=-g World To make sure that the debug symbols do not influence the bug, I did: strip Xvfb The stripped Xvfb (previously compiled with the debug symbols) works without the bug, so it seems like a workaroud for this problem, and I believe this can be my last resort. For both versions of Xvfb (the clean and the dirty one) I did: strace -o output.txt ./Xvfb -ac :1 +extension Composite Then I diffed the outputs and got: 3c3 < brk(0) = 0x84c2000 --- > brk(0) = 0x83a1000 44,45c44,45 < brk(0) = 0x84c2000 < brk(0x84e3000) = 0x84e3000 --- > brk(0) = 0x83a1000 > brk(0x83c2000) = 0x83c2000 49c49 < getpgrp() = 5194 --- > getpgrp() = 5226 51,52c51,52 < getpid() = 5195 < write(0, " 5195\n", 11) = 11 --- > getpid() = 5227 > write(0, " 5227\n", 11) = 11 57c57 < rt_sigaction(SIGALRM, {0x808fb8e, [ALRM], SA_RESTORER, 0x5d28c8}, NULL, 8) = 0 --- > rt_sigaction(SIGALRM, {0x807ba44, [ALRM], SA_RESTORER, 0x5d28c8}, NULL, 8) = 0 101,103c101,103 < rt_sigaction(SIGHUP, {0x808ddd9, [HUP], SA_RESTORER, 0x5d28c8}, {SIG_DFL}, 8) = 0 < rt_sigaction(SIGINT, {0x808de05, [INT], SA_RESTORER, 0x5d28c8}, {SIG_DFL}, 8) = 0 < rt_sigaction(SIGTERM, {0x808de05, [TERM], SA_RESTORER, 0x5d28c8}, {SIG_DFL}, 8) = 0 --- > rt_sigaction(SIGHUP, {0x807a4d4, [HUP], SA_RESTORER, 0x5d28c8}, {SIG_DFL}, 8) = 0 > rt_sigaction(SIGINT, {0x807a4f4, [INT], SA_RESTORER, 0x5d28c8}, {SIG_DFL}, 8) = 0 > rt_sigaction(SIGTERM, {0x807a4f4, [TERM], SA_RESTORER, 0x5d28c8}, {SIG_DFL}, 8) = 0 107,108c107,108 < getppid() = 5194 < gettimeofday({1117465649, 345025}, NULL) = 0 --- > getppid() = 5226 > gettimeofday({1117465837, 960885}, NULL) = 0 111c111 < gettimeofday({1117465649, 345963}, NULL) = 0 --- > gettimeofday({1117465837, 961411}, NULL) = 0 114c114 < time(NULL) = 1117465649 --- > time(NULL) = 1117465837 191c191 < brk(0x8505000) = 0x8505000 --- > brk(0x83e4000) = 0x83e4000 204,205c204,205 < brk(0x8526000) = 0x8526000 < gettimeofday({1117465649, 771449}, NULL) = 0 --- > brk(0x8405000) = 0x8405000 > gettimeofday({1117465838, 98478}, NULL) = 0 212c212 < brk(0x8547000) = 0x8547000 --- > brk(0x8426000) = 0x8426000 226c226 < brk(0x856a000) = 0x856a000 --- > brk(0x8449000) = 0x8449000 268c268 < brk(0x858b000) = 0x858b000 --- > brk(0x846a000) = 0x846a000 284c284 < brk(0x85ad000) = 0x85ad000 --- > brk(0x848c000) = 0x848c000 310c310 < brk(0x85db000) = 0x85db000 --- > brk(0x84ba000) = 0x84ba000 322,324c322,324 < gettimeofday({1117465649, 826551}, NULL) = 0 < gettimeofday({1117465649, 826713}, NULL) = 0 < gettimeofday({1117465649, 826883}, NULL) = 0 --- > gettimeofday({1117465838, 129578}, NULL) = 0 > gettimeofday({1117465838, 129615}, NULL) = 0 > gettimeofday({1117465838, 129668}, NULL) = 0 328,333c328,329 < close(0) = 0 < close(1) = 0 < unlink("/tmp/.X11-unix/X1") = 0 < unlink("/tmp/.X1-lock") = 0 < munmap(0xb7ea9000, 1314816) = 0 < exit_group(0) = ? --- > --- SIGSEGV (Segmentation fault) @ 0 (0) --- > +++ killed by SIGSEGV +++ We can see that both versions call the same system funcions and in the same order. I believe the different arguments to those system calls are ok. Therefore the bug is not related to system calls. I was wandering whether the "-g" option passed with CDEBUGFLAGS influences the source files, i.e. whether there are some "#define" statements that depend on "-g". Therefore I had to compare the output of the preprocessor. I modified the build system, so that whenever a file is compiled, first a preprocessor is run on this file, and the output is written to a file with some extension (and I just chose irek, my name). Then I built X.org with and without the "-g" option. As a result I got a lot of *.irek files, which I then diffed. The problem was that the differences were huge, and they produced a huge file. Only the diff on xc/programs/Xserver/composite/ gave me a file 80KB large. When the differences are so large, you don't know what to look for. Bottom line: this method is not useful for debugging this problem. More information for GNU/Linux: I extracted X.org into two separate directories. I built one with "make World", the other with "make CDEBUGFLAGS=-g World". After the built were complete I had a look at the differences between the text files in the two directories, and found that there were only some unimportant differences. I was hoping to see some differences in Makefiles, but found none. More information for GNU/Linux: the -O2 flag during the compilation of X.org causes Xvfb to crash. If you compile X.org with "make CDEBUGFLAGS=-g World", then the "-g" flag is used. If you compile with "make World", the default flags for compiling sources are "-O2 -fno-strength-reduce -fno-strict-aliasing" among other options. I played with these three options, and noticed that if I remove "-O2", the compiled Xvfb doesn't crash. A workaround that works for me on GNU/Linux is putting this host.def file into xc/config/cf with this contents: # define DefaultGcc2i386Opt -fno-strength-reduce GccAliasingArgs This file override this setting: # define DefaultGcc2i386Opt -O2 -fno-strength-reduce GccAliasingArgs That is, it removes the optimization flag. Using the -g flag or dropping the -O2 flag is not a solution, they simply influence the problem. When I change the code in some place, I get the crash even though I compiled Xvfb with -g or without -O2. My guess is that these flags arrange some data/code in memory differently, so that even when we reference some wrong memory (it's semgmentation fault), we are lucky to reference memory that belongs to us anyway. Otherwise (i.e. when I modifiy the code, or play with the -g -O2 flags), we reference some memory owned by others and Xvfb gets SIGSEGV. At least I got Xvfb with debug symbol that crashes, so that I get the exact line of code, which causes SIGSEGV: xc/programs/Xserver/dix/colormap.c:450 if(pent->fShared) { IT'S HERE: if (--pent->co.shco.red->refcnt == 0) xfree(pent->co.shco.red); if (--pent->co.shco.green->refcnt == 0) xfree(pent->co.shco.green); if (--pent->co.shco.blue->refcnt == 0) xfree(pent->co.shco.blue); } Created attachment 2820 [details] [review] a proposed fix This is the comment to the bug fix above. I found one way of fixing the bug. I am not sure that this fix is alright and that it won't cause other problems. When I compile and run Xvfb, it doesn't crash. THE FIX: I added this line: pent->fShared = FALSE; at xc/programs/Xserver/dix/colormap.c:456 I am attaching the patch for this file. I came up with this fix when I saw nearly the same block of code at xc/programs/Xserver/dix/colormap.c:735, but these two blocks differ only with the line. I added the line, and it works. The bug fix I submitted is wrong. Unfortunately I don't have a bug fix. But when I demand the 16 bit color depth, the problem is gone, because the colormaps are handled differently and Xvfb doesn't crash. For me it's good enough. i can no longer reproduce this with CVS head. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.