Bug 6533

Summary: gcc 4.1.x triggers xorg-server bug or is being miscompiled by it
Product: xorg Reporter: Matthias Dahl <ua_bugzilla_freedesktop>
Component: Server/DDX/XorgAssignee: Xorg Project Team <xorg-team>
Status: RESOLVED DUPLICATE QA Contact:
Severity: major    
Priority: high CC: erik.andren
Version: 7.0.0   
Hardware: Other   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
xorg.log (with nvidia proprietary driver)
none
xorg.conf
none
strace of Xorg server (with nv driver)
none
strace of Xorg server (with proprietary nvidia driver) none

Description Matthias Dahl 2006-04-08 17:48:49 UTC
A few weeks ago I switched over to gcc 4.1.0 from 3.4.5. I recompiled my entire 
system to get rid of unnecessary dependencies and stuff. From that point on, 
xorg-server started to make trouble.

If xorg-server is being compiled with anything above -O0, every switch to the 
VT after the first one, results in a black screen. I can still enter commands 
but I don't see anything. I can even switch back to Xorg without any trouble. 
If I end Xorg by stopping it from the black VT, the VT is restored even though 
some parts of it are garbled. This is with the proprietary nvidia drive by the 
way. 

The Xorg native nv driver shows a different behavior. It starts fine and one 
switch to the VT works too but switching back to Xorg results in a hang even 
though the magic-sysreq keys still work. The screen by the way hasn't been 
finished drawing yet. Most parts of it are there but some widgets are missing 
and so on.

All of the problems disappear if Xorg is being compiled with -O0 or with any 
optimization flag in combination with gcc 3.4.5.

I tracked it down so far that just replacing the /usr/bin/Xorg binary with 
a -O0 compiled one, fixes this too.

For what it's worth: the vesa driver always works just fine. And those problems 
only occur with gcc 4.1.x.

II tried to find the culprit in all of this and I spent -a lot of time- but I 
was still unable to track it down for good. I think gcc triggers some kind of 
timing problem or anything like it. But I am not that familiar with the Xorg 
server sources, so I don't know where to look. I would really appreciate any 
help I can get.

So, my system is Athlon64 3500+ (Winchester core), Gentoo 64bit, Xorg server 
1.0.2, Geforce 6600 GT (PCIe, TDH Extreme) and kernel 2.6.16. Tried up to date 
gcc and kernel snapshots but no luck. I recompiled my system several times with 
different CFLAGS to rule out that one... no luck either.

A more detailed description of this can be found in my gentoo bug report over 
here: https://bugs.gentoo.org/show_bug.cgi?id=127608.

I am more than willing to help locate and fix this bug... I just need help with 
it. :-)

Thanks for anything in advance.
Comment 1 Erik Andren 2006-04-09 20:34:34 UTC
Please post your xorg.conf and your xorg.log
A backtrace of the occuring would also be handy.  
Comment 2 Matthias Dahl 2006-04-10 06:48:13 UTC
Created attachment 5242 [details]
xorg.log (with nvidia proprietary driver)
Comment 3 Matthias Dahl 2006-04-10 06:50:00 UTC
Created attachment 5243 [details]
xorg.conf
Comment 4 Matthias Dahl 2006-04-10 06:51:33 UTC
Created attachment 5244 [details]
strace of Xorg server (with nv driver)

strace of Xorg server (with nv driver). Hangs after switch from VT back to X.
Killed by magic sysrq key. Console never gets restored. Reboot required.
Comment 5 Matthias Dahl 2006-04-10 06:53:27 UTC
Created attachment 5245 [details]
strace of Xorg server (with proprietary nvidia driver)

Second switch to VT results in a black screen even though commands can still be
(blindly) entered there. Switching back to X always works. After X has been
terminated from a VT, the console usually gets restored even though some VTs
are in a garbled state.
Comment 6 Matthias Dahl 2006-04-10 06:58:34 UTC
As you can see, I attached more detailed informations. Unfortunately I wasn't 
able to get anything useful out of gdb yet, so I hope the strace logs are 
sufficient and help. If you tell me what else you need or what I could test, 
please let me know.

If it helps, I could also provide you with the whole compiled xorg-server 
package (unstripped, with gdb infos) or just the Xorg binary.
Comment 7 Matthias Dahl 2006-04-13 02:34:42 UTC
Ok, I am now debugging the Xorg binary via a remote connection with gdb. Due 
the fact that I don't know much of the Xorg code and that I don't have any idea 
what I am looking for, this is moving along very (!) slowly. Any hint or so 
would be greatly appreciated.

Nevertheless I figured that when one switches from X to the console, at some 
point xf86VTSwitch() from xf86Events.c is being invoked. And in there, the call 
to xf86VTSwitchAway() is pretty interesting. Right after that little function 
has been executed (to be more precise, after the ioctl() call in there), the VT 
contents has been restored. This works one time. Every later switch results in 
the same process but after xf86VTSwitchAway() the console is still black. 
(resolution change happens prior to that call by the way)

I have come across something strange too with the O1/O2 compiled binary: 
according to gdb, right after the ioctl() call in xf86VTSwitchAway(), the line 
prior to it (xf86Info.vtRequestsPending = FALSE;) is being executed again (like 
something tampered with the IP) and it runs through the whole function. At the 
end, it doesn't return back to its caller but resumes going from one line prior 
the ioctl() call, runs through again and then returns back to its caller 
without ever reaching a RETURN line. At least that's what gdb shows me. I 
disassembled that part of code and it doesn't look right to me. I haven't 
touched anything assembler since ages, so I will post it here and maybe someone 
can have a look at it.

xf86VTSwitchAway() from O2 compiled Xorg binary:

0x00000000004a3580 <xf86VTSwitchAway+0>:        sub    $0x8,%rsp
0x00000000004a3584 <xf86VTSwitchAway+4>:        mov    2121285(%rip),%
rax        # 0x6a93d0 <_DYNAMIC+4576>
0x00000000004a358b <xf86VTSwitchAway+11>:       mov    $0x1,%edx
0x00000000004a3590 <xf86VTSwitchAway+16>:       mov    $0x5605,%esi
0x00000000004a3595 <xf86VTSwitchAway+21>:       mov    0x18(%rax),%edi
0x00000000004a3598 <xf86VTSwitchAway+24>:       movl   $0x0,0x94(%rax)
0x00000000004a35a2 <xf86VTSwitchAway+34>:       xor    %eax,%eax
0x00000000004a35a4 <xf86VTSwitchAway+36>:       callq  0x430c08 <ioctl@plt>
0x00000000004a35a9 <xf86VTSwitchAway+41>:       shr    $0x1f,%eax
0x00000000004a35ac <xf86VTSwitchAway+44>:       add    $0x8,%rsp
0x00000000004a35b0 <xf86VTSwitchAway+48>:       xor    $0x1,%eax
0x00000000004a35b3 <xf86VTSwitchAway+51>:       retq

Note the missing tests of the if condition above.

xf86VTSwitchAway() from O0 compiled Xorg binary:

0x00000000004e1928 <xf86VTSwitchAway+0>:        push   %rbp
0x00000000004e1929 <xf86VTSwitchAway+1>:        mov    %rsp,%rbp
0x00000000004e192c <xf86VTSwitchAway+4>:        sub    $0x10,%rsp
0x00000000004e1930 <xf86VTSwitchAway+8>:        mov    2595145(%rip),%
rax        # 0x75b280 <_DYNAMIC+4576>
0x00000000004e1937 <xf86VTSwitchAway+15>:       movl   $0x0,0x94(%rax)
0x00000000004e1941 <xf86VTSwitchAway+25>:       mov    2595128(%rip),%
rax        # 0x75b280 <_DYNAMIC+4576>
0x00000000004e1948 <xf86VTSwitchAway+32>:       mov    0x18(%rax),%edi
0x00000000004e194b <xf86VTSwitchAway+35>:       mov    $0x1,%edx
0x00000000004e1950 <xf86VTSwitchAway+40>:       mov    $0x5605,%esi
0x00000000004e1955 <xf86VTSwitchAway+45>:       mov    $0x0,%eax
0x00000000004e195a <xf86VTSwitchAway+50>:       callq  0x430cf8 <ioctl@plt>
0x00000000004e195f <xf86VTSwitchAway+55>:       test   %eax,%eax
0x00000000004e1961 <xf86VTSwitchAway+57>:       jns    0x4e196c 
<xf86VTSwitchAway+68>
0x00000000004e1963 <xf86VTSwitchAway+59>:       movl   $0x0,0xfffffffffffffffc(%
rbp)
0x00000000004e196a <xf86VTSwitchAway+66>:       jmp    0x4e1973 
<xf86VTSwitchAway+75>
0x00000000004e196c <xf86VTSwitchAway+68>:       movl   $0x1,0xfffffffffffffffc(%
rbp)
0x00000000004e1973 <xf86VTSwitchAway+75>:       mov    0xfffffffffffffffc(%
rbp),%eax
0x00000000004e1976 <xf86VTSwitchAway+78>:       leaveq
0x00000000004e1977 <xf86VTSwitchAway+79>:       retq

That's basically what I got so far... maybe am totally on the wrong track here 
but at least it's something.

Any ideas and/or suggestions?
Comment 8 Matthias Dahl 2006-04-14 09:27:55 UTC
Ok, I installed gcc 4.0.3 and a 4.2 snapshot today: everything works fine with 
4.0.3 (like with 3.4.5) but starting with 4.1, the problem occurs. So this 
really looks like a gcc bug to me, even though it seems hard to prove.

I filed a bug over the the gcc bugzilla, let's see where this leads...
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27152
Comment 9 Matthias Dahl 2006-04-15 03:21:59 UTC
Just to keep this up-to-date: I managed to find what part of the Xorg server is 
causing all the trouble: readKernelMapping() from 
hw/xfree86/os-support/linux/lnx_KbdMap.c (and probably the defined arrays 
there). If just this file is compiled with -O0 and the rest with -O2, 
everything works just fine.

More details can be found at the gcc bug report. (link above)
Comment 10 Michel Dänzer 2006-04-15 23:18:20 UTC
Could this be a duplicate of bug 6472?
Comment 11 Matthias Dahl 2006-04-16 00:38:32 UTC
I spent weeks trying to prove that something was wrong... I am glad that my 
honor is now restored. :-)

I tried the suggested fix in bug 6472 and it fixes things. So yes, it's the 
real thing. :-)
Comment 12 Michel Dänzer 2006-04-16 01:15:34 UTC

*** This bug has been marked as a duplicate of 6472 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.