Bug 14430 - [g33] desktop environment breakage after VT switch
Summary: [g33] desktop environment breakage after VT switch
Status: RESOLVED WORKSFORME
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: 7.3 (2007.09)
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Jesse Barnes
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 13493 15000
  Show dependency treegraph
 
Reported: 2008-02-08 11:55 UTC by Alan W. Irwin
Modified: 2008-08-20 12:39 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
X log file (103.94 KB, text/plain)
2008-02-08 11:57 UTC, Alan W. Irwin
no flags Details
xorg.conf (2.53 KB, text/plain)
2008-02-08 11:58 UTC, Alan W. Irwin
no flags Details
Xorg.0.log with exa method (104.24 KB, text/plain)
2008-02-11 10:58 UTC, Alan W. Irwin
no flags Details
xorg.conf with default exa (2.53 KB, text/plain)
2008-02-11 10:59 UTC, Alan W. Irwin
no flags Details
.xsession-errors for X session that froze (1.93 KB, text/plain)
2008-03-27 17:50 UTC, Alan W. Irwin
no flags Details
file resulting from startx >& startx.out2 on X session that froze (2.12 KB, text/plain)
2008-03-27 17:52 UTC, Alan W. Irwin
no flags Details
xorg.conf used for X session that froze (used EXA for this test) (2.53 KB, text/plain)
2008-03-27 17:54 UTC, Alan W. Irwin
no flags Details
log file taken for working X session after freeze. (Sorry, I forgot to preserve the one for the freeze). (94.74 KB, text/plain)
2008-03-27 17:56 UTC, Alan W. Irwin
no flags Details

Description Alan W. Irwin 2008-02-08 11:55:44 UTC
For the latest Intel driver (packaged for Debian unstable as
xserver-xorg-video-intel version 2:2.2.0.90-3) startx seems to work well and give me a reliable KDE 2D desktop with 3D games such as foobillard working well also.  Switching to the console (with crtl-alt-F1) works well.  However, any attempt to return from console back to X/KDE (with alt-F7) yields an unreliable desktop.  New tasks cannot be launched from the kicker GUI at the bottom of the KDE desktop, and the screen soon starts to be corrupted (like an eraser is following the cursor).  Attempts to do a normal exit from KDE hang, but I can exit from X with brute force using ctrl-alt-BS with lots of killing of hung KDE tasks required afterward.

There have been a number of other bug reports about problems switching to/from the console for slightly older versions of xf86-video-intel, but I was asked by one of the intel driver developers to start a new bug report since they were under the impression all these problems had been solved by the latest version of the intel driver. 

System environment:

Chipset: g33 (ASUS P5K-V MB)
System architecture: x86_64
Debian unstable package versions:
xf86-video-intel: packaged as xserver-xorg-video-intel version 2:2.2.0.90-3
xserver: packaged as xserver-xorg version 1:7.3+10
mesa: a number of different mesa-related packages with version 7.0.2-4
drm: packaged as libdrm2 version 2.3.0-4
kernel version: 2.6.23-1-amd64
Linux distribution: Combination of Debian unstable (for X and kernel) and Debian testing
Machine or mobo model: ASUS P5K-V

Reproduce steps: cycle to console and back with ctrl-alt-F1 and (from console) alt-F7.  Problem occurs 100 per cent of the time.
Comment 1 Alan W. Irwin 2008-02-08 11:57:17 UTC
Created attachment 14221 [details]
X log file
Comment 2 Alan W. Irwin 2008-02-08 11:58:09 UTC
Created attachment 14222 [details]
xorg.conf
Comment 3 Wang Zhenyu 2008-02-11 07:54:16 UTC
what about using default exa accel method?
Comment 4 Alan W. Irwin 2008-02-11 10:57:39 UTC
I just tried the default exa method (new log file and xorg.conf to be attached), and for my desktop needs it seems to be okay so far.  Thus, I will stick with it unless and until I run into any trouble with it.  However, the original reported issue with switching to the console and back still persists with exactly the same symptoms.  So I suspect when you fix it for exa, the problem will also go away for xaa. 
Comment 5 Alan W. Irwin 2008-02-11 10:58:58 UTC
Created attachment 14269 [details]
Xorg.0.log with exa method
Comment 6 Alan W. Irwin 2008-02-11 10:59:53 UTC
Created attachment 14270 [details]
xorg.conf with default exa
Comment 7 Alan W. Irwin 2008-02-11 14:27:54 UTC
EXA locked up after an hour of light desktop use.  Details at http://bugs.freedesktop.org/show_bug.cgi?id=14464 which I am keeping separate because it is likely a separate issue.
Comment 8 Alan W. Irwin 2008-02-20 16:42:36 UTC
I have made this a blocker bug for 13493 (release of the next version of Intel driver) because this bug completely disrupts the use of the Linux console with X for the case where the remaining kernel and X components are either at the latest released version or close to it (e.g., the mix of software versions available on Debian unstable as given above).  I have heard in a different forum that the Intel driver team have so far been unable to replicate this bug for their G33 equipment, but the bug can be quickly and easily reproduced every time on my system.  Thus, the question arises whether the Intel driver team are using a kernel and X software version mix similar to that for Debian unstable for their testing or some more exotic git bleeding-edge versions of same.
Comment 9 Jesse Barnes 2008-02-20 17:07:45 UTC
It seems strange that this problem could be caused by the driver... do older versions not exhibit this behavior?  If so, maybe you can bisect things down to the offending commit?
Comment 10 Alan W. Irwin 2008-02-21 02:17:55 UTC
Normally, I switch to the console at least once per startx (to background that task and to logout before switching back to X), and that worked without problems for The Debian testing version of X from 2007-11-01.  From an old Debian bug report concerning how to set modelines for modern X that I kept from then, the X-related package versions were the following:

xf86-video-intel: packaged as xserver-xorg-video-intel version 2:2.1.1-4 (unstable is currently 2:2.2.0.90-3)

xserver: packaged as xserver-xorg-core version 2:1.4-3 (unstable is currently 2:1.4.1~git20080131-1, improved version information I should have previously supplied above with my initial bug report)

drm: packaged as libdrm2 version 2.3.0-4 (unstable has the same version as testing)

kernel version: 2.6.22-2-amd64 (my recent failing X tests for this bug report have been done with kernel version 2.6.23-1-amd64)

So there is quite a bit of change between the older working X and the current one that exhibits this bug, and it is difficult to pin down which component change is the culprit.  

An even more interesting question, though, is why does this bug show so obviously for me, but not for anybody yet from the Intel driver team?  I predict that if you try the versions mentioned in my original report, i.e., the Debian unstable version of X you will immediately see this bug.
Comment 11 Jesse Barnes 2008-02-21 08:57:27 UTC
Yeah, it may be worth setting up something similar so we can see it.  Gordon, is this something you guys can do?

Now that I think about it a little, we've had some similar issues in the past.  They involved xrandr clients connecting to the server while it was VT switched away.  It's possible something similar is going on here, resulting in memory corruption rather than a crash like we saw in previous bugs.

Zhenyu, does that ring a bell at all?
Comment 12 Jesse Barnes 2008-02-21 09:08:42 UTC
One way to narrow it down would be start a minimal X environment (say just an xterm), do your switch away then switch back, then do 'startkde' from there.  If that works fine, we'll know it's one of the KDE desktop apps starting up that's causing problems.
Comment 13 Alan W. Irwin 2008-02-21 13:55:38 UTC
I can no longer reproduce this bug.  This is surprising since no kernel, drm, mesa, or X changes have been made on this system since my original report where the bug occurred with 100 per cent reliability (several tries with both XAA and EXA and with a simplified KDE desktop [typically two xterms, and that was it]).

I absolutely don't know what to think now.  The system has been rebooted at least once since the bug report, and that may have made some difference.  BTW, it was probably a warm reboot with "shutdown -r now" rather than a cold boot done with "shutdown -h now".  I have heard the chipset hardware is initialized differently in those two cases.  Anyhow, I will keep close track of this for a week or so, and if the problem has disappeared completely I will close the bug report at that point.
Comment 14 Alan W. Irwin 2008-02-25 10:09:27 UTC
The bad symptoms have now showed up again so I will keep this bug open. In this case there was no problem switching back and forth from the console for several days, but for the last switch I tried this morning, the reported symptoms immediately showed up again.  There have been no system upgrades or reboots or even restarts of X during this testing time.

In sum, this bug shows immediately after some but not all attempts to switch from the console back to X.  In some cases X may have to be run a couple days before switching to console and back triggers the bug, but in other cases the problem occurs right after a startx, ctrl-alt-F1, alt-F7 sequence.
Comment 15 Jesse Barnes 2008-03-18 18:33:20 UTC
Eric fixed some SDVO programming problems recently.  It would be weird if they caused this failure, but it's worth checking the git tree (use the 2.3 branch if you don't want to be exposed to some of the more experimental code).
Comment 16 Jesse Barnes 2008-03-25 15:10:28 UTC
What versions of everything are you using now?  Does your KDE session log show any errors?  Anything on the VT you started X from that might give us a clue?
Comment 17 Alan W. Irwin 2008-03-25 16:45:18 UTC
When Debian unstable changed from xserver-xorg-video-intel version 2:2.2.0.90-3 to the current 2:2.2.1-1, I verified the reported bug continued.  I plan to test again when your latest version of the Intel driver hits Debian unstable since I don't have the knowledge to build git versions.  

To answer your other question, in my previous tests of the bug I noticed nothing special in xsession-errors or the Linux console output, but I will look harder in the next test, but that may be a while, see above.
Comment 18 Jesse Barnes 2008-03-26 10:30:43 UTC
Ok, thanks.  I do remember seeing similar problems when running development X servers, but they were intermittent and I never nailed them down.  So this really seems like it's probably an X server issue as opposed to a driver issue.  But either way, we'll have to get more info in order to track down the problem.  If the KDE stuff is failing badly, there must be a log of what's happening somewhere, if not in .xsession-errors, then in some other kdm or gdm log file possibly?
Comment 19 Jesse Barnes 2008-03-26 10:33:16 UTC
Updating summary.
Comment 20 Alan W. Irwin 2008-03-26 10:47:37 UTC
I use the startx method so kdm and/or xdm log files are not relevant.  I will include .xsession-errors and captured startx output (as well as the usual xorg.conf and log files) the next time I have a problem to report.

Note 2.2.99.901 is now available from Debian experimental so I should be starting this and other tests shortly, but it sometimes takes a couple days of
my normal light production desktop use to see the errors so the tests are probably going to take a while before I have something to report.
Comment 21 Alan W. Irwin 2008-03-27 17:47:44 UTC
I confirm this bug still occurs for 2.2.99.901.  I was "lucky" on first try after I installed the Debian experimental version of the Intel driver on top of Debian unstable X.  To confirm the bug I first verified on the console with "ps auxww" that no old X executables were still running from anything before (I always do this now before every startx to make sure I get a clean start).  Then I ran startx which brought up KDE with two xterms going from my previous logout from KDE, but no other applications were running.  Then I immediately hit ctrl-alt-F1 to get first Linux console, then alt-f7 to get back into X which froze immediately (no mouse click would work) verifying the bug still exists.
I could only exit X by brute-force (ctrl-alt-BS).  I then immediately went through the same steps again, but this time there was no X freeze (and in fact it is still working) so the bug has an intermittent effect.

System environment (note most Debian X-related software has been considerably updated since the initial report):

Chipset: g33 (ASUS P5K-V MB)
System architecture: x86_64
Debian unstable package versions:
xf86-video-intel: packaged as xserver-xorg-video-intel version 2:2.2.99.901-1 (from Debian experimental)
xserver: packaged as xserver-xorg version 1:7.3+10 (X.Org X Server 1.4.0.90)
mesa: a number of different mesa-related packages with version 7.0.3~rc2-1
drm: packaged as libdrm2 version 2.3.0-4
kernel version: 2.6.24-1-amd64
Linux distribution: Combination of Debian experimental (for Intel driver) Debian unstable (for X and kernel) and Debian testing
Machine or mobo model: ASUS P5K-V

More information to follow as attachments.
Comment 22 Alan W. Irwin 2008-03-27 17:50:56 UTC
Created attachment 15524 [details]
.xsession-errors for X session that froze
Comment 23 Alan W. Irwin 2008-03-27 17:52:05 UTC
Created attachment 15525 [details]
file resulting from startx >& startx.out2 on X session that froze
Comment 24 Alan W. Irwin 2008-03-27 17:54:43 UTC
Created attachment 15526 [details]
xorg.conf used for X session that froze (used EXA for this test)
Comment 25 Alan W. Irwin 2008-03-27 17:56:16 UTC
Created attachment 15527 [details]
log file taken for working X session after freeze.  (Sorry, I forgot to preserve the one for the freeze).
Comment 26 Jesse Barnes 2008-03-27 19:33:50 UTC
So according to your .xsession-errors file, it looks like some KDE stuff is failing:
X Error: BadWindow (invalid Window parameter) 3
  Major opcode:  20
  Minor opcode:  0
  Resource id:  0x0
kded: Fatal IO error: client killed
ksmserver: Fatal IO error: client killed
ICE default IO error handler doing an exit(), pid = 25054, errno = 0
ICE default IO error handler doing an exit(), pid = 25073, errno = 0
ICE default IO error handler doing an exit(), pid = 25069, errno = 0
ICE default IO error handler doing an exit(), pid = 25074, errno = 0
ICE default IO error handler doing an exit(), pid = 25084, errno = 0
ICE default IO error handler doing an exit(), pid = 25072, errno = 0
ICE default IO error handler doing an exit(), pid = 25056, errno = 0
ICE default IO error handler doing an exit(), pid = 25052, errno = 0
GOT SIGHUP
startkde: Shutting down...
klauncher: Fatal IO error: client killed
DCOP aborting call from 'akregator' to 'klauncher'
DCOP aborting call from 'kicker' to 'klauncher'
kicker: Fatal IO error: client killed
akregator: Fatal IO error: client killed
unix_connect: can't connect to server (unix:/tmp/ksocket-irwin/localhost.localdomain-61e3-47ec360f)
startkde: Running shutdown scripts...
xprop:  unable to open display ':0'
startkde: Done.

Which would probably explain why your KDE desktop stops working...

Does the problem only happen if you VT switch before the desktop is fully up?  Or can you be running for a long time, VT switch, and then see the problem?  You may not have seen this the second time if everything was cached from the first run, making the desktop start faster...
Comment 27 Alan W. Irwin 2008-03-27 21:27:33 UTC
Unfortunately, this .xsession-errors includes the effects of my clean-ups preparing for the next X session.  This included killing artsd first followed by killall of kdeinit so it is probable everything after the first mention of artsd has nothing to do with the bug.  IOW, there is probably nothing relevant in .xsession-errors since the first mention of artsd is near the top of the file.

> Does the problem only happen if you VT switch before the desktop is fully up? 
Or can you be running for a long time, VT switch, and then see the problem? 
You may not have seen this the second time if everything was cached from the
first run, making the desktop start faster...

The desktop was fully up before I switched to console for this recent test that failed.  For example, my two standard initial xterms were launched and were ready for input.  However, I had done no actual work (no typing in either of those xterms or launching of any additional applications) with that desktop before I tried the test.  In the past, I have also seen the problem for a test after several days of doing work with the desktop.

Sorry my tests have not been much use to you to help pin down this elusive problem.  I suspect the only way you are going to get to the bottom of it is to reproduce it yourself with similar (i.e., cutting edge but not bleeding edge git) xserver, etc., versions to what I have on my system (which is why I tried to carefully document all relevant version numbers for this latest test).
Comment 28 Jesse Barnes 2008-05-21 13:40:45 UTC
Well, I suspect this is some sort of server problem rather than a driver problem, but either way we'll need to reproduce it to fix it.

Gordon, would it be possible for you to reproduce irwin's setup?
Comment 29 Gordon Jin 2008-05-22 20:19:57 UTC
VT switching works fine on my G33 and G35, with both master tip and 2_3_branch tip.
Comment 30 Steeve McCauley 2008-06-11 07:22:16 UTC
I'm seeing something like this with Fedora 9, when using the gnome fast user switch to switch between graphical VTs.  I can login with one user, then 'switch' to another user, but attempting to switch back locks up the system.  If I try changing VTs using Ctrl-Alt-F7 or Ctrl-Alt-F9 the screen locks up, and I can no longer access the system.   If I try to change to a text console and then back to graphics it locks up the system.  The problem only happens when the system is configured with the 'intel' driver, switching to the vesa driver eliminates the problem.

xorg-x11-server-common-1.4.99.901-29.20080415.fc9.x86_64

This is the output from lspci for the integrated intel graphics controller,

00:02.0 VGA compatible controller: Intel Corporation 82G33/G31 Express Integrated Graphics Controller (rev 02) (prog-if 00 [VGA controller])
        Subsystem: Lenovo Unknown device 302e
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at fe780000 (32-bit, non-prefetchable) [size=512K]
        I/O ports at dc00 [size=8]
        Memory at d0000000 (32-bit, prefetchable) [size=256M]
        Memory at fe600000 (32-bit, non-prefetchable) [size=1M]
        Capabilities: [90] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable-
        Capabilities: [d0] Power Management version 2
        Kernel modules: i915

Comment 31 Gordon Jin 2008-06-13 01:01:05 UTC
(In reply to comment #30)
> I'm seeing something like this with Fedora 9, when using the gnome fast user
> switch to switch between graphical VTs.  

The fast user switch bug may be fixed by this xf86-video-intel patch:

commit 36ec93300926084fb2951d69b001e4c67bc6ff79
Author: Eric Anholt <eric@anholt.net>
Date:   Tue May 6 18:48:20 2008 -0700

    Bug #15807: Fix use of the ring while VT-switched, hit by fast user
switching.

    The fix for flushing at blockhandler with no DRI on 965 was broken and
would
    try to flush the chip even when the driver wasn't in control of the VT.
    Hilarity ensued.
Comment 32 Jesse Barnes 2008-08-20 12:39:21 UTC
This one is just weird.  I'm going to close it though since we haven't heard about it recently; it was likely fixed by one of the several server fixes related to VT switching.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.