Summary: | [g33] desktop environment breakage after VT switch | ||
---|---|---|---|
Product: | xorg | Reporter: | Alan W. Irwin <Alan.W.Irwin1234> |
Component: | Driver/intel | Assignee: | Jesse Barnes <jbarnes> |
Status: | RESOLVED WORKSFORME | QA Contact: | Xorg Project Team <xorg-team> |
Severity: | normal | ||
Priority: | medium | CC: | Alan.W.Irwin1234, gordon.jin, steeve, zhenyu.z.wang |
Version: | 7.3 (2007.09) | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Bug Depends on: | |||
Bug Blocks: | 13493, 15000 | ||
Attachments: |
Description
Alan W. Irwin
2008-02-08 11:55:44 UTC
Created attachment 14221 [details]
X log file
Created attachment 14222 [details]
xorg.conf
what about using default exa accel method? I just tried the default exa method (new log file and xorg.conf to be attached), and for my desktop needs it seems to be okay so far. Thus, I will stick with it unless and until I run into any trouble with it. However, the original reported issue with switching to the console and back still persists with exactly the same symptoms. So I suspect when you fix it for exa, the problem will also go away for xaa. Created attachment 14269 [details]
Xorg.0.log with exa method
Created attachment 14270 [details]
xorg.conf with default exa
EXA locked up after an hour of light desktop use. Details at http://bugs.freedesktop.org/show_bug.cgi?id=14464 which I am keeping separate because it is likely a separate issue. I have made this a blocker bug for 13493 (release of the next version of Intel driver) because this bug completely disrupts the use of the Linux console with X for the case where the remaining kernel and X components are either at the latest released version or close to it (e.g., the mix of software versions available on Debian unstable as given above). I have heard in a different forum that the Intel driver team have so far been unable to replicate this bug for their G33 equipment, but the bug can be quickly and easily reproduced every time on my system. Thus, the question arises whether the Intel driver team are using a kernel and X software version mix similar to that for Debian unstable for their testing or some more exotic git bleeding-edge versions of same. It seems strange that this problem could be caused by the driver... do older versions not exhibit this behavior? If so, maybe you can bisect things down to the offending commit? Normally, I switch to the console at least once per startx (to background that task and to logout before switching back to X), and that worked without problems for The Debian testing version of X from 2007-11-01. From an old Debian bug report concerning how to set modelines for modern X that I kept from then, the X-related package versions were the following: xf86-video-intel: packaged as xserver-xorg-video-intel version 2:2.1.1-4 (unstable is currently 2:2.2.0.90-3) xserver: packaged as xserver-xorg-core version 2:1.4-3 (unstable is currently 2:1.4.1~git20080131-1, improved version information I should have previously supplied above with my initial bug report) drm: packaged as libdrm2 version 2.3.0-4 (unstable has the same version as testing) kernel version: 2.6.22-2-amd64 (my recent failing X tests for this bug report have been done with kernel version 2.6.23-1-amd64) So there is quite a bit of change between the older working X and the current one that exhibits this bug, and it is difficult to pin down which component change is the culprit. An even more interesting question, though, is why does this bug show so obviously for me, but not for anybody yet from the Intel driver team? I predict that if you try the versions mentioned in my original report, i.e., the Debian unstable version of X you will immediately see this bug. Yeah, it may be worth setting up something similar so we can see it. Gordon, is this something you guys can do? Now that I think about it a little, we've had some similar issues in the past. They involved xrandr clients connecting to the server while it was VT switched away. It's possible something similar is going on here, resulting in memory corruption rather than a crash like we saw in previous bugs. Zhenyu, does that ring a bell at all? One way to narrow it down would be start a minimal X environment (say just an xterm), do your switch away then switch back, then do 'startkde' from there. If that works fine, we'll know it's one of the KDE desktop apps starting up that's causing problems. I can no longer reproduce this bug. This is surprising since no kernel, drm, mesa, or X changes have been made on this system since my original report where the bug occurred with 100 per cent reliability (several tries with both XAA and EXA and with a simplified KDE desktop [typically two xterms, and that was it]). I absolutely don't know what to think now. The system has been rebooted at least once since the bug report, and that may have made some difference. BTW, it was probably a warm reboot with "shutdown -r now" rather than a cold boot done with "shutdown -h now". I have heard the chipset hardware is initialized differently in those two cases. Anyhow, I will keep close track of this for a week or so, and if the problem has disappeared completely I will close the bug report at that point. The bad symptoms have now showed up again so I will keep this bug open. In this case there was no problem switching back and forth from the console for several days, but for the last switch I tried this morning, the reported symptoms immediately showed up again. There have been no system upgrades or reboots or even restarts of X during this testing time. In sum, this bug shows immediately after some but not all attempts to switch from the console back to X. In some cases X may have to be run a couple days before switching to console and back triggers the bug, but in other cases the problem occurs right after a startx, ctrl-alt-F1, alt-F7 sequence. Eric fixed some SDVO programming problems recently. It would be weird if they caused this failure, but it's worth checking the git tree (use the 2.3 branch if you don't want to be exposed to some of the more experimental code). What versions of everything are you using now? Does your KDE session log show any errors? Anything on the VT you started X from that might give us a clue? When Debian unstable changed from xserver-xorg-video-intel version 2:2.2.0.90-3 to the current 2:2.2.1-1, I verified the reported bug continued. I plan to test again when your latest version of the Intel driver hits Debian unstable since I don't have the knowledge to build git versions. To answer your other question, in my previous tests of the bug I noticed nothing special in xsession-errors or the Linux console output, but I will look harder in the next test, but that may be a while, see above. Ok, thanks. I do remember seeing similar problems when running development X servers, but they were intermittent and I never nailed them down. So this really seems like it's probably an X server issue as opposed to a driver issue. But either way, we'll have to get more info in order to track down the problem. If the KDE stuff is failing badly, there must be a log of what's happening somewhere, if not in .xsession-errors, then in some other kdm or gdm log file possibly? Updating summary. I use the startx method so kdm and/or xdm log files are not relevant. I will include .xsession-errors and captured startx output (as well as the usual xorg.conf and log files) the next time I have a problem to report. Note 2.2.99.901 is now available from Debian experimental so I should be starting this and other tests shortly, but it sometimes takes a couple days of my normal light production desktop use to see the errors so the tests are probably going to take a while before I have something to report. I confirm this bug still occurs for 2.2.99.901. I was "lucky" on first try after I installed the Debian experimental version of the Intel driver on top of Debian unstable X. To confirm the bug I first verified on the console with "ps auxww" that no old X executables were still running from anything before (I always do this now before every startx to make sure I get a clean start). Then I ran startx which brought up KDE with two xterms going from my previous logout from KDE, but no other applications were running. Then I immediately hit ctrl-alt-F1 to get first Linux console, then alt-f7 to get back into X which froze immediately (no mouse click would work) verifying the bug still exists. I could only exit X by brute-force (ctrl-alt-BS). I then immediately went through the same steps again, but this time there was no X freeze (and in fact it is still working) so the bug has an intermittent effect. System environment (note most Debian X-related software has been considerably updated since the initial report): Chipset: g33 (ASUS P5K-V MB) System architecture: x86_64 Debian unstable package versions: xf86-video-intel: packaged as xserver-xorg-video-intel version 2:2.2.99.901-1 (from Debian experimental) xserver: packaged as xserver-xorg version 1:7.3+10 (X.Org X Server 1.4.0.90) mesa: a number of different mesa-related packages with version 7.0.3~rc2-1 drm: packaged as libdrm2 version 2.3.0-4 kernel version: 2.6.24-1-amd64 Linux distribution: Combination of Debian experimental (for Intel driver) Debian unstable (for X and kernel) and Debian testing Machine or mobo model: ASUS P5K-V More information to follow as attachments. Created attachment 15524 [details]
.xsession-errors for X session that froze
Created attachment 15525 [details]
file resulting from startx >& startx.out2 on X session that froze
Created attachment 15526 [details]
xorg.conf used for X session that froze (used EXA for this test)
Created attachment 15527 [details]
log file taken for working X session after freeze. (Sorry, I forgot to preserve the one for the freeze).
So according to your .xsession-errors file, it looks like some KDE stuff is failing: X Error: BadWindow (invalid Window parameter) 3 Major opcode: 20 Minor opcode: 0 Resource id: 0x0 kded: Fatal IO error: client killed ksmserver: Fatal IO error: client killed ICE default IO error handler doing an exit(), pid = 25054, errno = 0 ICE default IO error handler doing an exit(), pid = 25073, errno = 0 ICE default IO error handler doing an exit(), pid = 25069, errno = 0 ICE default IO error handler doing an exit(), pid = 25074, errno = 0 ICE default IO error handler doing an exit(), pid = 25084, errno = 0 ICE default IO error handler doing an exit(), pid = 25072, errno = 0 ICE default IO error handler doing an exit(), pid = 25056, errno = 0 ICE default IO error handler doing an exit(), pid = 25052, errno = 0 GOT SIGHUP startkde: Shutting down... klauncher: Fatal IO error: client killed DCOP aborting call from 'akregator' to 'klauncher' DCOP aborting call from 'kicker' to 'klauncher' kicker: Fatal IO error: client killed akregator: Fatal IO error: client killed unix_connect: can't connect to server (unix:/tmp/ksocket-irwin/localhost.localdomain-61e3-47ec360f) startkde: Running shutdown scripts... xprop: unable to open display ':0' startkde: Done. Which would probably explain why your KDE desktop stops working... Does the problem only happen if you VT switch before the desktop is fully up? Or can you be running for a long time, VT switch, and then see the problem? You may not have seen this the second time if everything was cached from the first run, making the desktop start faster... Unfortunately, this .xsession-errors includes the effects of my clean-ups preparing for the next X session. This included killing artsd first followed by killall of kdeinit so it is probable everything after the first mention of artsd has nothing to do with the bug. IOW, there is probably nothing relevant in .xsession-errors since the first mention of artsd is near the top of the file.
> Does the problem only happen if you VT switch before the desktop is fully up?
Or can you be running for a long time, VT switch, and then see the problem?
You may not have seen this the second time if everything was cached from the
first run, making the desktop start faster...
The desktop was fully up before I switched to console for this recent test that failed. For example, my two standard initial xterms were launched and were ready for input. However, I had done no actual work (no typing in either of those xterms or launching of any additional applications) with that desktop before I tried the test. In the past, I have also seen the problem for a test after several days of doing work with the desktop.
Sorry my tests have not been much use to you to help pin down this elusive problem. I suspect the only way you are going to get to the bottom of it is to reproduce it yourself with similar (i.e., cutting edge but not bleeding edge git) xserver, etc., versions to what I have on my system (which is why I tried to carefully document all relevant version numbers for this latest test).
Well, I suspect this is some sort of server problem rather than a driver problem, but either way we'll need to reproduce it to fix it. Gordon, would it be possible for you to reproduce irwin's setup? VT switching works fine on my G33 and G35, with both master tip and 2_3_branch tip. I'm seeing something like this with Fedora 9, when using the gnome fast user switch to switch between graphical VTs. I can login with one user, then 'switch' to another user, but attempting to switch back locks up the system. If I try changing VTs using Ctrl-Alt-F7 or Ctrl-Alt-F9 the screen locks up, and I can no longer access the system. If I try to change to a text console and then back to graphics it locks up the system. The problem only happens when the system is configured with the 'intel' driver, switching to the vesa driver eliminates the problem. xorg-x11-server-common-1.4.99.901-29.20080415.fc9.x86_64 This is the output from lspci for the integrated intel graphics controller, 00:02.0 VGA compatible controller: Intel Corporation 82G33/G31 Express Integrated Graphics Controller (rev 02) (prog-if 00 [VGA controller]) Subsystem: Lenovo Unknown device 302e Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at fe780000 (32-bit, non-prefetchable) [size=512K] I/O ports at dc00 [size=8] Memory at d0000000 (32-bit, prefetchable) [size=256M] Memory at fe600000 (32-bit, non-prefetchable) [size=1M] Capabilities: [90] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable- Capabilities: [d0] Power Management version 2 Kernel modules: i915 (In reply to comment #30) > I'm seeing something like this with Fedora 9, when using the gnome fast user > switch to switch between graphical VTs. The fast user switch bug may be fixed by this xf86-video-intel patch: commit 36ec93300926084fb2951d69b001e4c67bc6ff79 Author: Eric Anholt <eric@anholt.net> Date: Tue May 6 18:48:20 2008 -0700 Bug #15807: Fix use of the ring while VT-switched, hit by fast user switching. The fix for flushing at blockhandler with no DRI on 965 was broken and would try to flush the chip even when the driver wasn't in control of the VT. Hilarity ensued. This one is just weird. I'm going to close it though since we haven't heard about it recently; it was likely fixed by one of the several server fixes related to VT switching. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.