Created attachment 15417 [details]
Motherboard: DG965SS (with G965 onboard graphics)
Outputs: onboard VGA connector is unused, and a Samsung 770P monitor is connected to the DVI-D output of a HP ADD2 board.
Known to work: xf86-video-intel-2.1.1
Known to fail: xf86-video-intel-2.2.x
Problem: when I press Ctrl+Alt+F1 to switch from xorg to the text console, the screen gets blanked, and no text is visible. The same applies when I terminate the X server. However, if I return to X and then back to console, the text mode is restored properly (i.e., the bug affects only the first attempt to restore text mode).
Created attachment 15418 [details]
Good log from old driver
Created attachment 15419 [details]
Log with this bug from new driver
The bad log describes the following actions:
1) start X
2) wait for the Xfce desktop to show up
3) Ctrl+Alt+F1 (look at the blank screen for a few seconds)
4) Ctrl+Alt+F7 (look at the nice desktop)
5) Log out from Xfce session (text mode is restored properly)
Patrakov, does it work with the on-board VGA connector, using the same monitor?
Created attachment 15436 [details]
Log with VGA connector - there is no bug
I can't reproduce this one with the git bits on my G965 desktop using a DVI connected LCD. The text console takes a long time to come up after the VT switch, but it eventually does (1-3 seconds).
Gordon, have you seen this bug in your testing?
(In reply to comment #5)
> Gordon, have you seen this bug in your testing?
The ADD2-R card (with DVI interface) works worse on my G965 (Dell): The text mode is totally broken, even no display in system bootup, while works in X. Eric said the ADD2-R card is for BTX (SDV), and might not work well on ATX (production machine), although the size fits.
The ADD2-R card works quite well in the Cantiga SDV: switching between text mode and X is fine.
Since nobody among developers can reproduce the issue exactly, I think it makes sense for me to bisect this. However, I don't know how to bisect git if a working release and a non-working release are on the different branches. Please provide instructions.
(In reply to comment #7)
> Since nobody among developers can reproduce the issue exactly, I think it makes
> sense for me to bisect this. However, I don't know how to bisect git if a
> working release and a non-working release are on the different branches. Please
> provide instructions.
Yes, bisect will be helpful. Just bisecting on master branch should be fine.
Thanks Alexander, yeah a bisect might help us isolate things. To get around bad commits, I think you'll have to use git reset --hard (see the "Avoiding to test a commit" section of "git help bisect"), or git bisect skip. Hope that helps...
eecd3ccedee6c4acf101591f7e60673660379e62 is first bad commit
Author: Jesse Barnes <firstname.lastname@example.org>
Date: Thu Nov 8 09:31:08 2007 -0800
Check DPLL status before writing PIPEnCONF regs
If the DPLL isn't enabled or is in VGA mode, writing the PIPEnCONF registers
may cause a hang or crash. So ensure the DPLL is in the proper state before
Another excellent fix from Peter Clifton.
:040000 040000 725c7783ac4d35124a439db0ace74f8614b0cba5 89f4958f2c7b0f5fb57ae387770541bb6547c4bc M src
...and this result is obviously useless, because the code touched by this commit is no longer in 126.96.36.1991, but the bug is still there. What to do next?
Well the code did change a little. See the commit diff for "Unconditionally restore pipe configuration". You may be able to change that code a little to fix your problem, but afaict it's correct: first we restore the DPLLs, then the pipe config regs, then the plane regs. We may be missing a vblank wait or posting read somewhere though...
It seems that bisection failed because I confused two similar bugs:
1) this bug that manifests itself just as a blank screen with apparently-valid synchronization,
2) another bug that manifests itself as a monitor not finding any signal and displaying "Analog" and 'Digital" labels to indicate where it is searching.
Ah, ok, that makes sense. For awhile we had a bug where we wouldn't restore the pipe configuration correctly, which resulted in no monitor sync after VT switch. But your "blank screen" problem has a sync'd screen just no display? Can you capture a register dump when the screen is blanked and then again on a subsequent VT switch when it's working?
Tomorrow I will try playing with the RestoreHWState() function, undoing the change between 188.8.131.521 and 2.1.1 drivers part by part.
Please provide instructions on register dumping. I have no piiaccess-enabled X server, and no second computer to ssh from.
The register dumper doesn't depend on X, just libpciaccess, so if you have a recent enough version installed you should be able to build it. It's in src/reg_dumper in the driver tree. Just run it as root (I guess you'll have to do it blind after you VT switch) and redirect its output to a file...
Created attachment 15511 [details]
Dump after the 1st VT switch
Created attachment 15512 [details]
Dump after the 2nd VT switch
Hm, no interesting differences. Which means there's a timing or ordering problem somewhere in the VT switch code that's causing your blank screen problem... but somehow the chip is ok in subsequent switches.
After reverting RestoreHWState() implementation in the 184.108.40.2061 driver to a copy of the code in 2.1.1, I still get the bug. So it is some other function.
Created attachment 15534 [details] [review]
Hack to made old i830_driver.c compile with new driver
I have taken one more debugging step: in driver version 220.127.116.111, replaced i830_driver.c with its version from 2.1.1, modified just enough to compile (see the patch that I applied to the 2.1.1 version of i830_driver.c). The bug reproduced itself (blank screen on the first attempt to switch to text mode, works on subsequent attempts). So either the "backport" patch touches the code that is directly responsible for the bug (unlikely), or the bug is not in i830_driver.c.
Any other idea how to debug this? maybe an ordered list of suspected files to selectively revert?
Hm, it's unfortunate that those other reverts didn't help. There could be some stuff in i830_crtc.c and/or i830_sdvo.c that changed and caused this behavior as well (though nothing in the latter jumped out at me when I just checked).
Reverting both i830_crtc.c and i830_sdvo.c helped. Reverting i830_sdvo.c alone didn't help. I haven't tried yet to revert only i830_crtc.c, or revert partially.
Bisected (sort of). While testing, git suggested one revision where the text mode has been successfully restored, albeit much slower than normally. If I take this "slow" revision as "good":
$ git bisect log
# bad: [8a733ccbe054d9f7c731d9a6e1121e65ba367171] Bump version 18.104.22.1681
# good: [69ea37ebf72b92d842aa80dfa3578e328c7ceaa3] Bump driver version to 2.1.1
git-bisect start 'xf86-video-intel-22.214.171.1241' 'xf86-video-intel-2.1.1' '--' 'src/i830_crt.c'
# bad: [e720ae4476c3f986f623ce0f0ab9775b8b9b7e05] CRT hotplug detection improvements
git-bisect bad e720ae4476c3f986f623ce0f0ab9775b8b9b7e05
# good: [00f4587025a3879626623135b0a153fcdb906719] Ensure pipe/output active before doing load detection.
git-bisect good 00f4587025a3879626623135b0a153fcdb906719
# good: [ff2be3995d33f9e4b7f63b380f166b6168c9b9c6] Remove hard-coded CRT blanking frobbing for load detection.
git-bisect good ff2be3995d33f9e4b7f63b380f166b6168c9b9c6
All suggested revisions were tested by copying the src/i830_crt.c file from the bisect tree to the copy of the released 126.96.36.1991 driver version, compiling, installing, rebooting and switching to VT1.
And the first bad commit is:
Author: Jesse Barnes <email@example.com>
Date: Mon Dec 10 13:00:14 2007 -0800
CRT hotplug detection improvements
Patch from Hong Liu.
Fixup CRT detection by making sure the pipe is enabled before CRT
detection actually occurs. Fixes bugs Hong was seeing on G35 and other
:040000 040000 efbf7c7388616ed581133a48d78b372090d4519e ba3260fedc2ff4be0c813dadd0ca2b469635d45f M src
I will try to bisect again, marking the "slow" revision as "bad" now.
The "slow" thing was not reproducible, so I again ended up with e720ae4476c3f986f623ce0f0ab9775b8b9b7e05 as the first bad commit. Reverting it and fixing the trivial conflict gave me a driver without this bug.
Wow, thanks a lot for the detailed bisection and work you've put into this. I'm adding Hong to the cc list (it was his patch originally).
What the patch does is: assign PIPEA to VGA output and enable PIPEA, do a hotplug detection, then disable PIPEA. Later PIPEA will be assigend to DVI output.
I can't think of anything what this patch does can impact the VGA restore.
Jesse, what's your idea?
Would you please try the following patch (just a test patch to put VGA on PIPEB to see if the problem still occurs)?
diff --git a/src/i830_crt.c b/src/i830_crt.c
index 2a99f9c..30ecec8 100644
@@ -442,6 +442,7 @@ i830_crt_init(ScrnInfoPtr pScrn)
i830_output->pipe_mask = (1 << 0);
i830_output->pipe_mask = ((1 << 0) | (1 << 1));
+ i830_output->pipe_mask = (1 << 1);
i830_output->clone_mask = ((1 << I830_OUTPUT_ANALOG) |
(1 << I830_OUTPUT_DVO_TMDS));
With the debugging patch by Hong Liu, the problem still exists.
Created attachment 15613 [details] [review]
Don't allocate a pipe for hotplug detection
Assuming there's something wrong with the load detect pipe code, this patch might help...
(In reply to comment #31)
> Created an attachment (id=15613) [details]
> Don't allocate a pipe for hotplug detection
> Assuming there's something wrong with the load detect pipe code, this patch
> might help...
This patch should help, but it will reintroduce the problem we see in bug 13252.
The patch indeed helps against this bug. I can't reproduce bug 13252, so I can't judge whether the patch reintroduces it.
Do I understand correctly that in my situation, after the patch, the i830GetLoadDetectPipe() + i830ReleaseLoadDetectPipe() pair is simply not called? How can I debug which part of the card's state gets corrupted by this pair?
(In reply to comment #33)
> The patch indeed helps against this bug. I can't reproduce bug 13252, so I
> can't judge whether the patch reintroduces it.
It used to happen on our test machines, maybe not reproducable on yours.
> Do I understand correctly that in my situation, after the patch, the
> i830GetLoadDetectPipe() + i830ReleaseLoadDetectPipe() pair is simply not
> called? How can I debug which part of the card's state gets corrupted by this
Yes, the patch omits the Get/ReleaseLoadDetecctPipe call. We usually use intel_reg_dump to dump the state of card, but in your problem, we can't find any difference in good and bad case.
Then let's try other debugging techniques. E.g., start with your patch, unconditionally add a dummy CopyOf_i830GetLoadDetectPipe()/CopyOf_i830ReleaseLoadDetectPipe() pair at the top of i830_crt_detect(), where CopyOf... functions are initially just verbatim copies of the originals. Then, step by step, delete matching stuff from the end of CopyOf_i830GetLoadDetectPipe() and the beginning of CopyOf_i830ReleaseLoadDetectPipe() to see what part of the register programming breaks text mode restoration. However, as I know nothing about the card and also am too lazy to read long PDF manuals, it would be nice if you send some debugging patches of this kind (just for me to be sure that the deleted lines really match).
Other debugging suggestions are also welcome.
P.S. I am currently at work, with ATI card, so you'll have to wait while I return home to test your suggestions.
Hong, we need a load detect pipe allocated even for the hotplug bit? I thought it was just needed for DDC & load detect...
If so, at least we know where the problem is. When I traced through the code
today it seemed to restore all the registers correctly and such, but we may be
violating the ordering somehow, between pipe & output enable/disable...
Looks like the load detect code will do DPMS on of the CRTC before the output if it finds that an output already has a crtc assigned. The main mode setting code though always sets the CRTC mode before the output, so that's one difference. But you shouldn't be hitting that path since the VGA output doesn't have a crtc assigned yet...
However, it kind of makes sense that you'd only see this on the first VT switch after startup, since the detect routines (where we do the load detect pipe allocation stuff) only happens when the driver starts or we receive an xrandr request (might be interesting to see if you can reproduce the problem after running xrandr).
What's weird is that even in the startup path, we run the detect code but then restore all the hardware state to what it was before we probed, so anything done by i830(Get|Release)LoadDetectPipe should have been undone before we saved hardware state again at EnterVT time... And nothing in the logs indicates that there are important register differences between your good & bad cases. Which seems to imply that either RestoreHWState gets sabotaged by what the load detect pipe code does (not only that but sabotaged twice) or something else in the LeaveVT code needs to get run once before RestoreHWState will do its job right...
Can you try the patch in #15168 comment #13? Looks like we're missing some posted write compensation in the RestoreHWState code...
The blank screen indeed appears one time if I switch to VT1 after "xrandr --auto". As for the patch in bug 15168 comment #13, it doesn't help.
(In reply to comment #36)
> Hong, we need a load detect pipe allocated even for the hotplug bit? I thought
> it was just needed for DDC & load detect...
Yes, I remember we need to do load detect pipe allocation in order to get a correct output status.
I retried our Q965 machine without load detect pipe allocation today, the bug doesn't occur. I am rethinking bug 13252, it seems after several X restart, the pipe is disabled (not sure why, and this causes the detecting problem in that bug). In theory, pipeA connected to VGA output should be enabled by vbios when entering X server.
Maybe we can commit your patch now, and wait to see if bug 13525 will rehappen, at that time we need to find why pipe is disabled.
> If so, at least we know where the problem is. When I traced through the code
> today it seemed to restore all the registers correctly and such, but we may be
> violating the ordering somehow, between pipe & output enable/disable...
Alexander, did you ever get a chance to try the debugging you suggested in comment 35? It sounds reasonable to me; I'd really like to figure out exactly what's wrong with the GetLoadDetectPipe function...
No, I will try tomorrow. But the problem is still that I can't identify matching parts of these functions, so, without your debugging patches, I will have to guess.
If I comment out the call to xf86CrtcSetMode (crtc, mode, RR_Rotate_0, 0, 0) in CopyOf_i830GetLoadDetectPipe(), text mode restoration works. Not surprising, since this is the only call that touches the hardware in this function.
So: should I debug further into xf86CrtcSetMode(), or is this information enough to highlight some sort of reentrancy problem?
P.S. I will do nothing without your guidance.
Thanks Alexander, that does narrow it down a little... I'll try to think of some more debugging you can do, maybe we can capture some register dumps in between calls or something.
Created attachment 16669 [details] [review]
Fixup ordering and wait for vblank
Alexander, I did find a couple of problems in our VT switch code, can you try this patch?
The patch (as applied to 2.3.0) made things much worse: instead of the desktop, I now get a flashing white screen. Switching back to the text console doesn't show text, even from the second attempt.
Created attachment 16673 [details]
Completely failed log after attachment 16669 [details] [review]
BTW, instead of guessing blindly, coule you please attempt to obtain an exact copy of my hardware?
Motherboard: Intel BLKDG965SSCK Desktop Board
ADD2 adapter: HP ADD2 SH DVI Adapter DY-674A
Monitor: Samsung 770P
Uhh, no. We can't get a version of every bit of hardware out there.
On my new Debian Lenny amd64 system with gdm as a display manager, I noticed that the bug often manifests itself, but sometimes it doesn't. Looks like some timing issue.
P.S. The bug has the NEEDINFO tag without the description what info is needed.
Yeah, it could be timing related, but given that we have a few other VT switch bugs open it may also be a logic bug somewhere... Still looking, thanks for your patience.
Still happens on my 32-bit Xfce system & custom autologin script with 2.3.2 driver. Hard to reproduce on x86-64 with KDE & KDM.
The autologin script is:
[ -f /etc/default/autologin ] || exit 0
[ -n "$USER" ] || exit 0
USERHOME=`eval "echo ~$USER"`
rm -f $USERHOME/.Xauthority
/bin/su - $USER -c "xauth add :0 . `mcookie`" 2>/dev/null
HOME=$USERHOME exec xinit /bin/su - $USER -c "DISPLAY=:0 /bin/sh /etc/X11/xinit/xinitrc" -- /etc/X11/X vt7 -nolisten tcp -auth $USERHOME/.Xauthority >/dev/null 2>/dev/null
Sorry for the previous unclear comment. Even on x86-64 with KDM, the issue is triggered reliably after running "xrandr" without parameters.
The bug can no longer be triggered in 2.4.1 + "avoid duplicate mode set in lvds" + "avoid duplicate SaveHWState" + "reduce intel driver bootup time" (patches by Shaohua Li). The "reduce intel driver bootup time" patch contains a hunk that looks relevant, but I am not sure whether it really fixes or just hides the problem.
I think it duplicates the logic from the patch in comment #31. If it works ok we can push it; Gordon can you see if you can reproduce the hotplug CRTC allocation bug? If not I think we should go with something like Shaohua's patch.
Fix pushed as 7b6f4d22211d71480caf6335a3eacaacff369371