Bug 15187 - [G965, DVI] intel driver fails to restore text mode
Summary: [G965, DVI] intel driver fails to restore text mode
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: medium normal
Assignee: Jesse Barnes
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-03-24 04:01 UTC by Alexander E. Patrakov
Modified: 2008-08-20 14:41 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
xorg.conf (1.91 KB, text/plain)
2008-03-24 04:01 UTC, Alexander E. Patrakov
no flags Details
Good log from old driver (108.52 KB, text/plain)
2008-03-24 04:02 UTC, Alexander E. Patrakov
no flags Details
Log with this bug from new driver (119.09 KB, text/plain)
2008-03-24 04:04 UTC, Alexander E. Patrakov
no flags Details
Log with VGA connector - there is no bug (91.20 KB, text/plain)
2008-03-24 21:01 UTC, Alexander E. Patrakov
no flags Details
Dump after the 1st VT switch (6.85 KB, text/plain)
2008-03-27 10:08 UTC, Alexander E. Patrakov
no flags Details
Dump after the 2nd VT switch (6.85 KB, text/plain)
2008-03-27 10:09 UTC, Alexander E. Patrakov
no flags Details
Hack to made old i830_driver.c compile with new driver (9.75 KB, patch)
2008-03-28 03:08 UTC, Alexander E. Patrakov
no flags Details | Splinter Review
Don't allocate a pipe for hotplug detection (1.26 KB, patch)
2008-04-01 12:15 UTC, Jesse Barnes
no flags Details | Splinter Review
Fixup ordering and wait for vblank (15.13 KB, patch)
2008-05-21 13:51 UTC, Jesse Barnes
no flags Details | Splinter Review
Completely failed log after attachment 16669 (118.14 KB, text/plain)
2008-05-21 20:01 UTC, Alexander E. Patrakov
no flags Details

Description Alexander E. Patrakov 2008-03-24 04:01:16 UTC
Created attachment 15417 [details]
xorg.conf

Motherboard: DG965SS (with G965 onboard graphics)
Outputs: onboard VGA connector is unused, and a Samsung 770P monitor is connected to the DVI-D output of a HP ADD2 board.

Known to work: xf86-video-intel-2.1.1
Known to fail: xf86-video-intel-2.2.x

Problem: when I press Ctrl+Alt+F1 to switch from xorg to the text console, the screen gets blanked, and no text is visible. The same applies when I terminate the X server. However, if I return to X and then back to console, the text mode is restored properly (i.e., the bug affects only the first attempt to restore text mode).
Comment 1 Alexander E. Patrakov 2008-03-24 04:02:05 UTC
Created attachment 15418 [details]
Good log from old driver
Comment 2 Alexander E. Patrakov 2008-03-24 04:04:34 UTC
Created attachment 15419 [details]
Log with this bug from new driver

The bad log describes the following actions:

1) start X
2) wait for the Xfce desktop to show up
3) Ctrl+Alt+F1  (look at the blank screen for a few seconds)
4) Ctrl+Alt+F7  (look at the nice desktop)
5) Log out from Xfce session (text mode is restored properly)
Comment 3 Michael Fu 2008-03-24 18:00:34 UTC
Patrakov, does it work with the on-board VGA connector, using the same monitor?
Comment 4 Alexander E. Patrakov 2008-03-24 21:01:32 UTC
Created attachment 15436 [details]
Log with VGA connector - there is no bug
Comment 5 Jesse Barnes 2008-03-25 14:52:52 UTC
I can't reproduce this one with the git bits on my G965 desktop using a DVI connected LCD.  The text console takes a long time to come up after the VT switch, but it eventually does (1-3 seconds).

Gordon, have you seen this bug in your testing?
Comment 6 Gordon Jin 2008-03-26 19:25:09 UTC
(In reply to comment #5)
> Gordon, have you seen this bug in your testing?

No.

The ADD2-R card (with DVI interface) works worse on my G965 (Dell): The text mode is totally broken, even no display in system bootup, while works in X. Eric said the ADD2-R card is for BTX (SDV), and might not work well on ATX (production machine), although the size fits.

The ADD2-R card works quite well in the Cantiga SDV: switching between text mode and X is fine.
Comment 7 Alexander E. Patrakov 2008-03-26 21:20:56 UTC
Since nobody among developers can reproduce the issue exactly, I think it makes sense for me to bisect this. However, I don't know how to bisect git if a working release and a non-working release are on the different branches. Please provide instructions.
Comment 8 Gordon Jin 2008-03-27 05:52:34 UTC
(In reply to comment #7)
> Since nobody among developers can reproduce the issue exactly, I think it makes
> sense for me to bisect this. However, I don't know how to bisect git if a
> working release and a non-working release are on the different branches. Please
> provide instructions.

Yes, bisect will be helpful. Just bisecting on master branch should be fine.
Comment 9 Jesse Barnes 2008-03-27 08:51:53 UTC
Thanks Alexander, yeah a bisect might help us isolate things.  To get around bad commits, I think you'll have to use git reset --hard (see the "Avoiding to test a commit" section of "git help bisect"), or git bisect skip.  Hope that helps...
Comment 10 Alexander E. Patrakov 2008-03-27 08:57:38 UTC
eecd3ccedee6c4acf101591f7e60673660379e62 is first bad commit
commit eecd3ccedee6c4acf101591f7e60673660379e62
Author: Jesse Barnes <jbarnes@hobbes.virtuousgeek.org>
Date:   Thu Nov 8 09:31:08 2007 -0800

    Check DPLL status before writing PIPEnCONF regs
    
    If the DPLL isn't enabled or is in VGA mode, writing the PIPEnCONF registers
    may cause a hang or crash.  So ensure the DPLL is in the proper state before
    writing them.
    
    Another excellent fix from Peter Clifton.

:040000 040000 725c7783ac4d35124a439db0ace74f8614b0cba5 89f4958f2c7b0f5fb57ae387770541bb6547c4bc M	src
Comment 11 Alexander E. Patrakov 2008-03-27 09:11:43 UTC
...and this result is obviously useless, because the code touched by this commit is no longer in 2.2.99.901, but the bug is still there. What to do next?
Comment 12 Jesse Barnes 2008-03-27 09:28:11 UTC
Well the code did change a little.  See the commit diff for "Unconditionally restore pipe configuration".  You may be able to change that code a little to fix your problem, but afaict it's correct:  first we restore the DPLLs, then the pipe config regs, then the plane regs.  We may be missing a vblank wait or posting read somewhere though...
Comment 13 Alexander E. Patrakov 2008-03-27 09:31:13 UTC
It seems that bisection failed because I confused two similar bugs:

1) this bug that manifests itself just as a blank screen with apparently-valid synchronization,

2) another bug that manifests itself as a monitor not finding any signal and displaying "Analog" and 'Digital" labels to indicate where it is searching.
Comment 14 Jesse Barnes 2008-03-27 09:41:45 UTC
Ah, ok, that makes sense.  For awhile we had a bug where we wouldn't restore the pipe configuration correctly, which resulted in no monitor sync after VT switch.  But your "blank screen" problem has a sync'd screen just no display?  Can you capture a register dump when the screen is blanked and then again on a subsequent VT switch when it's working?
Comment 15 Alexander E. Patrakov 2008-03-27 09:43:26 UTC
Tomorrow I will try playing with the RestoreHWState() function, undoing the change between 2.2.99.901 and 2.1.1 drivers part by part.
Comment 16 Alexander E. Patrakov 2008-03-27 09:44:11 UTC
Please provide instructions on register dumping. I have no piiaccess-enabled X server, and no second computer to ssh from.
Comment 17 Jesse Barnes 2008-03-27 09:47:48 UTC
The register dumper doesn't depend on X, just libpciaccess, so if you have a recent enough version installed you should be able to build it.  It's in src/reg_dumper in the driver tree.  Just run it as root (I guess you'll have to do it blind after you VT switch) and redirect its output to a file...
Comment 18 Alexander E. Patrakov 2008-03-27 10:08:50 UTC
Created attachment 15511 [details]
Dump after the 1st VT switch
Comment 19 Alexander E. Patrakov 2008-03-27 10:09:19 UTC
Created attachment 15512 [details]
Dump after the 2nd VT switch
Comment 20 Jesse Barnes 2008-03-27 10:13:47 UTC
Hm, no interesting differences.  Which means there's a timing or ordering problem somewhere in the VT switch code that's causing your blank screen problem... but somehow the chip is ok in subsequent switches.
Comment 21 Alexander E. Patrakov 2008-03-27 21:09:18 UTC
After reverting RestoreHWState() implementation in the 2.2.99.901 driver to a copy of the code in 2.1.1, I still get the bug. So it is some other function.
Comment 22 Alexander E. Patrakov 2008-03-28 03:08:04 UTC
Created attachment 15534 [details] [review]
Hack to made old i830_driver.c compile with new driver

I have taken one more debugging step: in driver version 2.2.99.901, replaced i830_driver.c with its version from 2.1.1, modified just enough to compile (see the patch that I applied to the 2.1.1 version of i830_driver.c). The bug reproduced itself (blank screen on the first attempt to switch to text mode, works on subsequent attempts). So either the "backport" patch touches the code that is directly responsible for the bug (unlikely), or the bug is not in i830_driver.c.
Comment 23 Alexander E. Patrakov 2008-03-28 09:58:34 UTC
Any other idea how to debug this? maybe an ordered list of suspected files to selectively revert?
Comment 24 Jesse Barnes 2008-03-28 11:34:32 UTC
Hm, it's unfortunate that those other reverts didn't help.  There could be some stuff in i830_crtc.c and/or i830_sdvo.c that changed and caused this behavior as well (though nothing in the latter jumped out at me when I just checked).
Comment 25 Alexander E. Patrakov 2008-03-29 22:53:40 UTC
Reverting both i830_crtc.c and i830_sdvo.c helped. Reverting i830_sdvo.c alone didn't help. I haven't tried yet to revert only i830_crtc.c, or revert partially.
Comment 26 Alexander E. Patrakov 2008-03-29 23:41:45 UTC
Bisected (sort of). While testing, git suggested one revision where the text mode has been successfully restored, albeit much slower than normally. If I take this "slow" revision as "good":

$ git bisect log
# bad: [8a733ccbe054d9f7c731d9a6e1121e65ba367171] Bump version 2.2.99.901
# good: [69ea37ebf72b92d842aa80dfa3578e328c7ceaa3] Bump driver version to 2.1.1
git-bisect start 'xf86-video-intel-2.2.99.901' 'xf86-video-intel-2.1.1' '--' 'src/i830_crt.c'
# bad: [e720ae4476c3f986f623ce0f0ab9775b8b9b7e05] CRT hotplug detection improvements
git-bisect bad e720ae4476c3f986f623ce0f0ab9775b8b9b7e05
# good: [00f4587025a3879626623135b0a153fcdb906719] Ensure pipe/output active before doing load detection.
git-bisect good 00f4587025a3879626623135b0a153fcdb906719
# good: [ff2be3995d33f9e4b7f63b380f166b6168c9b9c6] Remove hard-coded CRT blanking frobbing for load detection.
git-bisect good ff2be3995d33f9e4b7f63b380f166b6168c9b9c6

All suggested revisions were tested by copying the src/i830_crt.c file from the bisect tree to the copy of the released 2.2.99.901 driver version, compiling, installing, rebooting and switching to VT1.

And the first bad commit is:

commit e720ae4476c3f986f623ce0f0ab9775b8b9b7e05
Author: Jesse Barnes <jbarnes@jbarnes-mobile.amr.corp.intel.com>
Date:   Mon Dec 10 13:00:14 2007 -0800

    CRT hotplug detection improvements
    
    Patch from Hong Liu.
    
    Fixup CRT detection by making sure the pipe is enabled before CRT
    detection actually occurs.  Fixes bugs Hong was seeing on G35 and other
    machines.

:040000 040000 efbf7c7388616ed581133a48d78b372090d4519e ba3260fedc2ff4be0c813dadd0ca2b469635d45f M	src

I will try to bisect again, marking the "slow" revision as "bad" now.
Comment 27 Alexander E. Patrakov 2008-03-30 00:06:28 UTC
The "slow" thing was not reproducible, so I again ended up with e720ae4476c3f986f623ce0f0ab9775b8b9b7e05 as the first bad commit. Reverting it and fixing the trivial conflict gave me a driver without this bug.
Comment 28 Jesse Barnes 2008-03-31 14:43:17 UTC
Wow, thanks a lot for the detailed bisection and work you've put into this.  I'm adding Hong to the cc list (it was his patch originally).
Comment 29 Hong Liu 2008-04-01 01:24:46 UTC
What the patch does is: assign PIPEA to VGA output and enable PIPEA, do a hotplug detection, then disable PIPEA. Later PIPEA will be assigend to DVI output.

I can't think of anything what this patch does can impact the VGA restore.
Jesse, what's your idea?

Would you please try the following patch (just a test patch to put VGA on PIPEB to see if the problem still occurs)?

diff --git a/src/i830_crt.c b/src/i830_crt.c
index 2a99f9c..30ecec8 100644
--- a/src/i830_crt.c
+++ b/src/i830_crt.c
@@ -442,6 +442,7 @@ i830_crt_init(ScrnInfoPtr pScrn)
 	i830_output->pipe_mask = (1 << 0);
     else
 	i830_output->pipe_mask = ((1 << 0) | (1 << 1));
+    i830_output->pipe_mask = (1 << 1);
     i830_output->clone_mask = ((1 << I830_OUTPUT_ANALOG) |
 			       (1 << I830_OUTPUT_DVO_TMDS));
     
Comment 30 Alexander E. Patrakov 2008-04-01 08:22:05 UTC
With the debugging patch by Hong Liu, the problem still exists.
Comment 31 Jesse Barnes 2008-04-01 12:15:01 UTC
Created attachment 15613 [details] [review]
Don't allocate a pipe for hotplug detection

Assuming there's something wrong with the load detect pipe code, this patch might help...
Comment 32 Hong Liu 2008-04-01 18:21:53 UTC
(In reply to comment #31)
> Created an attachment (id=15613) [details]
> Don't allocate a pipe for hotplug detection
> 
> Assuming there's something wrong with the load detect pipe code, this patch
> might help...
> 

This patch should help, but it will reintroduce the problem we see in bug 13252.
Comment 33 Alexander E. Patrakov 2008-04-01 21:09:54 UTC
The patch indeed helps against this bug. I can't reproduce bug 13252, so I can't judge whether the patch reintroduces it.

Do I understand correctly that in my situation, after the patch, the i830GetLoadDetectPipe() + i830ReleaseLoadDetectPipe() pair is simply not called? How can I debug which part of the card's state gets corrupted by this pair?
Comment 34 Hong Liu 2008-04-01 22:29:54 UTC
(In reply to comment #33)
> The patch indeed helps against this bug. I can't reproduce bug 13252, so I
> can't judge whether the patch reintroduces it.
> 

It used to happen on our test machines, maybe not reproducable on yours.

> Do I understand correctly that in my situation, after the patch, the
> i830GetLoadDetectPipe() + i830ReleaseLoadDetectPipe() pair is simply not
> called? How can I debug which part of the card's state gets corrupted by this
> pair?
> 

Yes, the patch omits the Get/ReleaseLoadDetecctPipe call. We usually use intel_reg_dump to dump the state of card, but in your problem, we can't find any difference in good and bad case.
Comment 35 Alexander E. Patrakov 2008-04-01 22:44:50 UTC
Then let's try other debugging techniques. E.g., start with your patch, unconditionally add a dummy CopyOf_i830GetLoadDetectPipe()/CopyOf_i830ReleaseLoadDetectPipe() pair at the top of i830_crt_detect(), where CopyOf... functions are initially just verbatim copies of the originals. Then, step by step, delete matching stuff from the end of CopyOf_i830GetLoadDetectPipe() and the beginning of CopyOf_i830ReleaseLoadDetectPipe() to see what part of the register programming breaks text mode restoration. However, as I know nothing about the card and also am too lazy to read long PDF manuals, it would be nice if you send some debugging patches of this kind (just for me to be sure that the deleted lines really match).

Other debugging suggestions are also welcome.

P.S. I am currently at work, with ATI card, so you'll have to wait while I return home to test your suggestions.
Comment 36 Jesse Barnes 2008-04-01 22:46:28 UTC
Hong, we need a load detect pipe allocated even for the hotplug bit?  I thought it was just needed for DDC & load detect...

If so, at least we know where the problem is.  When I traced through the code
today it seemed to restore all the registers correctly and such, but we may be
violating the ordering somehow, between pipe & output enable/disable...
Comment 37 Jesse Barnes 2008-04-03 16:01:04 UTC
Looks like the load detect code will do DPMS on of the CRTC before the output if it finds that an output already has a crtc assigned.  The main mode setting code though always sets the CRTC mode before the output, so that's one difference.  But you shouldn't be hitting that path since the VGA output doesn't have a crtc assigned yet...

However, it kind of makes sense that you'd only see this on the first VT switch after startup, since the detect routines (where we do the load detect pipe allocation stuff) only happens when the driver starts or we receive an xrandr request (might be interesting to see if you can reproduce the problem after running xrandr).

What's weird is that even in the startup path, we run the detect code but then restore all the hardware state to what it was before we probed, so anything done by i830(Get|Release)LoadDetectPipe should have been undone before we saved hardware state again at EnterVT time...  And nothing in the logs indicates that there are important register differences between your good & bad cases.  Which seems to imply that either RestoreHWState gets sabotaged by what the load detect pipe code does (not only that but sabotaged twice) or something else in the LeaveVT code needs to get run once before RestoreHWState will do its job right...
Comment 38 Jesse Barnes 2008-04-04 14:56:48 UTC
Can you try the patch in #15168 comment #13?  Looks like we're missing some posted write compensation in the RestoreHWState code...
Comment 39 Alexander E. Patrakov 2008-04-04 22:53:23 UTC
The blank screen indeed appears one time if I switch to VT1 after "xrandr --auto". As for the patch in bug 15168 comment #13, it doesn't help.
Comment 40 Hong Liu 2008-04-07 02:21:19 UTC
(In reply to comment #36)
> Hong, we need a load detect pipe allocated even for the hotplug bit?  I thought
> it was just needed for DDC & load detect...
> 
Yes, I remember we need to do load detect pipe allocation in order to get a correct output status.

I retried our Q965 machine without load detect pipe allocation today, the bug doesn't occur. I am rethinking bug 13252, it seems after several X restart, the pipe is disabled (not sure why, and this causes the detecting problem in that bug). In theory, pipeA connected to VGA output should be enabled by vbios when entering X server.

Maybe we can commit your patch now, and wait to see if bug 13525 will rehappen, at that time we need to find why pipe is disabled.



> If so, at least we know where the problem is.  When I traced through the code
> today it seemed to restore all the registers correctly and such, but we may be
> violating the ordering somehow, between pipe & output enable/disable...
> 

Comment 41 Jesse Barnes 2008-05-07 15:49:51 UTC
Alexander, did you ever get a chance to try the debugging you suggested in comment 35?  It sounds reasonable to me; I'd really like to figure out exactly what's wrong with the GetLoadDetectPipe function...
Comment 42 Alexander E. Patrakov 2008-05-07 20:11:20 UTC
No, I will try tomorrow. But the problem is still that I can't identify matching parts of these functions, so, without your debugging patches, I will have to guess.
Comment 43 Alexander E. Patrakov 2008-05-08 21:32:09 UTC
If I comment out the call to xf86CrtcSetMode (crtc, mode, RR_Rotate_0, 0, 0) in CopyOf_i830GetLoadDetectPipe(), text mode restoration works. Not surprising, since this is the only call that touches the hardware in this function.

So: should I debug further into xf86CrtcSetMode(), or is this information enough to highlight some sort of reentrancy problem?

P.S. I will do nothing without your guidance.
Comment 44 Jesse Barnes 2008-05-08 23:33:31 UTC
Thanks Alexander, that does narrow it down a little...  I'll try to think of some more debugging you can do, maybe we can capture some register dumps in between calls or something.
Comment 45 Jesse Barnes 2008-05-21 13:51:32 UTC
Created attachment 16669 [details] [review]
Fixup ordering and wait for vblank

Alexander, I did find a couple of problems in our VT switch code, can you try this patch?
Comment 46 Alexander E. Patrakov 2008-05-21 19:59:41 UTC
The patch (as applied to 2.3.0) made things much worse: instead of the desktop, I now get a flashing white screen. Switching back to the text console doesn't show text, even from the second attempt.
Comment 47 Alexander E. Patrakov 2008-05-21 20:01:09 UTC
Created attachment 16673 [details]
Completely failed log after attachment 16669 [details] [review]
Comment 48 Alexander E. Patrakov 2008-05-21 20:05:47 UTC
BTW, instead of guessing blindly, coule you please attempt to obtain an exact copy of my hardware?

Motherboard: Intel BLKDG965SSCK Desktop Board
ADD2 adapter: HP ADD2 SH DVI Adapter DY-674A
Monitor: Samsung 770P
Comment 49 Jesse Barnes 2008-05-21 22:56:23 UTC
Uhh, no.  We can't get a version of every bit of hardware out there.
Comment 50 Alexander E. Patrakov 2008-06-05 21:38:20 UTC
On my new Debian Lenny amd64 system with gdm as a display manager, I noticed that the bug often manifests itself, but sometimes it doesn't. Looks like some timing issue.

P.S. The bug has the NEEDINFO tag without the description what info is needed.
Comment 51 Jesse Barnes 2008-06-06 13:20:04 UTC
Yeah, it could be timing related, but given that we have a few other VT switch bugs open it may also be a logic bug somewhere...  Still looking, thanks for your patience.
Comment 52 Alexander E. Patrakov 2008-06-17 21:05:06 UTC
Still happens on my 32-bit Xfce system & custom autologin script with 2.3.2 driver. Hard to reproduce on x86-64 with KDE & KDM.

The autologin script is:

#!/bin/sh
[ -f /etc/default/autologin ] || exit 0
. /etc/default/autologin
[ -n "$USER" ] || exit 0
USERHOME=`eval "echo ~$USER"`
rm -f $USERHOME/.Xauthority
/bin/su - $USER -c "xauth add :0 . `mcookie`" 2>/dev/null
HOME=$USERHOME exec xinit /bin/su - $USER -c "DISPLAY=:0 /bin/sh /etc/X11/xinit/xinitrc" -- /etc/X11/X vt7 -nolisten tcp -auth $USERHOME/.Xauthority >/dev/null 2>/dev/null
Comment 53 Alexander E. Patrakov 2008-06-17 22:21:56 UTC
Sorry for the previous unclear comment. Even on x86-64 with KDM, the issue is triggered reliably after running "xrandr" without parameters.
Comment 54 Alexander E. Patrakov 2008-08-17 04:59:51 UTC
The bug can no longer be triggered in 2.4.1 + "avoid duplicate mode set in lvds" + "avoid duplicate SaveHWState" + "reduce intel driver bootup time" (patches by Shaohua Li). The "reduce intel driver bootup time" patch contains a hunk that looks relevant, but I am not sure whether it really fixes or just hides the problem.
Comment 55 Jesse Barnes 2008-08-18 10:05:56 UTC
I think it duplicates the logic from the patch in comment #31.  If it works ok we can push it; Gordon can you see if you can reproduce the hotplug CRTC allocation bug?  If not I think we should go with something like Shaohua's patch.
Comment 56 Jesse Barnes 2008-08-20 14:41:30 UTC
Fix pushed as 7b6f4d22211d71480caf6335a3eacaacff369371


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.