Bug 14249

Summary:

[G35] VGA not restored after S3 resume on Asus P5E-VM

Product:

xorg

Reporter:

Benjamin Pineau <ben.pineau>

Component:

Driver/intel

Assignee:

Jesse Barnes <jbarnes>

Status:

RESOLVED NOTOURBUG

QA Contact:

Xorg Project Team <xorg-team>

Severity:

normal

Priority:

medium

CC:

hong.liu, k00_fol, michael.fu

Version:

git

Keywords:

NEEDINFO

Hardware:

All

OS:

Linux (All)

Whiteboard:

i915 platform:

i915 features:

Attachments:

Description	Flags
2.6.24 kernel's .config	none
dmesg	none
lsmod	none
lspci -vvvxxxx before suspend-to-ram	none
lspci -vvvxxxx after suspend-to-ram	none
Xorg.0.log	none
xorg.conf	none
intel_reg_dumper's output just before suspending	none
intel_reg_dumper's output just after suspending	none
intel_reg_dumper's output final, after resumed and vbetool post	none
dmesg with drm debug=1, and doing a x11, suspend, resume vbetool post cycle	none
Re-enable pipes on resume	none
intel_reg_dump before suspend, running patched drm	none
intel_reg_dump after resume, running patched drm	none
commented dmesg logs from drm (debug=1) during switch to vt and back	none
2.6.24 kernel .config (if I didn't mess, framebuffer should be disabled)	none
reg dump before suspend, drm git from now + patch (git 6f19473191ae543fcc199d252c5865c0734d38ad)	none
reg dump after resume, drm git from now + patch (git 6f19473191ae543fcc199d252c5865c0734d38ad)	none
Save/restore MGGC register	none
lspci -xxx, after recompiling drm with the two attached patches and suspend+resuming	none
Read & write VGA regs via MMIO instead of port I/O	none

Description Benjamin Pineau 2008-01-25 07:55:54 UTC

Created attachment 13941 [details]
2.6.24 kernel's .config

On my Asus P5E-VM desktop (Intel G35, ICHR), switching to virtual console
(ctrl alt f1) won't work if I suspended-to-ram the machine before. In that
case, the monitor displays "no input signal" and stays black (until I
switch back to X11 via ctrl-alt-f7).

lspci -vvvxxxx output differs before and after then machine went suspended.
Many graphic cards registers differs (see attached lspci outputs), while
they stay consistent across reboots (an reciprocally, lspci look always
the same after a s2r).

Other symptoms I have on the machine when it has been s2r :
- If I stop X11 (ie. /etc/init.d/gdm stop), the console won't properly
  restore. X11 won't restart either (ie. /etc/init.d/gdm restart from ssh).
  Looks like the problem in bz #14218
  http://bugs.freedesktop.org/show_bug.cgi?id=14218
- The console is completely garbled when I reboot (and the reboot seems
  frozen at some point).

Indeed all those works well if the machine hasn't been suspend before
(switching to console works, reboot works, stopping/restarting xorg works).
The only way I found to fix the problem is to reboot (reset) the machine.

After a non working ctrl alt f1, those new lines appeared in Xorg.0.log :
(II) AIGLX: Suspending AIGLX clients for VT switch
(II) intel(0): xf86UnbindGARTMemory: unbind key 0
(II) intel(0): xf86UnbindGARTMemory: unbind key 1
(II) intel(0): xf86UnbindGARTMemory: unbind key 2
(II) intel(0): xf86UnbindGARTMemory: unbind key 3

Hardware :
  Asus P5E-VM HDMI motherboard (Intel G35 + ICH9R based)
  Belinea 101920 LCD screen (19'' 4/3, with DVI and VGA inputs)
Softwares versions :
  ubuntu hardy heron (testing version)
  kernel - 2.6.24, running 32bit
  xf86-video-intel - git head from today (January 25th 2008)
  xserver - 7.3 / 1.4.1 (Ubuntu provided git 20080118 snap)
  mesa - 7.0.2
  consolekit - 0.2.3
  libdrm2 - 2.3.0
  libxrandr - 1.2.2

Attached Xorg.0.log, lsmod, xorg.conf, kernel .config and :
- lspci -vvvxxxx before suspending
- lspci -vvvxxxx after suspending

Comment 1 Benjamin Pineau 2008-01-25 07:56:25 UTC

Created attachment 13942 [details]
dmesg

Comment 2 Benjamin Pineau 2008-01-25 07:56:41 UTC

Created attachment 13943 [details]
lsmod

Comment 3 Benjamin Pineau 2008-01-25 07:57:12 UTC

Created attachment 13944 [details]
lspci -vvvxxxx before suspend-to-ram

Comment 4 Benjamin Pineau 2008-01-25 07:57:32 UTC

Created attachment 13945 [details]
lspci -vvvxxxx after suspend-to-ram

Comment 5 Benjamin Pineau 2008-01-25 07:57:57 UTC

Created attachment 13946 [details]
Xorg.0.log

Comment 6 Benjamin Pineau 2008-01-25 07:58:19 UTC

Created attachment 13947 [details]
xorg.conf

Comment 7 Jesse Barnes 2008-01-27 06:35:23 UTC

Yeah, this is a known issue.  The fix is to use updated (i.e. from git) DRM bits that suspend/resume VGA state in addition to graphics state.  Can you give that a try and make sure it works for you?

Comment 8 Benjamin Pineau 2008-01-28 08:35:35 UTC

I updated drm and libdrm to git head, it didn't fix the issue.

I only updated drm kernel modules (drm.ko and i915.ko) and
libdrm (and already have a fairly recent xf86-video-intel git
snap). Should I also update the mesa lib or xserver ?

I investigated through my distro suspend scripts to reproduce
(and test several options) manually. Worth to note:

A pure and simple "echo mem > /sys/power/state" won't work
at all. I mean, when resumed, the screen remains black even
under X (with my distro scripts, only the vt remains black
but the X session and display are properly restored). To
get the display back under xorg, I must do a "vbetool post".
I guess this not worth a bug report, since it is handled by
vendors/distros suspend-resume scripts' quirks.

Doing "vbetool vgamode set 3" during resume (my distro's scripts
tries to do this), either with or without latest git drm, outputs
"Function not supported".
Same error message when I do a "vbetool vbestate restore <
/var/lib/acpi-support/vbestate" during resume (this file,
/var/lib/acpi-support/vbestate is generated before suspend
with a "vbetool vbestate save" without error message).
I don't know if they completely fail, but those two commands are
not useful here (they don't improve the "console broken" situation,
and removing them does not block xorg display to restore properly),
even if they are executed by default by my distro scripts (that's
part of pm-utils, that handles suspend-resume on Fedora and Ubuntu).
Doing a "vbetool dpms on" at resume didn't help either.

I also tried several "sysctl -w kernel.acpi_video_flags=x"
(setting it to 0, 1, 2 and 3 before suspend) with no success.

Also tested, without success (same "broken console" problem) :
- a 2.6.24 kernel without framebuffer (with and without drm from git)
- a 2.6.22 kernel with default included drm
- XAA instead of EXA
- with a DVI attached monitor instead of VGA
- not loading (blacklisting) drm and i915 modules

So, the minimal suspend-resume script to reproduce the problem
here (and without breaking x11 after resume) is :

#!/bin/sh
VT=$(fgconsole)
chvt 1
echo -n mem > /sys/power/state
vbetool post </dev/tty0
chvt $VT

Comment 9 Jesse Barnes 2008-01-28 12:17:18 UTC

This may not be a suspend/resume problem per se then.  We've had reports of mode setting in general being flaky on some of these types of machines, maybe the timing is just right at suspend time to hit that bug most or all of the time.

The updated DRM bits (with suspend/resume hooks) are supposed to eliminate the need for vbetool stuff in your suspend/resume scripts.  If possible, it would be good if you could get intel_reg_dumper output (it's in src/reg_dumper in the xf86-video-intel tree) from before the suspend and then after the resume, possibly from a network console.  Since X is switched away from before suspend, you should capture state prior to the suspend but after doing a VT switch to a text terminal.  Then on resume, try to capture it again before you try to switch back to X.

Comment 10 Benjamin Pineau 2008-01-28 15:31:42 UTC

Created attachment 13993 [details]
intel_reg_dumper's output just before suspending

Comment 11 Benjamin Pineau 2008-01-28 15:32:14 UTC

Created attachment 13994 [details]
intel_reg_dumper's output just after suspending

Comment 12 Benjamin Pineau 2008-01-28 15:32:53 UTC

Created attachment 13995 [details]
intel_reg_dumper's output final, after resumed and vbetool post

Comment 13 Benjamin Pineau 2008-01-28 15:34:27 UTC

Created attachment 13996 [details]
dmesg with drm debug=1, and doing a x11, suspend, resume vbetool post cycle

Comment 14 Benjamin Pineau 2008-01-28 15:35:47 UTC

I did 3 intel_reg_dumper's dumps (while running git's drm.ko and
i915.ko): one just before the suspend, one just at resume, and the
last (wasn't asked for but...) after a vbetool post. This means:

#!/bin/sh
VT=$(fgconsole)
chvt 1
  intel_reg_dumper > ~/regdump_just_before
echo -n mem > /sys/power/state
  intel_reg_dumper > ~/regdump_just_after
vbetool post </dev/tty0
  intel_reg_dumper > ~/regdump_final_after_post
chvt $VT

Strangely the dump just after resume is identical to the final dump
after vbe post (although this post made enough of a difference to
restore the x11 display back).

While at it, also booted and loaded drm with debug=1. Then did a
classic gdm start, suspend, resume, vbetool post. Output in the
last attached dmesg file.

Your comment about a possible race condition reminded me that once,
when I tried resuming without the vbetool post quirk, I saw the
xorg session display resumed just a tiny fraction of second before 
the screen goes definitively black.

Comment 15 Michael Fu 2008-01-28 19:01:52 UTC

I'm ccing Hong. Not sure if it's related with a weird blanking screen bug "resolved " by touching/reading all regs again...

Comment 16 Jesse Barnes 2008-02-06 15:20:16 UTC

Created attachment 14182 [details] [review]
Re-enable pipes on resume

Can you give this patch a try?  It should apply to the git version of DRM and correctly re-enable your pipes (I noticed in the reg dumper output that they were disabled).

Comment 17 Benjamin Pineau 2008-02-07 02:28:38 UTC

(In reply to comment #16)
> Created an attachment (id=14182) [details]
> Re-enable pipes on resume
> 
> Can you give this patch a try?  It should apply to the git version of DRM and
> correctly re-enable your pipes (I noticed in the reg dumper output that they
> were disabled).

Well done, this patch is a net improvement! 

It does not fixes the console brokenness after resuming, but it obsoletes the need for the "vbetool post" workaround (was needed to get the X11 display back).
 
With this patch applied, I can suspend & resume with just a pure "echo -n mem > /sys/power/state" and no other quirk at all, that's impressive.

Comment 18 Jesse Barnes 2008-02-07 09:31:30 UTC

Hm, now I wonder if you're seeing 14236... can you get some pre- and post-resume register dumps now that you're running the patched DRM?  I'm curious what differences there are that might account for your corruption.  A screenshot or photo would also be nice.

Comment 19 Benjamin Pineau 2008-02-07 11:09:00 UTC

Created attachment 14199 [details]
intel_reg_dump before suspend, running patched drm

Comment 20 Benjamin Pineau 2008-02-07 11:11:03 UTC

Created attachment 14200 [details]
intel_reg_dump after resume, running patched drm

Comment 21 Benjamin Pineau 2008-02-07 11:12:20 UTC

Created attachment 14201 [details]
commented dmesg logs from drm (debug=1) during switch to vt and back

Comment 22 Benjamin Pineau 2008-02-07 11:17:07 UTC

Yes I've seen #14236. The main reason why I opened a new bug: I
don't have any problem after hibernation (the bug #14236's reporter
says it have a similar problem both after s2r and s2d). Suspend-to-disk
(hibernation) works here, and doesn't break VT.

For the screenshot: that would be just a boring black screen; I have
no display distortion/corruption at all; when I switch to console
after a suspend-resume cycle, the screen behaves exactly like it does
when I shutdown the computer or pull off the wire: it blanks, writes
out "No input connection" for a few seconds, and remains blank.

I looked a the differences in dmesg logs (with drm debug=1) when
switching to console, and when sitching back to X11, with both a sane
system (that hasn't been suspended before), and a broken-console system.
The logs are almost totaly identical; the only visible difference shows
up when I switch back to xorg : this, on a sane/working system:
 [drm:drm_unlocked_ioctl] pid=5482, cmd=0x4018641b, nr=0x1b, dev 0xe200, auth=1
 [drm:drm_unlocked_ioctl] ret = -22
become that, on a previously suspend system:
 [drm:drm_unlocked_ioctl] pid=5482, cmd=0x4004644d, nr=0x4d, dev 0xe200, auth=1
 [drm:drm_unlocked_ioctl] pid=5482, cmd=0x40446440, nr=0x40, dev 0xe200, auth=1
Attached the compiled and commented relevant parts of dmesg. It probably
does no matter, but just in case...

Comment 23 Jesse Barnes 2008-02-07 11:26:40 UTC

The register dumps look strange, it's as if the VGA registers are being completely clobbered... are you sure you're running DRM modules from git as of today (I just checked in a couple of fixes)?  Or maybe your suspend/resume scripts are doing a 'vbetool post' or similar?  Doesn't look like you have any fb drivers builtin or loaded...

Stuff like this:

-(II):                 CR00: 0x5f
-(II):                 CR01: 0x4f
-(II):                 CR02: 0x50
-(II):                 CR03: 0x82
-(II):                 CR04: 0x55
-(II):                 CR05: 0x81
-(II):                 CR06: 0xbf
-(II):                 CR07: 0x1f
+(II):                 CR00: 0x00
+(II):                 CR01: 0x00
+(II):                 CR02: 0x00
+(II):                 CR03: 0x80
+(II):                 CR04: 0x00
+(II):                 CR05: 0x00
+(II):                 CR06: 0x00
+(II):                 CR07: 0x00

definitely shouldn't happen in the latest code, since it explicitly saves and restores these registers.  But differences like these would definitely explain your VT corruption on resume.

Comment 24 Benjamin Pineau 2008-02-07 11:41:32 UTC

The attached dumps where done with the drm git tip from a few hours ago
(before your two last commits, I'm at 76748efae2f51409813eeb6b91b783c73cb2845e)+
the attached patch. I didn't use vbetool (thanks to your patch).

Also I'm not 100% sure I've disabled all the necessary things to remove
framebuffer totally from kernel. So I attach my .config for verification 
(that's for a vanilla 2.6.24).

I'll update my drm to latest git and repost the register dumps in a few minutes.

Comment 25 Benjamin Pineau 2008-02-07 11:42:35 UTC

Created attachment 14203 [details]
2.6.24 kernel .config (if I didn't mess, framebuffer should be disabled)

Comment 26 Benjamin Pineau 2008-02-07 11:59:44 UTC

Created attachment 14204 [details]
reg dump before suspend, drm git from now + patch (git 6f19473191ae543fcc199d252c5865c0734d38ad)

Comment 27 Benjamin Pineau 2008-02-07 12:00:36 UTC

Created attachment 14205 [details]
reg dump after resume, drm git from now + patch (git 6f19473191ae543fcc199d252c5865c0734d38ad)

Comment 28 Benjamin Pineau 2008-02-07 12:08:01 UTC

Just for the record, the two latests register dumps (attachments 14204 and 14205)
are generated with a 2.6.24 vanilla kernel (compiled with the .config in attachment 14203 [details]), and using git tip drm as of now
(6f19473191ae543fcc199d252c5865c0734d38ad) plus the patch attached to this bug.
I did exactly this to dump the registers (from an xterm) :

#!/bin/sh
VT=$(fgconsole)
chvt 1 
intel_reg_dumper > /tmp/reg_dump_patched_drm_before_susp
echo -n mem > /sys/power/state 
intel_reg_dumper > /tmp/reg_dump_patched_drm_after_susp
chvt $VT

Comment 29 Jesse Barnes 2008-02-07 15:20:25 UTC

Created attachment 14207 [details] [review]
Save/restore MGGC register

Given that the VGA registers don't seem to be restored, I wonder if VGA routing on your bridge is disabled for some reason...

Can you try out this patch?

Comment 30 Benjamin Pineau 2008-02-07 15:45:08 UTC

Created attachment 14208 [details]
lspci -xxx, after recompiling drm with the two attached patches and suspend+resuming

lspci -xxx with the two patches applied on top of today drm git tip, and after suspending and resuming.

attachment 14207 [details] [review] didn't fix the problem. 
intel_register_dump outputs (before, and after suspend) are identical to the previous attached versions.

Comment 31 Jesse Barnes 2008-02-07 16:03:57 UTC

Hm, no looks like we can't really save/restore that register w/o resetting the chip altogether, since it's RO status is controlled by the SMRAM reg.  But the fact that post-suspend its value is 0x0030 and post-resume it's 0x0002 makes it seem like the BIOS did something bad...

Comment 32 Jesse Barnes 2008-02-07 17:03:18 UTC

Created attachment 14209 [details] [review]
Read & write VGA regs via MMIO instead of port I/O

Ok, here's a crazy and totally untested patch.  It may cause resume to just hard hang, but it may also give you a console back (I doubt it'll still work though).

Comment 33 Benjamin Pineau 2008-02-11 01:06:56 UTC

I already told the results to James Barnes directly, but for 
the record, and in case we or someone else would take a look 
at this bug later: the above patch (attachment 14209 [details] [review]) prevents 
the system from suspending (when trying to suspend, I'm left 
with a perpetual blinking cursor on a black terminal).
So, being unable to suspend, I can't tell if it helps to resume.

Something else, if the motherboard BIOS may be the culprit: I
use the latest Asus provided BIOS as of now (version 0405).

There's also a BIOS "Repost video on S3 resume" option but that 
seems totally ineffective for this problem (and was ineffective
to solve the need for "vbetool post on resume" before this other 
bug was fixed James Barnes with - now commited - attachment 14182 [details] [review]).

Comment 34 Jesse Barnes 2008-02-13 09:40:02 UTC

Updating summary.  Would be interesting to find out if other G35 users have the same problem.  If they do, it might be a bug in the Intel provided BIOS bits for G35 based systems, rather than an Asus specific problem.

Comment 35 Jesse Barnes 2008-02-13 09:41:27 UTC

Also, I reported this issue to Asus, since it really looks like the MGGC GMCH register is set to the wrong value on resume, disabling VGA access entirely.

Comment 36 Joel 2008-03-02 14:42:40 UTC

For what it's worth, this seems to be an issue for another platform as well
http://vip.asus.com/forum/view.aspx?id=20080205223624640&board_id=1&model=P5E-VM+HDMI&page=1&SLanguage=en-us

One reports that it's working with an older processor, but not with new. Don't know much, but I would like to attribute that to a hardware or bios bug.

Comment 37 Jesse Barnes 2008-03-02 15:51:26 UTC

Ah, interesting, thanks for the link.  Sounds like it may be a BIOS issue (possibly related to a certain CPU in the board).  Hope Asus finds & fixes it soon...

Comment 38 Benjamin Pineau 2008-03-02 23:43:33 UTC

According to the above link, this similar bug occurs on Vista with e8400 wolfedale cpus. I have a Kentsfield, Intel Core 2 Quad Q6600 revision G0.

Comment 39 Joel 2008-03-06 14:34:32 UTC

I can also confirm this bug on:
Celeron 420 (stepping 1), Asus P5E-VM HDMI (Bios rev 0301)
Using intel driver version 2.2.1-, on 
Kernel 2.6.24-ARCH #1 SMP PREEMPT, x86_64

(II) Module intel: vendor="X.Org Foundation"
        compiled for 1.4.0.90, module version = 2.2.1
        Module class: X.Org Video Driver
        ABI class: X.Org Video Driver, version 2.0

Also, after triggering the bug somethinf freaky happened to me after playing with screen rotation. 
Instead of rotating the screen back, my computer seemed to black out. No video output in X, no video in console. I could tell it switched, because of my num-lock status changed.

I tried to do trigger a reboot, but nothing seemed to happen. (My computer is virtually silent, and it's a bit almost impossible to hear it work.) After around a minute of waiting, I still didn't have VGA output, so I powered it down, and did a cold boot. 

STILL, VGA output was not restored. (I'm not sure it even reached post) I tried to press F1 (Since that's what I usually do since bios halt and complain that my system do not have a master IDE drive. All this to no avail.

I was unable to get any life from it until I removed main power cable and replugged it. After that the machine booted up, but all cmos data was gone. Clock, but not date, was also reset. I'm not too keen on trying to reproduce this, but I will report it to Asus, after I get some sleep.

Comment 40 Joel 2008-03-06 14:42:09 UTC

p.s.

An afterthougt:
Bios being cleared, could be the result of not being able too post for a few times and loading fail-safe defaults. Me pressing F1 blindly might have approved it. Or maybe It was restored trough asus safety-net for bios corruption. I have no idea.

-Joel F

Comment 41 Michael Fu 2008-03-14 06:07:07 UTC

well, can I mark this as NOTOURBUG now?

Comment 42 Jesse Barnes 2008-03-14 09:53:29 UTC

Yeah, probably.  I've been talking with Asus about it and they seem to have an idea of what's going on, but I don't know if they've released an update to fix the problem yet.  Ben, have you checked their website recently?  Do you still see this problem with the latest BIOS bits?

Comment 43 Benjamin Pineau 2008-03-15 03:12:47 UTC

Asus didn't released any new BIOS version since January; I'm already using the latest one (0405).

And yes, it seems clear now that this is a plain NOTOURBUG (esp. since we know Windows has similar problems with this motherboard).

Thanks for your help in hunting this bug (and also, thank you for fixing an other bug in the process (ie. the drm patch obsoleting the need for "vbetool post")).

Comment 44 Joel 2008-03-20 05:44:36 UTC

New bios 0503 is supposed to fix this, I'll attempt the upgrade now and test it for linux.

http://vip.asus.com/forum/view.aspx?SLanguage=en-us&id=20080205223624640&board_id=1&model=P5E-VM%20HDMI&page=3&count=21

Wish me luck! =)

Comment 45 Joel 2008-03-20 06:26:41 UTC

It did not seem to help me at all. I lost vga outside X, and did not get it back until after a cold boot. :(

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.