Description
Benjamin Pineau
2008-01-25 07:55:54 UTC
Created attachment 13942 [details]
dmesg
Created attachment 13943 [details]
lsmod
Created attachment 13944 [details]
lspci -vvvxxxx before suspend-to-ram
Created attachment 13945 [details]
lspci -vvvxxxx after suspend-to-ram
Created attachment 13946 [details]
Xorg.0.log
Created attachment 13947 [details]
xorg.conf
Yeah, this is a known issue. The fix is to use updated (i.e. from git) DRM bits that suspend/resume VGA state in addition to graphics state. Can you give that a try and make sure it works for you? I updated drm and libdrm to git head, it didn't fix the issue. I only updated drm kernel modules (drm.ko and i915.ko) and libdrm (and already have a fairly recent xf86-video-intel git snap). Should I also update the mesa lib or xserver ? I investigated through my distro suspend scripts to reproduce (and test several options) manually. Worth to note: A pure and simple "echo mem > /sys/power/state" won't work at all. I mean, when resumed, the screen remains black even under X (with my distro scripts, only the vt remains black but the X session and display are properly restored). To get the display back under xorg, I must do a "vbetool post". I guess this not worth a bug report, since it is handled by vendors/distros suspend-resume scripts' quirks. Doing "vbetool vgamode set 3" during resume (my distro's scripts tries to do this), either with or without latest git drm, outputs "Function not supported". Same error message when I do a "vbetool vbestate restore < /var/lib/acpi-support/vbestate" during resume (this file, /var/lib/acpi-support/vbestate is generated before suspend with a "vbetool vbestate save" without error message). I don't know if they completely fail, but those two commands are not useful here (they don't improve the "console broken" situation, and removing them does not block xorg display to restore properly), even if they are executed by default by my distro scripts (that's part of pm-utils, that handles suspend-resume on Fedora and Ubuntu). Doing a "vbetool dpms on" at resume didn't help either. I also tried several "sysctl -w kernel.acpi_video_flags=x" (setting it to 0, 1, 2 and 3 before suspend) with no success. Also tested, without success (same "broken console" problem) : - a 2.6.24 kernel without framebuffer (with and without drm from git) - a 2.6.22 kernel with default included drm - XAA instead of EXA - with a DVI attached monitor instead of VGA - not loading (blacklisting) drm and i915 modules So, the minimal suspend-resume script to reproduce the problem here (and without breaking x11 after resume) is : #!/bin/sh VT=$(fgconsole) chvt 1 echo -n mem > /sys/power/state vbetool post </dev/tty0 chvt $VT This may not be a suspend/resume problem per se then. We've had reports of mode setting in general being flaky on some of these types of machines, maybe the timing is just right at suspend time to hit that bug most or all of the time. The updated DRM bits (with suspend/resume hooks) are supposed to eliminate the need for vbetool stuff in your suspend/resume scripts. If possible, it would be good if you could get intel_reg_dumper output (it's in src/reg_dumper in the xf86-video-intel tree) from before the suspend and then after the resume, possibly from a network console. Since X is switched away from before suspend, you should capture state prior to the suspend but after doing a VT switch to a text terminal. Then on resume, try to capture it again before you try to switch back to X. Created attachment 13993 [details]
intel_reg_dumper's output just before suspending
Created attachment 13994 [details]
intel_reg_dumper's output just after suspending
Created attachment 13995 [details]
intel_reg_dumper's output final, after resumed and vbetool post
Created attachment 13996 [details]
dmesg with drm debug=1, and doing a x11, suspend, resume vbetool post cycle
I did 3 intel_reg_dumper's dumps (while running git's drm.ko and i915.ko): one just before the suspend, one just at resume, and the last (wasn't asked for but...) after a vbetool post. This means: #!/bin/sh VT=$(fgconsole) chvt 1 intel_reg_dumper > ~/regdump_just_before echo -n mem > /sys/power/state intel_reg_dumper > ~/regdump_just_after vbetool post </dev/tty0 intel_reg_dumper > ~/regdump_final_after_post chvt $VT Strangely the dump just after resume is identical to the final dump after vbe post (although this post made enough of a difference to restore the x11 display back). While at it, also booted and loaded drm with debug=1. Then did a classic gdm start, suspend, resume, vbetool post. Output in the last attached dmesg file. Your comment about a possible race condition reminded me that once, when I tried resuming without the vbetool post quirk, I saw the xorg session display resumed just a tiny fraction of second before the screen goes definitively black. I'm ccing Hong. Not sure if it's related with a weird blanking screen bug "resolved " by touching/reading all regs again... Created attachment 14182 [details] [review] Re-enable pipes on resume Can you give this patch a try? It should apply to the git version of DRM and correctly re-enable your pipes (I noticed in the reg dumper output that they were disabled). (In reply to comment #16) > Created an attachment (id=14182) [details] > Re-enable pipes on resume > > Can you give this patch a try? It should apply to the git version of DRM and > correctly re-enable your pipes (I noticed in the reg dumper output that they > were disabled). Well done, this patch is a net improvement! It does not fixes the console brokenness after resuming, but it obsoletes the need for the "vbetool post" workaround (was needed to get the X11 display back). With this patch applied, I can suspend & resume with just a pure "echo -n mem > /sys/power/state" and no other quirk at all, that's impressive. Hm, now I wonder if you're seeing 14236... can you get some pre- and post-resume register dumps now that you're running the patched DRM? I'm curious what differences there are that might account for your corruption. A screenshot or photo would also be nice. Created attachment 14199 [details]
intel_reg_dump before suspend, running patched drm
Created attachment 14200 [details]
intel_reg_dump after resume, running patched drm
Created attachment 14201 [details]
commented dmesg logs from drm (debug=1) during switch to vt and back
Yes I've seen #14236. The main reason why I opened a new bug: I don't have any problem after hibernation (the bug #14236's reporter says it have a similar problem both after s2r and s2d). Suspend-to-disk (hibernation) works here, and doesn't break VT. For the screenshot: that would be just a boring black screen; I have no display distortion/corruption at all; when I switch to console after a suspend-resume cycle, the screen behaves exactly like it does when I shutdown the computer or pull off the wire: it blanks, writes out "No input connection" for a few seconds, and remains blank. I looked a the differences in dmesg logs (with drm debug=1) when switching to console, and when sitching back to X11, with both a sane system (that hasn't been suspended before), and a broken-console system. The logs are almost totaly identical; the only visible difference shows up when I switch back to xorg : this, on a sane/working system: [drm:drm_unlocked_ioctl] pid=5482, cmd=0x4018641b, nr=0x1b, dev 0xe200, auth=1 [drm:drm_unlocked_ioctl] ret = -22 become that, on a previously suspend system: [drm:drm_unlocked_ioctl] pid=5482, cmd=0x4004644d, nr=0x4d, dev 0xe200, auth=1 [drm:drm_unlocked_ioctl] pid=5482, cmd=0x40446440, nr=0x40, dev 0xe200, auth=1 Attached the compiled and commented relevant parts of dmesg. It probably does no matter, but just in case... The register dumps look strange, it's as if the VGA registers are being completely clobbered... are you sure you're running DRM modules from git as of today (I just checked in a couple of fixes)? Or maybe your suspend/resume scripts are doing a 'vbetool post' or similar? Doesn't look like you have any fb drivers builtin or loaded... Stuff like this: -(II): CR00: 0x5f -(II): CR01: 0x4f -(II): CR02: 0x50 -(II): CR03: 0x82 -(II): CR04: 0x55 -(II): CR05: 0x81 -(II): CR06: 0xbf -(II): CR07: 0x1f +(II): CR00: 0x00 +(II): CR01: 0x00 +(II): CR02: 0x00 +(II): CR03: 0x80 +(II): CR04: 0x00 +(II): CR05: 0x00 +(II): CR06: 0x00 +(II): CR07: 0x00 definitely shouldn't happen in the latest code, since it explicitly saves and restores these registers. But differences like these would definitely explain your VT corruption on resume. The attached dumps where done with the drm git tip from a few hours ago (before your two last commits, I'm at 76748efae2f51409813eeb6b91b783c73cb2845e)+ the attached patch. I didn't use vbetool (thanks to your patch). Also I'm not 100% sure I've disabled all the necessary things to remove framebuffer totally from kernel. So I attach my .config for verification (that's for a vanilla 2.6.24). I'll update my drm to latest git and repost the register dumps in a few minutes. Created attachment 14203 [details]
2.6.24 kernel .config (if I didn't mess, framebuffer should be disabled)
Created attachment 14204 [details]
reg dump before suspend, drm git from now + patch (git 6f19473191ae543fcc199d252c5865c0734d38ad)
Created attachment 14205 [details]
reg dump after resume, drm git from now + patch (git 6f19473191ae543fcc199d252c5865c0734d38ad)
Just for the record, the two latests register dumps (attachments 14204 and 14205)
are generated with a 2.6.24 vanilla kernel (compiled with the .config in attachment 14203 [details]), and using git tip drm as of now
(6f19473191ae543fcc199d252c5865c0734d38ad) plus the patch attached to this bug.
I did exactly this to dump the registers (from an xterm) :
#!/bin/sh
VT=$(fgconsole)
chvt 1
intel_reg_dumper > /tmp/reg_dump_patched_drm_before_susp
echo -n mem > /sys/power/state
intel_reg_dumper > /tmp/reg_dump_patched_drm_after_susp
chvt $VT
Created attachment 14207 [details] [review] Save/restore MGGC register Given that the VGA registers don't seem to be restored, I wonder if VGA routing on your bridge is disabled for some reason... Can you try out this patch? Created attachment 14208 [details] lspci -xxx, after recompiling drm with the two attached patches and suspend+resuming lspci -xxx with the two patches applied on top of today drm git tip, and after suspending and resuming. attachment 14207 [details] [review] didn't fix the problem. intel_register_dump outputs (before, and after suspend) are identical to the previous attached versions. Hm, no looks like we can't really save/restore that register w/o resetting the chip altogether, since it's RO status is controlled by the SMRAM reg. But the fact that post-suspend its value is 0x0030 and post-resume it's 0x0002 makes it seem like the BIOS did something bad... Created attachment 14209 [details] [review] Read & write VGA regs via MMIO instead of port I/O Ok, here's a crazy and totally untested patch. It may cause resume to just hard hang, but it may also give you a console back (I doubt it'll still work though). I already told the results to James Barnes directly, but for the record, and in case we or someone else would take a look at this bug later: the above patch (attachment 14209 [details] [review]) prevents the system from suspending (when trying to suspend, I'm left with a perpetual blinking cursor on a black terminal). So, being unable to suspend, I can't tell if it helps to resume. Something else, if the motherboard BIOS may be the culprit: I use the latest Asus provided BIOS as of now (version 0405). There's also a BIOS "Repost video on S3 resume" option but that seems totally ineffective for this problem (and was ineffective to solve the need for "vbetool post on resume" before this other bug was fixed James Barnes with - now commited - attachment 14182 [details] [review]). Updating summary. Would be interesting to find out if other G35 users have the same problem. If they do, it might be a bug in the Intel provided BIOS bits for G35 based systems, rather than an Asus specific problem. Also, I reported this issue to Asus, since it really looks like the MGGC GMCH register is set to the wrong value on resume, disabling VGA access entirely. For what it's worth, this seems to be an issue for another platform as well http://vip.asus.com/forum/view.aspx?id=20080205223624640&board_id=1&model=P5E-VM+HDMI&page=1&SLanguage=en-us One reports that it's working with an older processor, but not with new. Don't know much, but I would like to attribute that to a hardware or bios bug. Ah, interesting, thanks for the link. Sounds like it may be a BIOS issue (possibly related to a certain CPU in the board). Hope Asus finds & fixes it soon... According to the above link, this similar bug occurs on Vista with e8400 wolfedale cpus. I have a Kentsfield, Intel Core 2 Quad Q6600 revision G0. I can also confirm this bug on: Celeron 420 (stepping 1), Asus P5E-VM HDMI (Bios rev 0301) Using intel driver version 2.2.1-, on Kernel 2.6.24-ARCH #1 SMP PREEMPT, x86_64 (II) Module intel: vendor="X.Org Foundation" compiled for 1.4.0.90, module version = 2.2.1 Module class: X.Org Video Driver ABI class: X.Org Video Driver, version 2.0 Also, after triggering the bug somethinf freaky happened to me after playing with screen rotation. Instead of rotating the screen back, my computer seemed to black out. No video output in X, no video in console. I could tell it switched, because of my num-lock status changed. I tried to do trigger a reboot, but nothing seemed to happen. (My computer is virtually silent, and it's a bit almost impossible to hear it work.) After around a minute of waiting, I still didn't have VGA output, so I powered it down, and did a cold boot. STILL, VGA output was not restored. (I'm not sure it even reached post) I tried to press F1 (Since that's what I usually do since bios halt and complain that my system do not have a master IDE drive. All this to no avail. I was unable to get any life from it until I removed main power cable and replugged it. After that the machine booted up, but all cmos data was gone. Clock, but not date, was also reset. I'm not too keen on trying to reproduce this, but I will report it to Asus, after I get some sleep. p.s. An afterthougt: Bios being cleared, could be the result of not being able too post for a few times and loading fail-safe defaults. Me pressing F1 blindly might have approved it. Or maybe It was restored trough asus safety-net for bios corruption. I have no idea. -Joel F well, can I mark this as NOTOURBUG now? Yeah, probably. I've been talking with Asus about it and they seem to have an idea of what's going on, but I don't know if they've released an update to fix the problem yet. Ben, have you checked their website recently? Do you still see this problem with the latest BIOS bits? Asus didn't released any new BIOS version since January; I'm already using the latest one (0405). And yes, it seems clear now that this is a plain NOTOURBUG (esp. since we know Windows has similar problems with this motherboard). Thanks for your help in hunting this bug (and also, thank you for fixing an other bug in the process (ie. the drm patch obsoleting the need for "vbetool post")). New bios 0503 is supposed to fix this, I'll attempt the upgrade now and test it for linux. http://vip.asus.com/forum/view.aspx?SLanguage=en-us&id=20080205223624640&board_id=1&model=P5E-VM%20HDMI&page=3&count=21 Wish me luck! =) It did not seem to help me at all. I lost vga outside X, and did not get it back until after a cold boot. :( |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.