13609 – [855GM] EXA fails after resuming, not enough 3D state restored?

Bug 13609 - [855GM] EXA fails after resuming, not enough 3D state restored?

Summary: [855GM] EXA fails after resuming, not enough 3D state restored?

Status:	RESOLVED FIXED

Alias:	None

Product:	xorg
Classification:	Unclassified
Component:	Driver/intel (show other bugs)
Version:	unspecified
Hardware:	x86 (IA32) Linux (All)

Importance:	medium normal
Assignee:	Wang Zhenyu
QA Contact:	Xorg Project Team

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	13493 15000
	Show dependency tree / graph

Reported:	2007-12-11 15:55 UTC by Andre
Modified:	2008-12-15 09:36 UTC (History)
CC List:	7 users (show)

See Also:
i915 platform:
i915 features:

Attachments
Xorg.0.log (58.09 KB, text/plain) 2007-12-11 15:58 UTC, Andre	no flags	Details
Regdump from the Console with X running, before S3 (7.94 KB, text/plain) 2008-02-24 08:27 UTC, Andre	no flags	Details
Regdump from the Console with X running, after S3 (7.94 KB, text/plain) 2008-02-24 08:39 UTC, Andre	no flags	Details
Don't emit 3D state at EnterVT time (388 bytes, patch) 2008-03-18 17:51 UTC, Jesse Barnes	no flags	Details \| Splinter Review
This is how it is supposed to look (53.25 KB, image/jpeg) 2008-03-18 20:29 UTC, Andre	no flags	Details
This is how it looks after S3 (116.38 KB, image/jpeg) 2008-03-18 20:33 UTC, Andre	no flags	Details
The failed XAA display (610.53 KB, image/jpeg) 2008-03-19 11:56 UTC, Andre	no flags	Details
Broken XAA after first S3 (168.29 KB, image/jpeg) 2008-03-19 12:06 UTC, Andre	no flags	Details
This regdump is universal. It does no more change through any suspends or even reboots. (7.94 KB, text/plain) 2008-05-06 16:05 UTC, Andre	no flags	Details
dmesg after a successful resume from S3 (14.73 KB, application/octet-stream) 2008-05-06 16:07 UTC, Andre	no flags	Details
dmesg after resume from s2disk in the successful series (14.72 KB, application/octet-stream) 2008-05-06 16:09 UTC, Andre	no flags	Details
dmesg after an unsuccessful resume from S3 (14.75 KB, application/octet-stream) 2008-05-06 16:10 UTC, Andre	no flags	Details
Xorg log from the successful run, debugging enabled (100.68 KB, application/octet-stream) 2008-05-06 16:11 UTC, Andre	no flags	Details
Xorg log from a failed run, debugging enabled (59.31 KB, application/octet-stream) 2008-05-06 16:12 UTC, Andre	no flags	Details
Xorg.0.log after s2ram on xorg-server master and intel-dri2 branch with lots of debug output (126.88 KB, text/plain) 2008-11-17 04:44 UTC, Andre	no flags	Details
Show Obsolete (1) View All

Description Andre 2007-12-11 15:55:36 UTC

This Bug renews bug 13367.

Since commit b8770f710729d616b3ac72544aa522161a78f819
from November 11, 2007, a resume from S3 always leaves
me with a corrupted X screen.
It can be "repaired" by suspending to disk.

The commit only adds
   IntelEmitInvarientState(pScrn);
to i830_driver.c

Before this commit, S3 did only fail (with
these same symptoms) when the machine was
never suspended to disk before an S3, and
never after one s2disk.

Some examples:

The behaviour before this commit:

boot
-> resume from S3 (gives corrupted screen)
-> resume from S3 (same corruption, can be used repeatedly)
-> resume from s2disk (restores screen)
-> resume from S3 (works fine now)
-> resume from S3 (keeps working)

or:
boot
-> resume from s2disk (fine)
-> resume from S3 (fine)
...


And after the commit:

boot
-> resume from S3 (gives corrupted screen)
-> resume from S3 (same corruption, can be used repeatedly)
-> resume from s2disk (restores screen)
-> resume from S3 (gives corrupted screen)
...

S3 always fails. You get the picture :-)

=====

So, at that point things go from bad to
worse rather instructively, I hope :-)

On driver version 1.7.4, S3 works flawlessly,
on 2.0.0 it gives the above pattern
of working only after resuming from s2disk.

I cannot easily git-bisect that one, because
compilation of the driver fails when approaching
2.0.0 (from present) already. I would need
some instruction (or at least prodding) how
to proceed. I'm not even sure if that one's too
antique to be of any help anyway.

====

The bahaviour does NOT change:
- when suspending from X or from a console.
  boot -> go to X -> go to console -> S3
  -> go to X (shows corrupted screen)
- with versions 1.3 or 1.4 of the xorg server
- with other kernel versions
- with EXA or XAA, the pattern does not change,
what changes is what's left on the screen (see below).

====

A "corrupt screen" can fall into two flavors:

With XAA acceleration, the consoles are invisible,
the background is invisible, the WM (fluxbox and Window Maker)
elements (menu, taskbar) are visible, there is no blanking, though.

With EXA acceleration, the exact opposite is left on the sreen, AFAIKT.

====

There are a series of register dumps over at bug 13367,
the following is some Info on my system, I will attach
an X log with some debug output.


The Machine is a TP R50e 1634
running Gentoo
xorg-server-1.4-r2
mesa-7.0.2
libdrm-2.3.0
and
linux-2.6.24-rc5

====

From xorg.conf:

Section "Module"
        Load  "extmod"
        Load  "dri"
        Load  "glx"
        Load  "dbe"
        Load  "record"
        Load  "xtrap"
        Load  "type1"
        Load  "freetype"
EndSection

Section "Device"
        Identifier  "Card0"
        Driver    "intel"
        VendorName  "Intel Corp."
        BoardName   "82852/855GM Integrated Graphics Device"
        BusID       "PCI:0:2:0"
        Option      "DRI"               "true"
        Option      "ModeDebug"         "true"
        # andrem: other sreen fuckup
        # Option      "AccelMethod"       "XAA"
EndSection

====

From lspci -nvv:

02:00.0 0607: 104c:ac56
        Subsystem: 1014:0512
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 11
        Region 0: Memory at b0000000 (32-bit, non-prefetchable) [size=4K]
        Bus: primary=02, secondary=03, subordinate=04, sec-latency=176
        Memory window 0: f0000000-f3fff000 (prefetchable)
        Memory window 1: d4000000-d7fff000 (prefetchable)
        I/O window 0: 00003000-000030ff
        I/O window 1: 00003400-000034ff
        BridgeCtl: Parity- SERR- ISA- VGA- MAbort- >Reset+ 16bInt+ PostWrite-
        16-bit legacy interface ports at 0001

Comment 1 Andre 2007-12-11 15:58:34 UTC

Created attachment 13044 [details]
Xorg.0.log

Comment 2 Jesse Barnes 2007-12-11 17:09:16 UTC

I don't understand what you mean when you say b8770f710729d616b3ac72544aa522161a78f819 changed things... can you clarify?

Before that commit S3 worked but only until you did an s2disk?

Or did you mean that after that commit S3 fails consistently (i.e. s2disk doesn't fix it)?

Comment 3 Andre 2007-12-11 19:30:45 UTC

I seem to be unable to explain, sorry.

Before the said commit: 
S3 will not work correctly before I do an s2disk.
After resuming from an s2disk, I can S3 just fine.
Everytime. 
Until I shutdown and reboot.
After "proper" boot: failure; 
after powering on again from s2disk: all is well.

After the said commit: 
S3 will not work correctly, no matter what.
It is no more dependent on the above difference, 
just fails unconditionally.

Comment 4 Jesse Barnes 2007-12-11 20:29:30 UTC

Ok thanks Andre, that clears things up.  So it really does seem to be related to 3D state somehow...

Comment 5 Jesse Barnes 2008-01-02 10:54:09 UTC

I could see that restoring the logical context might be an issue if the CCID registers were somehow clobbered (the set context instruction will try to save the current context as well), but in S3 that shouldn't happen.  Maybe there's something wrong with the actual 3d state restore; can you try commenting out the call to I830EmitInvarientState to see if that gets to back to the old behavior?  That'll at least narrow it down to context save/restore or 3d state load...

Comment 6 Jesse Barnes 2008-01-02 11:26:16 UTC

Also, you're not using any framebuffer drivers right (e.g. intelfb, vesafb, uvesafb)?  That could potentially cause all sorts of problems...

Comment 7 Andre 2008-01-02 12:47:52 UTC

I have no framebuffer support in the kernel at all, nor the module lying around. 

Commenting out the "if" statement calling
# I830EmitInvarientState(pScrn);
in src/i830_driver.c
from today's git
does not give me back the ability to S3 after a s2disk.

But it leaves me with a little less left on the screen after resuming from S3: only the dock border, window borders and console fonts (without blanking) are left.
Insofar, the call makes some difference.

Comment 8 Jesse Barnes 2008-01-02 12:58:58 UTC

Ok, good to know, thanks for testing.  I'd expect you to lose some of your display w/o the 3d state restore, but it sounds like the real problem is with the context save/restore somehow... If only I could get my hands on one of these laptops.

Comment 9 Andre 2008-01-02 13:20:09 UTC

Would an ssh access do any good already?

Comment 10 Jesse Barnes 2008-01-09 10:16:07 UTC

Zhenyu, is this something you could look at?

Thanks,
Jesse

Comment 11 Wang Zhenyu 2008-01-09 17:07:45 UTC

We have one hp 855GM, which s3 is fine last time I tried, and one sony vaio which has s3 issue even without X, some acpi quirks have been tried but still fail. I'll  ask if we can thinkpad R50 here.

Comment 12 Jesse Barnes 2008-02-05 16:47:31 UTC

Zhenyu, any luck on that laptop?  Any ideas wrt the 3D context restore?

Comment 13 Wang Zhenyu 2008-02-21 18:43:48 UTC

I have tested on two 855gm, one is hp nx5000, another is sony vaio.
Both machine can resume from s3 nicely in X, although they have other different problems. But none lead to screen corrupt after resume.

hp has problem when switch vt console, which gave white screen.
sony has problem with dim screen after resume, xbacklight has no effect.

And with Eric's mail, it seems we need I830EmitInvarientState for now, and can use hw context instead in future.

Comment 14 Andre 2008-02-22 02:57:18 UTC

ok, /me trying to get up to speed here...

The bugs you describe for the hp and sony do not appear here,
those are fine (unless I miss some finer bits about
the "dim" screen).

<Quote:>
 And with Eric's mail, it seems we need I830EmitInvarientState for now, and can
use hw context instead in future.
</Quote>

Here is where I don't get to speed.

The Function is defined in the driver, 
and it gets called conditionally at line 2341 (in current git):

   if (!IS_I965G(pI830))
   {
      if (IS_I9XX(pI830))
         I915EmitInvarientState(pScrn);
      else
         I830EmitInvarientState(pScrn);
   }

Is this where I might give calling I830EmitInvarientState(pScrn)
unconditionally a try, for testing your clue?
Or do I get things allwrong?
(I'm not too bold in trying things I don't grok on the driver, sorry.)

Comment 15 Jesse Barnes 2008-02-22 09:19:48 UTC

According to the logic, you should be seeing I830EmitInvarientState(pScrn) now, so I don't think there's any need to remove the conditions around it...

Maybe Eric has ideas about this one?

Comment 16 Jesse Barnes 2008-02-22 15:59:38 UTC

Andre, can you get register dumps again with the latest tree, both before suspend and after resume (both from the console)?  There's a bit in the CACHE_MODE_0 register that may explain this behavior, I'm curious to see if it changed.

Comment 17 Andre 2008-02-24 08:27:18 UTC

Created attachment 14540 [details]
Regdump from the Console with X running, before S3

Comment 18 Andre 2008-02-24 08:39:10 UTC

Created attachment 14541 [details]
Regdump from the Console with X running, after S3

Here you are. 

The following is a couple diffs between dumps:

1. The diff of the attached files:
regdump before S3 and after S3

# diff regdump_2008-02-24.beforeS3onC regdump_2008-02-24.afterS3onC 
164,165c164,165
< (II):                 CR0e: 0x03
< (II):                 CR0f: 0xc0
---
> (II):                 CR0e: 0x04
> (II):                 CR0f: 0x60


2. Diffs from regdumps before and after S3 from _within_ X

# diff regdump_2008-02-24.beforeS3inX regdump_2008-02-24.afterS3inX 
119c119
< (II):                 SR00: 0x03
---
> (II):                 SR00: 0x00
164,165c164,165
< (II):                 CR0e: 0x03
< (II):                 CR0f: 0xd0
---
> (II):                 CR0e: 0x00
> (II):                 CR0f: 0x00


3. Diff of console before starting X and console after starting X (both before S3) 

# diff regdump_2008-02-24.beforeXonC regdump_2008-02-24.beforeS3onC 
8c8
< (II):       RENCLK_GATE_D1: 0x00000000
---
> (II):       RENCLK_GATE_D1: 0x00000001
128c128
< (II):                  ARX: 0x20
---
> (II):                  ARX: 0x30
164,165c164,165
< (II):                 CR0e: 0x02
< (II):                 CR0f: 0x80
---
> (II):                 CR0e: 0x03
> (II):                 CR0f: 0xc0

Comment 19 Jesse Barnes 2008-03-18 14:28:42 UTC

Ok, the regs don't show us anything interesting.  Anyway I still suspect some problem with the logical context.  I'm putting together a debug patch for you now so we can compare working & broken logical contexts.

Comment 20 Jesse Barnes 2008-03-18 17:51:11 UTC

Created attachment 15275 [details] [review]
Don't emit 3D state at EnterVT time

Andre, can you confirm that this patch gets you back to the old behavior with the latest driver?  Also, can you clarify (ideally with screen shots) the different types of corruption you see with XAA vs. EXA?  On re-reading this bug that's one thing that confuses me... If we're missing some 3D state programming in I830EmitInvarientState it seems like that would only affect EXA, not XAA, since the latter just uses software rendering...

Comment 21 Jesse Barnes 2008-03-18 18:05:43 UTC

Andre, Zhenyu also committed some fixes for 3D state restore after the 2.2.0 release.  Can you try 2.2.1 or the git tree?

Comment 22 Andre 2008-03-18 20:21:31 UTC

The problem remained with 2.2.1, and still remains
on current git.

I've tried your patch against current git,
but I can't see any change between patch or no.

All of this happens while running exa. xaa comes next,
but that may be tomorrow. It's so late it's rather 
early already...

The way a corrupted screen exactly looks like 
has changed a little again, according to my 
notes. I cannot possibly tell you why I never
tried and take a screenshot of the corrupted
screen... but it actually works, to my surprise.

I will attach a shot of the normal screen and the corrupt one
on exa.

The signs for unresponsiveness: 
- no active cursor is indicated in 
the shell, 
- all text is printed but never wiped when deleted or 
overwritten.
- The mark for the active window (blue header) does no more move
with the active window. 
- The fluxbox menu is invisible (but works well :-)
- The active elements of the toolbar(arrows on the left) 
are gone, text is not erased as well.

I hope this isn't too abstract, or shall have to fetch 
a more colorful theme, heaven forbid :-)






I'm using gentoo's current ~x86 xorg, 
that is 1.4.0.90.

Comment 23 Andre 2008-03-18 20:29:29 UTC

Created attachment 15283 [details]
This is how it is supposed to look

This is my normal screen.

Comment 24 Andre 2008-03-18 20:33:07 UTC

Created attachment 15285 [details]
This is how it looks after S3

This is the corrupted version.

A finer/further point in difference: The font rendering
of fluxbox fonts has gone awkward, but the 
console font is fine.

Comment 25 Jesse Barnes 2008-03-18 20:50:11 UTC

Ok, thanks for the screenshots.  So things still won't work even after an s2disk?  If you can get the XAA screenshots eventually that would be nice too.  Thanks.

Comment 26 Jesse Barnes 2008-03-18 20:50:35 UTC

Oops, accidentally reassigned.

Comment 27 Andre 2008-03-18 20:55:13 UTC

I haven't tested the S3 after s2ram yet,
I will have to wait for tomorrow anyway,
because in XAA, the screenshot fails:

# xwd -root -out screen_corrupt-xaa.xwd
X Error of failed request:  BadValue (integer parameter out of range for
operation)
  Major opcode of failed request:  91 (X_QueryColors)
  Value in failed request:  0x30007fe
  Serial number of failed request:  664
  Current serial number in output stream:  664


So I have to find me someone with a digicam or
cellphone or something :-)

Comment 28 Andre 2008-03-19 11:47:34 UTC

Ok, I did the testing suite
boot -> S3 -> s2disk -> S3 [-> S3]*
again.

1. Vanilla intel git / EXA: S3 always fails.
2. Vanilla intel git / XAA: S3 always fails.
3. Intel git with above patch / EXA: S3 always fails.
4. Intel git with above patch / XAA: You caught the spot.

The old behaviour is back when using XAA.
In the test suite this means that second and consecutive
S3s will resume just fine.

Not in EXA, though.
That is inconsistent with my above report... :-(
Which is really odd, because I did double-check
(and including the git-bisect checks that adds up).

I will give this a testing round on git back when
to find the flaw...
...unless you stop me bacause it's perfectly reasonable
thing to happen :-)

In my testing run with the patched XAA, another failure
did occur: It went back to sleep right after resuming,
repeatedly. I will try and reproduce this one as well...
...after repairing sysklogd which is borken sice then.
Oh well.

Comment 29 Andre 2008-03-19 11:56:54 UTC

Created attachment 15309 [details]
The failed XAA display

This is what the screen looks like when resuming from S3 with XAA.
I added a spot of color to this one :-)
but then forgot to display the menu: The menu entries are displayed,
but not the border.

Comment 30 Andre 2008-03-19 12:06:30 UTC

Created attachment 15310 [details]
Broken XAA after first S3

Resending, because I grabbed the wrong resolution pic.

Comment 31 Andre 2008-03-19 17:56:15 UTC

Resuming from S3 with the patch and XAA failed in yet
another way, this time it was many colorful horizontal
lines... so I gave up reproducing and shall be lucky
it worked once...

I did another round of checking everyone else
for consistent bahaviour, and it keeps coming out
as described.

So I went back to commit 5f92b4c2db9,
the last "working" commit before complete S3 failure.

Using XAA, the pattern just like the patched
version occurs.
Using EXA, I get consistent failures.
This still is at odds with my testing
and the initial description.

I did that the initial description with 2.1.0,
so there I will be next...

I am skimming through my notes for any
help there... my first terrible hunch
of a test failure is that the switch
from XAA to EXA as default is only few
commits after the IntelEmitInvariantState addition.
My xorg.conf excerpt in the initial description
suggests that I used to define XAA but not
EXA explicitly. Thereby identifying an
XAA bug.

On the other hand I am pretty shure
to have double-checked this with
Carlos and did quite a few
EXA/XAA comparisons myself.
Not quite prepared to think I srewed them
allwrong all the time while busy grepping
logs...

Anyway, from what I see now
it looks like i better go through the
cornerstones of the git-bisection again --
and prepare shovel some ashes
on my head...

Comment 32 Jesse Barnes 2008-03-25 15:15:08 UTC

Heh, no problem.  These sorts of problems can be hard to nail down, let us know when you're run your tests again...  Thanks.

Comment 33 Andre 2008-03-26 16:09:20 UTC

Ok, I got to check through my notes and test some.

I got it wrong like I feared by not asking for "EXA"
in xorg.conf explicitly. EXA gets the default three 
commits away.

I have actually described the suspicios bit, but
did not take the clue... in the original bug I wrote:

> At least commit e784e152a8e84b6e447b55a5c7019e7b47e17621
> (18 minutes after the offender) still shows the old corruption
> pattern (as described for 2.1 versions) while already failing
> constantly on S3. 

Oh well.

So the suspicious behaviour I describe is XAA only,
while EXA is consistently just failing to resume 
from S3 properly. This behaviour is stable from
5f92b4c (just before the XAA regression) until now.

So there is no regression for EXA, just a plain bug :-)

So I guess I should just file a bug for the 
EXA issue while forgetting about the XAA failure.

That said, I did all of the rechecking above
on 2.6.24.3. On current 2.6.25-rc{6,7}, 
we may forget about the EXA bug as well, 
probably... at least for the moment,
as the current kernel does not come back from 
suspend at all. No X, no intel driver needed,
no EXA vs. XAA, just plain "won't work".  :-)

Let's see to that instead and see what EXA
behaves like when THAT regression is healed.
At least, I know how to bisect now, for my 
next task :-)

Anyway, sorry for screwing this one wrong. 
If there's any followup you'd like to get,
just ask for it. Thanks.

Comment 34 Jesse Barnes 2008-03-26 16:14:35 UTC

Yeah, 855GM suspend/resume with DRM is broken at this point...  still trying to fix it.

Thanks for checking everything else though, good to know we haven't regressed EXA at least.

Comment 35 Andre 2008-03-28 16:27:30 UTC

Is there a bug (or twelve) for the linux DRM issue?
I did not find anything I could relate to
in the 2.6.25 regressions and suspend issues 
tracker bugs nor on lkml nor here...

Anything short of following up on the 
kernel logs?

Thanks!

Comment 36 Jesse Barnes 2008-03-28 17:22:45 UTC

One of the 855 failures is being tracked in #15158, the other was an Ubuntu reported bug; I don't think we have an upstream one open for it yet (it affected one platform that was known to be unstable in other ways as well).

Comment 37 Andre 2008-04-21 14:16:27 UTC

Some news here:
I got the current kernel git (3925e6fc) to resume from S3 again.
Let me start slightly off-topic on the console...

S3 works when DRM and DRM_I915 (or DRM_830, no difference) are built into the kernel. It fails when they are built as modules, and fails as well when not built at all. All the following are not working:

CONFIG_DRM=y
# CONFIG_DRM_I830 is not set
# CONFIG_DRM_I915 is not set

CONFIG_DRM=y
# CONFIG_DRM_I830 is not set
CONFIG_DRM_I915=m

CONFIG_DRM=m
# CONFIG_DRM_I830 is not set
CONFIG_DRM_I915=m

# CONFIG_DRM is not set
# CONFIG_DRM_I830 is not set
# CONFIG_DRM_I915 is not set

Given that the module is loaded by the X server, which is not running, this is somewhat odd.

Actually, I stumbled over it, more like. I decided on a bisection with a monolithic kernel after some attempts to bisect it failed miserably... As I started with a monolithic and very minimal kernel, I can hereby grant that this pattern is consistent over a dozen configs, some of which I may post on request...

And here is where I am open to any pointers, and specifically:

Do you want this reported as a kernel bug? Do I assign it to you or some neighbour of yours? :-)
Wishes for another attempt to bisect? (I failed before, but have more to go on, now.)

=====

Symptoms:

The failures to resume have a very consistent pattern. On resume, the screen hangs after powering up, with a cursor on top left. (This is replaced with the proper screen after a second when it works, but on failure, it stays right
there.) The "invisible" console is fully functional. When shutting down, the machine powers off cleanly, but the screen stays on. The screen turns off only after a hard reset.

Back in X:

Giving X a try, now with DRM built into the kernel, resume is failing in the same way it did before, on first sight. The cursor and some stuff is missing, screen blanking won't work. So, basically, back to where I was, using gentoos 2.2.99.901 ebuild.

Comment 38 Andre 2008-04-24 14:08:25 UTC

Back on topic,

I did get S3 to restore X fine, repeatedly.
Sad thing is, I cannot reproduce it after a reboot.

Anyway, it did work after I made my way through the
debug states in /sys/power/pm_test, from freezer to core.

Now, probably the important note here is that
the machine comes back fine from core, so is nearly suspended.
The screen is never physically powered down in the process, though.

Anyway, even using the voodoo of reapeating my session by
restarting same apps and redoing the tests by
# echo freezer > /sys/power/pm_test && echo mem > /sys/power/state && sleep 5  && echo devices > /sys/power/pm_test  && echo mem > /sys/power/state && sleep 5  && echo platform > /sys/power/pm_test  && echo mem > /sys/power/state && sleep 5  && echo processors  > /sys/power/pm_test && echo mem > /sys/power/state && sleep 5  && echo  core > /sys/power/pm_test && echo mem > /sys/power/state && sleep 5  && echo none > /sys/power/pm_test

does not help any. I only did a few compiles the original time around,
but I cannot see how any of that may help.
So, throwing any hunches at me is very welcome.

While testing all this, I find a new fluke to suspending
(and also when suspending to some shallow state using /sys/power/pm_test):
after resume, the current mouse selection is pasted to the current console.

This happens with intel driver 2c135ef8a (last week's git tip)
and linux 3dc50637 (yesterday's git tip). I don't know yet
which one is responsible here, but will post when I know.

===

On the kernel DRM issue, I am afraid another round of going
backwards in git to find a working version fails all the
way back to .22, I know it did work back then, though.
I seem to be more stupid than git.

Comment 39 Andre 2008-05-06 16:03:47 UTC

So, some updates... I still cannot reproduce the working condition in X. It "just worked" on two occasions, both after several hours uptime. Note that uptime is not the key in itself. It could go after the moon or after your neighbor's dog just as well.

WHEN it decides to just work, it just keeps working reproducibly until I reboot. In more detail, I can say the following:

- On both occasions when it happened to work, it was after more than six hours uptime, no s2disk till then. This may mean nothing.
- It reproduces when working, I did a handful of cycles for this and the following.
- An s2disk does no harm, S3 works just fine thereafter.
- Suspending from the console works just as well, the running X session is restored just fine.
- After a fresh boot, it fails again. Consecutively.
- When suspending again from the corrupted screen, leaving garbled letters in the console, I find them in the console displaying my "blind" moves just right, like it should have. Which prolly just means really everything is fully functional.
- When suspending from the console BEFORE starting X, I can afterwards start a functional X.
- When suspending from the console with X running, X is borken in exactly the same way as when suspending from within X.
- When stopping the corrupted X server and start again, X does not come up cleanly, only giving me the menu border and the arrows I actually miss in the corrupted state. The background is corrupted, an xterm "blends in" with the corruption. Again, all is fully functional. Suspending again from there, and the arrows are gone, and corruptions seems unchanged. s2disk does not restore this one.

So much for visual appraisal.

Regdumps I made through all of the states (fresh boot, failed state, after resuming properly, when coming back from s2disk) are all identical now. Somebody has obviously nailed the regs properly in the meantime. This unchanging, robust regdump is attached.

The dmesg output also looks very consistent to me. There are some sporadic differences though, so I will attach two from late in the "always works" run, one taken after an S3, one after an s2disk. The third one is taken after a failing resume after reboot. I have plenty more, should the need arise :-)

The first occurrence was on kernel and intel versions like above, the second on kernel git afa26be86b6 and intel git a0ced923. That is to say, both are jolly recent.

====

This seems to be a different bug, but I'm not sure, so... As marked above, in X the current mouse selection is sent to the active xterm. In the beginning it was preceded by two newlines:

<quote>
leisereiter /home/andre/bug-i810/testing_2.6.26-rc1-3 # echo mem > /sys/power/st
ate

leisereiter /home/andre/bug-i810/testing_2.6.26-rc1-3 #
leisereiter /home/andre/bug-i810/testing_2.6.26-rc1-3 #
leisereiter /home/andre/bug-i810/testing_2.6.26-rc1-3 # echo hallo
leisereiter /home/andre/bug-i810/testing_2.6.26-rc1-3 # echo mem > /sys/power/st
ate

leisereiter /home/andre/bug-i810/testing_2.6.26-rc1-3 #
leisereiter /home/andre/bug-i810/testing_2.6.26-rc1-3 #
leisereiter /home/andre/bug-i810/testing_2.6.26-rc1-3 # echo hallo
</quote>

After the first s2disk, the two newlines did no more occur, but the "paste" still happended. Now just like middle mouse button.

Comment 40 Andre 2008-05-06 16:05:56 UTC

Created attachment 16395 [details]
This regdump is universal. It does no more change through any suspends or even reboots.

Comment 41 Andre 2008-05-06 16:07:57 UTC

Created attachment 16396 [details]
dmesg after a successful resume from S3

Comment 42 Andre 2008-05-06 16:09:19 UTC

Created attachment 16397 [details]
dmesg after resume from s2disk in the successful series

Comment 43 Andre 2008-05-06 16:10:05 UTC

Created attachment 16398 [details]
dmesg after an unsuccessful resume from S3

Comment 44 Andre 2008-05-06 16:11:40 UTC

Created attachment 16399 [details]
Xorg log from the successful run, debugging enabled

Comment 45 Andre 2008-05-06 16:12:22 UTC

Created attachment 16400 [details]
Xorg log from a failed run, debugging enabled

Comment 46 Andre 2008-05-07 00:12:46 UTC

Ok, not true enough, again. I had another successful S3 some hours later, 
but on the next cycle, resume was broken, with a slightly different
pattern than usual. So the thesis of repeatablility just hit a snag.

Comment 47 Gordon Jin 2008-06-16 23:00:10 UTC

clearing "NEEDINFO"

Comment 48 Michael Fu 2008-07-24 21:27:35 UTC

re-assign to zhenyu, since he has got the TP R50e.

Comment 49 Andre 2008-08-19 13:40:59 UTC

Behaviour of resuming changes for the better in the 2.4 series.
But not all is well, though.

I do get failures in resume every so often, but I also got a very long series of about 40 successful resumes with intel git b0b0998b5d5 from 30 July and linux-2.27-rc2. I use the gentoo distro driver in the 2.4 versions right now.

While not seeing proper patterns, here are some observations on the 2.4ish behaviour:

- I never got it through a suspend cycle successfully shortly after boot.

- It often works, but I can stress-test them to death in 4 to 40 attempts.

- Together, this causes the impression, that resume functionality comes and goes on its own timescale. Something not volatile enough. But that's just how it feels.

- That feeling got support on my attempts at git-bisection of behavioural changes. Some modern pattern I just had reproduced on older driver versions, where I never before seen those. Rather confusing, and so I left it at that back then...

- I experience two kinds of failures on resume:
1. The screen is not fully restored, the system is unresponsive, it does another blink (even an endless series of blinks with late 2.3 versions) and is off. Proper shutdown via ACPI.
2. The screen is not fully restored, but fully functional. More suspend cycles from X or the console work out, but I never saw it restore to correct operation. This failure is the most common one. Once I saw it degrade further on continued suspend cycles.

- I have seen both success in restoring the screen when doing another suspend cycle from the console and not doing so. Never saw it "repairing" to correct state after another resume from WITHIN the X-server. And, by impression, it does not seem to come back to functional if it was not "repaired" at once, i.e. on the next suspend cycle.

- Very rarely, I get a suspend failure (total freeze of system and screen looks like morphing, oily patterns. The most entertaining of failures I know of :-)

- I found the screen brightness not restoring to the previous level, even not to some fixed level, but coming up "somewhere". I still need to inquire into this one.

Comment 50 Diego Escalante Urrelo 2008-10-21 04:08:59 UTC

I have an R50e, with an intel 855GM card. Using intel driver+kernel 2.6.27 resume takes me back to a non refreshing X. In good english, you can understand that as all the windows not redrawing at all.
Example:

1. suspend
2. resume
3. gnome-screensaver password prompt appears as a solid grey rectangle
4. hit esc
5. g-s password prompt dissapears, everything still black
6. input your correct password, hit enter, everything still black
7. switch to another vt, switch back, everything now as "big solid rectangles", imagine a normal desktop but with the "contents" being solid color areas, like the title bar a solid blue rectangle, this browser having a solid grey square instead of menus, buttons, etc

Everything works but the screen does not redraw anything so if you want to see what happened you can switch to a vt and then back, but won't help much since you will still only see big rectangles.
Funny, your wallpaper is perfect :-)

This was working until last week with kernel 2.6.24 (.27 was broken) and ubuntu ibex's last week xorg bundle, it is now broken in .24 and .27 which was already broken.

I'm available for debugging, let me know.

Comment 51 Andre 2008-11-14 13:41:32 UTC

Good news here!

Starting off with the problem of X freezing ar once in the 1.5 series of the server with linux-2.6.28-rcX, I worked my way to a functional setup.

Most recent working solution is:
libX11 git head
libdrm git head
mesa git head
xorg-server-1.5.2 (a heavily patched version -r1 from gentoo x11 overlay is fine, too)
xf86-video-intel git head
running on linux-2.6.28-rc4-00322-g58e20d8

The xorg-server from git leaves me with an unresponsive system, but not a frozen one; I can shutdown cleanly with my power button. I will try and find out about that failure in the next couple days.

Now, with this setup, I can reliably suspend to ram and to disk, it seems. From
within X, from the consoles with X running, all comes back nicely.

The system annonces direct rendering, but Mesa uses the software renderer:
OpenGL renderer string: Software Rasterizer
OpenGL version string: 2.1 Mesa 7.3-devel

So, I cannot be shure this still is the same bug, but it definitely is
a fully functional system, minus some known rendering issues in firefox.

As a side note, I gave the new intelfb a largely unsuccessful try. For one thing
, the VESA modes will not display at all, and it breaks resume in X. But that de
finitely is another bug report at another time :-)

Comment 52 Andre 2008-11-17 04:40:36 UTC

Even more good news!

The setup described above survives some stress testing of 
about 30 cycles of s2ram with interspersed suspends from
console and/or to disk.

Upgrade to xorg-server master was not too hard, I needed
ot amend my xorg.conf to not rely on hal but instead use
my mouse and keyboard definitions. 

I had some corruption issues with master, but no lockups.
Switching the intel driver to the dri2 got me back in business.

To clarify: When saying "git" for all the components (see last post)
I reference the gentoo x11 git overlay, so there are 
some patches against pure git master.

So, with xorg-server master and intel driver on dri2 branch,
I get a functional system. Mesa relies on the software rasterizer 
still, but it is supposed to, if I followed things correctly.

A couple of suspends to ram from X and the consoles, and an s2disk
all give me my nice, functional system back. Hooray! 

I still need to do some more stress testing, and will complain on 
failure. 

I do attach an Xorg.0.log after a couple suspends with the 
dri2 setup. Feel free to ask for more info or testing.

So, ths may be called works for me, as far as I can tell.

Comment 53 Andre 2008-11-17 04:44:08 UTC

Created attachment 20367 [details]
Xorg.0.log after s2ram on xorg-server master and intel-dri2 branch with lots of debug output

Comment 54 Eric Anholt 2008-12-15 09:36:20 UTC

reporter says this is fixed with current code

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.