Bug 25658

Summary: Move to loadable firmware breaks suspend on nVidia 9800M
Product: xorg Reporter: Tavian Barnes <tavianator>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium    
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Tavian Barnes 2009-12-15 11:25:54 UTC
Since the move to loadable firmware ctx voodoo, suspend is broken on my nVidia 9800M (NV92).  Upon running pm-suspend, or hitting the keyboard shortcut, I get a flashing cursor on a black screen, but it sits there forever instead of suspending.  There's no response to magic SysRq or ssh -- I have to hard-reset it.  To be sure, I bisected it down to the commits starting with 6303a1a6ab13da61b0352101e7a974ca446d6a36 (use fw loader interface for ctxprog/ctxvals).  Anything before that seems to work; anything after c2f85058e99c542a82cfc893fbe5ebd2b86c666e (add back the ctxprog/ctxvals we have as loadable firmware) is broken.
Comment 1 Tavian Barnes 2009-12-15 11:27:43 UTC
I guess I should add that I'm using KMS and X.  Suspend without starting X does still work.
Comment 2 Mark Carey 2009-12-15 15:47:49 UTC
What disk partition does nouveau leave firmware on or is it built into initrd?

If the partition containing firmware isnt mounted when nouveau tries
to reload on resume wont there be problems?

On Wed, Dec 16, 2009 at 8:27 AM,  <bugzilla-daemon@freedesktop.org> wrote:
> http://bugs.freedesktop.org/show_bug.cgi?id=25658
>
>
>
>
>
> --- Comment #1 from Tavian Barnes <tavianator@gmail.com>  2009-12-15 11:27:43 PST ---
> I guess I should add that I'm using KMS and X.  Suspend without starting X does
> still work.
>
>
> --
> Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are the assignee for the bug.
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/nouveau
>
Comment 3 Tavian Barnes 2009-12-15 16:01:21 UTC
(In reply to comment #2)
> What disk partition does nouveau leave firmware on or is it built into initrd?
> 
> If the partition containing firmware isnt mounted when nouveau tries
> to reload on resume wont there be problems?
> 

The firmware is in /lib/firmware/nouveau on both my root partition and the initrd.  But I don't think that's the issue anyway, since it never suspends in the first place; it just sits there blinking.  I'm wondering if there was a mistake in extracting the firmware from the C source, or if there's something broken about the firmware interface and suspend.  Does it work with other (non-NV92) cards?
Comment 4 Xavier 2009-12-15 16:21:29 UTC
I also suspected loadable firmware initially, but then I quickly moved to suspecting all the ttm changes.
It's the last merge that broke it for me :
commit 12b59e64a4df9f4298a2abb8b331074adea113ed

curro pointed me to the following fix on dri-devel by airlied :
http://www.mail-archive.com/dri-devel@lists.sourceforge.net/msg45533.html

It seems that fixed suspend for me. But I still have troubles resuming, I get pixmap corruption problem in X. After reloading X, it is fine.

However, there is also a known bug in nouveau code, fixed by curro :
http://lists.freedesktop.org/archives/nouveau/2009-December/004266.html

This last patch seems to correct a very obvious bug/overlook. But it causes other problems for me when resuming :
Dec 15 18:50:06 xps-m1530 kernel: [   42.706924] [drm] nouveau 0000:01:00.0: Reinitialising engines...
Dec 15 18:50:06 xps-m1530 kernel: [   42.709976] [drm] nouveau 0000:01:00.0: Restoring GPU objects...
Dec 15 18:50:06 xps-m1530 kernel: [   42.742509] [drm] nouveau 0000:01:00.0: Restoring mode...
Dec 15 18:50:06 xps-m1530 kernel: [   42.742513] [drm] nouveau 0000:01:00.0: bo ffff88011d2b1400 pinned elsewhere: 0x00000002 vs 0x00000004
Dec 15 18:50:06 xps-m1530 kernel: [   42.746882] [drm] nouveau 0000:01:00.0: bo ffff88011d2b1400 pinned elsewhere: 0x00000002 vs 0x00000004
Dec 15 18:50:06 xps-m1530 kernel: [   42.746885] [drm:drm_helper_resume_force_mode] *ERROR* failed to set mode on crtc ffff88011c51c000

The screen stays black. Reloading nouveau blindly fixes it though.
Some other times, resuming caused hardlock, but I suspect it's related to this same bo/ttm problem.

Just give me another year and I might find something :)
Comment 5 Tavian Barnes 2009-12-15 22:44:52 UTC
(In reply to comment #4)
> I also suspected loadable firmware initially, but then I quickly moved to
> suspecting all the ttm changes.

That occurred to me too actually, but for some reason there's no nouveau module when you check out the commits that got merged, so I couldn't properly bisect it.  I guess I could just add the nouveau code back myself and then compile.

> It's the last merge that broke it for me :
> commit 12b59e64a4df9f4298a2abb8b331074adea113ed
> 
> curro pointed me to the following fix on dri-devel by airlied :
> http://www.mail-archive.com/dri-devel@lists.sourceforge.net/msg45533.html
> 
> It seems that fixed suspend for me. But I still have troubles resuming, I get
> pixmap corruption problem in X. After reloading X, it is fine.
> 
> However, there is also a known bug in nouveau code, fixed by curro :
> http://lists.freedesktop.org/archives/nouveau/2009-December/004266.html
> 
> This last patch seems to correct a very obvious bug/overlook. But it causes
> other problems for me when resuming :
> Dec 15 18:50:06 xps-m1530 kernel: [   42.706924] [drm] nouveau 0000:01:00.0:
> Reinitialising engines...
> Dec 15 18:50:06 xps-m1530 kernel: [   42.709976] [drm] nouveau 0000:01:00.0:
> Restoring GPU objects...
> Dec 15 18:50:06 xps-m1530 kernel: [   42.742509] [drm] nouveau 0000:01:00.0:
> Restoring mode...
> Dec 15 18:50:06 xps-m1530 kernel: [   42.742513] [drm] nouveau 0000:01:00.0: bo
> ffff88011d2b1400 pinned elsewhere: 0x00000002 vs 0x00000004
> Dec 15 18:50:06 xps-m1530 kernel: [   42.746882] [drm] nouveau 0000:01:00.0: bo
> ffff88011d2b1400 pinned elsewhere: 0x00000002 vs 0x00000004
> Dec 15 18:50:06 xps-m1530 kernel: [   42.746885]
> [drm:drm_helper_resume_force_mode] *ERROR* failed to set mode on crtc
> ffff88011c51c000
> 
> The screen stays black. Reloading nouveau blindly fixes it though.
> Some other times, resuming caused hardlock, but I suspect it's related to this
> same bo/ttm problem.

Similarly for me, except I get a BUG on resume.  No black screen though; I could see the BUG perfectly.  I can reproduce it and write down what it said if it's important.  I'll try with the dri-devel patch next.

> Just give me another year and I might find something :)

Haha.  Yeah, to me that nouveau bug you mentioned was about the least obvious "obvious" bug I've ever seen.  I really should read up on driver programming.
Comment 6 Ben Skeggs 2009-12-15 23:09:50 UTC
Latest nouveau git should fix the issues mentioned here.
Comment 7 Tavian Barnes 2009-12-15 23:54:59 UTC
(In reply to comment #6)
> Latest nouveau git should fix the issues mentioned here.

Indeed it does, thanks.  I noticed the slew of new commits right after my last comment.
Comment 8 Xavier 2009-12-16 03:40:48 UTC
The bad side effects of curro's patch were apparently fixed by :
    drm/nouveau: fix bug causing pinned buffers to lose their NO_EVICT flag

So all is well.. almost :)
There are still some corruption in X after resuming, but it is significantly less bad than before (when I had only airlied ttm fix, and not the nouveau ones).
Now the corruptions seem to be mostly font/cursor related.

I have screenshot of the corrupted fonts, but the screenshot of the wrong cursor looks right.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.