Bug 40572

Summary: [nvac] Nouveau drm causes failure to resume from suspend with any kernel newer than 2.6.36
Product: xorg Reporter: Daniel Lindgren <dali.spam>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED INVALID QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
lspci -vvv
none
git bisect
none
dmesg
none
netconsole with kernel panic none

Description Daniel Lindgren 2011-09-02 00:03:22 UTC
Created attachment 50834 [details]
lspci -vvv

Hello.

I have problems resuming from suspend on my machine with a XFX GeForce 9300 motherboard using any kernel later than 2.6.36.

I've done a git bisect and this is the final result:

----

02c30ca0a1d6d8b878fc32f47b3b25192ef4a8ef is the first bad commit
commit 02c30ca0a1d6d8b878fc32f47b3b25192ef4a8ef
Author: Ben Skeggs <bskeggs@redhat.com>
Date:   Thu Sep 16 16:17:35 2010 +1000

   drm/nv50: import initial clock get/set routines + hook up pm engine

   This will make nouveau_pm attempt to report the card's current performance
   level both during bootup, and through sysfs.

   This is a very initial implementation, and can be improved a *lot*

   Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

:040000 040000 626c66f9899b5542726bff4d994cdeab5c6c52af
8ec449fdc267d205dd33ba4f4ca51347e6e6b7a0 M      drivers

----

The problem is 100% repeatable and goes like this:

- Boot any kernel later than 2.6.36, while testing I boot the kernel's recovery mode option.
- Suspend machine, while testing I run pm-suspend.
- Attempt resume.
- The machine sort of wakes, but the screen is still in power save mode and the keyboard doesn't light up. After roughly 30 seconds, the machine reboots.
- Nothing in the logs after reboot.

I am currently running Debian Wheezy, but I've had the same problem with Debian Squeeze (with backported kernel 2.6.38.2) and Fedora 15.

Resume works as expected with kernel 2.6.36.4 and older. If I run a working kernel and add "nomodeset" to the kernel command line, the machine behaves similarly to when the problem occurs, with the difference that it doesn't reboot after 30 secs, I have to reset it manually.

At the moment I'm running 2.6.35.14, which is the latest currently maintained kernel that doesn't have any issues on my machine. Hopefully this won't be the last usable kernel on the machine, the hardware isn't very old.

I'd be happy to supply any information needed, I've attached lspci output and the git bisect log to start with. Please note that the last 4-5 good bisects actually had other issues, during boot the screen goes blank when modesetting kicks in, I had to log on blindly and run "pm-suspend". After successful suspend/resume, the screen starts working, both on the console and in X.

Cheers,
Daniel
Comment 1 Daniel Lindgren 2011-09-02 00:04:18 UTC
Created attachment 50835 [details]
git bisect
Comment 2 Marcin Slusarz 2011-09-02 09:48:01 UTC
Can you comment out nouveau_pm_init / nouveau_pm_fini calls from drivers/gpu/drm/nouveau/nouveau_state.c on e.g. 3.0 kernel and see if it fixes resume?

And please attach dmesg output.
Comment 3 Daniel Lindgren 2011-09-03 00:26:45 UTC
Created attachment 50856 [details]
dmesg
Comment 4 Daniel Lindgren 2011-09-03 00:27:55 UTC
(In reply to comment #2)
> Can you comment out nouveau_pm_init / nouveau_pm_fini calls from
> drivers/gpu/drm/nouveau/nouveau_state.c on e.g. 3.0 kernel and see if it fixes
> resume?
> 
> And please attach dmesg output.

Tried it on 3.0.4, no difference. Attached dmesg, but be aware that I can't get any info from the computer after a failed resume until it has been rebooted.
Comment 5 Marcin Slusarz 2011-09-03 01:53:15 UTC
Ok. So this commit is unlikely to be the culprit.
2 another ideas:
- does suspend/resume works at all without nouveau?
- can netconsole catch anything after failed resume?
Comment 6 Daniel Lindgren 2011-09-03 03:25:33 UTC
(In reply to comment #5)
> Ok. So this commit is unlikely to be the culprit.
> 2 another ideas:
> - does suspend/resume works at all without nouveau?
> - can netconsole catch anything after failed resume?

Blacklisted nouveau with 3.0.4, the screen is still black after resume but the machine is alive and I can ssh into it. No reboot.

Netconsole didn't produce anything after resume. Verified that it worked by inserting a USB stick before resume, but nothing logged after. No ping responses either, the network's probably gone.

Starting netconsole however changed the behaviour a bit, the machine reboots a little faster, about 20 seconds after I hit the power button to resume. Without netconsole it takes 40 seconds until it reboots.
Comment 7 Daniel Lindgren 2011-09-03 03:49:58 UTC
> Netconsole didn't produce anything after resume. Verified that it worked by
> inserting a USB stick before resume, but nothing logged after. No ping
> responses either, the network's probably gone.

I finally managed to get some output and there is a kernel panic, see attached netconsole.log.
Comment 8 Daniel Lindgren 2011-09-03 03:50:34 UTC
Created attachment 50865 [details]
netconsole with kernel panic
Comment 9 Marcin Slusarz 2011-09-03 07:33:46 UTC
1) If without nouveau you cannot resume properly, then your problems with nouveau are not nouveau fault. Maybe you could fiddle with some BIOS options and see how it will affect s/r.

2) Machine Check Exceptions report *hardware errors* - your hardware is probably dying.

I have no idea why it started with kernel upgrade.
Comment 10 Daniel Lindgren 2011-09-03 09:27:32 UTC
(In reply to comment #9)
> 1) If without nouveau you cannot resume properly, then your problems with
> nouveau are not nouveau fault. Maybe you could fiddle with some BIOS options
> and see how it will affect s/r.

I've done that in the past, when the problems started with kernel 2.6.37. Couldn't find anything that improved the situation.

> 2) Machine Check Exceptions report *hardware errors* - your hardware is
> probably dying.
> 
> I have no idea why it started with kernel upgrade.

1) Resume/suspend with any kernel older than 2.6.37 works without a hitch, no MCE.
2) Besides resume problems, the machine is rock solid with both older and newer kernels.
3) When nouveau is blacklisted, there is no MCE. The machine wakes but doesn't activate the monitor correctly, I can however ssh into the machine.
4) I have no problems with resume/suspend when running Windows 7 on the machine and no crashes either.

Semms unlikely to me that if the hardware truly is faulty the only situation it ever shows up is when nouveau is loaded during a resume from suspend with a kernel newer than 2.6.36.
Comment 11 Ilia Mirkin 2013-08-18 18:09:21 UTC
It appears that this bug report has laid dormant for quite a while. Sorry we haven't gotten to it. Since we fix bugs all the time, chances are pretty good that your issue has been fixed with the latest software. Please give it a shot. (Linux kernel 3.10.7, xf86-video-nouveau 1.0.9, mesa 9.1.6, or their git versions.) If upgrading to the latest isn't an option for you, your distro's bugzilla is probably the right destination for your bug report.

In an effort to clean up our bug list, we're pre-emptively closing all bugs that haven't seen updates since 2011. If the original issue remains, please make sure to provide fresh info, see http://nouveau.freedesktop.org/wiki/Bugs/ for what we need to see, and re-open this one.

Thanks,

The Nouveau Team

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.