Bug 86115 - [NV84] gpu hang on resume from suspend to ram
Summary: [NV84] gpu hang on resume from suspend to ram
Status: NEW
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-11-10 14:53 UTC by Lars Müller
Modified: 2015-01-25 20:14 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
kernel ooops trace (3.16.6) (4.53 KB, text/plain)
2014-11-10 15:37 UTC, Lars Müller
no flags Details
Retest S3 after calling: echo 1 > /sys/power/pm_async (6.04 KB, text/plain)
2014-11-10 15:57 UTC, Lars Müller
no flags Details
Retest S3 after calling: echo 0 > /sys/power/pm_async (10.89 KB, text/plain)
2014-11-10 19:05 UTC, Lars Müller
no flags Details
dmesg output as requested with comment #6 (53.59 KB, text/plain)
2014-11-10 19:14 UTC, Lars Müller
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Lars Müller 2014-11-10 14:53:21 UTC
The system in question has a NVIDIA Corporation G84 [GeForce 8600 GT] (rev a1) graphics card installed.

Triggering suspend to memory works but the resume fails and leads to blinking shift and scroll lock keyboard leds.

This is a fresh install, no upgrade.

With the same hardware suspend to memory worked with openSUSE 13.1.

I've also tested kernel-vanilla 3.18.rc3.next.20141106-1.1.g2af2e06 and it shows the same defect as with openSUSE 13.2.

Via netconsole I had been able to capture an even more detailed trace with no_console_suspend set.  Thanks to Takashi Iwai for his debugging hints.

The same initially was reported at the SUSE bug tracked as it was first seen with the openSUSE 13.2 kernel-default.
https://bugzilla.suse.com/show_bug.cgi?id=904483
Comment 1 Ilia Mirkin 2014-11-10 15:03:36 UTC
(In reply to Lars Müller from comment #0)
> With the same hardware suspend to memory worked with openSUSE 13.1.

What kernel is that?
Comment 2 Lars Müller 2014-11-10 15:30:39 UTC
(In reply to Ilia Mirkin from comment #1)
> (In reply to Lars Müller from comment #0)
> > With the same hardware suspend to memory worked with openSUSE 13.1.
> 
> What kernel is that?

3.11.10
Comment 3 Lars Müller 2014-11-10 15:37:53 UTC
Created attachment 109224 [details]
kernel ooops trace (3.16.6)

Add lost kernel ooops trace and more detailes kernel RPM details.

kernel-default-3.16.6-2.1.x86_64

Source Timestamp: 2014-10-20 15:47:22 +0200
GIT Revision: feb42eacae8d76252ab69a58d05a0be2cebd8a08
GIT Branch: openSUSE-13.2
Comment 4 Takashi Iwai 2014-11-10 15:45:19 UTC
In order to make things a bit more straight, could you retest S3 after doing:
  echo 1 > /sys/power/pm_async
?
Comment 5 Lars Müller 2014-11-10 15:57:10 UTC
Created attachment 109227 [details]
Retest S3 after calling: echo 1 > /sys/power/pm_async
Comment 6 Ilia Mirkin 2014-11-10 15:58:10 UTC
There was a change which fixed fbcon acceleration after resume -- before it effectively switched into software mode on resume, but that got fixed. Unfortunately if the GPU is unhappy on resume, that means that the console also hangs.

Specifically it was commit ecf24de071f4f6cea79ecef5d990794df5875ee1. Could you test a few kernels to see whether it's the same issue you're seeing?

e.g. does 3.14 work while 3.15 fails? If so, can you do a bisect to confirm (and if not, also do a bisect to figure out what broke it).

git bisect start v3.16 v3.13 -- drivers/gpu/drm/nouveau

should cover the whole range you mentioned (3.13 = good, 3.16 = bad).

Separately, could you attach a full dmesg after boot? That should answer a number of basic questions about some of your setup's specifics. Additionally, how is your monitor(s) connected?
Comment 7 Takashi Iwai 2014-11-10 16:15:10 UTC
FYI, there are a few kernel packages available in OBS home:tiwai:kernel:3.13, home:tiwai:kernel:3.14, and home:tiwai:kernel:3.15 repos.

For 3.12, you can use OBS Kernel:SLE12.  Kernel:stable is for 3.17, and Kernel:HEAD is for 3.18-rc.
Comment 8 Takashi Iwai 2014-11-10 16:20:02 UTC
(In reply to Takashi Iwai from comment #4)
> In order to make things a bit more straight, could you retest S3 after doing:
>   echo 1 > /sys/power/pm_async
> ?

Doh, sorry, I wanted to see the result with *disabled* async  PM.

Lars, please retest with

   echo 0 > /sys/power/pm_async

Certainly there is a GPU hang at PM, but I'd like to know whether it's the real culprit of the whole hang.
Comment 9 Lars Müller 2014-11-10 19:05:37 UTC
Created attachment 109236 [details]
Retest S3 after calling: echo 0 > /sys/power/pm_async

With /sys/power/pm_async set to 0 the system no longer crashes while resume.

I'm able to ssh into the system but x.org doesn't come back.  The two screens stay black.
Comment 10 Lars Müller 2014-11-10 19:14:05 UTC
Created attachment 109237 [details]
dmesg output as requested with comment #6
Comment 11 Takashi Iwai 2014-11-10 19:36:07 UTC
(In reply to Lars Müller from comment #9)
> Created attachment 109236 [details]
> Retest S3 after calling: echo 0 > /sys/power/pm_async
> 
> With /sys/power/pm_async set to 0 the system no longer crashes while resume.
> 
> I'm able to ssh into the system but x.org doesn't come back.  The two
> screens stay black.

OK, then I guess you're seeing two distinct issues.  One is the broken graphics by S3, and another is the kernel panic due to the stall of disk PM.  Disabling async PM seems curing the latter while the graphics remains broken.

Could you try to boot with nomodeset boot option in runlevel 3, and try S3, but without pm_async adjustment?  The graphics can be broken in this case after S3, but if the system alives, you should be able to remote-login.

If S3 crashes even without nouveau, we can check the two issues above completely separately.  If S3 crash happens only with nouveau, it implies that the disk problem is somehow related with nouveau problem.
Comment 12 Lars Müller 2014-11-25 19:13:36 UTC
(In reply to Takashi Iwai from comment #7)
> FYI, there are a few kernel packages available in OBS
> home:tiwai:kernel:3.13, home:tiwai:kernel:3.14, and home:tiwai:kernel:3.15
> repos.

                                        s2disk                 s2ram

kernel-default-3.12.33-2.1.g26c7845     suspend yes            resume works
                                        resume => fresh boot   display is dead

kernel-default-3.13.7-4.1.ga68bc7c      suspend yes            ooops
                                        resume fresh boot

kernel-default-3.14.6-1.1               suspend yes            ooops
                                        resume fresh boot

kernel-default-3.15.8-1.1               suspend yes            ooops
                                        resume fresh boot

kernel-default-3.16.6-2.1               suspend yes            ooops
                                        resume fresh boot

kernel-default-3.17.4-1.1.gd50009e      ok                     ooops

kernel-default-3.18.rc4-3.1.g0521fb3    suspend yes            ooops
                                        resume fails
                                        not net; no X
Comment 13 Lars Müller 2014-11-25 21:35:59 UTC
(In reply to Takashi Iwai from comment #11)
> 
> Could you try to boot with nomodeset boot option in runlevel 3, and try S3,
> but without pm_async adjustment?  The graphics can be broken in this case
> after S3, but if the system alives, you should be able to remote-login.
> 
> If S3 crashes even without nouveau, we can check the two issues above
> completely separately.  If S3 crash happens only with nouveau, it implies
> that the disk problem is somehow related with nouveau problem.

It crashes even without nouveau when booted with nomodeset set.
Comment 14 Lars Müller 2014-12-12 16:31:32 UTC
Is more testing or additional feedback required from my side to move this bug forward?
Comment 15 Tobias Klausmann 2014-12-12 19:11:46 UTC
(In reply to Lars Müller from comment #14)
> Is more testing or additional feedback required from my side to move this
> bug forward?

You could try to boot and test with

modprobe.blacklist=nouveau to make sure the module isn't loaded.

If you really have time on your hands, try to bisect the problem: 3.11 as a good starting point and 3.13 or 3.14 as bad, but that will really eat time :)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.