Bug 59168 - [nvc1/Quardro1000M|nvc3/Quadro2000M] graphic garbage/corruption/noise on resume
[nvc1/Quardro1000M|nvc3/Quadro2000M] graphic garbage/corruption/noise on resume
Status: RESOLVED FIXED
Product: xorg
Classification: Unclassified
Component: Driver/nouveau
git
x86-64 (AMD64) Linux (All)
: high major
Assigned To: Nouveau Project
Xorg Project Team
:
: 59858 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-01-09 14:36 UTC by michael.weirauch
Modified: 2014-12-08 01:27 UTC (History)
17 users (show)

See Also:


Attachments
dmesg 3.8.0-rc2 gdm resume garbage/noise nvc3 nouveau.debug=trace (806.21 KB, text/plain)
2013-01-09 14:36 UTC, michael.weirauch
no flags Details
gdm 3.8.0-rc2 resume garbage/noise nvc3 (1.66 MB, image/jpeg)
2013-01-09 14:39 UTC, michael.weirauch
no flags Details
dmesg 3.8.0-rc5 resume garbage/noise nvc3 (275.01 KB, text/plain)
2013-02-01 10:31 UTC, michael.weirauch
no flags Details
dmesg 3.8.0-rc5 resume garbage/noise nvc3 (900.70 KB, text/plain)
2013-02-07 09:05 UTC, michael.weirauch
no flags Details
fix suspend bug in nvc0 fence implementation (1.74 KB, patch)
2013-02-19 19:53 UTC, Marcin Slusarz
no flags Details | Splinter Review
dmesg 3.7.9, suspend / resume, garbage/noise NVidia GF106GLM [Quadro 2000M] (169.40 KB, text/plain)
2013-02-20 08:32 UTC, Petr Stastny
no flags Details
Complete dmesg from suspend/resume on 3.8.0 (246.51 KB, text/plain)
2013-02-26 16:14 UTC, Petr Stastny
no flags Details
kernel log with nouveau.debug=trace (90.21 KB, text/plain)
2013-03-12 11:49 UTC, Rolf Offermanns
no flags Details
Remove NVDEV_ENGINE_COPY1 from GF116 to mirror GF106 (842 bytes, text/plain)
2014-03-03 02:08 UTC, Laurence Lee
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description michael.weirauch 2013-01-09 14:36:39 UTC
Created attachment 72725 [details]
dmesg 3.8.0-rc2 gdm resume garbage/noise nvc3 nouveau.debug=trace

As follow up to bug 50121:

Steps to reproduce:
* suspend from gdm
* resume shows total garbage/noise in gdm login screen
* mouse cursor visible and movable (machine seems not under load)
* switching to tty1 possible
* switching to gdm possible (still noise)
* restarting X freezes machine (hard reboot required)

Side Notes:
* I should note that an experimental "nouveau.config=DEVINIT=NvForcePost=1" leaves me with a black screen on boot. (no plymouth)
  (Dunno if related. If so, I can upload a dmesg as well.)
* A perhaps related issue (if not related to nouveau driver) from the Radeon driver is bug 57774 showing a similar garbage/noise on resume.

Setup:
ThinkPad W520 4276CTO NVC3 (2000M)
openSUSE 12.2 + updated gnome3.6/xorg/mesa/xorg/xf86-video-nouveau
$ zypper search -si --match-exact xorg-x11-driver-video-nouveau xorg-x11-server Mesa libdrm_nouveau2 kernel
S | Name                          | Typ   | Version                          | Arch   
--+-------------------------------+-------+----------------------------------+--------
i | Mesa                          | Paket | 9.0.1-202.2                      | x86_64 
i | kernel                        | Paket | 3.8.0_rc2_1_desktop_nouveau01+-1 | x86_64 
i | libdrm_nouveau2               | Paket | 2.4.40-99.1                      | x86_64 
i | xorg-x11-driver-video-nouveau | Paket | 1.0.6-58.1                       | x86_64 
i | xorg-x11-server               | Paket | 7.6_1.13.1-215.1                 | x86_64 

dmesg-excerpt on resume:
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 3 [0x007fd18000 gnome-shell[1396]]
nouveau E[  PGRAPH][0000:01:00.0] GPC0/TPC3/MP: 0x001beff2 0x0000000f
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 3 [0x007fd18000 gnome-shell[1396]]
nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa204021e
nouveau E[   PFIFO][0000:01:00.0] write fault at 0x00005a0000 [PAGE_NOT_PRESENT] from PGRAPH/(unknown enum 0x00000010) on channel 0x007fd18000 [gnome-shell[1396]]
Comment 1 michael.weirauch 2013-01-09 14:39:27 UTC
Created attachment 72726 [details]
gdm 3.8.0-rc2  resume garbage/noise nvc3
Comment 2 michael.weirauch 2013-01-17 08:59:47 UTC
Issue still present on 3.8.0_rc3 nouveau as of 2012-01-17 and updated drm:

S | Name                          | Type    | Version                          | Arch  
--+-------------------------------+---------+----------------------------------+-------
i | Mesa                          | package | 9.0.1-202.7                      | x86_64
i | kernel                        | package | 3.8.0_rc3_1_desktop_nouveau01+-7 | x86_64
i | libdrm_nouveau2               | package | 2.4.41-105.1                     | x86_64
i | xorg-x11-driver-video-nouveau | package | 1.0.6-58.2                       | x86_64
i | xorg-x11-server               | package | 7.6_1.13.1-218.5                 | x86_64

Somebody got an idea about this issue or some hints on what I should try?

Btw, this is with the ThinkPad running closed in the dock and external monitor attached via DVI to DP-3. Opening the lid will not turn on the laptop panel and will also disable the external signal to DP-3. This is another story I think, though.

$ xrandr | grep "connect"
LVDS-1 unknown connection (normal left inverted right x axis y axis)
VGA-1 disconnected (normal left inverted right x axis y axis)
DP-1 disconnected (normal left inverted right x axis y axis)
DP-2 disconnected (normal left inverted right x axis y axis)
DP-3 connected 1920x1200+0+0 (normal left inverted right x axis y axis) 518mm x 324mm
Comment 3 michael.weirauch 2013-02-01 10:31:57 UTC
Created attachment 74037 [details]
dmesg 3.8.0-rc5 resume garbage/noise nvc3

Still the same on 3.8.0-rc5.

dmesg excerpt:
[  124.859583] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 3 [0x007fd18000 gnome-shell[1472]]
[  124.859603] nouveau E[  PGRAPH][0000:01:00.0] GPC0/TPC2/MP: 0x001beff2 0x0000000f
[  124.860408] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 3 [0x007fd18000 gnome-shell[1472]]
[  124.860414] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa204021e
[  125.169451] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 3 [0x007fd18000 gnome-shell[1472]]
[  125.169460] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa204021e
[  125.169536] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 3 [0x007fd18000 gnome-shell[1472]]
[  125.169544] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa204021e

Components:
Name                          | Typ   | Version                           
------------------------------+-------+-----------------------------------
Mesa                          | Paket | 9.0.2-210.1                       
kernel                        | Paket | 3.8.0_rc5_1_desktop_nouveau01+-15 
libdrm_nouveau2               | Paket | 2.4.41-105.1                      
xorg-x11-driver-video-nouveau | Paket | 1.0.6+git@2013-01-29                       
xorg-x11-server               | Paket | 7.6_1.13.2-223.1
Comment 4 michael.weirauch 2013-02-01 10:38:12 UTC
*** Bug 59858 has been marked as a duplicate of this bug. ***
Comment 5 michael.weirauch 2013-02-07 09:05:44 UTC
Created attachment 74325 [details]
dmesg 3.8.0-rc5 resume garbage/noise nvc3

Regular report with updated companion components:

* Still same garbage as before. System luckily didn't lock up when killing X.
* Can use system after restarting X.
* Garbage/Noise switched to white after "Tab"ing and moving mouse a bit on gdm screen.

demsg excerpt:
[  197.939969] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 3 [0x007fd18000 gnome-shell[1476]]
[  197.939992] nouveau E[  PGRAPH][0000:01:00.0] GPC0/TPC1/MP: 0x001beff2 0x0000000f
[  197.940851] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 3 [0x007fd18000 gnome-shell[1476]]
[  197.940858] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa204021e
[  197.952710] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 3 [0x007fd18000 gnome-shell[1476]]
[  197.952722] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa204021e
[  197.952805] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 3 [0x007fd18000 gnome-shell[1476]]
[  197.952816] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa204021e

Components:
Name                          | Typ   | Version                           
------------------------------+-------+-----------------------------------
Mesa                          | Paket | 9.0.98.82-221.1                   
kernel                        | Paket | 3.8.0_rc5_1_desktop_nouveau01+-15 
libdrm_nouveau2               | Paket | 2.4.42-109.1                      
xorg-x11-driver-video-nouveau | Paket | 1.0.6+git@2013-02-07                       
xorg-x11-server               | Paket | 7.6_1.13.99.901-226.1
Comment 6 Tom Callaway 2013-02-07 16:44:14 UTC
For what it is worth, I can reproduce this as well on Fedora 18, ThinkPad W520 4270CTO - NVIDIA Corporation GF108GLM [Quadro 1000M]
Comment 7 michael.weirauch 2013-02-13 11:52:09 UTC
Still present on nouveau 3.8.0-rc7 as of 2013-02-13.

Name                          | Typ   | Version                          
------------------------------+-------+----------------------------------
Mesa                          | Paket | 9.0.98.83-222.1                  
kernel                        | Paket | 3.8.0_rc7_1_desktop_nouveau01+-19
libdrm_nouveau2               | Paket | 2.4.42-109.1                     
xorg-x11-driver-video-nouveau | Paket | 1.0.6-58.5                       
xorg-x11-server               | Paket | 7.6_1.13.99.901-226.2
Comment 8 Andrew Meredith 2013-02-13 12:09:10 UTC
I have switched to using the proprietary nvidia drivers as this issue was too debilitating for daily use. I can confirm:

1 - Both suspend and resume are now flawless.

2 - The issue I had been having with gnome-shell repeatedly crashing has now apparently stopped.

Sorry to be the bearer of bad news.

Dell Latitude 6530
kernel-3.7.6-201.fc18.x86_64
xorg-x11-drv-nouveau-1.0.6-1.fc18.x86_64
mesa-dri-drivers-9.0.1-4.fc18.i686
Comment 9 Petr Stastny 2013-02-19 10:17:53 UTC
I am experiencing the same problem on my w520 with Nvidia GF106 [Quadro 2000M].. I get the same symptoms / messages.

I noticed that suspend / resume works if using libdrm-nouveau1a only (without libdrm-nouveau2), however I the mouse gets lost on resume. If using libdrm-nouveau2, I get the problems.. Maybe the problem is somehow related to dri2?
Comment 10 Ronald 2013-02-19 10:22:29 UTC
Michael Weirauch,

I noticed your whitty comment on bug #50121 and thought that I should report my debug findings on issue's like these. I have a comparable bug with my 7300GT (NV4B) at bug #23223. The card suspends fine, but resume's with a lot of garbage.

- Altough both cards are not the same, in fact the differences are probably huge.
- Altough I'm not a developer or expert in any of this.
I'm just sharing my knowledge here so you could hopefully aid the developers on doing their magic.

First:
- Martin Slusarz mentions in bug #23223 comment 18 a script limiter. Maybe you can try. I report my results at comment 19 in that same bug. Please note there is an off by one error mentioned in comment 20 you should take into account if you start to use it. But it works fine.

This method probably helps you, as "nouveau.config=DEVINIT=NvForcePost=1" also gave garbage in your case. Which means that these scripts are doing something wrong or unexpected/unknown.

Do this with the latest git kernel plus the latest nouveau tree.

Second:
- Comment 8 in this bug mentioned that switching to the proprietary driver fixed his issue. This is not bad news per sé. As this could give hope of gaining more intel by doing an mmiotrace of a working state:

http://nouveau.freedesktop.org/wiki/MmioTrace

Also check bug #23223 on my findings while doing this. It's best to not use X or Wayland, but only to enable udev. You have to use mmiotrace across a suspend/resume cycle. I think without using X is the best way to do it. Try to compress the resulting file with 'xz --best'. I did that, and was able to upload it to the bug itself.

Do this with the latest git kernel plus the latest nouveau tree.

Third and final:
- I made a VBIOS dump to aid in debugging, which was also requested by Marcin Slusarz.

https://bugs.freedesktop.org/show_bug.cgi?id=23223#c14

You have to use a v3.6(.9) kernel to use the mechanism I used since that piece of infrastructure was not ported to later kernel versions during the big rework in v3.7. Also I'm not sure, but it is said you can also retrieve a VBIOS from an MMIOTRACE. But alas, I'm not sure so it might be a good idea to seperately post it.

Above procedures should keep you occupied on a nice rainy sunday afternoon. If you are going to do this, please mention that you do this on IRC. Maybe someone might get interested in this as more information has become available.
Comment 11 Petr Stastny 2013-02-19 16:08:48 UTC
Suspend/resume is working on my w520 with NV30 using kernel 3.4.32!

After trying many different vanilla kernel versions (3.2.x/3.4.x/3.6.x/3.7x/3.8.x) I finally found out that using 3.4.32 the suspend/resume works flawlessly!

As a side note: starting with the kernel versions 3.2.x up to 3.4.32 there is a dedicated nv30x.c code (along with the nv40) in the kernel. Starting with 3.7 the code file for nv30 disappeared, however, the nv40 is still present.

This is somehow mysterious for me, as the nv30 family is a bit different from nv40.. Does anybody have a clue for this?
Comment 12 A. Bikadorov 2013-02-19 17:27:42 UTC
same problem here, really annoying.

NVC1 (GT540M)
kernel 3.7.7-1-ARCH

mesa 9.0.2-1
nouveau-dri 9.0.2-1
xf86-video-nouveau 1.0.6-1
xorg-server-1.13.2-1
Comment 13 Marcin Slusarz 2013-02-19 19:53:02 UTC
Created attachment 75129 [details] [review]
fix suspend bug in nvc0 fence implementation

Guys, try this patch by Maarten Lankhorst.
Comment 14 Petr Stastny 2013-02-19 20:59:49 UTC
Marcin, do you think the patch could also help on nv30?
Comment 15 Marcin Slusarz 2013-02-19 21:09:13 UTC
GF106 is nvc3, not nv30 - no idea where did you get it from...
so: no, it won't help for nv30, and yes, it will help for nvc3
Comment 16 Petr Stastny 2013-02-19 21:14:07 UTC
My post #11 is about my NV30 - so I thought that it may help too.

What can I do about my NV30? Do you have some suggestions / recommendations?
Comment 17 Marcin Slusarz 2013-02-19 21:37:55 UTC
You cannot have nv30 card (produced ~8 years ago) in 1-2 years old laptop. That would be insane. If "w520" (which I assume is ThinkPad W520) was mentioned as a mistake, then please open new bug report.

http://nouveau.freedesktop.org/wiki/Bugs
http://nouveau.freedesktop.org/wiki/CodeNames
Comment 18 Petr Stastny 2013-02-19 21:58:27 UTC
Ok, that must be my mistake. lspci says:

01:00.0 VGA compatible controller: NVIDIA Corporation GF106GLM [Quadro 2000M] (rev a1)

yes, I have a Thinkpad W520

(In reply to comment #17)
> You cannot have nv30 card (produced ~8 years ago) in 1-2 years old laptop.
> That would be insane. If "w520" (which I assume is ThinkPad W520) was
> mentioned as a mistake, then please open new bug report.
> 
> http://nouveau.freedesktop.org/wiki/Bugs
> http://nouveau.freedesktop.org/wiki/CodeNames
Comment 19 Petr Stastny 2013-02-20 08:32:47 UTC
Created attachment 75164 [details]
dmesg 3.7.9, suspend / resume, garbage/noise NVidia GF106GLM [Quadro 2000M]
Comment 20 Petr Stastny 2013-02-20 08:36:27 UTC
Comment on attachment 75164 [details]
dmesg 3.7.9, suspend / resume, garbage/noise NVidia GF106GLM [Quadro 2000M]

I patched the kernel 3.7.9 with the patch recommended by Marcin yesterday and tried the suspend / resume cycle with the patched kernel. Unfortunatelly, the garbage is still there and the error messages 

[  100.755954] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x007fce1000]
[  100.758599] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e

keep on repeating until I stop X. One difference though - the error messages seem to repeat faster now than before the patch..

Any ideas?
Comment 21 Legitimate 2013-02-21 07:09:34 UTC
Same problem occurs for me, yet another person, on resume (ArchLinux)

01:00.0 VGA compatible controller: NVIDIA Corporation GF116 [GeForce GTX 550 Ti] (rev a1)

- linux 3.7.9-1
- xorg-server 1.13.2-1
- mesa 9.0.2-1
- nouveau-dri 9.0.2-1
- xf86-video-nouveau 1.0.6-1
- gnome 3.6
Comment 22 Ryan Turner 2013-02-21 21:03:32 UTC
I'm seeing what I think is the same problem on my Thinkpad T530.

When I resume the system, xscreensaver's lock screen always seems to look and behave normally, but when I unlock the screen, either the whole screen is corrupted, or all the previously opened windows will be drawn incorrectly. I can even lock the screen again and xscreensaver still looks fine. When unlocked, the system can still be used semi-blindly, and if I can feel my way to a terminal, I can start some SDL and OpenGL apps that work fine, while newly-opened GTK/etc apps are corrupted. Logging out or otherwise restarting X gets everything back to normal.

Much like others that have tested, the proprietary nvidia driver works fine.

dmesg:
 [85334.271612] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x001fcfa000]
 [85334.271621] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
 *repeats numerous times*

lspci: (Thought I would chime in as I have a chip that hasn't been mentioned yet.)
 01:00.0 VGA compatible controller: NVIDIA Corporation GF108 [Quadro NVS 5400M] (rev a1)

Packages: (Gentoo/~amd64) CFLAGS="-O2 -pipe -march=native"
 gentoo-sources      3.7.8    USE="-build -deblob -symlink"
 mesa                9.1_rc2  USE="classic egl gallium gles1 gles2 llvm nptl openvg osmesa pax_kernel shared-glapi xa xorg xvmc -bindist -debug -gbm -pic -r600-llvm-compiler (-selinux) -vdpau -wayland"
 libdrm              2.4.42   USE="libkms -static-libs"
 xorg-server         1.13.2   USE="ipv6 kdrive nptl suid udev xorg -dmx -doc -minimal (-selinux) -static-libs -tslib -xnest -xvfb"
 xf86-video-nouveau  1.0.6
Comment 23 Tom Callaway 2013-02-22 01:06:54 UTC
(In reply to comment #13)
> Created attachment 75129 [details] [review] [review]
> fix suspend bug in nvc0 fence implementation
> 
> Guys, try this patch by Maarten Lankhorst.

Tried that patch with git head 3.9, suspend still comes back to garbage + mouse cursor, dmesg logs are filled with this, repeating endlessly:

[  302.118755] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa0040a1e
[  302.119858] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000]
[  302.120869] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa0040a1e
[  302.121951] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000]
[  302.122961] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
[  302.126800] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000]
[  302.127434] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa0040a1e
[  302.128113] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000]
[  302.128726] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
[  314.138071] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000]
[  314.145020] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa0040a1e
[  314.146127] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000]
[  314.147142] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa0040a1e
[  314.148232] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000]
[  314.149266] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
[  314.153112] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000]
[  314.153780] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa0040a1e
[  314.154518] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000]
[  314.155163] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
[  314.155873] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000]
[  314.156526] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
[  314.158712] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000]
[  314.159432] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa0040a1e
[  314.160207] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000]
[  314.160919] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
[  314.160995] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000]
[  314.161001] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
[  314.164003] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000]
[  314.164622] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa0040a1e
[  314.165791] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000]
[  314.166415] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
[  314.167099] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000]
[  314.167713] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
[  314.168395] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000]
[  314.169017] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
Comment 24 Petr Stastny 2013-02-22 07:59:14 UTC
I compiled the nouveau kernel yesterday in the hope that the newest nouveau source will fix the suspend / resume issue and help with the hibernation. Nevertheless, it does not help.
It is even a bit worse since normal work is impossible with that kernel - the card seems to be very slow and locks..

However, I was able to use 3.4.32 for the normal suspend / resume without a glitch (coming from hibernate results in black screen though) for the last 5 days..

So it mean to me, the card was working already, but stopped to work probably in course of the 3.5.x refactoring..
Comment 25 Petr Stastny 2013-02-22 12:05:36 UTC
I was digging in the nouveau code a bit and looking where the endlessly repeating messages 

[   66.090322] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x007fce1000]
[   66.090358] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
[   66.090457] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x007fce1000]
[   66.090485] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e

It looks to me like something should be initialized but does not.

I got to the source code file nouveau/core/engine/graph/nvc0.c

        switch (nvc0_graph_class(priv)) {
        case 0x9097:
                nv_engine(priv)->sclass = nvc0_graph_sclass;
                break;
        case 0x9197:
                nv_engine(priv)->sclass = nvc1_graph_sclass;
                break;
        case 0x9297:
                nv_engine(priv)->sclass = nvc8_graph_sclass;
                break;
        }
What I am just curious about - I do not see here any implementation for nvc3 - is this maybe the problem or is nvc3 already handled by any of the 3 methods listed here?

When I find some time, I could debug this more. Can somebody point me in the right direction? Is there any documentation for the implementation? I would like to fix the problem :)

BTW: the code above is from 3.8-rc7
Comment 26 Petr Stastny 2013-02-26 14:15:55 UTC
Just compiled 3.8.0 and tested suspend resume. Although the resume is still not working, there are some changes. The error messages do not repeat endlessly anymore, instead I get:

[   97.941502] PM: Finishing wakeup.
[   97.942484] usb 2-1.4: USB disconnect, device number 3
[   97.941503] Restarting tasks ... done.
[   97.942949] video LNXVIDEO:01: Restoring backlight state
[   97.943075] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x007fe00000]
[   97.943120] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa0040800
[   97.944405] cdc_ncm 2-1.4:1.6 wwan0: unregister 'cdc_ncm' usb-0000:00:1d.0-1.4, Mobile Broadband Network Device
[   97.944885] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x007fe00000]
[   97.944897] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa0040800
[   97.944976] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x007fe00000]
[   97.944985] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa0040800
[   97.945063] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x007fe00000]
[   97.945071] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa0040800
[   97.945155] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x007fe00000]
[   97.945164] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa0040800
[   97.945237] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x007fe00000]
[   97.945246] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa0040800
[   97.945429] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x007fe00000]
[   97.945435] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa0040800
[   98.168304] usb 2-1.4: new high-speed USB device number 4 using ehci-pci
[   98.303689] cdc_acm 2-1.4:1.1: ttyACM0: USB ACM device
[   98.307171] cdc_acm 2-1.4:1.3: ttyACM1: USB ACM device
[   98.315544] cdc_wdm 2-1.4:1.5: cdc-wdm0: USB WDM device
[   98.331090] usb 2-1.4: MAC-Address: 02:80:37:ec:02:00
[   98.331590] cdc_ncm 2-1.4:1.6 wwan0: register 'cdc_ncm' at usb-0000:00:1d.0-1.4, Mobile Broadband Network Device, 02:80:37:ec:02:00
[   98.332666] cdc_wdm 2-1.4:1.8: cdc-wdm1: USB WDM device
[   98.333153] cdc_acm 2-1.4:1.9: ttyACM2: USB ACM device
[   98.500834] e1000e 0000:00:19.0: irq 53 for MSI/MSI-X
[   98.603498] e1000e 0000:00:19.0: irq 53 for MSI/MSI-X
[   98.604001] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   99.289242] IPv6: ADDRCONF(NETDEV_UP): wwan0: link is not ready
[   99.291621] cdc_ncm: wwan0: network connection: disconnected
[  104.358920] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[  104.359024] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[  134.407906] nouveau E[    1268] failed to idle channel 0xcccc0000

Notice the "failed to idle channel"
Restarting X after this does not work - nouveau complains about some not existing pages - I do not have the dmesg for this now, I try to get it later.

Coming from hibernate I get:

[  365.858983] nouveau E[   PDISP][0000:01:00.0][0xc000857b][ffff88022c468e00] timeout1: 0x00000000
[  365.858985] nouveau E[   PDISP][0000:01:00.0][0xc000857b][ffff88022c468e00] init failed, -16
[  365.858993] nouveau E[     DRM] 0xdddddddd:0xd1500000 init failed with -16
[  365.859383] nouveau E[     DRM] 0xffffffff:0xdddddddd init failed with -16
[  365.859682] nouveau E[     DRM] 0xffffffff:0xffffffff init failed with -16
[  365.859695] nouveau  [   VBIOS][0000:01:00.0] running init tables
[  365.865846] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[  367.937496] nouveau  [     DRM] resuming display...

Should I go and try with the current nouveau from git?
Comment 27 Petr Stastny 2013-02-26 16:14:13 UTC
Created attachment 75581 [details]
Complete dmesg from suspend/resume on 3.8.0
Comment 28 Petr Stastny 2013-02-27 09:58:11 UTC
Just found out that suspend and resume works on vanilla 3.7.9 and 3.8 when nouveau.noaccel=1 is set.

So it seems the problem with the resume is directly related to the acceleration..

When hibernating on 3.7.9 the computer reboots automatically, hibernate on 3.8 works, but resuming from hibernation results in a black screen..
Comment 29 Rolf Offermanns 2013-03-12 07:54:42 UTC
My results from 3.9.0rc1:
Suspend
-------
Mar 11 18:07:46 rof-lap kernel: [  222.304276] nouveau  [     DRM] suspending fbcon...
Mar 11 18:07:46 rof-lap kernel: [  222.304298] nouveau  [     DRM] suspending display...
Mar 11 18:07:46 rof-lap kernel: [  222.304310] nouveau  [     DRM] unpinning framebuffer(s)...
Mar 11 18:07:46 rof-lap kernel: [  222.304409] nouveau  [     DRM] evicting buffers...
Mar 11 18:07:46 rof-lap kernel: [  222.517459] sd 0:0:0:0: [sda] Stopping disk
Mar 11 18:07:46 rof-lap kernel: [  222.652940] nouveau  [     DRM] suspending client object trees...
Mar 11 18:07:46 rof-lap kernel: [  222.653078] nouveau W[   PFIFO][0000:01:00.0] INTR 0x00000001: 0x00000004
Mar 11 18:07:46 rof-lap kernel: [  222.653254] nouveau W[   PFIFO][0000:01:00.0] INTR 0x00000001: 0x00000004
[...]

Resume
------
Mar 11 18:07:46 rof-lap kernel: [  224.980347] nouveau  [     DRM] re-enabling device...
Mar 11 18:07:46 rof-lap kernel: [  224.980367] nouveau  [     DRM] resuming client object trees...
Mar 11 18:07:46 rof-lap kernel: [  224.980374] nouveau  [   VBIOS][0000:01:00.0] running init tables
Mar 11 18:07:46 rof-lap kernel: [  225.167093] nouveau  [  PTHERM][0000:01:00.0] programmed thresholds [ 90(3), 95(3), 105(5), 135(5) ]
Mar 11 18:07:46 rof-lap kernel: [  225.168112] nouveau  [     DRM] resuming display...
[...]
Mar 11 18:07:47 rof-lap kernel: [  227.558681] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000 X[920]]
Mar 11 18:07:47 rof-lap kernel: [  227.558690] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
Mar 11 18:07:47 rof-lap kernel: [  227.559484] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000 X[920]]
Mar 11 18:07:47 rof-lap kernel: [  227.559491] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
Mar 11 18:07:47 rof-lap kernel: [  227.559645] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000 X[920]]
Mar 11 18:07:47 rof-lap kernel: [  227.559651] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
Mar 11 18:07:47 rof-lap kernel: [  227.559795] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000 X[920]]
Mar 11 18:07:47 rof-lap kernel: [  227.559801] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
Mar 11 18:07:47 rof-lap kernel: [  227.559946] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000 X[920]]
Mar 11 18:07:47 rof-lap kernel: [  227.559952] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
Mar 11 18:07:47 rof-lap kernel: [  227.560097] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000 X[920]]
Mar 11 18:07:47 rof-lap kernel: [  227.560102] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
Mar 11 18:07:47 rof-lap kernel: [  227.560247] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000 X[920]]
Mar 11 18:07:47 rof-lap kernel: [  227.560253] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
Mar 11 18:07:47 rof-lap kernel: [  227.560403] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000 X[920]]
Mar 11 18:07:47 rof-lap kernel: [  227.560409] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
Mar 11 18:07:47 rof-lap kernel: [  227.561045] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000 X[920]]
Mar 11 18:07:47 rof-lap kernel: [  227.561052] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
Mar 11 18:07:47 rof-lap kernel: [  227.561201] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x003fe10000 X[920]]
:
[...]


Restarting Xorg resolves the problem until next suspend.

01:00.0 VGA compatible controller: NVIDIA Corporation GF108 [GeForce GT 540M] (rev a1)
Comment 30 Rolf Offermanns 2013-03-12 11:49:13 UTC
Created attachment 76393 [details]
kernel log with nouveau.debug=trace

Boot -> suspend -> resume -> restart Xorg
Comment 31 Bogdan Rădulescu 2013-03-20 09:57:24 UTC
Also happens here with Linux 3.8.3. It used to happen with all the 3.7 versions I tested.

The card as reported by lspci is:
01:00.0 VGA compatible controller: nVidia Corporation GF108 [GeForce GT 540M] (rev a1)
        Subsystem: Sony Corporation Device 9089
        Kernel driver in use: nouveau

My messages in dmesg are:
[  487.986709] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 3 [0x003fb8c000]
[  487.986718] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa2040a04

Hope this nouveau bug will get fixed soon as it's really annoying.
Comment 32 rcoe 2013-03-29 14:30:47 UTC
I have this problem also.  After suspend/resume firefox, gvim, and other graphic applications do not display.  The window only displays the border and no graphics
inside the window.

I tried the patch without success.
The workaround 'nouveau.noaccel=1' makes suspend/resume work.  Also, some
screen corruption in a graphic application is not occurring.

lspci 
01:00.0 VGA compatible controller: NVIDIA Corporation Device 0dfc (rev a1)
01:00.0 0300: 10de:0dfc (rev a1)

[  237.125651] nouveau  [  DEVICE][0000:01:00.0] BOOT0  : 0x0c1e00a1
[  237.125659] nouveau  [  DEVICE][0000:01:00.0] Chipset: GF108 (NVC1)
[  237.125664] nouveau  [  DEVICE][0000:01:00.0] Family : NVC0

libdrm_nouveau2-2.4.42-1.1.1.x86_64
xorg-x11-driver-video-nouveau-1.0.6-2.1.1.x86_64
kernel 3.7.10
Comment 33 michael.weirauch 2013-04-03 07:02:23 UTC
Still present on nouveau 3.9.0-rc4 (master@git) as of 2013-04-03.

Name                          | Type    | Version                         
------------------------------+---------+---------------------------------
Mesa                          | package | 9.1.1-247.1                     
kernel                        | package | 3.9.0_rc4_1_desktop_nouveau01+-3
libdrm_nouveau2               | package | 2.4.43-118.1                    
xorg-x11-driver-video-nouveau | package | 1.0.7@git-2013-04-03
xorg-x11-server               | package | 7.6_1.14.0-243.8         

@Marcin Slusarz, if you are reading this:

Do you think it's worthwhile trying your script limiter patch from bug 23223 comment 18 mentioned by Ronald in conjunction with "nouveau.config=DEVINIT=NvForcePost=1"?
Comment 34 michael.weirauch 2013-04-04 12:12:39 UTC
Tested the bios script limiting patch a bit:

Plain excerpt:
[    2.601923] nouveau  [   VBIOS][0000:01:00.0] checking PRAMIN for image...
[    2.684360] nouveau  [   VBIOS][0000:01:00.0] ... appears to be valid
[    2.684362] nouveau  [   VBIOS][0000:01:00.0] using image from PRAMIN
[    2.684476] nouveau  [   VBIOS][0000:01:00.0] BIT signature found
[    2.684478] nouveau  [   VBIOS][0000:01:00.0] version 70.06.33.00.04
[    2.684480] nouveau D[   VBIOS][0000:01:00.0] created
[    2.684665] nouveau D[   VBIOS][0000:01:00.0] reset
[    2.684666] nouveau D[   VBIOS][0000:01:00.0] initialised
[    2.684671] nouveau  [   VBIOS][0000:01:00.0] executing script 0, offset: 54347
[    2.704681] nouveau  [   VBIOS][0000:01:00.0] executing script 1, offset: 56127
[    2.704701] nouveau  [   VBIOS][0000:01:00.0] executing script 2, offset: 61021
[    2.704702] nouveau  [   VBIOS][0000:01:00.0] executing script 3, offset: 61031
[    2.704705] nouveau  [   VBIOS][0000:01:00.0] executing script 4, offset: 61500
[    2.704706] nouveau  [   VBIOS][0000:01:00.0] executing special script, offset: 61601


Booting with nouveau.config=DEVINIG=NvForcePost=1 and nouveau.minscript=0 and maxscript alternatively with [999,4,3,2,1,0] I always get black screen where the plymouth dm_crypt unlock passphrase splash should appear after loading initrd.
System seems "fine", though. (Can't access the box via ssh at that stage. No network set up.)
Comment 35 michael.weirauch 2013-05-31 13:58:04 UTC
Might be there is good news ahead. At least for me.

Since the rebase on 3.10-rc2 some days ago (2013-05-24) I can suspend and resume fine from within a gnome-session and gdm-login. No graphics distortion whatsoever.

Been downgrading (a while before though) to stable X11/Mesa repo and not the bleeding edge git versions. This shouldn't be the issue, because I was actually switching to the git-variants in order to see their effect on the "bug" here.

Name                          | Typ   | Version                    
------------------------------+-------+----------------------------
Mesa                          | Paket | 9.1.3-240.1                
kernel                        | Paket | 3.10.0_rc2_2.24_desktop+-22
libdrm_nouveau2               | Paket | 2.4.45-110.1               
xorg-x11-driver-video-nouveau | Paket | 1.0.7-60.3                 
xorg-x11-server               | Paket | 7.6_1.14.1-234.3           


It's even frightening I can safely take the laptop (Thinkpad W520) out of the dock (would usually freeze with high system load before) and the display (LVDS-1) turns on automagically and on putting it back in, it switches back to the DVI-attached LCD and the Thinkpad display goes out.

Haven't tested behaviour on taking out the laptop with closed lid and opening afterwards, though.

Can somebody confirm their issues gone, too?
Comment 36 Thomas H.P. Andersen 2013-05-31 21:51:39 UTC
Confirming that suspend works on Quadro 1000M.

This is on current rawhide (kernel 3.10.0-0.rc3)
Comment 37 michael.weirauch 2013-06-04 09:59:51 UTC
Still works with recent 3.10-rc4 merge on nouveau-master.

Just to answer myself, taking the ThinkPad out the dock with closed lid and opening afterwards also works as expected.

Many issues seem to be gone now apart from this resume-blocker here.
Comment 38 michael.weirauch 2013-08-19 06:45:05 UTC
I'd consider this bug resolved/fixed since some time.
Comment 39 Laurence Lee 2013-09-03 01:05:02 UTC
I hate to resurrect a zombie thread, but I am still affected by this suspend/resume issue. My system also reports "SHADER 0xa004021e" as a common point of failure.

On Fedora 18, kernel 3.10.10-100.fc18.x86_64), my screen resumes from hibernation with garbage on-screen as described in this ticket, and messages like these are repeated in the logs:

Sep  2 14:23:38 ufo-laptop kernel: [   66.074252] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 1 [0x005fbb1000 X[846]]
Sep  2 14:23:38 ufo-laptop kernel: [   66.074264] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e


lspci -nnv:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1251] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: CLEVO/KAPOK Computer Device [1558:5102]
	Flags: bus master, fast devsel, latency 0, IRQ 16
	Memory at f4000000 (32-bit, non-prefetchable) [size=32M]
	Memory at e8000000 (64-bit, prefetchable) [size=128M]
	Memory at f0000000 (64-bit, prefetchable) [size=64M]
	I/O ports at e000 [size=128]
	Expansion ROM at f6000000 [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [b4] Vendor Specific Information: Len=14 <?>
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Kernel driver in use: nouveau


The only thing that works at the moment is setting nouveau.noaccel=1. Setting nouveau.config=DEVINIG=NvForcePost=1 causes bootup to Plymouth with a black, unlit screen.

I would really like to have a long-term solution to this, and am willing to act upon any suggestions or patches you may have to reach that goal. Thanks!
Comment 40 Ilia Mirkin 2013-09-03 01:19:54 UTC
Try and see if 3.11 helps, a bunch of init stuff was changed for nvcx. If not, please attach the relevant logs and reopen the bug. However if it's sufficiently different from the original, it may be less confusing to just open a fresh one.
Comment 41 Doug Brunner 2014-01-09 09:55:49 UTC
Unfortunately still present in 3.11. On Ubuntu Saucy with 3.11.0-15-generic on x86_64, suspend/resume results in the same corrupted screen. I see vague shapes of windows that can be moved with alt-drag, but most visual elements including text are absent. Console and syslog have messages:

Jan  9 00:28:24 codex kernel: [   80.883153] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
Jan  9 00:28:24 codex kernel: [   80.883195] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 2 [0x005fba00
00 Xorg[4341]]

repeating ad infinitum until I log into a console and restart lightdm (and thus X server). The machine then operates normally (can use X as expected) until the next reboot. Setting noaccel=1 works around the problem, but obviously with reduced graphics performance.

I also tried kernel 3.12 from the Ubuntu kernel PPA - same behavior.

From lspci -nnv:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF116M [GeForce GT 560M] [10de:1251] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: CLEVO/KAPOK Computer Device [1558:7100]
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at f4000000 (32-bit, non-prefetchable) [size=32M]
        Memory at e8000000 (64-bit, prefetchable) [size=128M]
        Memory at f0000000 (64-bit, prefetchable) [size=64M]
        I/O ports at e000 [size=128]
        Expansion ROM at f6000000 [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [b4] Vendor Specific Information: Len=14 <?>
        Capabilities: [100] Virtual Channel
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Kernel driver in use: nouveau

From grep nouveau /var/log/syslog:
Jan  9 00:27:19 codex kernel: [   18.225269] fb: conflicting fb hw usage nouveaufb vs simple - removing generic driver
Jan  9 00:27:19 codex kernel: [   18.227186] nouveau  [  DEVICE][0000:01:00.0] BOOT0  : 0x0cf880a1
Jan  9 00:27:19 codex kernel: [   18.227189] nouveau  [  DEVICE][0000:01:00.0] Chipset: GF116 (NVCF)
Jan  9 00:27:19 codex kernel: [   18.227192] nouveau  [  DEVICE][0000:01:00.0] Family : NVC0
Jan  9 00:27:19 codex kernel: [   18.231456] nouveau  [   VBIOS][0000:01:00.0] checking PRAMIN for image...
Jan  9 00:27:20 codex kernel: [   18.334067] nouveau  [   VBIOS][0000:01:00.0] ... appears to be valid
Jan  9 00:27:20 codex kernel: [   18.334071] nouveau  [   VBIOS][0000:01:00.0] using image from PRAMIN
Jan  9 00:27:20 codex kernel: [   18.334222] nouveau  [   VBIOS][0000:01:00.0] BIT signature found
Jan  9 00:27:20 codex kernel: [   18.334227] nouveau  [   VBIOS][0000:01:00.0] version 70.26.29.00.06
Jan  9 00:27:20 codex kernel: [   18.358154] nouveau  [     MXM][0000:01:00.0] BIOS version 3.0
Jan  9 00:27:20 codex kernel: [   18.361036] nouveau  [     MXM][0000:01:00.0] MXMS Version 3.0
Jan  9 00:27:20 codex kernel: [   18.361084] nouveau  [     PFB][0000:01:00.0] RAM type: GDDR5
Jan  9 00:27:20 codex kernel: [   18.361087] nouveau  [     PFB][0000:01:00.0] RAM size: 1536 MiB
Jan  9 00:27:20 codex kernel: [   18.361089] nouveau  [     PFB][0000:01:00.0]    ZCOMP: 0 tags
Jan  9 00:27:20 codex kernel: [   18.402251] nouveau  [  PTHERM][0000:01:00.0] FAN control: none / external
Jan  9 00:27:20 codex kernel: [   18.402262] nouveau  [  PTHERM][0000:01:00.0] fan management: disabled
Jan  9 00:27:20 codex kernel: [   18.402268] nouveau  [  PTHERM][0000:01:00.0] internal sensor: yes
Jan  9 00:27:20 codex kernel: [   18.438029] nouveau  [     DRM] VRAM: 1536 MiB
Jan  9 00:27:20 codex kernel: [   18.438030] nouveau  [     DRM] GART: 1048576 MiB
Jan  9 00:27:20 codex kernel: [   18.438034] nouveau  [     DRM] TMDS table version 2.0
Jan  9 00:27:20 codex kernel: [   18.438036] nouveau  [     DRM] DCB version 4.0
Jan  9 00:27:20 codex kernel: [   18.438038] nouveau  [     DRM] DCB outp 00: 01000313 00010034
Jan  9 00:27:20 codex kernel: [   18.438040] nouveau  [     DRM] DCB outp 07: 08013382 00020030
Jan  9 00:27:20 codex kernel: [   18.438041] nouveau  [     DRM] DCB outp 08: 040383b6 0f220014
Jan  9 00:27:20 codex kernel: [   18.438043] nouveau  [     DRM] DCB outp 11: 02027362 00020010
Jan  9 00:27:20 codex kernel: [   18.438044] nouveau  [     DRM] DCB outp 13: 02013380 00000000
Jan  9 00:27:20 codex kernel: [   18.438046] nouveau  [     DRM] DCB conn 00: 00000040
Jan  9 00:27:20 codex kernel: [   18.438047] nouveau  [     DRM] DCB conn 01: 00001161
Jan  9 00:27:20 codex kernel: [   18.438049] nouveau  [     DRM] DCB conn 02: 00001231
Jan  9 00:27:20 codex kernel: [   18.438050] nouveau  [     DRM] DCB conn 03: 01000330
Jan  9 00:27:20 codex kernel: [   18.438052] nouveau  [     DRM] DCB conn 04: 01000446
Jan  9 00:27:20 codex kernel: [   18.438053] nouveau  [     DRM] DCB conn 05: 02000546
Jan  9 00:27:20 codex kernel: [   18.438054] nouveau  [     DRM] DCB conn 06: 00010661
Jan  9 00:27:20 codex kernel: [   18.438055] nouveau  [     DRM] DCB conn 07: 00010761
Jan  9 00:27:20 codex kernel: [   18.438057] nouveau  [     DRM] DCB conn 08: 00020847
Jan  9 00:27:20 codex kernel: [   18.438059] nouveau  [     DRM] DCB conn 09: 00000900
Jan  9 00:27:20 codex kernel: [   18.439274] nouveau  [     DRM] ACPI backlight interface available, not registering our own
Jan  9 00:27:20 codex kernel: [   18.439495] nouveau  [     DRM] 3 available performance level(s)
Jan  9 00:27:20 codex kernel: [   18.439500] nouveau  [     DRM] 0: core 50MHz shader 101MHz memory 135MHz voltage 820mV
Jan  9 00:27:20 codex kernel: [   18.439504] nouveau  [     DRM] 1: core 202MHz shader 405MHz memory 324MHz voltage 820mV
Jan  9 00:27:20 codex kernel: [   18.439507] nouveau  [     DRM] 3: core 775MHz shader 1550MHz memory 1250MHz voltage 1000mV
Jan  9 00:27:20 codex kernel: [   18.439511] nouveau  [     DRM] c: core 202MHz shader 405MHz memory 324MHz voltage 1000mV
Jan  9 00:27:20 codex kernel: [   18.445776] nouveau  [     DRM] MM: using COPY1 for buffer copies
Jan  9 00:27:20 codex kernel: [   18.705449] nouveau  [     DRM] allocated 1920x1080 fb: 0x60000, bo ffff88060a4e8000
Jan  9 00:27:20 codex kernel: [   18.706813] fbcon: nouveaufb (fb0) is primary device
Jan  9 00:27:21 codex kernel: [   20.089762] nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device
Jan  9 00:27:21 codex kernel: [   20.089768] nouveau 0000:01:00.0: registered panic notifier
Jan  9 00:27:21 codex kernel: [   20.089773] [drm] Initialized nouveau 1.1.1 20120801 for 0000:01:00.0 on minor 0
Jan  9 00:28:24 codex kernel: [   74.287528] nouveau  [     DRM] suspending display...
Jan  9 00:28:24 codex kernel: [   74.287549] nouveau  [     DRM] unpinning framebuffer(s)...
Jan  9 00:28:24 codex kernel: [   74.287678] nouveau  [     DRM] evicting buffers...
Jan  9 00:28:24 codex kernel: [   74.669560] nouveau  [     DRM] waiting for kernel channels to go idle...
Jan  9 00:28:24 codex kernel: [   74.669589] nouveau  [     DRM] suspending client object trees...
Jan  9 00:28:24 codex kernel: [   74.670070] nouveau  [     DRM] suspending kernel object tree...
Jan  9 00:28:24 codex kernel: [   78.372083] nouveau  [     DRM] re-enabling device...
Jan  9 00:28:24 codex kernel: [   78.372094] nouveau  [     DRM] resuming kernel object tree...
Jan  9 00:28:24 codex kernel: [   78.372100] nouveau  [   VBIOS][0000:01:00.0] running init tables
Jan  9 00:28:24 codex kernel: [   78.599853] nouveau  [     DRM] resuming client object trees...
Jan  9 00:28:24 codex kernel: [   78.600083] nouveau  [     DRM] resuming display...
Jan  9 00:28:24 codex kernel: [   80.727990] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 2 [0x005fba0000 Xorg[4341]]
Jan  9 00:28:24 codex kernel: [   80.728043] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
Jan  9 00:28:24 codex kernel: [   80.728120] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 2 [0x005fba0000 Xorg[4341]]
Jan  9 00:28:24 codex kernel: [   80.728168] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
Jan  9 00:28:24 codex kernel: [   80.728251] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 2 [0x005fba0000 Xorg[4341]]
Jan  9 00:28:24 codex kernel: [   80.728297] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
Jan  9 00:28:24 codex kernel: [   80.728434] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 2 [0x005fba0000 Xorg[4341]]
Jan  9 00:28:24 codex kernel: [   80.728476] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
<and so on>
Comment 42 Laurence Lee 2014-03-03 02:08:08 UTC
Created attachment 94996 [details]
Remove NVDEV_ENGINE_COPY1 from GF116 to mirror GF106

Since Michael Weirauch reports this issue has been resolved for his card (0xc3, according to his dmesg uploads), and the issue remains on Doug Brunner and my card (0xcf), and that both cards once used the same code-base and functioned correctly in the Linux 3.4 Kernels, I went on a hunch.

By changing the 0xc3 definition to 0xcf (forcing my card to be recognized as a GF106), I was able to reproduce Michael's suspend/resume success with my own card.

Having discovered that THAT works, it became a mission to figure out what was different between the 0xc3 (GF106, working) and 0xcf (GF116, non-working).

By removing the declaration of NVDEV_ENGINE_COPY1 from the 0xcf case, as this patch does, the suspend/resume issue no longer affects my card.

I have submitted this patch for your review, and sincerely hope it is accepted. Thanks!
Comment 43 Ilia Mirkin 2014-03-03 02:16:07 UTC
(In reply to comment #42)
> Created attachment 94996 [details]
> Remove NVDEV_ENGINE_COPY1 from GF116 to mirror GF106
> 
> Since Michael Weirauch reports this issue has been resolved for his card
> (0xc3, according to his dmesg uploads), and the issue remains on Doug
> Brunner and my card (0xcf), and that both cards once used the same code-base
> and functioned correctly in the Linux 3.4 Kernels, I went on a hunch.
> 
> By changing the 0xc3 definition to 0xcf (forcing my card to be recognized as
> a GF106), I was able to reproduce Michael's suspend/resume success with my
> own card.
> 
> Having discovered that THAT works, it became a mission to figure out what
> was different between the 0xc3 (GF106, working) and 0xcf (GF116,
> non-working).
> 
> By removing the declaration of NVDEV_ENGINE_COPY1 from the 0xcf case, as
> this patch does, the suspend/resume issue no longer affects my card.
> 
> I have submitted this patch for your review, and sincerely hope it is
> accepted. Thanks!

Can you check if the issue still occurs with 3.14-rcX (but without your patch)? Some changes were made to better respect the engine disables in register 22500. (Speaking of which, can you grab envytools and do a "nvapeek 22500"... and also 22580)
Comment 44 Laurence Lee 2014-03-03 03:38:13 UTC
Configured and compiled linux-3.14-rc4.tar.xz as found on kernel.org, and the suspend/resume issue still exists.


"nvapeek 22500" and "nvapeek 22580" just yielded "...", so I'm posting the contents of "nvapeek 22400 400" for a wider view of that area:

00022400: 00000000 00000000 00000000 00000002
00022410: 00000000 00000000 30000000 00000000
00022420: 00000000 00000000 00000800 00000000
00022430: 00000001 00000004 00000003 00000000
...
00022600: 0000001c 00000000 00000000 00000000
...
00022680: 8000001c 00000000 00000000 00000000
...
Comment 45 Ilia Mirkin 2014-03-03 03:48:05 UTC
(In reply to comment #44)
> Configured and compiled linux-3.14-rc4.tar.xz as found on kernel.org, and
> the suspend/resume issue still exists.
> 
> 
> "nvapeek 22500" and "nvapeek 22580" just yielded "...", so I'm posting the

OK, well "..." means "0" -- a little confusing, but oh well. So none of the DISABLE bits are set, which means that 3.14-rcX will not help you. If you want, you can achieve the same effect with nouveau.config=PCE1=0 . Not sure why enabling it causes a resume issue.
Comment 46 Laurence Lee 2014-03-03 05:27:24 UTC
Thanks for the command-line tip, that does indeed have the desired effect on standard-built Fedora kernels, and is much better than setting nouveau.noaccel=1 (which, when activated, is now leading to some font-rendering glitches).

Actually, it does kind of make sense that an invalid copy engine would cause screen garbage to be displayed, and that this would be triggered by the rapid screen repainting that happens on Resume.

I'll leave it up to more knowledgeable minds whether to act upon this patch or not; but it's worth noting that only 4 other devices out of the 9 declared in nvc0.c have an entry for a secondary, "COPY1" engine -- the GF100, GF104, GF110, and GF114.

I'm just thankful to have a usable solution on my rig in the meantime. Thanks!
Comment 47 Doug Brunner 2014-03-04 07:18:00 UTC
Setting config=PCE1=0 in my modprobe .conf file also fixes the issue for me; I was able to remove noaccel=1 and still suspend and resume without problems. I updated to Ubuntu's backported kernel 3.12.2 since my last post (to fix an unrelated Ethernet issue). I haven't tried that kernel with no nouveau options, can do so if it would be helpful; I suspect not though, since Laurence Lee found the issue still existed in 3.14.

(In reply to comment #46)
> Thanks for the command-line tip, that does indeed have the desired effect on
> standard-built Fedora kernels, and is much better than setting
> nouveau.noaccel=1 (which, when activated, is now leading to some
> font-rendering glitches).
> 
> Actually, it does kind of make sense that an invalid copy engine would cause
> screen garbage to be displayed, and that this would be triggered by the
> rapid screen repainting that happens on Resume.
> 
> I'll leave it up to more knowledgeable minds whether to act upon this patch
> or not; but it's worth noting that only 4 other devices out of the 9
> declared in nvc0.c have an entry for a secondary, "COPY1" engine -- the
> GF100, GF104, GF110, and GF114.
> 
> I'm just thankful to have a usable solution on my rig in the meantime.
> Thanks!
Comment 48 Anssi Hannula 2014-04-04 11:28:49 UTC
I am also still experiencing the same issue on NVCF (nouveau for-next from earlier today), but it is _NOT_ limited to resume at least in my case. I can also confirm that nouveau.config=PCE1=0 seems to workaround/fix the issue.

The corruption does not happen immediately, but after using the system for a long time (days) and/or after running some graphics-intensive games corruptions start to slowly appear (old random data in various windows).
Comment 49 cdep.illabout+freedesktop 2014-04-11 13:40:07 UTC
I was also experiencing the problem of corruption after a suspend/resume.  Adding "nouveau.config=PCE1=0" has seemed to fix it.

I am on Arch Linux.  Here is my lspci -v:

01:00.0 VGA compatible controller: NVIDIA Corporation GF116M [GeForce GT 560M] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: CLEVO/KAPOK Computer Device 5102
	Flags: bus master, fast devsel, latency 0, IRQ 54
	Memory at f4000000 (32-bit, non-prefetchable) [size=32M]
	Memory at e8000000 (64-bit, prefetchable) [size=128M]
	Memory at f0000000 (64-bit, prefetchable) [size=64M]
	I/O ports at e000 [size=128]
	Expansion ROM at f6000000 [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [b4] Vendor Specific Information: Len=14 <?>
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Kernel driver in use: nouveau
	Kernel modules: nouveau

I got the same errors in my dmesg as everyone else.  These two lines basically repeat forever:

[   74.912769] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 2 [0x005fba0000 X[2244]]
[   74.912788] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e

Here are my Arch Linux package versions:

extra/nouveau-dri 10.1.0-4
local/nouveau-fw 325.15-1
extra/xf86-video-nouveau 1.0.10-2

extra/libdrm 2.4.52-1

core/linux 3.14-4 (base)
core/linux-api-headers 3.13.2-1
core/linux-firmware 20140316.dec41bc-1
core/linux-headers 3.14-4

extra/mesa 10.1.0-4
extra/mesa-demos 8.1.0-1
extra/mesa-libgl 10.1.0-4

Xorg.0.log doesn't have anything particularly interesting in it.
Comment 50 Christian Costa 2014-08-17 14:30:04 UTC
Same problem here after suspend/resume. The console is flooded with the same messages below for a short period then a white screen appears with a slight noise. CTL-ALT-F1 does not work so I have to reboot.

[  392.485883] nouveau E[  PGRAPH][0000:01:00.0] SHADER 0xa004021e
[  392.489922] nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 2 [0x00bfa80000 Xorg[1154]]

I created a file in /etc/modprobe.d with the line "nouveau.config=PCE1=0" but that does not help. Is this correct? Or do I need a more recent kernel?

I use an Ubuntu 14.04 64-bits. Here is my system information:

lspci:
01:00.0 VGA compatible controller: NVIDIA Corporation GF116M [GeForce GT 560M] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: ASUSTeK Computer Inc. Device 204a
	Flags: bus master, fast devsel, latency 0, IRQ 47
	Memory at f2000000 (32-bit, non-prefetchable) [size=32M]
	Memory at e0000000 (64-bit, prefetchable) [size=128M]
	Memory at e8000000 (64-bit, prefetchable) [size=64M]
	I/O ports at d000 [size=128]
	Expansion ROM at f4000000 [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [b4] Vendor Specific Information: Len=14 <?>
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Kernel driver in use: nouveau

kernel: 3.13.0-34-generic
libdrm: 2.4.52-1
mesa: 10.1.3-0ubuntu0.1 (same with 10.2.5 compiled from git)
xorg: 1.15.1-0ubuntu2.1
xorg-video-nouveau: 1.0.10-1ubuntu2
Comment 51 Christian Costa 2014-08-17 18:37:31 UTC
Well, while updating initramfs, the command complained about bad syntax of "nouveau.config=PCE1=0" in conf files. I just realized that it was for kernel options so I used "options nouveau config=PCE1=0" in my conf file instead.

Indeed the nouveau messages flood and the white screen disappear. The login screen is showed and I can move the mouse cursor but I cannot interact at all and CTL-ATL-F1 does not work. The mouse cursor disappear after a while. I cannot do anything apart from rebooting.

I also tried kernel 3.16.1 with same result.
Comment 52 Ilia Mirkin 2014-12-08 01:27:59 UTC
This bug covered a lot of different issues over its lifetime. The last one of them is a NVCF issue where the second copy engine does not appear to be there. We've disabled nouveau attempting to use it on any NVCF's and the patch is in 3.18 (and being backported to stable trees).

If you feel like you still have an issue related to this bug, open a new one, do not under any circumstances reopen this one, as it has been too polluted by unrelated issues and comments.