Bug 67878

Summary: [NV98] [BISECTED] Hardware freeze after resume from suspend
Product: xorg Reporter: Ben Gamari <bgamari>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED MOVED QA Contact: Xorg Project Team <xorg-team>
Severity: critical    
Priority: medium CC: bugzilla, computersforpeace, hans, patrik.lundquist, pontus.fuchs
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=67597
https://bugs.freedesktop.org/show_bug.cgi?id=62835
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg output from boot to attempted resume
none
End of log from resume with debug=trace
none
Bisection log between v3.7 and v3.8
none
dmesg output from successful suspend/resume with 4f6029da^
none
dmesg output from a few failed suspend/resume attempts with 4f6029da none

Description Ben Gamari 2013-08-07 21:48:12 UTC
Created attachment 83798 [details]
dmesg output from boot to attempted resume

On a Dell Latitude E6400 with nVidia G98M and 3.10 kernel I find that the machine will reproducibly resume from suspend with a non-responsive X session. Given attempts I can eventually get to a somewhat functional VT. dmesg output attached.
Comment 1 Ben Gamari 2013-08-07 21:49:13 UTC
lspci gives the following device details,

01:00.0 VGA compatible controller: NVIDIA Corporation G98M [Quadro NVS 160M] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Dell Device 0233
	Flags: bus master, fast devsel, latency 0, IRQ 16
	Memory at f5000000 (32-bit, non-prefetchable) [size=16M]
	Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Memory at f2000000 (64-bit, non-prefetchable) [size=32M]
	I/O ports at df00 [size=128]
	[virtual] Expansion ROM at f4000000 [disabled] [size=128K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Kernel driver in use: nouveau
Comment 2 Ben Gamari 2013-08-07 21:50:00 UTC
It seems that Bug #62835 (https://bugs.freedesktop.org/show_bug.cgi?id=62835) might be related.
Comment 3 Ben Gamari 2013-08-07 21:58:19 UTC
This thread might also be related http://www.spinics.net/lists/dri-devel/msg32782.html.
Comment 4 Ben Gamari 2013-08-07 22:23:06 UTC
Created attachment 83799 [details]
End of log from resume with debug=trace
Comment 5 Ben Gamari 2013-08-09 01:59:54 UTC
It seems 3D/compositing might be to blame here; suspending while running metacity seems to resume correctly.
Comment 6 Ben Gamari 2013-08-09 02:09:50 UTC
The same issue can be reproduced with 3.8.

Interestingly, 3.5 seems to work correctly, even with compiz.
Comment 7 Ben Gamari 2013-08-09 02:24:17 UTC
From discussions with xexaxo in #nouveau, it seems that this might be a similar regression to what happened in 3.8 (described in bug #59057). My plan is as follows,

 1) Verify that 3.8.11 works
 1.1) If not, verify that 3.8.1 works and bisect to find the broken release
 1.2) If so, check whether 3.9 works
 2) Bisect backwards from the first broken release to 3.8.11 (or whichever release was tested to work)
Comment 8 Ben Gamari 2013-08-09 14:58:28 UTC
The 3.8.11 kernel fails in the same way that 3.10 does, trying 3.8.1 next.
Comment 9 Ben Gamari 2013-08-09 16:13:55 UTC
Rereading bugs #59057 and #62835, it's not entirely clear whether the bug was actually ever fixed; it may be that the reporter simply worked around it. Comment #33 of Bug #59057 (https://bugs.freedesktop.org/show_bug.cgi?id=59057#c33) actually refers to commit e5a58edc94a20a7ef4b7db67c166c4ca0588bad0 (46c13c131d3b73080aa0f50f45e834a9ab3c0e71 in Linus's tree) as working. Going to test this and surrounding commits to verify this.
Comment 10 Ben Gamari 2013-08-09 17:27:50 UTC
Tested 46c13c131d3b73080aa0f50f45e834a9ab3c0e71. Things appear to fail in a similar way to the resume failure upon starting compiz.
Comment 11 Ben Gamari 2013-08-09 17:36:53 UTC
One factor that I've neglected to mention thusfar is that I've been using my own mesa build in the above tests,

    $ glxinfo
    ...
    OpenGL version string: 2.1 Mesa 9.2.0-devel (git-5a7bdd4)

After reverting to Ubuntu's packaged mesa,

    $ glxinfo
    ...
    OpenGL version string: 2.1 Mesa 9.1.4

Resume appears to work as expected.
Comment 12 Ben Gamari 2013-08-09 17:39:45 UTC
The mesa tests mentioned above were conducted on a 3.10 kernel. In the mesa 9.1 case there is nothing interesting spit out by nouveau to dmesg. Only a few status messages,

    [  312.871019] nouveau  [     DRM] suspending fbcon...
    [  312.871023] nouveau  [     DRM] suspending display...
    [  312.871049] nouveau  [     DRM] unpinning framebuffer(s)...
    [  312.871108] nouveau  [     DRM] evicting buffers...
    [  313.133858] nouveau  [     DRM] waiting for kernel channels to go idle...
    [  313.133883] nouveau  [     DRM] suspending client object trees...
    [  313.134682] nouveau  [     DRM] suspending kernel object tree...
    ...
    [  317.178155] nouveau  [     DRM] re-enabling device...
    [  317.178170] nouveau  [     DRM] resuming kernel object tree...
    [  317.178176] nouveau  [   VBIOS][0000:01:00.0] running init tables
    [  317.287651] serial 00:08: activated
    [  317.357172] nouveau  [     DRM] resuming client object trees...
    [  317.357691] nouveau  [     DRM] resuming display...
Comment 13 Ben Gamari 2013-08-09 17:45:35 UTC
I can confirm that mesa 9.1.4 on a 3.10 kernel can successfully resume, even while running glxgears on compiz. The messages mentioned in Comment 12 are the only things produced by nouveau in dmesg.
Comment 14 Ben Gamari 2013-08-09 18:03:09 UTC
Confirmed that 72916698b056d0559263e84372bb45cd83a1c2c2 is bad. Unfortunately this is a merge base. Here is the bisection log,

    git bisect start
    # bad: [5a7bdd4b4173958c53109517b7c95f1039623e7e] docs: Add items for GL4.4
    git bisect bad 5a7bdd4b4173958c53109517b7c95f1039623e7e
    # good: [e64febb4b71475b35765f0dc168df22655444a7f] docs: 9.1.4 release notes
    git bisect good e64febb4b71475b35765f0dc168df22655444a7f
    # bad: [72916698b056d0559263e84372bb45cd83a1c2c2] r600g: fix segfault with old kernel
    git bisect bad 72916698b056d0559263e84372bb45cd83a1c2c2
Comment 15 Ben Gamari 2013-08-10 16:01:54 UTC
Unfortunately I'm now having trouble reproducing the working conditions with mesa 9.1.4.

Returning to mapping out kernel version. It appears that 3.6 works correctly.
Comment 16 Ben Gamari 2013-08-10 21:17:45 UTC
It seems that a 3.7 kernel will resume correctly as well.
Comment 17 Ben Gamari 2013-08-10 22:36:24 UTC
Confirmed that a clean 3.8 build exhibits the issue.

Starting a bisection between 3.7 and 3.8.
Comment 18 Ben Gamari 2013-08-11 07:01:05 UTC
Linux commit 992956189de58cae9f2be40585bc25105cd7c5ad is bad.
Comment 19 Ilia Mirkin 2013-08-11 07:08:29 UTC
(In reply to comment #18)
> Linux commit 992956189de58cae9f2be40585bc25105cd7c5ad is bad.

That seems thoroughly unlikely.

commit 992956189de58cae9f2be40585bc25105cd7c5ad
Author: Eric W. Biederman <ebiederm@xmission.com>
Date:   Mon Dec 17 17:19:36 2012 -0800

    efi: Fix the build with user namespaces enabled.

This doesn't apply to your situation on many levels... this is a build fix... to efi vars...

I bet if you checkout to 992956189de58cae9f2be40585bc25105cd7c5ad^ then you will still have a bad kernel. You should probably redo the bisect, but look at the bisect log (git bisect log) and keep all your "bad" commits, since they are likely indeed bad. But you may have been a bit too eager in calling out the "good" kernels. (See git help bisect for how to start with a bunch of bad commits.) While you're at it, you may want to re-test whether 3.7 really is good.
Comment 20 Ben Gamari 2013-08-11 08:15:53 UTC
@Ilia, I should have been more specific. These next few comments are largely just notes recording the state of my bisection. By "bad" I mean that I have tested the commit and it exhibits the issue, not that it is the first bad commit. Currently I have around 10 more bisection steps to go before the culprit is hopefully identified.
Comment 21 Ben Gamari 2013-08-11 08:16:26 UTC
Linux commit 2b8318881ddbcb67c5e8d2178b42284749442222 appears to work.
Comment 22 Ben Gamari 2013-08-11 08:21:26 UTC
Linux kernel 3c2e81ef344a90bb0a39d84af6878b4aeff568a2 exhibits the issue.
Comment 23 Ben Gamari 2013-08-11 14:33:32 UTC
640631d04cd2cfbb4792d6a8fc5fcab14ee273a5 is bad.
9fabd4eedeb904173d05cb1ced3c3e6b9d2e8137 is good.
Comment 24 Ben Gamari 2013-08-11 15:00:05 UTC
74b6685089591fa275929109f7b839bf386890a0 is good.
bd3b49f25a3eae2d91432247b7565489120b6bcf is bad.
Comment 25 Ben Gamari 2013-08-11 15:08:50 UTC
2d8b9ccbcee694c9ce681ec596df642e52ddcb15 is bad.
b6e4ad200a726a32c7083f491383713bc8680f86 is good.
Comment 26 Ben Gamari 2013-08-11 15:18:11 UTC
47057302f075578618ea36fc3c4c97a5a6f97f00 is good.
4f6029da58ba9204c98e33f4f3737fe085c87a6f is bad.
Comment 27 Ben Gamari 2013-08-11 15:24:11 UTC
647bf61d0399515c526c125450cadaade79b1988 is good.
f9887d091149406de5c8b388f7e0bb6932dd621b is good.
Comment 28 Ben Gamari 2013-08-11 15:24:40 UTC
Created attachment 83940 [details]
Bisection log between v3.7 and v3.8
Comment 29 Ben Gamari 2013-08-11 15:25:09 UTC
According to the bisection,

4f6029da58ba9204c98e33f4f3737fe085c87a6f is the first bad commit
commit 4f6029da58ba9204c98e33f4f3737fe085c87a6f
Author: Ben Skeggs <bskeggs@redhat.com>
Date:   Fri Nov 16 11:54:31 2012 +1000

    drm/nv50-nvc0: switch to common disp impl, removing previous version
    
    Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

:040000 040000 9daeb0bd5ed3e9b22b53c21fab853bd2e392f6ed 4bdbb1d96e57d3f254affb8812788f04b7474bf7 M	drivers
Comment 30 Ben Gamari 2013-08-11 15:47:23 UTC
Created attachment 83942 [details]
dmesg output from successful suspend/resume with 4f6029da^
Comment 31 Ben Gamari 2013-08-11 15:51:55 UTC
Created attachment 83943 [details]
dmesg output from a few failed suspend/resume attempts with 4f6029da

In this case I logged in and suspended the machine. Upon resuming, the X session was frozen with the cursor being updated occassionally. After several attempts I was able to get to a VT. Shortly thereafter the X server died, causing lightdm to respawn a greeter which functioned correctly, presumably because it doesn't require acceleration. I could log in again, also the X session would freeze before getting to a functional desktop (presumably upon compiz starting). Again, with a few tries I could get back to a VT, at which point lightdm would start a greeter. With every freeze a "failed to idle channel 0xcccc0000" message would be dumped to dmesg.
Comment 32 Ilia Mirkin 2013-08-30 23:07:48 UTC
One idea I just randomly had was that there might be a difference in the teardown process. For example, in the removed code, nv50_display_fini did stuff. In the current code, it's basically empty (well, some small bits in nouveau_display_fini).

It looks like the old code

(a) blanked each crtc
(b) sent out a EVO_UPDATE command
(c) waited for each crtc to hit a vblank
(d) did something with the cursor (cleared it?)
(e) waited for some sort of DPMS thing

It could well be that this now happens elsewhere, but I just wanted to put that thought down on "paper".
Comment 33 Pontus Fuchs 2013-12-08 17:08:31 UTC
Bug still present on 3.13-rc3
Comment 34 Pontus Fuchs 2013-12-08 20:15:13 UTC
(In reply to comment #32)
> One idea I just randomly had was that there might be a difference in the
> teardown process. For example, in the removed code, nv50_display_fini did
> stuff. In the current code, it's basically empty (well, some small bits in
> nouveau_display_fini).
> 
> It looks like the old code
> 
> (a) blanked each crtc
> (b) sent out a EVO_UPDATE command
> (c) waited for each crtc to hit a vblank
> (d) did something with the cursor (cleared it?)
> (e) waited for some sort of DPMS thing
> 
> It could well be that this now happens elsewhere, but I just wanted to put
> that thought down on "paper".

I tried this idea by doing the following:

1) Checked out 4f6029da58ba9204c98e33f4f3737fe085c87a6f^1 (= f9887d091149406de5c8b388f7e0bb6932dd621b)
2) Deleted everything in nv50_display_fini

With that change suspend/resume works so I guess the problem is elsewhere.
Comment 35 Pontus Fuchs 2013-12-19 20:18:26 UTC
Some new observations made while investigating this issue:

* Without X started suspend/resume works fine
* With NoAccel set in xorg conf file suspend/resume works fine
* If X is stopped before suspending the resume works ok but if I try to start X a second time after resume, X hangs.
Comment 36 Brian 2014-01-30 08:28:54 UTC
Re-tested on drm-next, at:

commit ef64cf9d06049e4e9df661f3be60b217e476bee1
Merge: 279b9e0cc300 f3980dc50c51
Author: Dave Airlie <airlied@redhat.com>
Date:   Thu Jan 30 10:46:06 2014 +1000

    Merge branch 'drm-nouveau-next' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-next

Still reproducible:

  nouveau E[Xorg[1140]] failed to idle channel 0xcccc0000 [Xorg[1140]]
  nouveau E[     PFB][0000:01:00.0] trapped read at 0x002001e020 on channel 0x0001fb14 [unknown] SEMAPHORE_BG/PFIFO_READ/00 reason: PAGE_NOT_PRESENT
Comment 37 Brian 2014-02-08 06:20:35 UTC
I've been following this ticket and attempting to poke around a bit. I just tested my hardware with various points in the 3.6, 3.7, and 3.7-rc kernels, and all of those still gave me a non-responsive screen with messages like the following after resume. e.g., on Linux 3.6:

[  161.192867] [drm] nouveau 0000:01:00.0: Failed to idle channel 2.
[  164.355897] [drm] nouveau 0000:01:00.0: Failed to idle channel 4.

or on Linux 3.7.9:

[  336.337207] nouveau E[    3134] failed to idle channel 0xcccc0000

None of these builds give me a PAGE_NOT_PRESENT error, though. This makes it hard to bisect, as I can't find any working point to test...

My hardware:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GT218M [NVS 3100M] [10de:0a6c] (rev a2) (prog-if 00 [VGA controller])
	Subsystem: Dell Device [1028:040a]
	Flags: bus master, fast devsel, latency 0, IRQ 16
	Memory at e2000000 (32-bit, non-prefetchable) [size=16M]
	Memory at d0000000 (64-bit, prefetchable) [size=256M]
	Memory at e0000000 (64-bit, prefetchable) [size=32M]
	I/O ports at 7000 [size=128]
	Expansion ROM at e3000000 [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [b4] Vendor Specific Information: Len=14 <?>
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Kernel driver in use: nouveau
Comment 38 Ilia Mirkin 2014-02-08 06:30:32 UTC
(In reply to comment #37)
> I've been following this ticket and attempting to poke around a bit. I just
> tested my hardware with various points in the 3.6, 3.7, and 3.7-rc kernels,
> and all of those still gave me a non-responsive screen with messages like
> the following after resume. e.g., on Linux 3.6:

Do you have any reason to believe that you have the same problem? This one was bisected to commit 4f6029da58ba9204c98e33f4f3737fe085c87a6f which appeared in v3.8 (which means that 3.7.x should all be fine).

Of course you have the additional problem of having a nva8 (this bug has nv98 hw users, although that doesn't exclude you from having the same issue -- the bisected commit was fairly generic), which is still unstable for some users, but was even more unstable in earlier kernels. As for not seeing a PAGE_NOT_PRESENT -- are you sure that the kernels in question had code to emit the error in the first place?
Comment 39 Brian 2014-02-08 07:20:42 UTC
(In reply to comment #38)
> Do you have any reason to believe that you have the same problem? This one
> was bisected to commit 4f6029da58ba9204c98e33f4f3737fe085c87a6f which
> appeared in v3.8 (which means that 3.7.x should all be fine).

Not necessarily, although initially the bug symptoms were rather similar, and it's a similar family of hardware. I'm only now checking the bisection myself, and it seems that that particular commit is not my only problem.

> Of course you have the additional problem of having a nva8 (this bug has
> nv98 hw users, although that doesn't exclude you from having the same issue
> -- the bisected commit was fairly generic), which is still unstable for some
> users, but was even more unstable in earlier kernels.

Well, that could complicate my ability to fix things here. I suspect that this particular regression is one of several issues on my hardware, then.

> As for not seeing a
> PAGE_NOT_PRESENT -- are you sure that the kernels in question had code to
> emit the error in the first place?

The code seems to be present. For instance, I'm trying 3.6, where I see nv50_fb_vm_trap() (drivers/gpu/drm/nouveau/nv50_fb.c) has the same "VM: trapped write at 0x...." log message. So I presume that if I was still experiencing the fault in 3.6, it would appear in the log.

BTW, I noticed Ben Gamari's earlier comments about mesa versioning, so I downgraded to 9.1.4, and I still experience the same behavior.
Comment 40 Jochen 2014-05-27 15:28:34 UTC
I hit this bug upon upgrade from Ubuntu 12.04 to 14.04. I can confirm nouveau is working in Ubuntu kernel 3.7.10 and it freezes on resume in 3.13.0.
My hardware:

02:00.0 VGA compatible controller: NVIDIA Corporation C79 [GeForce 9400M] (rev b1) (prog-if 00 [VGA controller])
	Subsystem: Apple Inc. MacBook5,1
	Flags: bus master, fast devsel, latency 0, IRQ 16
	Memory at d2000000 (32-bit, non-prefetchable) [size=16M]
	Memory at c0000000 (64-bit, prefetchable) [size=256M]
	Memory at d0000000 (64-bit, prefetchable) [size=32M]
	I/O ports at 1000 [size=128]
	Expansion ROM at d3000000 [disabled] [size=128K]
	Capabilities: [60] Power Management version 2
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Kernel driver in use: nouveau

[    3.918406] nouveau 0000:02:00.0: setting latency timer to 64
[    3.919370] nouveau  [  DEVICE][0000:02:00.0] BOOT0  : 0x0ac180b1
[    3.919374] nouveau  [  DEVICE][0000:02:00.0] Chipset: MCP79/MCP7A (NVAC)
[    3.919376] nouveau  [  DEVICE][0000:02:00.0] Family : NV50
[    3.930101] nouveau  [   VBIOS][0000:02:00.0] checking PRAMIN for image...
[    3.995688] nouveau  [   VBIOS][0000:02:00.0] ... appears to be valid
[    3.995692] nouveau  [   VBIOS][0000:02:00.0] using image from PRAMIN
[    3.995827] nouveau  [   VBIOS][0000:02:00.0] BIT signature found
[    3.995831] nouveau  [   VBIOS][0000:02:00.0] version 62.79.40.00
[    4.069047] nouveau  [     MXM][0000:02:00.0] no VBIOS data, nothing to do
[    4.147303] nouveau  [     PFB][0000:02:00.0] RAM type: stolen system memory
[    4.147309] nouveau  [     PFB][0000:02:00.0] RAM size: 256 MiB
[    4.938049] nouveau  [     DRM] VRAM: 256 MiB
[    4.938053] nouveau  [     DRM] GART: 512 MiB
[    4.938057] nouveau  [     DRM] BIT BIOS found
[    4.938061] nouveau  [     DRM] Bios version 62.79.40.00
[    4.938065] nouveau  [     DRM] TMDS table version 2.0
[    4.938068] nouveau  [     DRM] DCB version 4.0
[    4.938071] nouveau  [     DRM] DCB outp 00: 01000123 00010014
[    4.938073] nouveau  [     DRM] DCB outp 01: 02021232 00000010
[    4.938076] nouveau  [     DRM] DCB outp 02: 02021286 0f220010
[    4.938078] nouveau  [     DRM] DCB conn 00: 00000040
[    4.938081] nouveau  [     DRM] DCB conn 01: 0000a146
[    6.473715] nouveau  [     DRM] 4 available performance level(s)
[    6.473720] nouveau  [     DRM] 0: core 100MHz shader 200MHz voltage 900mV fanspeed 100%
[    6.473724] nouveau  [     DRM] 1: core 150MHz shader 300MHz voltage 900mV fanspeed 100%
[    6.473728] nouveau  [     DRM] 2: core 350MHz shader 800MHz voltage 900mV fanspeed 100%
[    6.473731] nouveau  [     DRM] 3: core 450MHz shader 1100MHz voltage 1010mV fanspeed 100%
[    6.473734] nouveau  [     DRM] c:
[    6.498971] nouveau  [     DRM] MM: using M2MF for buffer copies
[    6.593834] nouveau  [     DRM] allocated 1280x800 fb: 0x50000, bo ffff88013692ac00
[    6.593930] fbcon: nouveaufb (fb0) is primary device
[    6.862546] fb0: nouveaufb frame buffer device
[    6.863165] [drm] Initialized nouveau 1.1.0 20120801 for 0000:02:00.0 on minor 0
Comment 41 Ilia Mirkin 2014-05-27 15:33:47 UTC
(In reply to comment #40)
> I hit this bug upon upgrade from Ubuntu 12.04 to 14.04. I can confirm
> nouveau is working in Ubuntu kernel 3.7.10 and it freezes on resume in
> 3.13.0.
> My hardware:
> 
> 02:00.0 VGA compatible controller: NVIDIA Corporation C79 [GeForce 9400M]
> (rev b1) (prog-if 00 [VGA controller])
> 	Subsystem: Apple Inc. MacBook5,1

Did you verify that the same commit is responsible? If not, please do a bisect (you can cheat and just assume this is the same issue and test the commit and its parent). If it's the same commit, please let us know. If not, open a separate issue.
Comment 42 Jochen 2014-05-27 16:07:10 UTC
Ilia, I don't know how to do that.
Comment 43 Ilia Mirkin 2014-05-27 16:10:23 UTC
(In reply to comment #42)
> Ilia, I don't know how to do that.

Use your favourite search engine to see how to use 'git bisect'. You may even be able to find some guide specific to your distro. If you can't narrow the problem down, we definitely won't be able to help.
Comment 44 Jochen 2014-05-27 16:59:59 UTC
I'm sorry Ilia, I'm a normal user, no kernel developer.
Comment 45 Ilia Mirkin 2014-05-27 17:02:25 UTC
(In reply to comment #44)
> I'm sorry Ilia, I'm a normal user, no kernel developer.

If you're unable/unwilling/whatever-the-reason to do some amount of debugging, you will be best-served by your distribution's support channels.
Comment 46 Pontus Fuchs 2014-07-15 17:30:23 UTC
I tried to fix this a few months ago but failed. If someone with the right skills and time want to have a look at this problem, I'd be happy to give away a laptop with this chipset. Shipment cost on me. I can prepare a linux installation with sources of the kernel at the regression point. Contact me if you are up to the task.
Comment 47 Peter Hurley 2014-07-15 18:10:57 UTC
(In reply to comment #46)
> I tried to fix this a few months ago but failed. If someone with the right
> skills and time want to have a look at this problem, I'd be happy to give
> away a laptop with this chipset. Shipment cost on me. I can prepare a linux
> installation with sources of the kernel at the regression point. Contact me
> if you are up to the task.

Don't do that.

Read here: https://wiki.ubuntu.com/Kernel/KernelBisection
If you can follow those Ubuntu-specific instructions, that will narrow down the problem.

If you can't follow those instructions, have you filed a Launchpad bug? If so, please post the link or bug # (double-check that it's a public bug or say it's private).
Comment 48 Pontus Fuchs 2014-07-15 18:23:22 UTC
(In reply to comment #47)
> (In reply to comment #46)
> > I tried to fix this a few months ago but failed. If someone with the right
> > skills and time want to have a look at this problem, I'd be happy to give
> > away a laptop with this chipset. Shipment cost on me. I can prepare a linux
> > installation with sources of the kernel at the regression point. Contact me
> > if you are up to the task.
> 
> Don't do that.
> 
> Read here: https://wiki.ubuntu.com/Kernel/KernelBisection
> If you can follow those Ubuntu-specific instructions, that will narrow down
> the problem.
> 
> If you can't follow those instructions, have you filed a Launchpad bug? If
> so, please post the link or bug # (double-check that it's a public bug or
> say it's private).

The problem is already bisected. The commit that introduces the regression switches nv50 to use nvc0's disp implementation. The commit is basically deleting all the nv50 code and changing a few function pointers to use the nvc0 implementation. I tried pin pointing what the problem was (see comment 34) but I was not able to fix the problem.
Comment 49 Peter Hurley 2014-07-15 18:51:34 UTC
(In reply to comment #48)
> (In reply to comment #47)
> > (In reply to comment #46)
> > > I tried to fix this a few months ago but failed. If someone with the right
> > > skills and time want to have a look at this problem, I'd be happy to give
> > > away a laptop with this chipset. Shipment cost on me. I can prepare a linux
> > > installation with sources of the kernel at the regression point. Contact me
> > > if you are up to the task.
> > 
> > Don't do that.
> > 
> > Read here: https://wiki.ubuntu.com/Kernel/KernelBisection
> > If you can follow those Ubuntu-specific instructions, that will narrow down
> > the problem.
> > 
> > If you can't follow those instructions, have you filed a Launchpad bug? If
> > so, please post the link or bug # (double-check that it's a public bug or
> > say it's private).
> 
> The problem is already bisected. The commit that introduces the regression
> switches nv50 to use nvc0's disp implementation. The commit is basically
> deleting all the nv50 code and changing a few function pointers to use the
> nvc0 implementation. I tried pin pointing what the problem was (see comment
> 34) but I was not able to fix the problem.

I saw Ben Gamari's bisection log (from 3.7->3.8), but I didn't realize that you had duplicated the bisection; I only saw your bump 4 months later, on 3.13.

The OP reported this for a Dell Latitude E6400 with G98; are you running similar hardware?
Comment 50 Pontus Fuchs 2014-07-15 19:55:02 UTC
> 
> I saw Ben Gamari's bisection log (from 3.7->3.8), but I didn't realize that
> you had duplicated the bisection; I only saw your bump 4 months later, on
> 3.13.
> 
> The OP reported this for a Dell Latitude E6400 with G98; are you running
> similar hardware?

HW is not identical. I have a Dell XPS 1330 with 8400M GS (10de:0427)

I originially reported "my" problem in bug 62835. After bisection I found this issue with the same offending commit and identical symptom.

Donation offer still valid. I have two of these machines laying around collecting dust.
Comment 51 BobbyJ 2014-10-17 14:12:21 UTC
I have this problem also using the nouveau driver in Ubuntu 14.04.1 LTS 64-bit. I have a Dell D630C with an nvidia 135M graphics card. After power management puts the laptop to sleep, the laptop becomes unusable after resuming. The login screen appears and the mouse moves for a short time, then freezes. Nothing else works,but to completely power down and restart the computer. I would love to see a fix for this issue. Has there been any further progress?
Comment 52 Ilia Mirkin 2014-10-17 14:38:26 UTC
To anyone looking to pile on with a "me too" comment: Only do so *AFTER* verifying that commit 4f6029da is the first bad commit for you.

[Also, check that it's still happening with the latest kernel... 3.17 at the time of writing.]

BobbyJ: I'm guessing you didn't do that. File your own bug with all the relevant info, and we can take it from there.
Comment 53 Nate Homier 2015-10-18 23:50:58 UTC
What's the status of this bug.  The last activity is from a year ago on November 2014.  I think I'm affected by this bug.

I filed this bug report at Ubuntu:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-nouveau/+bug/1378881

Which was marked as a duplicate of this bug:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-nouveau/+bug/1111884

My desktop is unusable.  I know your all busy with life, but a little reassurance that this is going to get fixed would be appreciated.

Thanks, Nate.

P.S.  And if I can help out anyway I will.  I used to build from source years ago before Git came around, but I could learn fast on how to apply patches from source if necessary if you teach me.
Comment 54 Martin Peres 2019-12-04 08:35:26 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/50.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.