Bug 54437

Summary: [NVC8] linux-nouveau2.6 (3.6.0-rc4) : GTX580 : Xorg freezes when using accel
Product: xorg Reporter: Eric <3rik.gm>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: critical    
Priority: high CC: bryce, doityourselfteam, leifer
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
See Also: https://bugzilla.redhat.com/show_bug.cgi?id=860477
https://launchpad.net/bugs/1039202
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
kernel log of card freeze
none
kernel log of card freeze
none
dmesg from GPU lockup none

Description Eric 2012-09-03 12:53:10 UTC
Hello,

I've compiled linux-nouveau-3.6.0rc4 x86_64 (made a package for Arch). With this, the system is stable for 1 minute. Then, when I run compiz, or cairo-dock, it freezes..

Here are dmesg messages :

[    0.000000] Linux version 3.6.0-1-nouveau (bioman@pumpkin) (gcc version 4.7.1 20120721 (prerelease) (GCC) ) #1 SMP PREEMPT Sun Sep 2 22:33:28 CEST 2012
[    1.236259] fb: conflicting fb hw usage nouveaufb vs VESA VGA - removing generic driver
[    1.237326] nouveau  [  DEVICE][0000:01:00.0] BOOT0  : 0x0c8000a1
[    1.237328] nouveau  [  DEVICE][0000:01:00.0] Chipset: GF110 (NVC8)
[    1.237329] nouveau  [  DEVICE][0000:01:00.0] Family : NVC0
[    1.238288] nouveau  [   VBIOS][0000:01:00.0] checking PRAMIN for image...
[    1.339740] nouveau  [   VBIOS][0000:01:00.0] ... appears to be valid
[    1.339741] nouveau  [   VBIOS][0000:01:00.0] using image from PRAMIN
[    1.339822] nouveau  [   VBIOS][0000:01:00.0] BIT signature found
[    1.339823] nouveau  [   VBIOS][0000:01:00.0] version 70.10.17.00
[    1.340038] nouveau  [     MXM][0000:01:00.0] no VBIOS data, nothing to do
[    1.362195] nouveau  [     PFB][0000:01:00.0] RAM type: GDDR5
[    1.362196] nouveau  [     PFB][0000:01:00.0] RAM size: 1536 MiB
[    1.378219] nouveau  [     DRM][0000:01:00.0] VRAM: 1536MiB
[    1.378221] nouveau  [     DRM][0000:01:00.0] GART: 512MiB
[    1.378224] nouveau  [     DRM][0000:01:00.0] BIT BIOS found
[    1.378226] nouveau  [     DRM][0000:01:00.0] Bios version 70.10.17.00
[    1.378229] nouveau  [     DRM][0000:01:00.0] TMDS table version 2.0
[    1.378231] nouveau  [     DRM][0000:01:00.0] DCB version 4.0
[    1.378233] nouveau  [     DRM][0000:01:00.0] DCB outp 00: 02000300 00000000
[    1.378235] nouveau  [     DRM][0000:01:00.0] DCB outp 01: 01000302 00020030
[    1.378237] nouveau  [     DRM][0000:01:00.0] DCB outp 02: 04011380 00000000
[    1.378239] nouveau  [     DRM][0000:01:00.0] DCB outp 03: 08011382 00020030
[    1.378241] nouveau  [     DRM][0000:01:00.0] DCB outp 04: 02022362 00020010
[    1.378243] nouveau  [     DRM][0000:01:00.0] DCB conn 00: 00001030
[    1.378246] nouveau  [     DRM][0000:01:00.0] DCB conn 01: 00010130
[    1.378248] nouveau  [     DRM][0000:01:00.0] DCB conn 02: 00002261
[    1.406991] nouveau  [     DRM][0000:01:00.0] 0 available performance level(s)
[    1.406993] nouveau  [     DRM][0000:01:00.0] c: core 50MHz shader 101MHz memory 135MHz voltage 963mV fanspeed 40%
[    1.421196] nouveau  [     DRM][0000:01:00.0] MM: using COPY0 for buffer copies
[    1.713751] nouveau  [     DRM][0000:01:00.0] allocated 1920x1200 fb: 0x60000, bo ffff88031f77b400
[    1.713938] fbcon: nouveaufb (fb0) is primary device
[    1.741008] fb0: nouveaufb frame buffer device
[    1.741010] [drm] Initialized nouveau 1.1.0 20120801 for 0000:01:00.0 on minor 0
[    1.741023] nouveau 0000:02:00.0: enabling device (0004 -> 0007)
[    1.741572] nouveau  [  DEVICE][0000:02:00.0] BOOT0  : 0x0c8000a1
[    1.741574] nouveau  [  DEVICE][0000:02:00.0] Chipset: GF110 (NVC8)
[    1.741576] nouveau  [  DEVICE][0000:02:00.0] Family : NVC0
[    1.742577] nouveau  [   VBIOS][0000:02:00.0] checking PRAMIN for image...
[    1.752297] nouveau  [   VBIOS][0000:02:00.0] ... signature not found
[    1.752300] nouveau  [   VBIOS][0000:02:00.0] checking PROM for image...
[    2.013316] nouveau  [   VBIOS][0000:02:00.0] ... appears to be valid
[    2.013318] nouveau  [   VBIOS][0000:02:00.0] using image from PROM
[    2.013401] nouveau  [   VBIOS][0000:02:00.0] BIT signature found
[    2.013403] nouveau  [   VBIOS][0000:02:00.0] version 70.10.17.00
[    2.013871] nouveau  [     MXM][0000:02:00.0] no VBIOS data, nothing to do
[    2.013886] nouveau  [ DEVINIT][0000:02:00.0] adaptor not initialised
[    2.013887] nouveau  [   VBIOS][0000:02:00.0] running init tables
[    2.123350] nouveau  [     PFB][0000:02:00.0] RAM type: GDDR5
[    2.123352] nouveau  [     PFB][0000:02:00.0] RAM size: 1536 MiB
[    2.139055] nouveau  [     DRM][0000:02:00.0] VRAM: 1536MiB
[    2.139056] nouveau  [     DRM][0000:02:00.0] GART: 512MiB
[    2.139057] nouveau  [     DRM][0000:02:00.0] BIT BIOS found
[    2.139059] nouveau  [     DRM][0000:02:00.0] Bios version 70.10.17.00
[    2.139060] nouveau  [     DRM][0000:02:00.0] TMDS table version 2.0
[    2.139061] nouveau  [     DRM][0000:02:00.0] DCB version 4.0
[    2.139062] nouveau  [     DRM][0000:02:00.0] DCB outp 00: 02000300 00000000
[    2.139063] nouveau  [     DRM][0000:02:00.0] DCB outp 01: 01000302 00020030
[    2.139064] nouveau  [     DRM][0000:02:00.0] DCB outp 02: 04011380 00000000
[    2.139065] nouveau  [     DRM][0000:02:00.0] DCB outp 03: 08011382 00020030
[    2.139066] nouveau  [     DRM][0000:02:00.0] DCB outp 04: 02022362 00020010
[    2.139067] nouveau  [     DRM][0000:02:00.0] DCB conn 00: 00001030
[    2.139068] nouveau  [     DRM][0000:02:00.0] DCB conn 01: 00010130
[    2.139069] nouveau  [     DRM][0000:02:00.0] DCB conn 02: 00002261
[    2.146174] nouveau  [     DRM][0000:02:00.0] 0 available performance level(s)
[    2.146176] nouveau  [     DRM][0000:02:00.0] c: core 50MHz shader 101MHz memory 135MHz voltage 963mV fanspeed 40%
[    2.160303] nouveau  [     DRM][0000:02:00.0] MM: using COPY0 for buffer copies
[    2.468525] nouveau  [     DRM][0000:02:00.0] allocated 1024x768 fb: 0x60000, bo ffff88031ec1b800
[    2.468602] fb1: nouveaufb frame buffer device
[    2.468604] [drm] Initialized nouveau 1.1.0 20120801 for 0000:02:00.0 on minor 1
[  293.317841] nouveau W[   PFIFO][0000:01:00.0] unknown status 0x40000000
[  311.737359] nouveau E[   PFIFO][0000:01:00.0] read fault at 0x4294000000 [PT_NOT_PRESENT] from PGRAPH/CTXCTL on channel 0x005fcbb000
[  311.737368] nouveau W[   PFIFO][0000:01:00.0] unknown status 0x40000000
[  318.253976] nouveau E[     DRM][0000:01:00.0] GPU lockup - switching to software fbcon
[  321.258010] nouveau E[     DRM][0000:01:00.0] failed to idle channel 0xcccc0001
[  323.258060] nouveau E[   PFIFO][0000:01:00.0] playlist update failed
[  326.255500] nouveau E[     DRM][0000:01:00.0] failed to idle channel 0xcccc0000
[  326.257004] nouveau  [   PFIFO][0000:01:00.0] unknown status 0x00000100
[  328.255524] nouveau E[   PFIFO][0000:01:00.0] channel 1 kick timeout
[  328.257004] nouveau  [   PFIFO][0000:01:00.0] unknown status 0x00000100
[  330.255526] nouveau E[   PFIFO][0000:01:00.0] playlist update failed
[  330.256872] nouveau ![   PFIFO][0000:01:00.0] unhandled status 0x00000001
[  332.549469] nouveau E[   PFIFO][0000:01:00.0] playlist update failed
[  334.585660] nouveau E[   PFIFO][0000:01:00.0] playlist update failed
[  347.004925] nouveau E[     DRM][0000:01:00.0] failed to idle channel 0xcccc0001
[  349.004914] nouveau E[   PFIFO][0000:01:00.0] channel 2 kick timeout
[  351.005021] nouveau E[   PFIFO][0000:01:00.0] playlist update failed
[  354.004593] nouveau E[     DRM][0000:01:00.0] failed to idle channel 0xcccc0000
[  356.004823] nouveau E[   PFIFO][0000:01:00.0] channel 1 kick timeout
[  358.004887] nouveau E[   PFIFO][0000:01:00.0] playlist update failed
[  360.284163] nouveau E[   PFIFO][0000:01:00.0] playlist update failed
[  362.320275] nouveau E[   PFIFO][0000:01:00.0] playlist update failed
Comment 1 Eric 2012-09-03 17:23:55 UTC
Please,

Also fix those bugs for linux-nouveau 3.4.10

Best regards,

Eric
Comment 2 Eric 2012-09-29 14:32:38 UTC
Hello,

Using external GTX 580 firmware helped !

Eric
Comment 3 Kelly Doran 2013-08-19 18:20:17 UTC
This bug persists.  I consider it to be a pretty serious bug, too.  Anyone with an nvc8 card who runs Gnome or Unity will have the card freeze up in seconds, although somewhat less frequently with KDE.  Running firefox will immediately freeze the card.  Ubuntu and Fedora both ship their liveCDs to enable nouveau by default and people cannot even run the installer for more than a minute, and most will have no idea what went wrong.  Note that most nvc0 cards run just fine... I even have a nvc4 on another computer that runs perfectly, it is nvc8 that specifically has this problem, which very few (if any?) devs seem to posess.  Feel free to google "nouveau gtx 580" to see that this is hitting a decent amount of people.

The card works just fine if using the extracted firmware, but this is a poor solution.  I have been reading the various envytools/hwdocs on the fuc and been trying to investigate this but I have hit a wall, this issue is just too difficult for me to handle.  I am pretty sure the solution is to do whatever Ben did to get the nvd7/nvd9 cards working, which looked like adding chipset specific firmware data to the loading code, but I don't know nearly enough to do this myself.  If anyone has some advice, please let me know, I would really like to see this bug closed.
Comment 4 Kelly Doran 2013-08-19 18:21:19 UTC
Created attachment 84279 [details]
kernel log of card freeze
Comment 5 Kelly Doran 2013-08-19 18:22:08 UTC
Created attachment 84280 [details]
kernel log of card freeze

Another dmesg log of the freeze.  Note that the read fault is not always at the same address.
Comment 6 Ilia Mirkin 2013-08-19 18:29:32 UTC
Worth trying 3.11-rc6. A bunch of changes went into 3.11-rc1 related to register setup on nvc0+ cards.
Comment 7 Kelly Doran 2013-08-19 19:16:58 UTC
These logs are from the nouveau git after those changes hit.  I've been tracking the git changes pretty carefully and they looked promising but alas, didn't work.
Comment 8 Martin Peres 2013-08-19 23:55:41 UTC
*** Bug 45517 has been marked as a duplicate of this bug. ***
Comment 9 Leif Gruenwoldt 2013-08-22 17:33:56 UTC
(In reply to comment #3)
> The card works just fine if using the extracted firmware

Can you elaborate on how to do that? This sounds like a better solution than using the proprietary nvidia driver.
Comment 11 Dmitriy 2014-07-21 16:28:08 UTC
*** Bug 81614 has been marked as a duplicate of this bug. ***
Comment 12 Dmitriy 2014-07-21 16:29:35 UTC
This bug is affecting me also, see the last duplicated bug. Any progress in fixing this? Maybe some help in testing (for ex.) required?
Comment 13 Ilia Mirkin 2014-07-21 16:42:31 UTC
(In reply to comment #12)
> This bug is affecting me also, see the last duplicated bug. Any progress in
> fixing this? Maybe some help in testing (for ex.) required?

It's a bit of a mystery unfortunately. Adding to the annoyance, Ben said that it does work just fine on his NVC8, although he has the less powerful versions. Could be something with high ROP/TPC/GPC counts not being handled. (Or multiple PARTs?)

That might actually be an interesting experiment -- before loading nouveau, mask out a bunch of the units and see if it helps. If it does, find the "breaking" point.

This is the code that computes that stuff:

http://cgit.freedesktop.org/~darktama/nouveau/tree/nvkm/engine/graph/nvc0.c#n1330

	priv->rop_nr = (nv_rd32(priv, 0x409604) & 0x001f0000) >> 16;
	priv->gpc_nr =  nv_rd32(priv, 0x409604) & 0x0000001f;
	for (i = 0; i < priv->gpc_nr; i++) {
		priv->tpc_nr[i]  = nv_rd32(priv, GPC_UNIT(i, 0x2608));
		priv->tpc_total += priv->tpc_nr[i];
	}

Step 1: Print out the various values (i.e. number of ROPs, GPCs, and the per-GPC TPC counts).
Step 2: Artificially lower them (to, e.g., 1) and see if it helps. If it does, figure out which of the values matter and where the breaking points are.

If it doesn't help, perhaps the units need to be disabled a little harder, e.g. by setting 0x22584/0x22588.
Comment 14 Dmitriy 2014-07-21 18:35:16 UTC
(In reply to comment #13)

> 
> It's a bit of a mystery unfortunately. Adding to the annoyance, Ben said
> that it does work just fine on his NVC8, although he has the less powerful
> versions. Could be something with high ROP/TPC/GPC counts not being handled.
> (Or multiple PARTs?)
> 
> That might actually be an interesting experiment -- before loading nouveau,
> mask out a bunch of the units and see if it helps. If it does, find the
> "breaking" point.
> 
> This is the code that computes that stuff:
> 
> http://cgit.freedesktop.org/~darktama/nouveau/tree/nvkm/engine/graph/nvc0.
> c#n1330
> 
> 	priv->rop_nr = (nv_rd32(priv, 0x409604) & 0x001f0000) >> 16;
> 	priv->gpc_nr =  nv_rd32(priv, 0x409604) & 0x0000001f;
> 	for (i = 0; i < priv->gpc_nr; i++) {
> 		priv->tpc_nr[i]  = nv_rd32(priv, GPC_UNIT(i, 0x2608));
> 		priv->tpc_total += priv->tpc_nr[i];
> 	}
> 
> Step 1: Print out the various values (i.e. number of ROPs, GPCs, and the
> per-GPC TPC counts).
> Step 2: Artificially lower them (to, e.g., 1) and see if it helps. If it
> does, figure out which of the values matter and where the breaking points
> are.
> 
> If it doesn't help, perhaps the units need to be disabled a little harder,
> e.g. by setting 0x22584/0x22588.

Can you describe more detailed what I need to do? I'm afraid I'm not so advanced at this moment to understand everything in your comment. Maybe not in comments but by e-mail doityourselfteam@gmail.com
Comment 15 Ilia Mirkin 2014-07-22 11:12:16 UTC
*** Bug 81614 has been marked as a duplicate of this bug. ***
Comment 16 Kelly Doran 2014-07-28 18:36:53 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > This bug is affecting me also, see the last duplicated bug. Any progress in
> > fixing this? Maybe some help in testing (for ex.) required?
> 
> It's a bit of a mystery unfortunately. Adding to the annoyance, Ben said
> that it does work just fine on his NVC8, although he has the less powerful
> versions. Could be something with high ROP/TPC/GPC counts not being handled.
> (Or multiple PARTs?)
> 
> That might actually be an interesting experiment -- before loading nouveau,
> mask out a bunch of the units and see if it helps. If it does, find the
> "breaking" point.
> 
> This is the code that computes that stuff:
> 
> http://cgit.freedesktop.org/~darktama/nouveau/tree/nvkm/engine/graph/nvc0.
> c#n1330
> 
> 	priv->rop_nr = (nv_rd32(priv, 0x409604) & 0x001f0000) >> 16;
> 	priv->gpc_nr =  nv_rd32(priv, 0x409604) & 0x0000001f;
> 	for (i = 0; i < priv->gpc_nr; i++) {
> 		priv->tpc_nr[i]  = nv_rd32(priv, GPC_UNIT(i, 0x2608));
> 		priv->tpc_total += priv->tpc_nr[i];
> 	}
> 
> Step 1: Print out the various values (i.e. number of ROPs, GPCs, and the
> per-GPC TPC counts).
> Step 2: Artificially lower them (to, e.g., 1) and see if it helps. If it
> does, figure out which of the values matter and where the breaking points
> are.
> 
> If it doesn't help, perhaps the units need to be disabled a little harder,
> e.g. by setting 0x22584/0x22588.

Heres the printed out values:
[    3.185455] Rop nr: 6
[    3.185457] Gpc nr: 4
[    3.185460] Tpc nr for gpc 0: 4
[    3.185463] Tpc nr for gpc 1: 4
[    3.185466] Tpc nr for gpc 2: 4
[    3.185469] Tpc nr for gpc 3: 4

I tried setting them all to 1, the card freezes pretty much immediately after logging into kwin (which is when I suspect opengl rendering starts), although oddly enough there was no read fault in the dmesg.  I also tried setting them all to 2, and it froze pretty quickly too, and the machine became completely unrecoverable.  Note that I also tried using the blob firmware with all values set to 2, so I think not having them at their natural amounts simply pisses the card off.  Didn't try directly disabling stuff with 0x22584/0x22588, not entirely sure where I would do that even.
Comment 17 t.jp 2014-08-01 21:27:46 UTC
Created attachment 103833 [details]
dmesg from GPU lockup
Comment 18 t.jp 2014-08-01 21:36:44 UTC
I'm affected by this bug as well. Card is ASUS ENGTX580 DCII.

Distro: Arch Linux
X.Org: 1.16
mesa: 10.2.4
xf86-video-nouveau: 1.0.10

I have a Korean Monitor so there are some errors about missing EDID in the dmesg, but even without an xorg.conf nouveau detected the 2560x1440 resolution and Gnome 3 looked fine until it locked up.

The lockup happened, while I was typing a terminal in Gnome 3.12.2. Came out of nowhere. I attached my dmesg log.

I rebooted and got another lockup very quickly but this time without the long list of nouveau E[   PDISP] messages. Channel value is the same, but process is Xorg.bin not mutter-launch.

read fault at 0x4391800000 [PT_NOT_PRESENT] from PGRAPH/CTXCTL on channel 0x005fb79000 [Xorg.bin[909]]
Comment 19 Kelly Doran 2014-08-11 13:29:33 UTC
My computer survived the night with the latest patchset that made it into 3.17, so I am marking this as fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.