Bug 94990 - [GM204] GTX 970 + 4GB VRAM fails at secboot (v4.6+)
Summary: [GM204] GTX 970 + 4GB VRAM fails at secboot (v4.6+)
Status: NEW
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
: 97066 98611 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-04-18 13:51 UTC by mirkoroller
Modified: 2017-05-23 18:29 UTC (History)
17 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (53.99 KB, text/plain)
2016-04-18 13:51 UTC, mirkoroller
no flags Details
Config File (101.22 KB, text/plain)
2016-04-19 18:11 UTC, mirkoroller
no flags Details
sys.log after booting with nvidia gtx970 only (129.02 KB, text/plain)
2016-04-20 19:31 UTC, mirkoroller
no flags Details
Logout with msi kernel bug (1.43 MB, text/plain)
2016-04-23 09:59 UTC, mirkoroller
no flags Details
logout 2 (1.40 MB, text/plain)
2016-04-23 09:59 UTC, mirkoroller
no flags Details
logout 3 (1.29 MB, text/plain)
2016-04-23 10:00 UTC, mirkoroller
no flags Details
Logout1 (1.43 MB, image/jpeg)
2016-04-23 10:02 UTC, mirkoroller
no flags Details
Logout2 (1.40 MB, image/jpeg)
2016-04-23 10:02 UTC, mirkoroller
no flags Details
Logout3 (1.29 MB, image/jpeg)
2016-04-23 10:03 UTC, mirkoroller
no flags Details
/lib/firmware moved, bootlog (54.25 KB, text/plain)
2016-04-23 10:08 UTC, mirkoroller
no flags Details
dmesg after patch (422.45 KB, text/plain)
2016-04-26 17:48 UTC, mirkoroller
no flags Details
dmesg after patch, with pcimsi=off (552.98 KB, text/plain)
2016-04-26 17:55 UTC, mirkoroller
no flags Details
dmesg from kernel with debug and NvForcePost=1 pci_msi=off (1.46 MB, text/plain)
2016-06-25 10:04 UTC, H.Habighorst
no flags Details
DMESG with pci_msi=off nouveau.config=NvForcePost=1 nouveau.debug=trace (261.34 KB, text/plain)
2016-06-25 13:09 UTC, H.Habighorst
no flags Details
dmesg for latest linux-next + patches for secboot (389.66 KB, text/plain)
2016-06-26 11:00 UTC, H.Habighorst
no flags Details
DMESG with pci_msi=off nouveau.config=NvForcePost=1 nouveau.debug=trace (14.94 MB, text/x-log)
2016-07-22 23:29 UTC, Ilia Guterman
no flags Details
attachment-8952-0.html (7.50 KB, text/html)
2016-08-09 18:17 UTC, Efrem McCrimon
no flags Details
0001-nvkm_gpuobj-size-is-smaller-then-nvkm_gpuobj-size-ca.patch (1.61 KB, patch)
2016-08-09 23:07 UTC, Ilia Guterman
no flags Details | Splinter Review
dmesg with patch 125656 applied. everything else fedora stock... (250.41 KB, text/plain)
2016-08-16 22:23 UTC, Florian Mickler
no flags Details
attachment-14215-0.html (2.02 KB, text/html)
2016-08-23 18:11 UTC, Efrem McCrimon
no flags Details
attachment-18907-0.html (2.39 KB, text/html)
2016-08-24 20:01 UTC, Efrem McCrimon
no flags Details
attachment-28359-0.html (2.06 KB, text/html)
2016-08-24 21:14 UTC, Efrem McCrimon
no flags Details
attachment-20368-0.html (2.74 KB, text/html)
2016-08-26 02:12 UTC, Efrem McCrimon
no flags Details
limit ram to 3 bars (1006 bytes, patch)
2016-10-24 06:56 UTC, Ilia Guterman
no flags Details | Splinter Review
boot log kernel 4.8.4-gentoo (21.61 KB, text/plain)
2016-10-27 16:47 UTC, Wojciech Arabczyk
no flags Details
dmesg after applying patch to limit bar to 3GB (3.09 KB, text/plain)
2016-10-27 17:20 UTC, Wojciech Arabczyk
no flags Details
untested initial attempt at a solution (1.68 KB, text/plain)
2016-11-21 00:46 UTC, Ben Skeggs
no flags Details
untested initial attempt at a solution (take 2) (1.69 KB, patch)
2016-11-21 01:07 UTC, Ben Skeggs
no flags Details | Splinter Review
Log with patch 128091 (27.30 KB, text/plain)
2016-11-21 05:58 UTC, Alexandre Courbot
no flags Details
attachment-14321-0.html (2.25 KB, text/html)
2016-11-24 21:33 UTC, Efrem McCrimon
no flags Details
Xorg log on GTX970 4GB with 4.9.0-rc7 and xf86-video-nouveau from HEAD (55.51 KB, text/x-log)
2016-12-04 20:25 UTC, Bernie Innocenti
no flags Details
dmesg on GTX970 4GB with 4.9.0-rc7 and xf86-video-nouveau from HEAD (500.35 KB, text/plain)
2016-12-04 20:26 UTC, Bernie Innocenti
no flags Details
Disable last 512 MiB of VRAM on GTX970 (1.12 KB, patch)
2017-03-07 00:01 UTC, Lyude Paul
no flags Details | Splinter Review
nouveau-fdo94990-linux-4.10.patch (33.78 KB, patch)
2017-05-03 13:16 UTC, Hans Petter Jansson
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description mirkoroller 2016-04-18 13:51:38 UTC
Created attachment 123026 [details]
dmesg

Hi.... Iam testing the latest 4.6rc4 kernel and the nouveau kernel module. I installed the latest /lib/firmware from git.   After rebooting, screen goes black, monitor goes to sleep.

Hardware ASUS Z97 Mainboard
Nvidia GTX970


    2.587816] nouveau 0000:01:00.0: fifo: read fault at 00ffba0000 engine 1f [] client 12 [PMU] reason 0d [REGION_VIOLATION] on channel -1 [0000000000 unknown]
[    2.687724] nouveau 0000:01:00.0: timeout at drivers/gpu/drm/nouveau/nvkm/subdev/secboot/base.c:145/nvkm_secboot_falcon_run()!
[    2.687736] nouveau 0000:01:00.0: fifo: write fault at 000030d000 engine 05 [BAR3] client 08 [HOST_CPU_NB] reason 0d [REGION_VIOLATION] on channel -1 [00ffbf5000 unknown]
[    2.687752] nouveau 0000:01:00.0: fifo: write fault at 0000011000 engine 05 [BAR3] client 08 [HOST_CPU_NB] reason 0d [REGION_VIOLATION] on channel -1 [00ffbf5000 unknown]


Full dmesg output:
http://pastebin.com/pWD1wuTL
Comment 1 Ilia Mirkin 2016-04-18 13:54:07 UTC
Does it work fine on a cold boot, but only fails on a warm boot? (Is that what you mean by "after rebooting"?)
Comment 2 mirkoroller 2016-04-18 13:55:08 UTC
(In reply to Ilia Mirkin from comment #1)
> Does it work fine on a cold boot, but only fails on a warm boot? (Is that
> what you mean by "after rebooting"?)

No, its never working...
Comment 3 Alexandre Courbot 2016-04-18 14:20:02 UTC
Just to make sure, can you give us a md5sum of the firmware files in /lib/firmware/nvidia/gm204/acr and /lib/firmware/nvidia/gm204/gr?

$ md5sum /nvidia/gm204/acr/* /lib/firmware/nvidia/gm204/gr/*

should do it.
Comment 4 mirkoroller 2016-04-18 14:24:42 UTC
(In reply to Alexandre Courbot from comment #3)
> Just to make sure, can you give us a md5sum of the firmware files in
> /lib/firmware/nvidia/gm204/acr and /lib/firmware/nvidia/gm204/gr?
> 
> $ md5sum /nvidia/gm204/acr/* /lib/firmware/nvidia/gm204/gr/*
> 
> should do it.

 md5sum /nvidia/gm204/acr/* /lib/firmware/nvidia/gm204/gr/*
md5sum: '/nvidia/gm204/acr/*': Datei oder Verzeichnis nicht gefunden
2218d5ef13ae6f4ff1b69169325b6e79  /lib/firmware/nvidia/gm204/gr/fecs_bl.bin
138a1521acec4d0a896ac4d71b5e7435  /lib/firmware/nvidia/gm204/gr/fecs_data.bin
bc2670a4d52798347d6bd6e3cea4f269  /lib/firmware/nvidia/gm204/gr/fecs_inst.bin
11f07571b6b39c3dcd720aa28710f4d9  /lib/firmware/nvidia/gm204/gr/fecs_sig.bin
66b91964bd2d7875b971790a505eea0a  /lib/firmware/nvidia/gm204/gr/gpccs_bl.bin
4f98e6fe4a5e3dfe8a3bcbc2b6a56f39  /lib/firmware/nvidia/gm204/gr/gpccs_data.bin
982a0a0382c8aa6e417bfb36b6db6942  /lib/firmware/nvidia/gm204/gr/gpccs_inst.bin
adac8235d4d31e041fcce466862e9d61  /lib/firmware/nvidia/gm204/gr/gpccs_sig.bin
4cc8aa5cb5a8d9caad4541447c28cd02  /lib/firmware/nvidia/gm204/gr/sw_bundle_init.bin
d2c1c761da60b9127517936267bf5289  /lib/firmware/nvidia/gm204/gr/sw_ctx.bin
cbb1feb005ef01043ccbfb4fbcb53667  /lib/firmware/nvidia/gm204/gr/sw_method_init.bin
cf141703e24099c4c03102b870befd41  /lib/firmware/nvidia/gm204/gr/sw_nonctx.bin
Comment 5 Alexandre Courbot 2016-04-18 14:26:18 UTC
Oops sorry, the command I gave you was incorrect and we are missing the ACR files. Can you now run:

$ md5sum /lib/firmware/nvidia/gm204/acr/*

Thanks!
Comment 6 mirkoroller 2016-04-18 14:28:39 UTC
Firmware comes from:
git clone git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git(In reply to Alexandre Courbot from comment #5)
> Oops sorry, the command I gave you was incorrect and we are missing the ACR
> files. Can you now run:
> 
> $ md5sum /lib/firmware/nvidia/gm204/acr/*
> 
> Thanks!

 md5sum /lib/firmware/nvidia/gm204/acr/*
a7b5607ac96761fee269fdec7709d28c  /lib/firmware/nvidia/gm204/acr/bl.bin
f4a9e768efe4a42c7b0768e9f1b63d50  /lib/firmware/nvidia/gm204/acr/ucode_load.bin
f8bccb173c87408a84d31862d8e86178  /lib/firmware/nvidia/gm204/acr/ucode_unload.bin
Comment 7 mirkoroller 2016-04-18 14:29:20 UTC
Firmware comes from:

git clone git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git
Comment 8 Alexandre Courbot 2016-04-19 06:50:11 UTC
Ok, md5sums are correct, so your firmware is not corrupted.

I will try to reproduce on the same setup as you (GTX970/4.6-rc4) - can you confirm that you are using pure 4.6-rc4 without any extra patch? Also, can you run

$ cat /proc/config.gz >config.gz

and attach the created config.gz file?

Thanks!
Comment 9 mirkoroller 2016-04-19 07:42:48 UTC
(In reply to Alexandre Courbot from comment #8)
> Ok, md5sums are correct, so your firmware is not corrupted.
> 
> I will try to reproduce on the same setup as you (GTX970/4.6-rc4) - can you
> confirm that you are using pure 4.6-rc4 without any extra patch? Also, can
> you run
> 
> $ cat /proc/config.gz >config.gz
> 
> and attach the created config.gz file?
> 
> Thanks!

Yes, its a clean 4.6-rc4, from kernel.org.

I Post the .config in 10h.
Comment 10 mirkoroller 2016-04-19 18:11:30 UTC
Created attachment 123065 [details]
Config File

The .config File.
Comment 11 Alexandre Courbot 2016-04-20 04:19:39 UTC
Successfully booted with your config on 4.6-rc4 and a GM206 (GTX 960). I am trying to get my hands on a GTX970 to see if this is specific to the card.

Are you able/willing to compile custom Nouveau modules in order to get more debug information?
Comment 12 Alexandre Courbot 2016-04-20 04:32:16 UTC
This line in the log looks kinda suspicious to me:

[    2.278551] nouveau 0000:01:00.0: enabling device (0000 -> 0003)

I am not seeing it on my end ; could this mean that the Geforce card is a secondary display device, and that the integrated Intel graphics is the main one?

In this case maybe you need this patch, which is not in -rc4: https://lists.freedesktop.org/archives/nouveau/2016-April/024523.html

Also I am surprised by the report that the screen goes blank after this - failure to initialize GR should not interfere with the ability to display a framebuffer (only acceleration will be disabled).

Let's make another experiment: can you move /lib/firmware/nvidia/gm204 somewhere else, reboot (you will see a complain about missing firmware files in the boot log), and tell us whether the display is active after that?

Thanks!
Comment 13 mirkoroller 2016-04-20 11:07:31 UTC
Yes, iam booting with intel-gpu and nvidia-gtx970 card activated. There is a picture on the intel screen. If i disable the intel gpu in the bios, there is no output on the nvidia screen.

I try to ssh to the machine, with disabled intel-gpu.

I have a modiefied Bios in the GTX970. Could this be the answer? I disabled the Fan on the gtx970 under 50C. But everything works on Windows10 and kernel 4.5.0 with nouvau driver.
Comment 14 mirkoroller 2016-04-20 11:09:14 UTC
I removed the /lib/firmware directory, still black screen.
Comment 15 mirkoroller 2016-04-20 19:31:19 UTC
Created attachment 123096 [details]
sys.log after booting with nvidia gtx970 only
Comment 16 Alexandre Courbot 2016-04-22 06:24:15 UTC
Hi,

Your latest comments seem to indicate that the issue is not related to firmware - it just happens to manifest itself at this stage as well when firmware is loaded.

2 things to try:

1) Can you try loading Nouveau with the "config=NvForcePost=1" option? Adding "nouveau.config=NvForcePost=1" to your kernel command line should do it.

2) Can I get a full log of you booting with the /lib/firmware directory removed, or can you confirm that when you can see Nouveau complaining about the firmware when you boot in this configuration?

Thanks!
Comment 17 mirkoroller 2016-04-23 09:59:31 UTC
Created attachment 123172 [details]
Logout with msi kernel bug
Comment 18 mirkoroller 2016-04-23 09:59:50 UTC
Created attachment 123173 [details]
logout 2
Comment 19 mirkoroller 2016-04-23 10:00:15 UTC
Created attachment 123174 [details]
logout 3
Comment 20 mirkoroller 2016-04-23 10:02:10 UTC
Created attachment 123175 [details]
Logout1
Comment 21 mirkoroller 2016-04-23 10:02:34 UTC
Created attachment 123176 [details]
Logout2
Comment 22 mirkoroller 2016-04-23 10:03:09 UTC
Created attachment 123177 [details]
Logout3
Comment 23 mirkoroller 2016-04-23 10:08:19 UTC
Created attachment 123178 [details]
/lib/firmware moved, bootlog
Comment 24 Alexandre Courbot 2016-04-26 07:53:31 UTC
Hi,

Not sure why you enabled MSI - I asked if you could set NvForcePost, not NvMSI? Or maybe you posted on the wrong bug?

My gut feeling is more and more that for some reason devinit is not ran by the bios and the heuristic in 4.6 does not detect this (which is why I wanted to try NvForcePost). If you are compiling your own kernel, could you try applying the following patch and see if things are improving?

https://lists.freedesktop.org/archives/nouveau/2016-April/024523.html
Comment 25 mirkoroller 2016-04-26 17:47:35 UTC
(In reply to Alexandre Courbot from comment #24)
> Hi,
> 
> Not sure why you enabled MSI - I asked if you could set NvForcePost, not
> NvMSI? Or maybe you posted on the wrong bug?
> 
> My gut feeling is more and more that for some reason devinit is not ran by
> the bios and the heuristic in 4.6 does not detect this (which is why I
> wanted to try NvForcePost). If you are compiling your own kernel, could you
> try applying the following patch and see if things are improving?
> 
> https://lists.freedesktop.org/archives/nouveau/2016-April/024523.html


Added Patch and NvForcePost=1 

Result is the Same. I did not enable MSI by bootparamter, it looks like its enabled by default...
Comment 26 mirkoroller 2016-04-26 17:48:06 UTC
Created attachment 123280 [details]
dmesg after patch
Comment 27 mirkoroller 2016-04-26 17:55:53 UTC
Created attachment 123281 [details]
dmesg after patch, with pcimsi=off
Comment 28 mirkoroller 2016-05-16 08:59:19 UTC
Bug still not solved on new 4.6 KErnel.
Comment 29 Andrei Dziahel 2016-06-06 08:45:09 UTC
Hi,

Ok, so it seems I'm affected too: my desktop with GTX970 doesn't boot kernel 4.6 (comes with recent openSUSE Tumbleweed snapshot release). Although it does boot 4.5.4 kernel, from which I'm writing this comment.

I've found this issue by googling "timeout at drivers/gpu/drm/nouveau/nvkm/subdev/secboot/base.c:145" message from my logs.

Relevant log parts are pretty much identical for me, I can post them here, if needed. 

Firmware checksums are identical. Haven't tried NvForcePost=1 and pci_msi=off yet, will try later if needed.

Anyway, +1 here.

Motherboard: Gigabyte GA-MA770
Videocard: GTX 970 by MSI
UEFI and SecureBoot are enabled.
Comment 30 H.Habighorst 2016-06-25 10:04:14 UTC
Created attachment 124715 [details]
dmesg from kernel with debug and NvForcePost=1 pci_msi=off

I have only removed a lot of repeated messages from the end (FIFO message spams), whole beginning was left as is.
Comment 31 H.Habighorst 2016-06-25 10:10:31 UTC
Booted the last kernel from tumbleweed with debug on (4.6.2-1).

Firmware MD5's are the same as mentioned.

DMESG output posted above, as mentioned I left the beginning intact, but cutted rest off due to file being very large (65 MB, it never stops spamming the fifo message).

If you'll need more info / something specific, just mention it, will follow the bug.
Comment 32 H.Habighorst 2016-06-25 13:06:33 UTC
Comment on attachment 124715 [details]
dmesg from kernel with debug and NvForcePost=1 pci_msi=off

- obsolete, wrong kernel parameters
Comment 33 H.Habighorst 2016-06-25 13:09:44 UTC
Created attachment 124716 [details]
DMESG with pci_msi=off nouveau.config=NvForcePost=1 nouveau.debug=trace

Sorry, first time nouveau debugging and just realized I completely messed up the kernel options - this time should be correct.
Comment 34 H.Habighorst 2016-06-26 11:00:32 UTC
Created attachment 124724 [details]
dmesg for latest linux-next + patches for secboot

Found these patches on the mailing list, thought I could try it out, but no luck.

https://lists.freedesktop.org/archives/nouveau/2016-June/025336.html

I've appended the dmesg, the kernel source was linux-next as of 2016-06-24 with the patches mentioned applied.
Comment 35 Yann Golanski 2016-07-20 13:05:28 UTC
Same symptoms, running either `kernel-4.6.3-300.fc24.x86_64` or `kernel-4.6.4-301.fc24.x86_64` on Fedora 24.

Do you need some more information from me?
Comment 36 Ilia Guterman 2016-07-22 23:29:04 UTC
Created attachment 125265 [details]
DMESG with pci_msi=off nouveau.config=NvForcePost=1 nouveau.debug=trace

Asus Z77 and GTX970, it seems the first fifo caused by https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/nouveau/nvkm/subdev/secboot/base.c#L142

changing the value to 0x0 allowed the machine to boot, still with lots of errors.
Comment 37 Ilia Mirkin 2016-07-24 17:26:14 UTC
All 3 users who posted dmesg in this bug have 4GB of VRAM.
Comment 38 Alexandre Courbot 2016-07-24 23:30:21 UTC
Looks like we are narrowing it down. This may be a bug in the firmware itself when dealing with high addresses. Let me check on my side which address the ACR is loaded, and if I can force a lower address for it.
Comment 39 Alexandre Courbot 2016-07-25 05:00:42 UTC
Tried on a 4GB GM206 / 4.6.4, it passed without any issue:

[   12.410603] nouveau 0000:01:00.0: NVIDIA GM206 (1262f0a1)
[   12.493772] nouveau 0000:01:00.0: bios: version 84.06.01.0c.00
[   12.494306] nouveau 0000:01:00.0: disp: dcb 15 type 8 unknown
[   12.506916] nouveau 0000:01:00.0: fb: 4096 MiB GDDR5
[   12.512991] INST: ffbd5000
[   12.513137]  PGD: ffbcd000
[   12.526018]  WPR: ffba0000
[   12.528160] BLOB: ffbca000
[   12.532053] BLOB: ffbc9000
[   12.532180] [TTM] Zone  kernel: Available graphics memory: 1024460 kiB

... and then I could run glmark2 without any error.

Tried messing with the WPR address to make it invalid and see if I could get the same error. It happens to be different:

[   31.729804] nouveau 0000:01:00.0: priv: GPC0: 419df4 00000000 (1b40820e)
[   31.729846] nouveau 0000:01:00.0: priv: GPC1: 419df4 00000000 (1b40820e)
[   31.729894] nouveau 0000:01:00.0: priv: GPC0: 505884 c0000000 (1a40820e)
[   31.729924] nouveau 0000:01:00.0: priv: GPC1: 50d884 c0000000 (1a40820e)
[   33.731990] nouveau 0000:01:00.0: timeout at ../drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c:1469/gf100_gr_init_ctxctl()!
[   33.732035] nouveau 0000:01:00.0: gr: init failed, -16

So not sure where this REGION_VIOLATION comes from. Guess I will need to ping the fw people at NVIDIA...
Comment 40 Alexandre Courbot 2016-07-25 05:05:51 UTC
I should mention that even though the GR message is the same, I am not getting any REGION_VIOLATION nor timeout in nvkm_secboot_falcon_run() when running secure boot with an invalid WPR address.
Comment 41 Ben Skeggs 2016-07-25 05:21:25 UTC
Another interesting experiment would be for someone hitting this to:

- Boot with "nouveau.modeset=0 3" on the kernel commandline.
- Suspend the machine with "echo mem > /sys/power/state"
- Resume the machine (there will be no display at this point, so you'll need ssh access or to type the rest blindly)
- Execute "modprobe nouveau modeset=1" and see if the situation is any better.
Comment 42 Ben Skeggs 2016-07-25 05:24:23 UTC
(In reply to Ben Skeggs from comment #41)
> Another interesting experiment would be for someone hitting this to:
> 
> - Boot with "nouveau.modeset=0 3" on the kernel commandline.
> - Suspend the machine with "echo mem > /sys/power/state"
> - Resume the machine (there will be no display at this point, so you'll need
> ssh access or to type the rest blindly)
> - Execute "modprobe nouveau modeset=1" and see if the situation is any
> better.

Before suspending, also run "modprobe -r nouveau" ;)  (Thanks Ilia!)
Comment 43 Alexandre Courbot 2016-07-25 05:48:12 UTC
Got a GTX970/4GB, been able to reproduce the error on the same machine that booted a GM206/4GB without any issue.
Comment 44 Alexandre Courbot 2016-07-25 06:43:34 UTC
Allocating instmem from the lower VRAM partition (<3.5GB) results in a successfull boot. Looks like this is the fix.
Comment 45 Yann Golanski 2016-08-01 07:26:59 UTC
Definitely neither rushing nor pestering you but is there an ETA for this?
Comment 46 Alexandre Courbot 2016-08-01 07:28:33 UTC
I think Ben mentioned he was working on a fix for this. Ben, do you need more data from me?
Comment 47 Ivan 2016-08-08 05:38:06 UTC
Hi. i5-4670, ASUS H87-pro, Nvidia GTX970. I have the same bug on kernel 4.6.* and on 4.7.0 too. Just black screen on boot... On kernel 4.5.* everything works just fine.
Comment 48 Ivan 2016-08-08 06:32:12 UTC
(In reply to Ivan from comment #47)
> Hi. i5-4670, ASUS H87-pro, Nvidia GTX970. I have the same bug on kernel
> 4.6.* and on 4.7.0 too. Just black screen on boot... On kernel 4.5.*
> everything works just fine.

Btw. I try to boot Arch Live CD, kernel 4.6.4 with option "nouveau.modeset=0" and it works.
Comment 49 Zach Wolfe 2016-08-08 19:09:51 UTC
For the record, I had dealt with this problem too, boots fine on 4.5.* but not 4.6 (as of today). I have also tried a GTX 960 and a GTX 980 (4GB) (now my current). Fedora 24 boots fine with each of those cards on 4.6. The EVGA GTX 970 (4GB) did not boot (at all) on kernel 4.6. However it will boot under 4.5 but the monitor(s) connected to the 970 card will not work. (I run two cards).
Comment 50 Yann Golanski 2016-08-09 08:23:31 UTC
Could some kind soul write a how to modify grub2 configs to add the `nouveau.modeset=0` so we could test is that work on our setup?
Comment 51 Tobias Klausmann 2016-08-09 12:20:15 UTC
(In reply to Yann Golanski from comment #50)
> Could some kind soul write a how to modify grub2 configs to add the
> `nouveau.modeset=0` so we could test is that work on our setup?

Add it in /boot/grub2/grub.cfg to the kernel commandline for the kernel you want to boot, or change the boot config inside grub2 for a oneshot (F12). But you should have just googled that ;-)
Comment 52 Ilia Guterman 2016-08-09 14:37:15 UTC
i tested with 'nouveau.modeset=0' on kernel 4.5 and 4.6 on debian/sid and got
the same results of screen flickering and seems like nothing inside of the
nouveau driver getting called.
Comment 53 Pierre Moreau 2016-08-09 16:00:55 UTC
(In reply to Ilia Guterman from comment #52)
> i tested with 'nouveau.modeset=0' on kernel 4.5 and 4.6 on debian/sid and got
> the same results of screen flickering and seems like nothing inside of the
> nouveau driver getting called.

Right, `nouveau.modeset=0` wont’t prevent Nouveau from loading, however the driver is in a "disable" state and won’t do a thing:

> modeset
> Whether the driver should be enabled. 0 for disabled, 1 for enabled, 2 for headless

(taken from the Nouveau wiki: https://nouveau.freedesktop.org/wiki/KernelModuleParameters/#modeset)
Comment 54 Efrem McCrimon 2016-08-09 18:17:54 UTC
Created attachment 125646 [details]
attachment-8952-0.html

Hi QA:

 I had can to create the manual blacklist entry for nouveau.   I did this
because I wanted to use the non-free driver.  This is also required if you
want to use the Intel HD internal graphics card with a Nvidia chip set as a
secondary card.

The first edit will disable nouveau and verify by "lsmod | grep -i nouveau"



Manual edits required to get this working.

1. Created blacklist entry, /etc/modprobe.d/blacklist­nouveau.conf as shown
below

# cat /etc/modprobe.d/blacklist­nouveau.conf

blacklist nouveau

options nouveau modeset=0

2. kernel option used, nomodeset

Below (lines below) if you want to use the Nvidia non-free driver support:

3. Used Nvidia xconfig tool, nvidia­xconfig

4. Edit the file manually to see what drivers were being used , it should
state: Drivers “nvidia”

5. ** Special note, I have two separate video GPU cards and decided to use
individual heads, device1, device2


Now after booting, Nvidia drivers are working because the X server came up.
I noticed that

'num­lock' was off. Verification of X is as follows:

1. Verify what is in proc, lsmod, dmesg messages

# cat /proc/driver/nvidia/version

NVRM version: NVIDIA UNIX x86_64 Kernel Module 352.30 Tue Jul 21 18:53:45
PDT 2015

GCC version: gcc version 4.8.3 20140106 (OpenMandriva Association) (Linaro
GCC 4.8­2014.01)

This is the driver we wanted, 352.30, installed from 'mcc'; manual edits
required for the blacklist

file creation. The installation needs to create something. The Nvidia
installer does create a file for

you using their manual installation method by executing the driver
installation <Nvidia-
driver.version>.run.

2. Verify X log in /var/log/Xorg.0.log.

3. Run Nvidia settings and Nvidia System Management Interface tool,
nvidia­settings, a X tool,

and nvidia­smi, System Management interface tool

Nvidia­smi, reports the driver version match such as in
proc/driver/nvidia/version and reports the

GPU(s). GPU­0, is the 960; GPU­1, is the 730 card. I want to run the 960 as
the primary (a

physical connection to a 24in monitor).

# nvidia­smi

Sat Feb 27 04:26:53 2016

+­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­+

| NVIDIA­SMI 352.30 Driver Version: 352.30 |

|­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­+

| GPU Name Persistence­M| Bus­Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory­Usage | GPU­Util Compute M. |

|============+======================+======================|

| 0 GeForce GTX 960 Off | 0000:01:00.0 On | N/A |

| 0% 28C P8 7W / 128W | 216MiB / 4091MiB | 0% Default |

+­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­+

| 1 GeForce GT 730 Off | 0000:02:00.0 N/A | N/A |

| 30% 28C P8 N/A / N/A | 66MiB / 1023MiB | N/A Default |

+­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­+

+­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­+

| Processes: GPU Memory |

| GPU PID Type Process name Usage |

|==================================================|

| 0 4619 G /etc/X11/X 200MiB |

| 0 6121 G /usr/bin/nvidia­settings 2MiB |

| 1 Not Supported |


Regards,

Efrem Mc

PS, I have used this in the past.  You can also try using "acpi_osi=Linux"
on the kernel command line.
This allows be to boot with the Nvidia card as a primary video source with
the Intel HD graphics card disabled in BIOS.

I have found that the BIOS is disabled and initfs still loads the i915
driver module


On Tue, Aug 9, 2016 at 12:00 PM, <bugzilla-daemon@freedesktop.org> wrote:

> *Comment # 53 <https://bugs.freedesktop.org/show_bug.cgi?id=94990#c53> on
> bug 94990 <https://bugs.freedesktop.org/show_bug.cgi?id=94990> from Pierre
> Moreau <pierre.morrow@free.fr> *
>
> (In reply to Ilia Guterman from comment #52 <https://bugs.freedesktop.org/show_bug.cgi?id=94990#c52>)> i tested with 'nouveau.modeset=0' on kernel 4.5 and 4.6 on debian/sid and got
> > the same results of screen flickering and seems like nothing inside of the
> > nouveau driver getting called.
>
> Right, `nouveau.modeset=0` wont’t prevent Nouveau from loading, however the
> driver is in a "disable" state and won’t do a thing:
> > modeset
> > Whether the driver should be enabled. 0 for disabled, 1 for enabled, 2 for headless
>
> (taken from the Nouveau wiki:https://nouveau.freedesktop.org/wiki/KernelModuleParameters/#modeset)
>
> ------------------------------
> You are receiving this mail because:
>
>    - You are the assignee for the bug.
>
>
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
>
>
Comment 55 Ilia Guterman 2016-08-09 23:07:48 UTC
Created attachment 125656 [details] [review]
0001-nvkm_gpuobj-size-is-smaller-then-nvkm_gpuobj-size-ca.patch

After some messing around i am able to boot.

I noticed it crashes only when initializing certain nvkm_gpuobj, which has it 'nvkm_gpuobj->size' bigger then 'nv50_instobj(memory)->mem->size << NVKM_RAM_MM_SHIFT'.

catching that specific case made it better.

from your kernel directory do 'git apply 0001-nvkm_gpuobj-size-is-smaller-then-nvkm_gpuobj-size-ca.patch

and if you can post the boot log.
Comment 56 Ivan 2016-08-12 19:23:49 UTC
So, I just install fresh Fedora 24 with kernel-4.5.5, then remove nouveau from system completely:

1. # echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf

2. Paste ‘rd.driver.blacklist=nouveau’ to end of ‘GRUB_CMDLINE_LINUX=”…”  in /etc/sysconfig/grub

3. # grub2-mkconfig -o /boot/grub2/grub.cfg

4. # dnf remove xorg-x11-drv-nouveau

5. # dracut /boot/initramfs-$(uname -r).img $(uname -r)

After this steps reboot, upgrade system to kernel-4.6.5, and install nVidia proprietary driver. Everything works just fine.
Comment 57 Florian Mickler 2016-08-16 22:21:51 UTC
(In reply to Ilia Guterman from comment #55)
> Created attachment 125656 [details] [review] [review]
> 0001-nvkm_gpuobj-size-is-smaller-then-nvkm_gpuobj-size-ca.patch
> 
> After some messing around i am able to boot.
> 
> I noticed it crashes only when initializing certain nvkm_gpuobj, which has
> it 'nvkm_gpuobj->size' bigger then 'nv50_instobj(memory)->mem->size <<
> NVKM_RAM_MM_SHIFT'.
> 
> catching that specific case made it better.
> 
> from your kernel directory do 'git apply
> 0001-nvkm_gpuobj-size-is-smaller-then-nvkm_gpuobj-size-ca.patch
> 
> and if you can post the boot log.

Hi Ilia,

I applied your patch and my system now boots up. I saw an login-screen-image from before the boot for a split-second.  apart from that, it seems to run fine. (no stress testing done, just firefox and some gnome terminals...)


what does your patch do? increase the reported size of some of the objects?
Comment 58 Florian Mickler 2016-08-16 22:23:27 UTC
Created attachment 125833 [details]
dmesg with patch 125656 applied. everything else fedora stock...
Comment 59 Ilia Guterman 2016-08-17 04:52:25 UTC
It increases the size of nv50_instobj objects that report the size of 0x3000.

Struct "nv50_instobj" has size of 0x3000 but it's parent structure "nvkm_gpuobj" reports different size, a size of 0x97000.

All the other instances of nvkm_gpuobj and their child nv50_instobj report similar size size to each other so they are fine.

It is a mystery to me why the sizes doesn't match, because if I understand correctly the sizes are reported by the GPU itself.
Comment 60 Florian Mickler 2016-08-17 20:42:25 UTC
If I try to start redeclipse, it segfaults. :(
Comment 61 Yann Golanski 2016-08-22 10:43:19 UTC
(In reply to Ivan from comment #56)
> So, I just install fresh Fedora 24 with kernel-4.5.5, then remove nouveau
> from system completely:
> 
> […]
> 
> After this steps reboot, upgrade system to kernel-4.6.5, and install nVidia
> proprietary driver. Everything works just fine.

Which is a nice work around if one wishes to go with the plethora of problems nVidia drives give you…
Comment 62 Efrem McCrimon 2016-08-23 18:11:37 UTC
Created attachment 125981 [details]
attachment-14215-0.html

What version of the non-free Nvidia driver are you using? 370.23 or 367.35?

On Aug 23, 2016 8:29 AM, <bugzilla-daemon@freedesktop.org> wrote:

> *Comment # 61 <https://bugs.freedesktop.org/show_bug.cgi?id=94990#c61> on
> bug 94990 <https://bugs.freedesktop.org/show_bug.cgi?id=94990> from Yann
> Golanski <freedesktop@kierun.org> *
>
> (In reply to Ivan from comment #56 <https://bugs.freedesktop.org/show_bug.cgi?id=94990#c56>)> So, I just install fresh Fedora 24 with kernel-4.5.5, then remove nouveau
> > from system completely:
> >
> > […]
> >
> > After this steps reboot, upgrade system to kernel-4.6.5, and install nVidia
> > proprietary driver. Everything works just fine.
>
> Which is a nice work around if one wishes to go with the plethora of problems
> nVidia drives give you…
>
> ------------------------------
> You are receiving this mail because:
>
>    - You are the assignee for the bug.
>
>
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
>
>
Comment 63 Yann Golanski 2016-08-24 08:35:39 UTC
(In reply to Efrem McCrimon from comment #62)
> What version of the non-free Nvidia driver are you using? 370.23 or 367.35?

I do not use the non-free Nvidia driver because it causes many more problems than it solves: every kernel upgrade might break things until a new driver is released. At least with the nouveau, it's more in sync.  Well, this bug none withstanding :)
Comment 64 Efrem McCrimon 2016-08-24 20:01:13 UTC
Created attachment 126018 [details]
attachment-18907-0.html

I have to install the Nvidia driver on on development systems because I am
doing some programming with CUDA.  I use the CUDA Toolkit in C/C++.  I am
converting some of the software over to OpenACC.  OpenACC has it own
compilers but CUDA requires the non-free driver because the tools
communication to the driver with the APIs.

Regards,

Efrem Mc

On Wed, Aug 24, 2016 at 4:35 AM, <bugzilla-daemon@freedesktop.org> wrote:

> *Comment # 63 <https://bugs.freedesktop.org/show_bug.cgi?id=94990#c63> on
> bug 94990 <https://bugs.freedesktop.org/show_bug.cgi?id=94990> from Yann
> Golanski <freedesktop@kierun.org> *
>
> (In reply to Efrem McCrimon from comment #62 <https://bugs.freedesktop.org/show_bug.cgi?id=94990#c62>)> What version of the non-free Nvidia driver are you using? 370.23 or 367.35?
>
> I do not use the non-free Nvidia driver because it causes many more problems
> than it solves: every kernel upgrade might break things until a new driver is
> released. At least with the nouveau, it's more in sync.  Well, this bug none
> withstanding :)
>
> ------------------------------
> You are receiving this mail because:
>
>    - You are the assignee for the bug.
>
>
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
>
>
Comment 65 Florian Mickler 2016-08-24 20:55:16 UTC
It is considered to be very rude to hijack a bug report for other discussions.
Comment 66 Efrem McCrimon 2016-08-24 21:14:33 UTC
Created attachment 126022 [details]
attachment-28359-0.html

In reference, I do use the nouveau driver on non-CUDA development systems.
I like how the development teams respond to bugs and fixes.  There are some
differences in the behavior of the kernels between 4.6.x and 4.7.x with the
video drivers.  I am not sure if is related to libdrm, mesa, of libGL
types.  I have a 960 GTX which is in the same product family as the 970
GTX.  Had problems with the 4.7.0 kernel and reverted back to the 4.6.5
kernel and working with some limitations.

On Wed, Aug 24, 2016 at 4:55 PM, <bugzilla-daemon@freedesktop.org> wrote:

> *Comment # 65 <https://bugs.freedesktop.org/show_bug.cgi?id=94990#c65> on
> bug 94990 <https://bugs.freedesktop.org/show_bug.cgi?id=94990> from Florian
> Mickler <florian@mickler.org> *
>
> It is considered to be very rude to hijack a bug report for other discussions.
>
> ------------------------------
> You are receiving this mail because:
>
>    - You are the assignee for the bug.
>
>
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
>
>
Comment 67 Ilia Mirkin 2016-08-24 21:19:23 UTC
I don't see a way to lock this bug down, unfortunately, but Efram, please refrain from further comments other than to say "Yes, I have tested the provided patch and it works" (or doesn't work), once such a patch has been made available. [It has not yet.] This is not a discussion forum.

There are a few hackpatches available that "work around" this, but they aren't real solutions. Wait for something from Ben before reporting test results.

Thanks for everyone's patience!
Comment 68 Ilia Mirkin 2016-08-24 21:22:01 UTC
*** Bug 97066 has been marked as a duplicate of this bug. ***
Comment 69 Efrem McCrimon 2016-08-26 02:12:09 UTC
Created attachment 126042 [details]
attachment-20368-0.html

Thanks for the comment. Will do.  I uninstalled the non-free driver and
reverted back.  I will report the nouveau version, firmware, and kernel
being used.  I know the kernel is 4.6.5 64bit.

On Wed, Aug 24, 2016 at 5:22 PM, <bugzilla-daemon@freedesktop.org> wrote:

> Ilia Mirkin <imirkin@alum.mit.edu> changed bug 94990
> <https://bugs.freedesktop.org/show_bug.cgi?id=94990>
> What Removed Added
> CC   pil.smmilut@yandex.com
>
> *Comment # 68 <https://bugs.freedesktop.org/show_bug.cgi?id=94990#c68> on
> bug 94990 <https://bugs.freedesktop.org/show_bug.cgi?id=94990> from Ilia
> Mirkin <imirkin@alum.mit.edu> *
>
> *** Bug 97066 <https://bugs.freedesktop.org/show_bug.cgi?id=97066> has been marked as a duplicate of this bug. ***
>
> ------------------------------
> You are receiving this mail because:
>
>    - You are the assignee for the bug.
>
>
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
>
>
Comment 70 Yann Golanski 2016-09-08 10:49:45 UTC
@Ben: definitely neither rushing nor pestering you but is there an ETA for this?  

Is there anything I can do to help there?
Comment 71 Yann Golanski 2016-10-13 10:35:30 UTC
Is there a chance that this bug will be fixed before Fedora 25 comes out (https://fedoraproject.org/wiki/Releases/25/Schedule)? It would be really irritating to not be able to upgrade because of it.

Is there anything that we can do to help make this faster?

I would offer a patch but my X11 driver skills are weak. However, I am happy to test a patch if needed.
Comment 72 Karol Herbst 2016-10-13 11:22:07 UTC
(In reply to Yann Golanski from comment #71)
> Is there a chance that this bug will be fixed before Fedora 25 comes out
> (https://fedoraproject.org/wiki/Releases/25/Schedule)? It would be really
> irritating to not be able to upgrade because of it.
> 
> Is there anything that we can do to help make this faster?
> 
> I would offer a patch but my X11 driver skills are weak. However, I am happy
> to test a patch if needed.

well, we have an idea how we can fix the issue, but chances are it might break other cards as well. Anyway, the fix won't be too difficult and a backport should be fairly simple.

By the way, this affects the kernel module only.
Comment 73 Yann Golanski 2016-10-14 07:57:36 UTC
(In reply to Karol Herbst from comment #72)
> By the way, this affects the kernel module only.

This still stops my machine from booting or X11 from running if I disable the Nouveau driver. Or am I being dense and missing something?… ☺
Comment 74 Karol Herbst 2016-10-14 08:29:21 UTC
(In reply to Yann Golanski from comment #73)
> (In reply to Karol Herbst from comment #72)
> > By the way, this affects the kernel module only.
> 
> This still stops my machine from booting or X11 from running if I disable
> the Nouveau driver. Or am I being dense and missing something?… ☺

most likely, yes. You should check dmesg in this case and check what is happening.
Comment 75 Yann Golanski 2016-10-17 11:53:08 UTC
(In reply to Karol Herbst from comment #74)
> (In reply to Yann Golanski from comment #73)
> > (In reply to Karol Herbst from comment #72)
> > > By the way, this affects the kernel module only.
> > 
> > This still stops my machine from booting or X11 from running if I disable
> > the Nouveau driver. Or am I being dense and missing something?… ☺
> 
> most likely, yes. You should check dmesg in this case and check what is
> happening.

Err… If I disable Nouveau, I get no X11 at all. If I leave it enabled, the machine does not boot.

Is there a way to boot with a kernel 4.6X and have X11 work via Nouveau?
Comment 76 Zach Wolfe 2016-10-23 16:04:44 UTC
Was just curious about this issue myself, on Mint 18 w/ kernel 4.5.4 and my GTX 980 just gives me a black screen as well. Worked on Fedora with later kernel versions however so I take it its not strictly a nouveau issue but its beyond my capabilities to even find the problem?
Comment 77 Ilia Guterman 2016-10-24 06:56:40 UTC
Created attachment 127508 [details] [review]
limit ram to 3 bars
Comment 78 Ilia Guterman 2016-10-24 06:57:10 UTC
Comment on attachment 127508 [details] [review]
limit ram to 3 bars

Zach Wolfe: one method is to limit the ram usage, but its a bandage not a fix.
Comment 79 Karol Herbst 2016-10-24 06:58:46 UTC
yeah, this would be also the workaround _I_ would suggest, not the other ones above.
Comment 80 Yann Golanski 2016-10-25 08:50:06 UTC
(In reply to Zach Wolfe from comment #76)
> Worked on Fedora with later
> kernel versions however so I take it its not strictly a nouveau issue but
> its beyond my capabilities to even find the problem?

Which Fedora and kernel versions did it work on?
Comment 81 Wojciech Arabczyk 2016-10-27 16:47:50 UTC
Created attachment 127563 [details]
boot log kernel 4.8.4-gentoo
Comment 82 Wojciech Arabczyk 2016-10-27 17:20:43 UTC
Created attachment 127564 [details]
dmesg after applying patch to limit bar to 3GB

After applying the patch to limit ram to 3 bars, boots without problems. Blank screen is gone.
Comment 83 DocMAX 2016-11-06 15:55:17 UTC
same issue

http://ptpb.pw/-Fjc
Comment 84 Karol Herbst 2016-11-06 20:23:38 UTC
*** Bug 98611 has been marked as a duplicate of this bug. ***
Comment 85 Yann Golanski 2016-11-08 08:33:27 UTC
(In reply to Wojciech Arabczyk from comment #82)
> Created attachment 127564 [details]
> dmesg after applying patch to limit bar to 3GB
> 
> After applying the patch to limit ram to 3 bars, boots without problems.
> Blank screen is gone.

Could we at least have this as a patch for now until a better one comes along. Otherwise us poor folks owning GT970 cannot update to Fedora 25… ☹
Comment 86 Karol Herbst 2016-11-08 10:37:31 UTC
(In reply to Yann Golanski from comment #85)
> (In reply to Wojciech Arabczyk from comment #82)
> > Created attachment 127564 [details]
> > dmesg after applying patch to limit bar to 3GB
> > 
> > After applying the patch to limit ram to 3 bars, boots without problems.
> > Blank screen is gone.
> 
> Could we at least have this as a patch for now until a better one comes
> along. Otherwise us poor folks owning GT970 cannot update to Fedora 25… ☹

we can't. there would be a less generic patch needed, because it would affect a lot of different GPUs as well and mess with the amount of VRAM reported.
Comment 87 Yann Golanski 2016-11-08 13:29:54 UTC
(In reply to Karol Herbst from comment #86)
> (In reply to Yann Golanski from comment #85)
> > Could we at least have this as a patch for now until a better one comes
> > along. Otherwise us poor folks owning GT970 cannot update to Fedora 25… ☹
> 
> we can't. there would be a less generic patch needed, because it would
> affect a lot of different GPUs as well and mess with the amount of VRAM
> reported.

My bad. I did not understand that. Thank you for the clarification.
Comment 88 Gabriel Amadej 2016-11-10 12:53:13 UTC
Hi. I've noticed that the liquorix kernel [ https://liquorix.net ] works with my GTX970. It worked with version 4.6, 4.7, and now 4.8. I don't know what it does differently to make it work, but maybe a solution to the problem can be found there.
Comment 89 Karol Herbst 2016-11-10 13:47:16 UTC
(In reply to Gabriel Amadej from comment #88)
> Hi. I've noticed that the liquorix kernel [ https://liquorix.net ] works
> with my GTX970. It worked with version 4.6, 4.7, and now 4.8. I don't know
> what it does differently to make it work, but maybe a solution to the
> problem can be found there.

does your GTX 970 have 4GB or 2GB? The 2GB version should work.
Otherwise, maybe the firmware files are missing so that not all GPU features are available. Anyway, check with glxinfo if nouveau is actually used
Comment 90 DocMAX 2016-11-19 06:13:38 UTC
Is anybody working on this?
Comment 91 Karol Herbst 2016-11-19 09:01:20 UTC
(In reply to DocMAX from comment #90)
> Is anybody working on this?

yes
Comment 92 Ben Skeggs 2016-11-21 00:46:21 UTC
Created attachment 128090 [details]
untested initial attempt at a solution

Can people experiencing this issue please retry with the attached patch, and provide feedback here.
Comment 93 Ben Skeggs 2016-11-21 01:07:44 UTC
Created attachment 128091 [details] [review]
untested initial attempt at a solution (take 2)
Comment 94 Alexandre Courbot 2016-11-21 05:58:05 UTC
Created attachment 128093 [details]
Log with patch 128091

Hi Ben,

Patch did not do the trick for me (log attached). Confirmed that 0001-nvkm_gpuobj-size-is-smaller-then-nvkm_gpuobj-size-ca.patch still succeeded, so apparently the method is not perfectly correct. Rather strangely the printk'd value of 0x100ce0 is 0x00000000...
Comment 95 Ben Skeggs 2016-11-21 06:50:57 UTC
(In reply to Alexandre Courbot from comment #94)
> Created attachment 128093 [details]
> Log with patch 128091
> 
> Hi Ben,
> 
> Patch did not do the trick for me (log attached). Confirmed that
> 0001-nvkm_gpuobj-size-is-smaller-then-nvkm_gpuobj-size-ca.patch still
> succeeded, so apparently the method is not perfectly correct. Rather
> strangely the printk'd value of 0x100ce0 is 0x00000000...

Yeah, everyone, don't bother trying this.  This register is written by the secure boot ucode, so the problem remains unsolved.
Comment 96 Zach Wolfe 2016-11-24 01:09:36 UTC
Could someone point me to how exactly to apply "0001-nvkm_gpuobj-size-is-smaller-then-nvkm_gpuobj-size-ca.patch"?
Comment 97 Efrem McCrimon 2016-11-24 21:33:51 UTC
Created attachment 128179 [details]
attachment-14321-0.html

Hi all:

Please help me understand the ram issue on the video card, is too many
buffers created?  I did not have this problem on a 4.6 kernel, now the
kernel has been upgraded twice, 4.8.5, and now 4.8.6.  No problems using a
960 GTX w/4GB DDR5 memory, plus I have a secondary 730 GTw/1GB RAM.

Please advice

On Wed, Nov 23, 2016 at 8:09 PM, <bugzilla-daemon@freedesktop.org> wrote:

> *Comment # 96 <https://bugs.freedesktop.org/show_bug.cgi?id=94990#c96> on
> bug 94990 <https://bugs.freedesktop.org/show_bug.cgi?id=94990> from Zach
> Wolfe <zwolfe21@hotmail.com> *
>
> Could someone point me to how exactly to apply
> "0001-nvkm_gpuobj-size-is-smaller-then-nvkm_gpuobj-size-ca.patch"?
>
> ------------------------------
> You are receiving this mail because:
>
>    - You are the assignee for the bug.
>
>
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
>
>
Comment 98 Ilia Mirkin 2016-11-24 21:40:58 UTC
(In reply to Efrem McCrimon from comment #97)
> Created attachment 128179 [details]
> attachment-14321-0.html
> 
> Hi all:
> 
> Please help me understand the ram issue on the video card, is too many
> buffers created?  I did not have this problem on a 4.6 kernel, now the
> kernel has been upgraded twice, 4.8.5, and now 4.8.6.  No problems using a
> 960 GTX w/4GB DDR5 memory, plus I have a secondary 730 GTw/1GB RAM.

The issue is that the 4GB are laid out "funny" on a GTX 970 (but not other models), and nouveau doesn't properly handle that.

You had this problem on a 4.6 kernel, as it was the first to provide the secboot firmware loading functionality (assuming you had the firmware in place). To get back to the previous behavior, remove the firmware, or boot with nouveau.noaccel=1 nouveau.nofbaccel=1 which should avoid initializing the graphics accel engine. (And hopefully avoid secboot as well, but I'm not 100% sure.)
Comment 99 Ilia Mirkin 2016-11-24 21:42:20 UTC
Also, before I go insane, *please* stop replying to these via email. Use bugzilla directly. Your (Efrem's) email client insists on creating html attachments which are entered into bugzilla. And you also reply-to-all, so I get the thing 2x, once via bugzilla, once directly.
Comment 100 Yann Golanski 2016-11-28 10:07:54 UTC
(In reply to Ilia Mirkin from comment #98)
> To get back to the previous behavior, remove the firmware, or boot
> with nouveau.noaccel=1 nouveau.nofbaccel=1 which should avoid initializing
> the graphics accel engine. (And hopefully avoid secboot as well, but I'm not
> 100% sure.)


Just to be crystal clear: setting the two settings above will boot with X11 (so I can get a gdm login screen and an Xsession…) but just without any 3D acceleration?

Would this work around work on Fedora 25?
Comment 101 Bernie Innocenti 2016-12-04 19:49:51 UTC
I tested 0001-nvkm_gpuobj-size-is-smaller-then-nvkm_gpuobj-size-ca.patch with 4.9.0-rc7 and it correctly initializes the console at 3840x2160 @ 30Hz (for some reason, the 60Hz mode is not detected).

I also tried the patch on top of drm-next (which as of today is still based on 4.9.0-rc5), but all I got a blank screen. This branch has other stability issues, so it's not necessarily a nouveau regression.

With 4.9.0-rc7, I can start Xorg, but xf86-video-nouveau fails to initialize:

 [   336.771] (II) [drm] nouveau interface version: 1.3.1
 [   336.771] (EE) Unknown chipset: NV124

This is expected, since GM20x acceleration landed after xf86-video-nouveau-1.0.13. I'll try rebuilding from head and report back.
Comment 102 Bernie Innocenti 2016-12-04 20:25:27 UTC
Created attachment 128333 [details]
Xorg log on GTX970 4GB with 4.9.0-rc7 and xf86-video-nouveau from HEAD
Comment 103 Bernie Innocenti 2016-12-04 20:26:10 UTC
Created attachment 128334 [details]
dmesg on GTX970 4GB with 4.9.0-rc7 and xf86-video-nouveau from HEAD
Comment 104 Bernie Innocenti 2016-12-04 20:26:34 UTC
With xf86-video-nouveau built from HEAD makes more progress, but ultimately falls back to software rendering:

[  2962.382] (EE) NOUVEAU(0): Failed to initialise context object: 2D_NVC0 (0)
[  2962.382] (EE) NOUVEAU(0): Error initialising acceleration.  Falling back to NoAccel

Attaching the full Xorg log and dmesg.
Comment 105 fariouche 2016-12-18 21:57:27 UTC
Hi,

I'm joining the group of gtx970 4GB owners who have issues with nouveau :)

I've applied 0001-nvkm_gpuobj-size-is-smaller-then-nvkm_gpuobj-size-ca.patch and now I have at least a console.
However, no x11 with acceleration.

I've check the kernel messages, and besides the tons of traces ("disp:") I see this line:
: #### nvkm_gpuobj_heap_acquire gpuobj->addr 4286832640, gpuobj->size 4096
: #### nv50_instobj_size 32768
: nouveau 0000:03:00.0: priv: GPC0: 419df4 00000000 (1840820e)
: nouveau 0000:03:00.0: priv: GPC1: 419df4 c0000000 (1a40820d)
: nouveau 0000:03:00.0: priv: GPC2: 514884 c0000000 (1c40820e)
: nouveau 0000:03:00.0: priv: GPC3: 51d084 c0000000 (1f40820e)
: nouveau 0000:03:00.0: gr: init failed, -28


Seems like secboot failed somewhere. Looking at the code, GR failed most likely when secboot tried to initiate the reset command.

Forgot to say that I'm using kernel 4.9.

If you need me to do some tests, let me know
Comment 106 Karol Herbst 2017-01-06 20:14:51 UTC
(In reply to fariouche from comment #105)
> Hi,
> 
> I'm joining the group of gtx970 4GB owners who have issues with nouveau :)
> 
> I've applied 0001-nvkm_gpuobj-size-is-smaller-then-nvkm_gpuobj-size-ca.patch
> and now I have at least a console.
> However, no x11 with acceleration.
> 
> I've check the kernel messages, and besides the tons of traces ("disp:") I
> see this line:
> : #### nvkm_gpuobj_heap_acquire gpuobj->addr 4286832640, gpuobj->size 4096
> : #### nv50_instobj_size 32768
> : nouveau 0000:03:00.0: priv: GPC0: 419df4 00000000 (1840820e)
> : nouveau 0000:03:00.0: priv: GPC1: 419df4 c0000000 (1a40820d)
> : nouveau 0000:03:00.0: priv: GPC2: 514884 c0000000 (1c40820e)
> : nouveau 0000:03:00.0: priv: GPC3: 51d084 c0000000 (1f40820e)
> : nouveau 0000:03:00.0: gr: init failed, -28
> 
> 
> Seems like secboot failed somewhere. Looking at the code, GR failed most
> likely when secboot tried to initiate the reset command.
> 
> Forgot to say that I'm using kernel 4.9.
> 
> If you need me to do some tests, let me know

don't use that hack.

use the "limit ram to 3 bars" one instead.
Comment 107 fariouche 2017-01-07 23:33:18 UTC
Unfortunately I already tried that patch alone, and I was unable to have a console, only black screen.

And I do not have a serial for the traces, so I didn't investigate furthermore.
Comment 108 Yann Golanski 2017-01-20 09:12:32 UTC
Any updates on this bug?… I would like to update to F25 but I cannot because of this. ☹
Comment 109 Salvatore P. 2017-01-22 21:42:50 UTC
Not only Fedora 25 but everything with a recent kernel do not work on a GTX970 because of this bug.
The installer of Fedora25 do not work.
Comment 110 Wojciech Arabczyk 2017-01-23 07:32:20 UTC
The most important bug nowadays... At least methinks.

I sold my 970 because of it.
Comment 111 Yann Golanski 2017-01-24 10:49:35 UTC
What can we do to help get this bug fixed?…
Comment 112 Ben Skeggs 2017-03-02 08:44:31 UTC
If you build the master branch my development tree[1], this issue should be resolved now.

[1] https://github.com/skeggsb/nouveau - currently builds against drm-next commit 9ca70356a9260403c1bda40d942935e55d00c11c
Comment 113 Lyude Paul 2017-03-07 00:00:14 UTC
Hey, as well even though this is fixed in the development branch I was asked by one of the other nouveau guys to post this for people running older kernels since it's easier to backport. This is a very dirty hack that disables the last 512 MB of VRAM for the GTX970, however it does technically fix the issue and let your GPU run. If you can't use Ben Skegg's patch, give this one a try
Comment 114 Lyude Paul 2017-03-07 00:01:13 UTC
Created attachment 130094 [details] [review]
Disable last 512 MiB of VRAM on GTX970
Comment 115 Bernie Innocenti 2017-03-07 05:12:49 UTC
I tested the nouveau module built from https://github.com/skeggsb/nouveau on top of the latest drm-next kernel and I can confirm that it fixes the black screen issue on my GTX 970 with 4GB of VRAM.

Ben, would it be possible to extract the relevant patches and backport them onto recent stable kernels such as 4.9 and 4.10?

Haven't tested Lyude's patch yet.
Comment 116 fariouche 2017-03-14 22:04:02 UTC
Hi,

Just to let you know that I'v just tested the patch (remove last 512MB)
Indeed, the kernel can boot (4.10.1 with latest fw) but I'm loosing the screen while the kernel tries to switch to nouveau.
A hotplug brings back the console (with corrupted top lines)
My setup: 1 DVI screen at 1600x1200 (main), 1 DP 4K screen (black) and 1 HDMI TV (black too)

If I have some time, I will try to remove all the displays and check for errors.

One day it will work :) But having an open source driver is really challenging these days :(
Comment 117 Yann Golanski 2017-04-07 11:55:16 UTC
So I guess this still not fixed. ☹
Comment 118 Yann Golanski 2017-04-19 10:44:40 UTC
More than a year and still no end in sight… Is the solution to get another card or use the propitiatory driver?
Comment 119 Pierre Moreau 2017-04-19 11:27:36 UTC
(In reply to Yann Golanski from comment #118)
> More than a year and still no end in sight… Is the solution to get another
> card or use the propitiatory driver?

Have you tried building Ben’s tree (see comment #112) and it didn’t work for you? Because the issue should be fixed in the development version of Nouveau, and by the looks of it, will land in Linux 4.12.
Comment 120 Hans Petter Jansson 2017-05-03 13:16:53 UTC
Created attachment 131184 [details] [review]
nouveau-fdo94990-linux-4.10.patch

I was able to get my GTX 970 working with KMS on Linux 4.10 by cherry-picking a few of Ben's commits from the master branch onto the linux-4.10 branch (see patch).

These are also available in the linux-4.10 branch of https://github.com/hpjansson/nouveau.git for convenience.

With this, I can boot and Plymouth and Xorg work fine, but Wayland still freezes -- however, I think that's caused by a different issue.
Comment 121 Yann Golanski 2017-05-05 10:55:02 UTC
(In reply to Pierre Moreau from comment #119)
> (In reply to Yann Golanski from comment #118)
> > More than a year and still no end in sight… Is the solution to get another
> > card or use the propitiatory driver?
> 
> Have you tried building Ben’s tree (see comment #112) and it didn’t work for
> you? Because the issue should be fixed in the development version of
> Nouveau, and by the looks of it, will land in Linux 4.12.

Sorry for the delay, my BIOS got re-written by Windows (thanks⸮) and it took me some time to fix it. 

I am happy to wait till 4.12 is released on Fedora but would like to know it works: several comments pointed to it being flaky…
Comment 122 Pierre Moreau 2017-05-16 23:40:25 UTC
(In reply to Yann Golanski from comment #121)
> I am happy to wait till 4.12 is released on Fedora but would like to know it
> works: several comments pointed to it being flaky…

Hans Petter had X working (though not Wayland) by cherry-picking those commits and Bernie had it working as well with Ben's tree.
The others tested different patches.
Comment 123 Mikołaj Świątek 2017-05-17 22:05:29 UTC
As an owner of a MSI GTX970 4G, I can confirm that with 4.12-rc1, I can boot perfectly fine with a single monitor connected, and with Mesa 17.1.0 on the userspace side, my user experience with KDE Plasma is thus far indistinguishable from the NVIDIA blob. Awesome work there, kudos.

Plasma's Wayland session does freeze shortly after login, but that's not supposed to be mature and can be caused by any amount of things.

Now, if I try to add a second monitor, I get corrupted output after modeset seemingly, but I'm pretty sure that's bug 89664.
Comment 124 Florian Mickler 2017-05-23 18:26:29 UTC
Works like a charm now! (4.12-rc1)
4.12.0-0.rc1.git0.1.vanilla.knurd.1.fc25.x86_64
Comment 125 Florian Mickler 2017-05-23 18:29:53 UTC
(Working with a 2 monitor setup and Fedora 25)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.