Using the kernel from the linux-4.0 branch I attempted to use nouveau with my GTX970. At the moment the nouveau driver started and the system should switch to KMS, the screen turned black instead. DMESG output was cut of due to to many message from nouveau. As such I attached the journalctl output of that boot session instead.
Created attachment 114449 [details] VBIOS file VBIOS retrieved from using: echo 1 > /sys/bus/pci/devices/<pciid>/rom; cat /sys/bus/pci/devices/<pciid>/rom > vbios.rom; echo 0 > /sys/bus/pci/devices/<pciid>/rom Original BIOS installation files can be found here: http://www.gigabyte.com/products/product-page.aspx?pid=5209#bios My card uses the F13 Bios.
Created attachment 114450 [details] journalctl dump of the boot The journal dump I mentioned in the OP. Capped around line 15K as the original totalled at almost 50K, crossing the 3MB file size limit.
As I mentioned on IRC, there are actually 2x GM204's. And *neither* says that it's running its vbios tables, which is very odd. Try booting with nouveau.config=NvForcePost=1 Perhaps we're misdetecting the posting-ness state of things. The card that's bitching is 0000:01:00.0 but the :2 one is probed by nouveau first, not sure if that's significant. If this doesn't help. I'd recommend retesting with just one of them plugged in.
Please check with kernel 4.1 or later -- some maxwell init issues were hopefully addressed there.
Will the official kernel release do or do you want me to compile the Nouveau GIT kernel?
(In reply to Omar from comment #5) > Will the official kernel release do or do you want me to compile the Nouveau > GIT kernel? Any official kernel (4.1 or later) will do.
I can safely drop nouveau.config=NvForcePost=1 and KMS works on atleast 1 out of 2 cards. I'm also able to run GDM and start a Gnome session on both Xorg and Wayland (albeit the latter lagging quite badly). It still has issues with the second card though (which ironically is the first card when speaking in terms of PCI slots; 01:00.0). It seems as if that card is still not initialized and the monitors just show a funky coloured bar made up of small vertical striped (see attachment). I did a quick grep for nouveau on the boot journal and attached it. If you want more info I'll be happy to try and provide :)
Created attachment 119115 [details] Journal nouveau grep
(In reply to Omar from comment #7) > I can safely drop nouveau.config=NvForcePost=1 and KMS works on atleast 1 > out of 2 cards. > > It still has issues with the second card though (which ironically is the > first card when speaking in terms of PCI slots; 01:00.0). Which kernel did you try this with? Does using nouveau.config=NvForcePost=1 allow both GPUs to initialize properly?
Created attachment 119116 [details] Photo of the colored stripes bar
Linux Omar-PC 4.2.3-1-ARCH #1 SMP PREEMPT Sat Oct 3 18:52:50 CEST 2015 x86_64 GNU/Linux I'll give it a go. I'll update you in a bit :)
They're staying black as opposed to showing a coloured bar but I'm still unable to get anything to display on them.
Created attachment 119118 [details] Journal nouveau grep, NvForcePost enabled
I had the exact same issue with GM107: ``` ehsan@machine:~$ lspci -nn | grep -i vga 03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM107GL [Quadro K620] [10de:13bb] (rev a2) ``` Adding `nouveau.config=NvForcePost=1` to the kernel command line fixed it for me too, thanks Omar. This is `Ubuntu 16.04 LTS` with kernel version `4.4.0-24-generic #43-Ubuntu SMP Wed Jun 8 19:27:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux` Even the colored striped bars look similar, and I hear a noise I think might have to do with incorrect frequency setting for the monitor.
I thought I'd give this another shot to see if things have changed since the last few kernel releases. Tested on: Linux Omar-PC 4.8.12-3-ARCH #1 SMP PREEMPT Thu Dec 8 16:10:23 CET 2016 x86_64 GNU/Linux Nouveau default behaviour: - Main GPU shortly blinks with boot information on the first screen followed by the second screen, then both power down. - The monitor attached to the second GPU shows the coloured stripes. - Turning the monitor on the second GPU off and back on again made it render a bright blue opaque screen. Nouveau with config=NvForcePost=1 behaviour: - All monitors power down, nothing is shown at all on either GPU. I'll attach the output of `journalctl -b --no-pager | grep nouveau` for both runs shortly.
Created attachment 128399 [details] Kernel 4.8.12, nouveau defaults
Created attachment 128400 [details] Kernel 4.8.12, nouveau NvForcePost
Is this perhaps bug 94990? I also own an NVIDIA GTX 970 (Asus) and I can see the Gigabyte G1 Gaming GTX970 also has 4 GB VRAM. The problem here could be that NVIDIA made a controversial decision in making 3.5 GB VRAM full speed, but a special, subpar configuration for the remaining 0.5 GB making it approximately 7x slower than the "main" VRAM. And noveau doesn't support this. There are some tries with a hack in the bug above to have nouveau only use the initial 3.5 GB but I don't think it was successful yet as of April 2017. It's too bad because it cripples the GTX 970 on most Linux distros, leaving one with the remaining option to add the nomodeset option to the Linux boot options to get an initial low resolution unaccelerated boot, and while running that, blacklisting the nouveau driver and then finally installing the proprietary NVIDIA driver. A reboot after this, and it will (should) work with the caveat that caution is necessary with proprietary drivers and Kernel updates especially if using so called "rolling release" distros. I'm still having my hopes up for a fix! Would be the single most useful fix for the past two years for me.
(In reply to Jonas Nordlund from comment #18) > Is this perhaps bug 94990? If it is, should be solved by using drm-next. > It's too bad because it cripples the GTX 970 on most Linux distros, leaving > one with the remaining option to add the nomodeset option to the Linux boot You could also boot with nouveau.config=gr=0,sec=0 which should disable the graph unit initialization, causing the trouble. (Might be secboot and not sec.) Note, I haven't tested this, going from memory on how this works.
I suspect these are not the same problem tbh. Anyway, I've tried to run nouveau once again. I've tried the default kernel, the nouveau.config arguments on the default kernel, and using the linux-4.12 branch of github/skeggsb/linux.git. Default kernel (Linux Omar-PC 4.10.13-1-ARCH #1 SMP PREEMPT Thu Apr 27 12:15:09 CEST 2017 x86_64 GNU/Linux): - Main GPU blinks with boot information on the first monitor. It then loses signal and the second monitor turns on showing the top and bottom of the boot information that was on screen, everything else is distorted and it shows the coloured stripes along the top (it does so in a higher resolution than the monitor on the second GPU). - Turning the first monitor off and back on makes the second monitor lose signal and the contents displayed are now shown on the first monitor. Turning the second off and on after that flips things around again. It seems the last monitor to be turned on gets to display the (distorted) output. - The monitor attached to the second GPU shows the coloured stripes. Turning it off and back on again and it only shows black (it does have a signal). Default kernel with module arguments gr+sec/secboot: - Both monitors on the primary GPU lose signal. Turning them off and on changes nothing. - Monitor on the secondary GPU is black (does get a signal). Turning off and back on again and it loses signal. linux-4.12 branch of github/skeggsb/linux.git: - Same as above with the module arguments, except the monitor on the second GPU does not lose signal after turning it off and on again. I am assuming this is still the full kernel with the latest nouveau kernel module as stated on the wiki? Otherwise I guess this test was pretty meaningless. If you want certain logs of any/all of the runs, please drop a message with what you need and I'll do another run to get the requested information. I'm also occasionally on the IRC so you can also reach me there :) I did see some errors regarding "link training failed" (these have always been there for me every time iirc) and "DRM EVO timeout" (I believe these are new to me. They do not ring a bell). If I have some time this weekend I'll give things a shot when unplugging the second GPU to see if this changes any behaviour with the 4.10 kernel or not.
I've just done the runs with the first GPU installed only. It's showing the exact same behaviour as before.
I think I'm suffering from the same bug, though I've only been able to experience it after the VRAM detection problem was fixed in 4.12. I have a MSI GTX 970 4GB feeding two 1080p monitors - one via HDMI and one via DP. With kernel 4.12-rc1, and a single monitor connected, everything works fine, on both monitors individually. However, when I try to boot with both connected at the same time, I get corrupted output similar to what Omar described - a coloured stripe at the top of the screen, and what looks like garbled text below it. Output only appears on one monitor, second stays in standby mode. This happens irrespective of whether I boot straight into X or single-user mode. None of the nouveau.config options mentioned do anything different than what Omar reported. I have some logs captured with nouveau.debug=debug, and can upload them if needed.
Have a look at bug #100676. Is it the same issue? (Please test the patches that were provided there.)
(In reply to Ilia Mirkin from comment #23) > Have a look at bug #100676. Is it the same issue? (Please test the patches > that were provided there.) I don't *think* it's the exact same issue. The screen photo in that report is similar to what I see, though my text output is completely unintelligible, rather than just slightly corrupted. The reason I thought it was this bug was that the log messages about unknown connectors and failing to create encoders are identical with Omar's in my case. In any event, I've built the kernel module from Ben's tree and it didn't change anything at all, same behaviour.
(In reply to Mikołaj Świątek from comment #24) > (In reply to Ilia Mirkin from comment #23) > > Have a look at bug #100676. Is it the same issue? (Please test the patches > > that were provided there.) > > I don't *think* it's the exact same issue. The screen photo in that report > is similar to what I see, though my text output is completely > unintelligible, rather than just slightly corrupted. The reason I thought it > was this bug was that the log messages about unknown connectors and failing > to create encoders are identical with Omar's in my case. > > In any event, I've built the kernel module from Ben's tree and it didn't > change anything at all, same behaviour. Can I see your kernel log output from that please? Bonus points if you boot with "log_buf_len=8M nouveau.debug=trace".
Created attachment 131502 [details] Kernel module built from master, nouveau.debug=debug nouveau.debug=trace resulted in too much spam for journald to handle...
(In reply to Ben Skeggs from comment #25) > (In reply to Mikołaj Świątek from comment #24) > > (In reply to Ilia Mirkin from comment #23) > > > Have a look at bug #100676. Is it the same issue? (Please test the patches > > > that were provided there.) > > > > I don't *think* it's the exact same issue. The screen photo in that report > > is similar to what I see, though my text output is completely > > unintelligible, rather than just slightly corrupted. The reason I thought it > > was this bug was that the log messages about unknown connectors and failing > > to create encoders are identical with Omar's in my case. > > > > In any event, I've built the kernel module from Ben's tree and it didn't > > change anything at all, same behaviour. > > Can I see your kernel log output from that please? Bonus points if you boot > with "log_buf_len=8M nouveau.debug=trace". Tried to do it that way, but it somehow resulted in so much output that I couldn't even see the boot log with journalctl. I guess the kernel ring buffer fills up and the beginning gets overwritten before journald can read it? Not an expert at debugging kernel modules by any stretch, so let me know if I'm missing something obvious here. In the meantime, uploaded a run with nouveau.debug=debug.
(In reply to Mikołaj Świątek from comment #27) > (In reply to Ben Skeggs from comment #25) > > (In reply to Mikołaj Świątek from comment #24) > > > (In reply to Ilia Mirkin from comment #23) > > > > Have a look at bug #100676. Is it the same issue? (Please test the patches > > > > that were provided there.) > > > > > > I don't *think* it's the exact same issue. The screen photo in that report > > > is similar to what I see, though my text output is completely > > > unintelligible, rather than just slightly corrupted. The reason I thought it > > > was this bug was that the log messages about unknown connectors and failing > > > to create encoders are identical with Omar's in my case. > > > > > > In any event, I've built the kernel module from Ben's tree and it didn't > > > change anything at all, same behaviour. > > > > Can I see your kernel log output from that please? Bonus points if you boot > > with "log_buf_len=8M nouveau.debug=trace". > > Tried to do it that way, but it somehow resulted in so much output that I > couldn't even see the boot log with journalctl. I guess the kernel ring > buffer fills up and the beginning gets overwritten before journald can read > it? Not an expert at debugging kernel modules by any stretch, so let me know > if I'm missing something obvious here. > > In the meantime, uploaded a run with nouveau.debug=debug. I can't tell 100% for sure from that, but, there's *strong* evidence there to suggest that yes, you are indeed seeing the bug Ilia mentioned. You can probably work around it in the meantime by plugging one of your displays into another connector, or by trying the tree suggested in the other bug.
(In reply to Ben Skeggs from comment #28) > (In reply to Mikołaj Świątek from comment #27) > > (In reply to Ben Skeggs from comment #25) > > > (In reply to Mikołaj Świątek from comment #24) > > > > (In reply to Ilia Mirkin from comment #23) > > > > > Have a look at bug #100676. Is it the same issue? (Please test the patches > > > > > that were provided there.) > > > > > > > > I don't *think* it's the exact same issue. The screen photo in that report > > > > is similar to what I see, though my text output is completely > > > > unintelligible, rather than just slightly corrupted. The reason I thought it > > > > was this bug was that the log messages about unknown connectors and failing > > > > to create encoders are identical with Omar's in my case. > > > > > > > > In any event, I've built the kernel module from Ben's tree and it didn't > > > > change anything at all, same behaviour. > > > > > > Can I see your kernel log output from that please? Bonus points if you boot > > > with "log_buf_len=8M nouveau.debug=trace". > > > > Tried to do it that way, but it somehow resulted in so much output that I > > couldn't even see the boot log with journalctl. I guess the kernel ring > > buffer fills up and the beginning gets overwritten before journald can read > > it? Not an expert at debugging kernel modules by any stretch, so let me know > > if I'm missing something obvious here. > > > > In the meantime, uploaded a run with nouveau.debug=debug. > > I can't tell 100% for sure from that, but, there's *strong* evidence there > to suggest that yes, you are indeed seeing the bug Ilia mentioned. You can > probably work around it in the meantime by plugging one of your displays > into another connector, or by trying the tree suggested in the other bug. Well, for bug 100676 you suggest using your master branch, which I'm already doing. Still, based on that report, I tried booting with "log_buf_len=8M drm.debug=0x14 nouveau.debug=disp=trace,i2c=trace,bios=trace", which gave legible output, attaching log in the hope that it helps. Can't really use a different connector for reasons, so for the time being I'm stuck using nvidia's driver.
Created attachment 131522 [details] nouveau master, trace Module built from bskeggs/nouveau master, booted with "log_buf_len=8M drm.debug=0x14 nouveau.debug=disp=trace,i2c=trace,bios=trace".
(In reply to Mikołaj Świątek from comment #30) > Created attachment 131522 [details] > nouveau master, trace > > Module built from bskeggs/nouveau master, booted with "log_buf_len=8M > drm.debug=0x14 nouveau.debug=disp=trace,i2c=trace,bios=trace". According to the log messages here, this output isn't from the code on my master branch. Perhaps another version of the module got loaded instead?
(In reply to Ben Skeggs from comment #31) > (In reply to Mikołaj Świątek from comment #30) > > Created attachment 131522 [details] > > nouveau master, trace > > > > Module built from bskeggs/nouveau master, booted with "log_buf_len=8M > > drm.debug=0x14 nouveau.debug=disp=trace,i2c=trace,bios=trace". > > According to the log messages here, this output isn't from the code on my > master branch. Perhaps another version of the module got loaded instead? Yep, apparently I forgot to regenerate initramfs. Sorry for the confusion, seems that you were right and it was actually bug 100676. With using the correct module, both displays work fine in single-user mode, but trying to start an X session results in a kernel BUG in ttm_bo_vm_fault, which I guess should be reported elsewhere.
I wont recommend using/keeping the GM204 (GTX 970). It cant ever run with free software: https://www.theregister.co.uk/2015/04/15/nvidia_gtx_900_linux_driver_roadbloack/ https://www.phoronix.com/scan.php?page=news_item&px=Nouveau-XDC2017 https://www.phoronix.com/scan.php?page=news_item&px=Nouveau-XDC2016-NVIDIA Sell this crappy GP102 card away and go away from nvidia. Nvidia died with the 780ti card. Its the last end-user card that can be used normaly. Everything else is in some countries even a legal problem. Because the manufacturer (nvidia) blocks the users from beeing able to boot the software they want on THEIR hardware - happyly illegal in some countries. Hopefully some layer would sue the heck out of nvidia so that they would have to release the private signing key or close their doors. Blocking the freedom of the users on such way should not be accepted by anyone.
(In reply to caguduzexi from comment #33) > I wont recommend using/keeping the GM204 (GTX 970). It cant ever run with > free software: > https://www.theregister.co.uk/2015/04/15/ > nvidia_gtx_900_linux_driver_roadbloack/ > https://www.phoronix.com/scan.php?page=news_item&px=Nouveau-XDC2017 > https://www.phoronix.com/scan.php?page=news_item&px=Nouveau-XDC2016-NVIDIA > > Sell this crappy GP102 card away and go away from nvidia. Nvidia died with > the 780ti card. Its the last end-user card that can be used normaly. > Everything else is in some countries even a legal problem. Because the > manufacturer (nvidia) blocks the users from beeing able to boot the software > they want on THEIR hardware - happyly illegal in some countries. Hopefully > some layer would sue the heck out of nvidia so that they would have to > release the private signing key or close their doors. > Blocking the freedom of the users on such way should not be accepted by > anyone. User banned
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/178.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.