Bug 85160 - [NV94] INVALID_STATE error, X fails to start on GeForce 9600 GT with dual monitors, kernels 3.18.0-0.rc0.git8.2.fc22.1 onwards
Summary: [NV94] INVALID_STATE error, X fails to start on GeForce 9600 GT with dual mon...
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium critical
Assignee: Roy
QA Contact: Nouveau Project
URL:
Whiteboard:
Keywords:
: 85381 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-10-17 23:13 UTC by Adam Williamson
Modified: 2015-10-10 22:32 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
journalctl from an affected boot with drm.debug=15 (2.15 MB, text/plain)
2014-10-17 23:13 UTC, Adam Williamson
no flags Details
Revert "drm/nv50/kms: Set VBLANK time in modeset script" - git 1dce626 (2.92 KB, patch)
2014-10-26 04:08 UTC, poma
no flags Details | Splinter Review
Fix vblank period setting on G94 (2.24 KB, patch)
2014-10-26 15:42 UTC, Roy
no flags Details | Splinter Review
debug patch (975 bytes, patch)
2014-10-31 00:09 UTC, Ben Skeggs
no flags Details | Splinter Review
dmesg-3.18.0-rc2.git-d34d4d8+a7e3f94-drm-fixes+nouveau-NV50 (45.15 KB, text/plain)
2014-10-31 03:41 UTC, poma
no flags Details
0001-evo-debug dmesg (202.31 KB, text/plain)
2014-10-31 16:37 UTC, Zlatko Calusic
no flags Details
dmesg 3.18.0-0.rc3.git2.1.fc22 & darktama nouveau git b6dc8ef (43.46 KB, text/plain)
2014-11-06 11:35 UTC, poma
no flags Details
dmesg-3.18.0-0.rc4.git0.1.fc22.x86_64-NV50 (75.97 KB, text/plain)
2014-11-10 21:22 UTC, poma
no flags Details
dmesg-3.18.0-rc3.git-03dca70-drm-fixes+NV50 (14.32 KB, text/plain)
2014-11-11 15:21 UTC, poma
no flags Details
dmesg boot (80.75 KB, text/plain)
2015-10-10 22:29 UTC, J
no flags Details
dmesg.connected_display_port.4.2.3-300.fc23.x86_64 (20.59 KB, text/plain)
2015-10-10 22:30 UTC, J
no flags Details
dmesg.disconnect_display_port.4.2.3-300.fc23.x86_64 (17.77 KB, text/plain)
2015-10-10 22:30 UTC, J
no flags Details

Description Adam Williamson 2014-10-17 23:13:05 UTC
Running Fedora 21 with kernels from https://alt.fedoraproject.org/pub/alt/rawhide-kernel-nodebug/ (tracking 3.18 development).

I have kernel-3.18.0-0.rc0.git1.2.fc22.1.x86_64 , kernel-3.18.0-0.rc0.git8.2.fc22.1.x86_64 and kernel-3.18.0-0.rc0.git9.2.fc22.1.x86_64 installed . kernel-3.18.0-0.rc0.git1.2.fc22.1.x86_64 boots fine on my system, the other two do not.

On git1 I get boot framebuffer output on both heads, then X starts properly. On git8 and git9, I get boot framebuffer only on one head, then both screens go into power saving mode when X starts, or one keeps cycling power-saving / non-power-saving but never comes up.

The logs have a bunch of stuff, but it seems to kick off with an error:

nouveau E[   PDISP][0000:01:00.0] INVALID_STATE [UNK0B] chid 1 mthd 0x0080 data 0x00000000

I'll attach drm.debug=15 logs. Hardware is:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation G94 [GeForce 9600 GT] [10de:0622] (rev a1)

Ben Skeggs has one of the same cards, somewhere in his pile. May only be reproducible with dual displays, I didn't try with just one yet.
Comment 1 Adam Williamson 2014-10-17 23:13:47 UTC
Created attachment 108011 [details]
journalctl from an affected boot with drm.debug=15
Comment 2 poma 2014-10-26 00:20:23 UTC
Downstream reported case that resembles:
Nouveau display(DVI) broken - kernel 3.18
https://bugzilla.redhat.com/show_bug.cgi?id=1157191
Comment 3 Ilia Mirkin 2014-10-26 00:50:38 UTC
A user on IRC bisected a failure that resulted in PDISP getting very unhappy to:

commit 1dce6264045cd23e9c07574ed0bb31c7dce9354f
Author: Roy Spliet <rspliet@eclipso.eu>
Date:   Fri Sep 12 18:00:13 2014 +0200

    drm/nv50/kms: Set VBLANK time in modeset script
    
    Solves blinking on reclocking memory. The value set is an underestimate, but
    with non-reduced vblanking this should give us plenty of time
    
    Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
    Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

You should safely be able to revert this commit, see if that helps.
Comment 4 Adam Williamson 2014-10-26 01:37:46 UTC
Thanks for the pointer, I'll try and do a Fedora kernel build with the patch reverted sometime soon.
Comment 5 poma 2014-10-26 04:08:37 UTC
Created attachment 108425 [details] [review]
Revert "drm/nv50/kms: Set VBLANK time in modeset script" - git 1dce626


Ilia an Mr. user, thanks. :)
Comment 6 Zlatko Calusic 2014-10-26 13:29:32 UTC
I have the same problem in 3.18-rc1, and can confirm that the issue is fixed if I revert 1dce62640. Also GF 9600 GT and also dual monitor setup.

In my case, my primary display (DVI) would go blank right after switching to FB and monitor would go to powersave. Reverting the mentioned commit fixes the issue completely.

If there's an updated patch, I'm willing to test it before it goes mainstream.
Comment 7 Roy 2014-10-26 15:42:17 UTC
Created attachment 108457 [details] [review]
Fix vblank period setting on G94

Instead of reverting said patch, please test the attached fix.
Comment 8 Roy 2014-10-26 15:45:33 UTC
*** Bug 85381 has been marked as a duplicate of this bug. ***
Comment 9 poma 2014-10-26 19:08:54 UTC
(In reply to Roy from comment #7)
> Created attachment 108457 [details] [review] [review]
> Fix vblank period setting on G94

Only for G94?
What about the rest of Family : NV50 - G98, GT215, GT216, MCP79/MCP7A, etc.

> Instead of reverting said patch, please test the attached fix.

Chipset: G98 (NV98)
Family : NV50

All tests PASSED.

Tested with: 3.18.0-rc1.git-2fd5b07-drm-fixes+
Comment 10 poma 2014-10-27 03:33:47 UTC
I also patched kernel-3.18.0-0.rc1.git4.1.fc22
http://koji.fedoraproject.org/koji/buildinfo?buildID=587854
and tested:

- suspend(S3) core debug
# echo core > /sys/power/pm_test
# echo mem > /sys/power/state
& RESUME

- hibernate(S4) core debug
# echo core > /sys/power/pm_test
# echo disk > /sys/power/state
& THAW

- suspend(S3) none debug (systemctl suspend)
# echo none > /sys/power/pm_test
# echo mem > /sys/power/state
& RESUME

- hibernate(S4) none debug (systemctl hibernate)
# echo none > /sys/power/pm_test
# echo disk > /sys/power/state
& THAW

- soft-off(S5)
# systemctl poweroff/reboot
& BOOT

Display is powered on and stays powered on.
kernel-3.18.0-0.rc1.git4.NV50.fc21.x86_64
All tests PASSED.
Comment 11 Roy 2014-10-27 10:41:25 UTC
(In reply to poma from comment #9)
> (In reply to Roy from comment #7)
> > Created attachment 108457 [details] [review] [review] [review]
> > Fix vblank period setting on G94
> 
> Only for G94?
> What about the rest of Family : NV50 - G98, GT215, GT216, MCP79/MCP7A, etc.

Bug reports I've seen only mentioned G94 as problematic - 3.18rc1 works fine on NV92, NVA3, NVA5, NVA8 and NVAC as I observed myself. This patch changes behaviour across all board ranging from NV50 to NVD9, but should have no visible effects on most chips.

> 
> > Instead of reverting said patch, please test the attached fix.
> 
> Chipset: G98 (NV98)
> Family : NV50
> 
> All tests PASSED.
> 
> Tested with: 3.18.0-rc1.git-2fd5b07-drm-fixes+

Thanks. Adam Williamson: does this patch fix your issues as well?
Comment 12 Adam Williamson 2014-10-27 14:45:30 UTC
Sorry, I didn't have time to test yet. I'll try and do it today.
Comment 13 Michael Riesch 2014-10-27 15:59:38 UTC
The patch works on v3.18-rc1 and v3.18-rc2.

My X.org didn't start then, though. I had to update X.org (from 1.12 -> 1.16) and xf86-driver-nouveau (1.0.1 -> 1.0.11). Finally, I ended up upgrading Debian Wheezy to Jessie. After a few quirks with missing Gnome Shell packages everything worked fine.

Tested-by: Michael Riesch <michael@riesch.at>
Comment 14 Zlatko Calusic 2014-10-28 07:38:59 UTC
(In reply to Roy from comment #7)
> Created attachment 108457 [details] [review] [review]
> Fix vblank period setting on G94
> 
> Instead of reverting said patch, please test the attached fix.

The patch fixes the issue for me. Now 2 days running, no problems at all.
Comment 15 poma 2014-10-29 00:31:48 UTC
Roy, when you intend to merge this patch?
Comment 16 Adam Williamson 2014-10-29 10:14:46 UTC
Fix looks good here too, thanks very much. System boots and X starts.
Comment 17 Ben Skeggs 2014-10-31 00:09:30 UTC
Created attachment 108710 [details] [review]
debug patch

Can you guys please revert the fixes you're using, apply this debugging patch, and then send me your kernel logs of the issue occuring.

Thanks,
Ben.
Comment 18 poma 2014-10-31 03:41:18 UTC
Created attachment 108713 [details]
dmesg-3.18.0-rc2.git-d34d4d8+a7e3f94-drm-fixes+nouveau-NV50
Comment 19 poma 2014-10-31 04:33:04 UTC
Besides, with 0001-evo-debug.patch, occasionally(cca every second time) the machine does not boot up, at all - stuck in the middle of nowhere.
Comment 20 Zlatko Calusic 2014-10-31 16:37:05 UTC
Created attachment 108735 [details]
0001-evo-debug dmesg
Comment 21 Bruno Saraiva 2014-11-02 12:48:30 UTC
Hi

Distro: Gentoo ~amd64
VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2)
2 physical displays (1x 1680x1050, 1x 1440x900)

on 3.18-rc1, without the vblank fix, logged http://codepad.org/cAjYm8BO

(although the number of sucessful boots is still a lot higher than without the vblank fix) 

still on 3.18-rc1, with the vblank fix, logged http://codepad.org/bSWQeg3N (lot's of corruption on the framebuffer - normally the fb is black, this time i've found my screens with a lot of blueish artefacts, idk what to call them).
Comment 22 poma 2014-11-06 11:35:08 UTC
Created attachment 109023 [details]
dmesg 3.18.0-0.rc3.git2.1.fc22 & darktama nouveau git b6dc8ef


Ben, Roy, is there a new patch, will the fix land in 3.18 mix?
Comment 23 poma 2014-11-10 21:22:06 UTC
Created attachment 109248 [details]
dmesg-3.18.0-0.rc4.git0.1.fc22.x86_64-NV50
Comment 24 Roy 2014-11-10 22:13:33 UTC
(In reply to poma from comment #23)
> Created attachment 109248 [details]
> dmesg-3.18.0-0.rc4.git0.1.fc22.x86_64-NV50

That's not surprising given this fix was not merged in that tree. Please be patient, we'll get the fix in (or a different one if new data pops up) before 3.18 gets released.
Comment 25 poma 2014-11-11 15:21:59 UTC
Created attachment 109287 [details]
dmesg-3.18.0-rc3.git-03dca70-drm-fixes+NV50


            == ALL TESTS PASSED ==
Comment 26 poma 2014-11-12 06:37:24 UTC
Combination also tested, works OK:
http://koji.fedoraproject.org/koji/buildinfo?buildID=592269
&
http://cgit.freedesktop.org/~darktama/nouveau/commit/?id=ae69cfb

$ modinfo nouveau -n
/lib/modules/3.18.0-0.rc4.git0.2.fc22.x86_64/updates/nouveau.ko

Thanks guys.
Comment 27 Roy 2014-11-18 15:42:56 UTC
The fix for this bug was merged in kernel 3.18 RC5. Thank you all for your feedback. If your problem persists with kernel 3.18 RC5 or newer, please re-open this bug.
Comment 28 J 2015-10-10 22:28:37 UTC
I have a similar problem, but with a different card.  I believe my errors start the same way as the original poster.

modinfo nouveau -n
/lib/modules/4.2.3-300.fc23.x86_64/kernel/drivers/gpu/drm/nouveau/nouveau.ko.xz

lspci -v | grep -i vga
01:00.0 VGA compatible controller: NVIDIA Corporation G92GLM [Quadro FX 2800M] (rev a2) (prog-if 00 [VGA controller])

I'm using a laptop (dell m6500) and get a bunch of nouveau errors printed when I connect my 2nd display via the display port.  So long as I stay running in init 3, the primary laptop LCD continues to function, but once I go init 5, X appears to start, and then just hangs.

I've attached dmesg output for when I connect the display port, and when I disconnect the display port.

Please let me know if I've posted to the wrong bug or can help in any way.
Comment 29 J 2015-10-10 22:29:21 UTC
Created attachment 118804 [details]
dmesg boot
Comment 30 J 2015-10-10 22:30:04 UTC
Created attachment 118805 [details]
dmesg.connected_display_port.4.2.3-300.fc23.x86_64
Comment 31 J 2015-10-10 22:30:35 UTC
Created attachment 118806 [details]
dmesg.disconnect_display_port.4.2.3-300.fc23.x86_64
Comment 32 Ilia Mirkin 2015-10-10 22:32:18 UTC
(In reply to J from comment #28)
> I have a similar problem, but with a different card.  I believe my errors
> start the same way as the original poster.

Similar problem = new bug.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.