Bug 99900 - [NVC1] nouveau: freeze / crash after kernel update to 4.10
Summary: [NVC1] nouveau: freeze / crash after kernel update to 4.10
Status: NEW
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: high critical
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-02-22 12:03 UTC by Torsten Krah
Modified: 2017-09-21 04:23 UTC (History)
6 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg 4.10 - fifo write fault (72.53 KB, text/plain)
2017-02-22 12:03 UTC, Torsten Krah
no flags Details
Xorg.0.log 4.10 - flip queue failed (80.85 KB, text/plain)
2017-02-22 12:04 UTC, Torsten Krah
no flags Details
Xorg.log file (69.45 KB, application/x-trash)
2017-02-28 07:59 UTC, Ralph Gauges
no flags Details
dmesg from freezing session (70.97 KB, text/x-log)
2017-08-26 16:03 UTC, andrewb03
no flags Details
filtered journalctl out showing nouveau errors (6.29 KB, text/plain)
2017-08-26 16:04 UTC, andrewb03
no flags Details
dmesg from ARCH 4.12.3 kernel (64.45 KB, text/plain)
2017-09-19 22:13 UTC, andrewb03
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Torsten Krah 2017-02-22 12:03:05 UTC
Created attachment 129823 [details]
dmesg 4.10 - fifo write fault

After release of 4.10 kernel i've used that one from here:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10/

with my ubuntu trusty LTS.
Xorg is now crashing whenever my monitors are going to sleep (happens always in my lunch time) - after returning its crashed and i need todo a hard reset to get the gpu running again.
This is what is printed to dmesg when this happens (machine is still working via ssh):

[11010.813785] nouveau 0000:01:00.0: fifo: PBDMA0: 04000100 [] ch 2 [001fcc0000 Xorg[2593]] subc 5 mthd 001c data 00000001
[11010.813815] nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [] ch 2 [001fcc0000 Xorg[2593]] subc 0 mthd 0010 data 00000000
[11010.813838] nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [] ch 2 [001fcc0000 Xorg[2593]] subc 0 mthd 0014 data 0001b020
[11010.813864] nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [] ch 2 [001fcc0000 Xorg[2593]] subc 0 mthd 0018 data 003c9310
[11010.813886] nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [] ch 2 [001fcc0000 Xorg[2593]] subc 0 mthd 001c data 00000002
[11010.813905] nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [] ch 2 [001fcc0000 Xorg[2593]] subc 0 mthd 0020 data 00000000
[11010.813919] nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [] ch 2 [001fcc0000 Xorg[2593]] subc 3 mthd 0200 data 000000cf
[11010.813937] nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [] ch 2 [001fcc0000 Xorg[2593]] subc 3 mthd 0204 data 00000000
[11010.813954] nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [] ch 2 [001fcc0000 Xorg[2593]] subc 3 mthd 0208 data 00000010
[11010.813973] nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [] ch 2 [001fcc0000 Xorg[2593]] subc 3 mthd 020c data 00000001
[11010.813990] nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [] ch 2 [001fcc0000 Xorg[2593]] subc 3 mthd 0210 data 00000000
[11010.814014] nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [] ch 2 [001fcc0000 Xorg[2593]] subc 0 mthd 0010 data 00000000
[11010.814060] nouveau 0000:01:00.0: fifo: write fault at 000169b000 engine 00 [PGRAPH] client 0f [GPC0/PROP] reason 02 [PAGE_NOT_PRESENT] on channel 2 [001fcc0000 Xorg[2593]]
[11010.814070] nouveau 0000:01:00.0: fifo: gr engine fault on channel 2, recovering...


Xorg.log does have that:

[    42.993] (WW) NOUVEAU(0): flip queue failed: Device or resource busy

Trying to restart Xorg is unsuccessful.

[  8810.799] (WW) NOUVEAU(0): flip queue failed: Invalid argument


This does not happen with 4.9.6 i've running now.
Anything i should/can provide or try to get this fixed?
Comment 1 Torsten Krah 2017-02-22 12:04:00 UTC
Created attachment 129824 [details]
Xorg.0.log 4.10 - flip queue failed
Comment 2 Joshua Baergen 2017-02-27 21:14:28 UTC
I've also had nouveau issues after upgrading to 4.10(.1), but in my case syslog is spammed with:

Feb 27 14:02:31 baergj kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [ACQUIRE] ch 2 [007f9bb000 X[2107]] subc 0 mthd 0000 data 00000000

This appears to happen after screen blank at some point. Like Torsten, I don't have a problem with the 4.9 series.
Comment 3 Joshua Baergen 2017-02-27 21:16:38 UTC
Oh, I should have included this:

[    46.789] (--) NOUVEAU(0): Chipset: "NVIDIA NVE7"
Comment 4 Ralph Gauges 2017-02-28 07:59:44 UTC
Created attachment 129974 [details]
Xorg.log file

Xorg.log also shows some error messages towards the end.
Comment 5 Ralph Gauges 2017-02-28 08:05:07 UTC
Sorry, my report got lost whn I attached the Xorg.log. So again.

I am also seeing freezes related to the nouveau driver since upgrading to kernel 4.10. Before that I was using 4.9 without problems.

In addition to the Xorg.log, my kern.log contains some error messsages that repeat lots of times before the system freezes. I have added the last few lines below.

I am running the latest MESA code from git, so the last crash was with Mesa from  20170227 and I am using a 750 Ti Card.

01:00.0 VGA compatible controller: NVIDIA Corporation GM107 [GeForce GTX 750 Ti] (rev a2)

If there is anything else I can provide, please let me know.

Thanks and keep up the good work.


Feb 27 21:12:38 zeus kernel: [34599.410668] nouveau 0000:01:00.0: gr: FECS ucode error 2
Feb 27 21:12:38 zeus kernel: [34599.410668] nouveau 0000:01:00.0: gr: FECS 00000002
Feb 27 21:12:38 zeus kernel: [34599.410670] nouveau 0000:01:00.0: gr: 409000 - done 00000340
Feb 27 21:12:38 zeus kernel: [34599.410674] nouveau 0000:01:00.0: gr: 409000 - stat 80000000 0001a600 00000000 80031828
Feb 27 21:12:38 zeus kernel: [34599.410677] nouveau 0000:01:00.0: gr: 409000 - stat 200208e3 00000003 00000006 00000090
Feb 27 21:12:38 zeus kernel: [34599.410679] nouveau 0000:01:00.0: gr: 502000 - done 00000300
Feb 27 21:12:38 zeus kernel: [34599.410683] nouveau 0000:01:00.0: gr: 502000 - stat 80000000 00010400 00000000 00000000
Feb 27 21:12:38 zeus kernel: [34599.410687] nouveau 0000:01:00.0: gr: 502000 - stat 00000000 00000000 00000002 00000000
Feb 27 21:12:38 zeus kernel: [34599.410690] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
Feb 27 21:12:38 zeus kernel: [34599.410692] nouveau 0000:01:00.0: fifo: gr engine fault on channel 2, recovering...
Feb 27 21:12:38 zeus kernel: [34599.410709] nouveau 0000:01:00.0: gr: FECS MTHD subc 3 class 0000 mthd 1828 data 200208e3
Feb 27 21:12:38 zeus kernel: [34599.410710] nouveau 0000:01:00.0: gr: FECS 00000002
Feb 27 21:12:38 zeus kernel: [34599.410712] nouveau 0000:01:00.0: gr: 409000 - done 00000340
Feb 27 21:12:38 zeus kernel: [34599.410715] nouveau 0000:01:00.0: gr: 409000 - stat 80000000 0001a600 00000000 80031828
Feb 27 21:12:38 zeus kernel: [34599.410719] nouveau 0000:01:00.0: gr: 409000 - stat 200208e3 00000002 00000006 00000091
Feb 27 21:12:38 zeus kernel: [34599.410720] nouveau 0000:01:00.0: gr: 502000 - done 00000300
Feb 27 21:12:38 zeus kernel: [34599.410724] nouveau 0000:01:00.0: gr: 502000 - stat 80000000 00010400 00000000 00000000
Feb 27 21:12:38 zeus kernel: [34599.410728] nouveau 0000:01:00.0: gr: 502000 - stat 00000000 00000000 00000002 00000000
Feb 27 21:12:38 zeus kernel: [34599.410740] nouveau 0000:01:00.0: gr: FECS MTHD subc 3 class 0000 mthd 1828 data 200208e3
Feb 27 21:12:38 zeus kernel: [34599.410741] nouveau 0000:01:00.0: gr: FECS 00000002
Feb 27 21:12:38 zeus kernel: [34599.410743] nouveau 0000:01:00.0: gr: 409000 - done 00000340
Feb 27 21:12:38 zeus kernel: [34599.410746] nouveau 0000:01:00.0: gr: 409000 - stat 80000000 0001a600 00000000 80031828
Feb 27 21:12:38 zeus kernel: [34599.410750] nouveau 0000:01:00.0: gr: 409000 - stat 200208e3 00000003 00000006 00000090
Feb 27 21:12:38 zeus kernel: [34599.410751] nouveau 0000:01:00.0: gr: 502000 - done 00000300
Feb 27 21:12:38 zeus kernel: [34599.410755] nouveau 0000:01:00.0: gr: 502000 - stat 80000000 00010400 00000000 00000000
Feb 27 21:12:38 zeus kernel: [34599.410759] nouveau 0000:01:00.0: gr: 502000 - stat 00000000 00000000 00000002 00000000
Feb 27 21:12:38 zeus kernel: [34599.410769] nouveau 0000:01:00.0: gr: FECS ucode error 2
Feb 27 21:12:38 zeus kernel: [34599.410770] nouveau 0000:01:00.0: gr: FECS 00000002
Feb 27 21:12:38 zeus kernel: [34599.410772] nouveau 0000:01:00.0: gr: 409000 - done 00000340
Feb 27 21:12:38 zeus kernel: [34599.410775] nouveau 0000:01:00.0: gr: 409000 - stat 80000000 0001a600 00000000 80031828
Feb 27 21:12:38 zeus kernel: [34599.410779] nouveau 0000:01:00.0: gr: 409000 - stat 200208e3 00000003 00000006 00000090
Feb 27 21:12:38 zeus kernel: [34599.410780] nouveau 0000:01:00.0: gr: 502000 - done 00000300
Feb 27 21:12:38 zeus kernel: [34599.410784] nouveau 0000:01:00.0: gr: 502000 - stat 80000000 00010400 00000000 00000000
Feb 27 21:12:38 zeus kernel: [34599.410788] nouveau 0000:01:00.0: gr: 502000 - stat 00000000 00000000 00000002 00000000
Feb 27 21:12:38 zeus kernel: [34599.410798] nouveau 0000:01:00.0: fifo: PBDMA0: 01000000 [] ch 2 [007f957000 Xorg[1458]] subc 3 mthd 0200 data 000000cf
Feb 27 21:12:38 zeus kernel: [34599.410807] nouveau 0000:01:00.0: priv: HUB0: 400500 00010001 (1b408201)
Feb 27 21:12:38 zeus kernel: [34599.410817] nouveau 0000:01:00.0: fifo: PBDMA0: 01000000 [] ch 2 [007f957000 Xorg[1458]] subc 3 mthd 0204 data 00000000
Feb 27 21:12:38 zeus kernel: [34599.411237] nouveau 0000:01:00.0: fifo: write fault at 00019e3000 engine 00 [GR] client 0f [GPC0/PROP_0] reason 02 [PTE] on channel 7 [007f15c000 VirtualBox[8519]]
Feb 27 21:12:38 zeus kernel: [34599.411239] nouveau 0000:01:00.0: fifo: gr engine fault on channel 7, recovering...
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@Feb 28
Comment 6 Ralph Gauges 2017-02-28 08:17:08 UTC
Just managed to freeze X again. This time I just started VirtualBox.
kern.log entries are as follows. Xorg shows the same backtraces as the one I already attached.

Feb 28 09:08:50 zeus kernel: [ 3164.314029] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 7 [007f15c000 VirtualBox[8257]] subc 0 class 0000 mthd 2b00 data 20
046234
Feb 28 09:08:50 zeus kernel: [ 3164.314037] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 7 [007f15c000 VirtualBox[8257]] subc 0 class 0000 mthd 2b04 data 00
000000
Feb 28 09:08:50 zeus kernel: [ 3164.314046] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 7 [007f15c000 VirtualBox[8257]] subc 0 class 0000 mthd 2b08 data 00
000000
Feb 28 09:08:50 zeus kernel: [ 3164.314054] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 7 [007f15c000 VirtualBox[8257]] subc 0 class 0000 mthd 2b0c data 00
000000
Feb 28 09:08:50 zeus kernel: [ 3164.314062] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 7 [007f15c000 VirtualBox[8257]] subc 0 class 0000 mthd 2b10 data 00
000000
Feb 28 09:08:50 zeus kernel: [ 3164.314070] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 7 [007f15c000 VirtualBox[8257]] subc 0 class 0000 mthd 2b14 data 20
050004
Feb 28 09:08:50 zeus kernel: [ 3164.314078] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 7 [007f15c000 VirtualBox[8257]] subc 0 class 0000 mthd 2b18 data 00
000000
Feb 28 09:08:50 zeus kernel: [ 3164.314087] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 7 [007f15c000 VirtualBox[8257]] subc 0 class 0000 mthd 2b1c data 00
013070
Feb 28 09:08:50 zeus kernel: [ 3164.314095] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 7 [007f15c000 VirtualBox[8257]] subc 0 class 0000 mthd 2b20 data 00
0002ef
Feb 28 09:08:50 zeus kernel: [ 3164.314103] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 7 [007f15c000 VirtualBox[8257]] subc 0 class 0000 mthd 2b24 data 00
000002
Feb 28 09:08:50 zeus kernel: [ 3164.314111] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 7 [007f15c000 VirtualBox[8257]] subc 0 class 0000 mthd 2b28 data 00
000000
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
Comment 7 Ralph Gauges 2017-02-28 09:01:54 UTC
One last note, hopefully.

I went back to kernel 4.9. The error can still be triggered using the latest version of VirtualBox with a Windows 10 guest and guest additions installed.
During boot up of the VM, probably when the guest additions are loaded, the VM stops.

Again I gets the error messages in kern.log and Xorg.log but with kernel 4.9 it takes a lot longer for the system to freeze. Actually it didn't crash, it just wouldn't accept any input after some minutes. The mouse pointer was still moving, but neither keyboard input nor mouse clicks hat any effect. I also couldn't switch to a text console with ALT-F1 any more.

Hope that helps in reproducing the problem.
Comment 8 Torsten Krah 2017-03-03 09:12:16 UTC
xset dpms force standby

is enough to make this happen.
Currently bisecting the kernel via:

git bisect start v4.10 v4.9.13 -- drivers/gpu/drm/nouveau

but this will take some time to compile all of them and test it.
Comment 10 Ben Skeggs 2017-03-04 07:37:33 UTC
Oops, ignore previous comment.  Wrong bug!
Comment 11 Torsten Krah 2017-03-06 07:07:12 UTC
Hi Ben,

don't you think that https://bugs.freedesktop.org/show_bug.cgi?id=99922 reads like a duplicate of that one here - at least sounds similar to me?
Comment 12 Antoine Saroufim 2017-03-14 12:09:46 UTC
I'm experiencing the same issues. Nouveau (Gallium 0.4 - NVC1) freezes everything on Wayland, sometimes the kernel too and it turns the display into a black screen with a visible cursor on X11. I've noticed this since I've upgraded to kernel 4.10. Here are a few other things I've observed:

On (X)Wayland:
- Display freezes when starting fullscreen wine games. Graphics sometimes turn dark and the display freezes. (Can reproduce this 100% of the times if I launch a mission in Starcraft 2)
- Launching a fullscreen wine game while triggering the GNOME Overview mode freezes everything including the kernel (can intentionally reproduce this too)

On X11:
-Whenever a lockscreen mechanism is triggered and the screen goes blank, waking the screen up yields an unresponsive X session with a black screen and a working cursor.
- If X11 freezes, it can be killed and the session can be reopened. The kernel never freezes and you can still switch to other TTYs, unlike on Wayland.
- This happens on both GNOME and Plasma
- The blackscreen X11 issue happens way more often (once every 1-2 hours) than Wayland's freezing issue (2-3 times per 14 hours). 

This happens on Mesa 17.1 (Git) and 17.0 (from openSUSE's repos). It does not happen on the proprietary driver. 

Extra information:

Graphics Card: GT730
Operating System: OpenSUSE Tumbleweed
Kernel: 4.10
Comment 13 Mike 2017-04-10 23:31:02 UTC
I am also hitting this problem after starting to use 4.10.* kernels

F25, 4.10.6.200, X86_64, Quadro K600, nouveau, KDE.

workarounds: boot with nouveau.runpm=0, or turn off display power management using desktop settings.
Comment 14 Viktor Kuzmin 2017-04-11 18:10:29 UTC
This porblem is not related only to NVC0.

This bugs is the same (I think): https://bugs.freedesktop.org/show_bug.cgi?id=98690

MacBook Pro 11.3 (GK107, GeForce GT 750M), Gentoo Linux, kernel 4.10.8.

[   49.196313] Workqueue: pm pm_runtime_work
[   49.196314] Call Trace:
[   49.196321]  ? dump_stack+0x46/0x59
[   49.196323]  ? __warn+0xb9/0xe0
[   49.196327]  ? pci_pm_runtime_resume+0xa0/0xa0
[   49.196329]  ? warn_slowpath_fmt+0x4a/0x50
[   49.196349]  ? gen6_read32+0x92/0x1e0 [i915]
[   49.196369]  ? hsw_enable_pc8+0x6b7/0x720 [i915]
[   49.196371]  ? pci_pm_runtime_resume+0xa0/0xa0
[   49.196384]  ? intel_runtime_suspend+0x142/0x250 [i915]
[   49.196386]  ? pci_pm_runtime_suspend+0x50/0x140
[   49.196387]  ? __rpm_callback+0xb1/0x1f0
[   49.196389]  ? rpm_callback+0x1a/0x70
[   49.196390]  ? pci_pm_runtime_resume+0xa0/0xa0
[   49.196392]  ? rpm_suspend+0x11d/0x670
[   49.196396]  ? _raw_write_unlock_irq+0xe/0x20
[   49.196400]  ? finish_task_switch+0xa7/0x260
[   49.196403]  ? __update_idle_core+0x1b/0xb0
[   49.196405]  ? pm_runtime_work+0x62/0xa0
[   49.196407]  ? process_one_work+0x133/0x480
[   49.196408]  ? worker_thread+0x42/0x4c0
[   49.196411]  ? kthread+0xef/0x130
[   49.196412]  ? process_one_work+0x480/0x480
[   49.196415]  ? kthread_create_on_node+0x40/0x40
[   49.196416]  ? ret_from_fork+0x23/0x30
Comment 15 andrewb03 2017-08-26 16:02:51 UTC
This issue is still reproducible on 4.12.9.

I cannot go below 4.10 due to fixes on Ryzen, so for me this is a critical bug.

I have attached dmesg and xorg log output.
Comment 16 andrewb03 2017-08-26 16:03:33 UTC
Created attachment 133807 [details]
dmesg from freezing session
Comment 17 andrewb03 2017-08-26 16:04:01 UTC
Created attachment 133808 [details]
filtered journalctl out showing nouveau errors
Comment 18 andrewb03 2017-09-19 21:30:45 UTC
According to the HangDiagnosis page - https://nouveau.freedesktop.org/wiki/HangDiagnosis - this is my crash level:
Display is frozen in X, but mouse cursor moves.

SSH works as well but keyboard freezes so I'm forced to either reboot the PC via ssh or restart X via ssh.

Any updates on this bug?  I'm seeing it on 4.12.10 in Arch Linux.

This seems to happen right around the crash:

nouveau 0000:0a:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
nouveau 0000:0a:00.0: fifo: runlist 0: scheduled for recovery
nouveau 0000:0a:00.0: fifo: channel 6: killed
nouveau 0000:0a:00.0: fifo: engine 7: scheduled for recovery
nouveau 0000:0a:00.0: fifo: engine 0: scheduled for recovery
nouveau 0000:0a:00.0: gnome-shell[5893]: channel 6 killed!

At the time I was simply using Discord in gnome-shell via a Chrome tab.
Comment 19 andrewb03 2017-09-19 22:13:48 UTC
Created attachment 134350 [details]
dmesg from ARCH 4.12.3 kernel

Linux 4.12.12 (confirmed in Arch) and 4.12.13 (confirmed in both my Gentoo and Arch installs) appear to break nouveau's autodetection of the display and gnome-shell crashes:

[   12.945435] nouveau 0000:0a:00.0: DRM: DDC responded, but no EDID for DP-1
[   13.087167] gnome-shell[901]: segfault at 28 ip 00007ffab9d672f5 sp 00007fff5771ba20 error 4 in libmutter-0.so.0.0.0[7ffab9d1d000+139000]
[   16.264988] nouveau 0000:0a:00.0: DRM: DDC responded, but no EDID for DP-1
[   16.286324] nouveau 0000:0a:00.0: DRM: DDC responded, but no EDID for DP-1

Attached dmesg from failed nouveau init.

Can we get movement on this?
Comment 20 Ilia Mirkin 2017-09-19 23:33:19 UTC
Odd, that issue should have been fixed in 4.12.11. Try 4.13?
Comment 21 andrewb03 2017-09-20 22:08:30 UTC
4.13 throws on my Gentoo install (since it wasn't available on Arch yet):

nouveau 0000:0a:00.0: DRM: DDC responded, but no EDID for DP-4

When this happens the console freezes unless I disable KMS via the command line (SSH works fine).

My GPUs are 660ti in the second PCI slot connected to DP-1 and 1080ti in the first PCI slot connected to DP-4.

Are you saying 4.12.11+ was supposed to resolve the random freezing?

On Gentoo I don't even have X install so I think it freezes when nouveau tries to load the FB console.

Would it be worth building nouveau in instead of as a kernel module?
Comment 22 Ilia Mirkin 2017-09-20 22:11:30 UTC
"Random anything" is a wholly undiagnosable issue. 4.12.11 resolved a regression in 4.12 which caused EDID to not be read properly over DP. There have been many reports of this causing a variety of issues.

Sounds like your issue is wholly unrelated to the original issue reported here as well. When in doubt, file a new bug. Marking bugs as dup is trivial. Untangling separate issues from one bug is impossible.
Comment 23 andrewb03 2017-09-21 00:00:08 UTC
I'll open a separate bug for the EDID issue on 4.12.12+ and the crashing doesn't look related to this issue on second glance.
Comment 24 kong 2017-09-21 04:23:16 UTC
In 4.12.12 on fc26.
X hang and hang again randomly, and logged like this:

nouveau 0000:01:00.0: gr: DATA_ERROR 00000005 [INVALID_ENUM] ch 18 [007f294000 Xorg[2256]] subc 0 class a197 mthd 19c8 data 00000000
nouveau 0000:01:00.0: gr: ILLEGAL_MTHD ch 18 [007f294000 Xorg[2256]] subc 0 class a197 mthd 19d8 data 200308e0
nouveau 0000:01:00.0: gr: ILLEGAL_MTHD ch 18 [007f294000 Xorg[2256]] subc 0 class a197 mthd 19dc data 00000800
nouveau 0000:01:00.0: fifo: write fault at 0000000000 engine 00 [GR] client 0c [GPC0/RAST] reason 02 [PTE] on channel 18 [007f294000 Xorg[2256]]
nouveau 0000:01:00.0: fifo: channel 18: killed
nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery
nouveau 0000:01:00.0: gr: DATA_ERROR 0000000c [INVALID_BITFIELD] ch 18 [007f294000 Xorg[2256]] subc 0 class a197 mthd 19e4 data 00340000
nouveau 0000:01:00.0: gr: DATA_ERROR 0000000c [INVALID_BITFIELD] ch 18 [007f294000 Xorg[2256]] subc 0 class a197 mthd 19e8 data a01108e3
nouveau 0000:01:00.0: gr: DATA_ERROR 0000000c [INVALID_BITFIELD] ch 18 [007f294000 Xorg[2256]] subc 0 class a197 mthd 19ec data 00000560


tried 4.12.13 on fc26. 

android emulator randomly results:

nouveau: kernel rejected pushbuf: Invalid argument
nouveau: ch6: krec 0 pushes 0 bufs 13 relocs 0
nouveau: ch6: buf 00000000 00000002 00000004 00000004 00000000
nouveau: ch6: buf 00000001 00000048 00000002 00000002 00000000
nouveau: ch6: buf 00000002 00000007 00000002 00000002 00000000
nouveau: ch6: buf 00000003 00000008 00000002 00000002 00000002
nouveau: ch6: buf 00000004 0000000b 00000002 00000002 00000000
nouveau: ch6: buf 00000005 0000000a 00000002 00000002 00000002
nouveau: ch6: buf 00000006 00000006 00000004 00000000 00000004
nouveau: ch6: buf 00000007 0000004c 00000002 00000000 00000002
nouveau: ch6: buf 00000008 0000004d 00000002 00000000 00000002
nouveau: ch6: buf 00000009 00000057 00000004 00000004 00000000
nouveau: ch6: buf 0000000a 00000095 00000002 00000002 00000000
nouveau: ch6: buf 0000000b 00000065 00000002 00000002 00000000
nouveau: ch6: buf 0000000c 00000043 00000002 00000002 00000000
qemu-system-i386: pushbuf.c:727: nouveau_pushbuf_data: Assertion `kref' failed.
Aborted (core dumped)


now , turn driver to xorg-x11-drv-nvidia , it not crashes.
i don't it whether about to this issue , but it must be nouveau's issue and output similar error logs.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.