Bug 99900 - [NVC1] nouveau: freeze / crash after kernel update to 4.10
Summary: [NVC1] nouveau: freeze / crash after kernel update to 4.10
Status: RESOLVED MOVED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: high critical
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-02-22 12:03 UTC by Torsten Krah
Modified: 2019-12-04 09:24 UTC (History)
7 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg 4.10 - fifo write fault (72.53 KB, text/plain)
2017-02-22 12:03 UTC, Torsten Krah
no flags Details
Xorg.0.log 4.10 - flip queue failed (80.85 KB, text/plain)
2017-02-22 12:04 UTC, Torsten Krah
no flags Details
Xorg.log file (69.45 KB, application/x-trash)
2017-02-28 07:59 UTC, Ralph Gauges
no flags Details
dmesg from freezing session (70.97 KB, text/x-log)
2017-08-26 16:03 UTC, andrewb03
no flags Details
filtered journalctl out showing nouveau errors (6.29 KB, text/plain)
2017-08-26 16:04 UTC, andrewb03
no flags Details
dmesg from ARCH 4.12.3 kernel (64.45 KB, text/plain)
2017-09-19 22:13 UTC, andrewb03
no flags Details

Description Torsten Krah 2017-02-22 12:03:05 UTC
Created attachment 129823 [details]
dmesg 4.10 - fifo write fault

After release of 4.10 kernel i've used that one from here:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10/

with my ubuntu trusty LTS.
Xorg is now crashing whenever my monitors are going to sleep (happens always in my lunch time) - after returning its crashed and i need todo a hard reset to get the gpu running again.
This is what is printed to dmesg when this happens (machine is still working via ssh):

[11010.813785] nouveau 0000:01:00.0: fifo: PBDMA0: 04000100 [] ch 2 [001fcc0000 Xorg[2593]] subc 5 mthd 001c data 00000001
[11010.813815] nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [] ch 2 [001fcc0000 Xorg[2593]] subc 0 mthd 0010 data 00000000
[11010.813838] nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [] ch 2 [001fcc0000 Xorg[2593]] subc 0 mthd 0014 data 0001b020
[11010.813864] nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [] ch 2 [001fcc0000 Xorg[2593]] subc 0 mthd 0018 data 003c9310
[11010.813886] nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [] ch 2 [001fcc0000 Xorg[2593]] subc 0 mthd 001c data 00000002
[11010.813905] nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [] ch 2 [001fcc0000 Xorg[2593]] subc 0 mthd 0020 data 00000000
[11010.813919] nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [] ch 2 [001fcc0000 Xorg[2593]] subc 3 mthd 0200 data 000000cf
[11010.813937] nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [] ch 2 [001fcc0000 Xorg[2593]] subc 3 mthd 0204 data 00000000
[11010.813954] nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [] ch 2 [001fcc0000 Xorg[2593]] subc 3 mthd 0208 data 00000010
[11010.813973] nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [] ch 2 [001fcc0000 Xorg[2593]] subc 3 mthd 020c data 00000001
[11010.813990] nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [] ch 2 [001fcc0000 Xorg[2593]] subc 3 mthd 0210 data 00000000
[11010.814014] nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [] ch 2 [001fcc0000 Xorg[2593]] subc 0 mthd 0010 data 00000000
[11010.814060] nouveau 0000:01:00.0: fifo: write fault at 000169b000 engine 00 [PGRAPH] client 0f [GPC0/PROP] reason 02 [PAGE_NOT_PRESENT] on channel 2 [001fcc0000 Xorg[2593]]
[11010.814070] nouveau 0000:01:00.0: fifo: gr engine fault on channel 2, recovering...


Xorg.log does have that:

[    42.993] (WW) NOUVEAU(0): flip queue failed: Device or resource busy

Trying to restart Xorg is unsuccessful.

[  8810.799] (WW) NOUVEAU(0): flip queue failed: Invalid argument


This does not happen with 4.9.6 i've running now.
Anything i should/can provide or try to get this fixed?
Comment 1 Torsten Krah 2017-02-22 12:04:00 UTC
Created attachment 129824 [details]
Xorg.0.log 4.10 - flip queue failed
Comment 2 Joshua Baergen 2017-02-27 21:14:28 UTC
I've also had nouveau issues after upgrading to 4.10(.1), but in my case syslog is spammed with:

Feb 27 14:02:31 baergj kernel: nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [ACQUIRE] ch 2 [007f9bb000 X[2107]] subc 0 mthd 0000 data 00000000

This appears to happen after screen blank at some point. Like Torsten, I don't have a problem with the 4.9 series.
Comment 3 Joshua Baergen 2017-02-27 21:16:38 UTC
Oh, I should have included this:

[    46.789] (--) NOUVEAU(0): Chipset: "NVIDIA NVE7"
Comment 4 Ralph Gauges 2017-02-28 07:59:44 UTC
Created attachment 129974 [details]
Xorg.log file

Xorg.log also shows some error messages towards the end.
Comment 5 Ralph Gauges 2017-02-28 08:05:07 UTC
Sorry, my report got lost whn I attached the Xorg.log. So again.

I am also seeing freezes related to the nouveau driver since upgrading to kernel 4.10. Before that I was using 4.9 without problems.

In addition to the Xorg.log, my kern.log contains some error messsages that repeat lots of times before the system freezes. I have added the last few lines below.

I am running the latest MESA code from git, so the last crash was with Mesa from  20170227 and I am using a 750 Ti Card.

01:00.0 VGA compatible controller: NVIDIA Corporation GM107 [GeForce GTX 750 Ti] (rev a2)

If there is anything else I can provide, please let me know.

Thanks and keep up the good work.


Feb 27 21:12:38 zeus kernel: [34599.410668] nouveau 0000:01:00.0: gr: FECS ucode error 2
Feb 27 21:12:38 zeus kernel: [34599.410668] nouveau 0000:01:00.0: gr: FECS 00000002
Feb 27 21:12:38 zeus kernel: [34599.410670] nouveau 0000:01:00.0: gr: 409000 - done 00000340
Feb 27 21:12:38 zeus kernel: [34599.410674] nouveau 0000:01:00.0: gr: 409000 - stat 80000000 0001a600 00000000 80031828
Feb 27 21:12:38 zeus kernel: [34599.410677] nouveau 0000:01:00.0: gr: 409000 - stat 200208e3 00000003 00000006 00000090
Feb 27 21:12:38 zeus kernel: [34599.410679] nouveau 0000:01:00.0: gr: 502000 - done 00000300
Feb 27 21:12:38 zeus kernel: [34599.410683] nouveau 0000:01:00.0: gr: 502000 - stat 80000000 00010400 00000000 00000000
Feb 27 21:12:38 zeus kernel: [34599.410687] nouveau 0000:01:00.0: gr: 502000 - stat 00000000 00000000 00000002 00000000
Feb 27 21:12:38 zeus kernel: [34599.410690] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
Feb 27 21:12:38 zeus kernel: [34599.410692] nouveau 0000:01:00.0: fifo: gr engine fault on channel 2, recovering...
Feb 27 21:12:38 zeus kernel: [34599.410709] nouveau 0000:01:00.0: gr: FECS MTHD subc 3 class 0000 mthd 1828 data 200208e3
Feb 27 21:12:38 zeus kernel: [34599.410710] nouveau 0000:01:00.0: gr: FECS 00000002
Feb 27 21:12:38 zeus kernel: [34599.410712] nouveau 0000:01:00.0: gr: 409000 - done 00000340
Feb 27 21:12:38 zeus kernel: [34599.410715] nouveau 0000:01:00.0: gr: 409000 - stat 80000000 0001a600 00000000 80031828
Feb 27 21:12:38 zeus kernel: [34599.410719] nouveau 0000:01:00.0: gr: 409000 - stat 200208e3 00000002 00000006 00000091
Feb 27 21:12:38 zeus kernel: [34599.410720] nouveau 0000:01:00.0: gr: 502000 - done 00000300
Feb 27 21:12:38 zeus kernel: [34599.410724] nouveau 0000:01:00.0: gr: 502000 - stat 80000000 00010400 00000000 00000000
Feb 27 21:12:38 zeus kernel: [34599.410728] nouveau 0000:01:00.0: gr: 502000 - stat 00000000 00000000 00000002 00000000
Feb 27 21:12:38 zeus kernel: [34599.410740] nouveau 0000:01:00.0: gr: FECS MTHD subc 3 class 0000 mthd 1828 data 200208e3
Feb 27 21:12:38 zeus kernel: [34599.410741] nouveau 0000:01:00.0: gr: FECS 00000002
Feb 27 21:12:38 zeus kernel: [34599.410743] nouveau 0000:01:00.0: gr: 409000 - done 00000340
Feb 27 21:12:38 zeus kernel: [34599.410746] nouveau 0000:01:00.0: gr: 409000 - stat 80000000 0001a600 00000000 80031828
Feb 27 21:12:38 zeus kernel: [34599.410750] nouveau 0000:01:00.0: gr: 409000 - stat 200208e3 00000003 00000006 00000090
Feb 27 21:12:38 zeus kernel: [34599.410751] nouveau 0000:01:00.0: gr: 502000 - done 00000300
Feb 27 21:12:38 zeus kernel: [34599.410755] nouveau 0000:01:00.0: gr: 502000 - stat 80000000 00010400 00000000 00000000
Feb 27 21:12:38 zeus kernel: [34599.410759] nouveau 0000:01:00.0: gr: 502000 - stat 00000000 00000000 00000002 00000000
Feb 27 21:12:38 zeus kernel: [34599.410769] nouveau 0000:01:00.0: gr: FECS ucode error 2
Feb 27 21:12:38 zeus kernel: [34599.410770] nouveau 0000:01:00.0: gr: FECS 00000002
Feb 27 21:12:38 zeus kernel: [34599.410772] nouveau 0000:01:00.0: gr: 409000 - done 00000340
Feb 27 21:12:38 zeus kernel: [34599.410775] nouveau 0000:01:00.0: gr: 409000 - stat 80000000 0001a600 00000000 80031828
Feb 27 21:12:38 zeus kernel: [34599.410779] nouveau 0000:01:00.0: gr: 409000 - stat 200208e3 00000003 00000006 00000090
Feb 27 21:12:38 zeus kernel: [34599.410780] nouveau 0000:01:00.0: gr: 502000 - done 00000300
Feb 27 21:12:38 zeus kernel: [34599.410784] nouveau 0000:01:00.0: gr: 502000 - stat 80000000 00010400 00000000 00000000
Feb 27 21:12:38 zeus kernel: [34599.410788] nouveau 0000:01:00.0: gr: 502000 - stat 00000000 00000000 00000002 00000000
Feb 27 21:12:38 zeus kernel: [34599.410798] nouveau 0000:01:00.0: fifo: PBDMA0: 01000000 [] ch 2 [007f957000 Xorg[1458]] subc 3 mthd 0200 data 000000cf
Feb 27 21:12:38 zeus kernel: [34599.410807] nouveau 0000:01:00.0: priv: HUB0: 400500 00010001 (1b408201)
Feb 27 21:12:38 zeus kernel: [34599.410817] nouveau 0000:01:00.0: fifo: PBDMA0: 01000000 [] ch 2 [007f957000 Xorg[1458]] subc 3 mthd 0204 data 00000000
Feb 27 21:12:38 zeus kernel: [34599.411237] nouveau 0000:01:00.0: fifo: write fault at 00019e3000 engine 00 [GR] client 0f [GPC0/PROP_0] reason 02 [PTE] on channel 7 [007f15c000 VirtualBox[8519]]
Feb 27 21:12:38 zeus kernel: [34599.411239] nouveau 0000:01:00.0: fifo: gr engine fault on channel 7, recovering...
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@Feb 28
Comment 6 Ralph Gauges 2017-02-28 08:17:08 UTC
Just managed to freeze X again. This time I just started VirtualBox.
kern.log entries are as follows. Xorg shows the same backtraces as the one I already attached.

Feb 28 09:08:50 zeus kernel: [ 3164.314029] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 7 [007f15c000 VirtualBox[8257]] subc 0 class 0000 mthd 2b00 data 20
046234
Feb 28 09:08:50 zeus kernel: [ 3164.314037] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 7 [007f15c000 VirtualBox[8257]] subc 0 class 0000 mthd 2b04 data 00
000000
Feb 28 09:08:50 zeus kernel: [ 3164.314046] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 7 [007f15c000 VirtualBox[8257]] subc 0 class 0000 mthd 2b08 data 00
000000
Feb 28 09:08:50 zeus kernel: [ 3164.314054] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 7 [007f15c000 VirtualBox[8257]] subc 0 class 0000 mthd 2b0c data 00
000000
Feb 28 09:08:50 zeus kernel: [ 3164.314062] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 7 [007f15c000 VirtualBox[8257]] subc 0 class 0000 mthd 2b10 data 00
000000
Feb 28 09:08:50 zeus kernel: [ 3164.314070] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 7 [007f15c000 VirtualBox[8257]] subc 0 class 0000 mthd 2b14 data 20
050004
Feb 28 09:08:50 zeus kernel: [ 3164.314078] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 7 [007f15c000 VirtualBox[8257]] subc 0 class 0000 mthd 2b18 data 00
000000
Feb 28 09:08:50 zeus kernel: [ 3164.314087] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 7 [007f15c000 VirtualBox[8257]] subc 0 class 0000 mthd 2b1c data 00
013070
Feb 28 09:08:50 zeus kernel: [ 3164.314095] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 7 [007f15c000 VirtualBox[8257]] subc 0 class 0000 mthd 2b20 data 00
0002ef
Feb 28 09:08:50 zeus kernel: [ 3164.314103] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 7 [007f15c000 VirtualBox[8257]] subc 0 class 0000 mthd 2b24 data 00
000002
Feb 28 09:08:50 zeus kernel: [ 3164.314111] nouveau 0000:01:00.0: gr: ILLEGAL_CLASS ch 7 [007f15c000 VirtualBox[8257]] subc 0 class 0000 mthd 2b28 data 00
000000
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
Comment 7 Ralph Gauges 2017-02-28 09:01:54 UTC
One last note, hopefully.

I went back to kernel 4.9. The error can still be triggered using the latest version of VirtualBox with a Windows 10 guest and guest additions installed.
During boot up of the VM, probably when the guest additions are loaded, the VM stops.

Again I gets the error messages in kern.log and Xorg.log but with kernel 4.9 it takes a lot longer for the system to freeze. Actually it didn't crash, it just wouldn't accept any input after some minutes. The mouse pointer was still moving, but neither keyboard input nor mouse clicks hat any effect. I also couldn't switch to a text console with ALT-F1 any more.

Hope that helps in reproducing the problem.
Comment 8 Torsten Krah 2017-03-03 09:12:16 UTC
xset dpms force standby

is enough to make this happen.
Currently bisecting the kernel via:

git bisect start v4.10 v4.9.13 -- drivers/gpu/drm/nouveau

but this will take some time to compile all of them and test it.
Comment 10 Ben Skeggs 2017-03-04 07:37:33 UTC
Oops, ignore previous comment.  Wrong bug!
Comment 11 Torsten Krah 2017-03-06 07:07:12 UTC
Hi Ben,

don't you think that https://bugs.freedesktop.org/show_bug.cgi?id=99922 reads like a duplicate of that one here - at least sounds similar to me?
Comment 12 Antoine Saroufim 2017-03-14 12:09:46 UTC
I'm experiencing the same issues. Nouveau (Gallium 0.4 - NVC1) freezes everything on Wayland, sometimes the kernel too and it turns the display into a black screen with a visible cursor on X11. I've noticed this since I've upgraded to kernel 4.10. Here are a few other things I've observed:

On (X)Wayland:
- Display freezes when starting fullscreen wine games. Graphics sometimes turn dark and the display freezes. (Can reproduce this 100% of the times if I launch a mission in Starcraft 2)
- Launching a fullscreen wine game while triggering the GNOME Overview mode freezes everything including the kernel (can intentionally reproduce this too)

On X11:
-Whenever a lockscreen mechanism is triggered and the screen goes blank, waking the screen up yields an unresponsive X session with a black screen and a working cursor.
- If X11 freezes, it can be killed and the session can be reopened. The kernel never freezes and you can still switch to other TTYs, unlike on Wayland.
- This happens on both GNOME and Plasma
- The blackscreen X11 issue happens way more often (once every 1-2 hours) than Wayland's freezing issue (2-3 times per 14 hours). 

This happens on Mesa 17.1 (Git) and 17.0 (from openSUSE's repos). It does not happen on the proprietary driver. 

Extra information:

Graphics Card: GT730
Operating System: OpenSUSE Tumbleweed
Kernel: 4.10
Comment 13 Mike 2017-04-10 23:31:02 UTC
I am also hitting this problem after starting to use 4.10.* kernels

F25, 4.10.6.200, X86_64, Quadro K600, nouveau, KDE.

workarounds: boot with nouveau.runpm=0, or turn off display power management using desktop settings.
Comment 14 Viktor Kuzmin 2017-04-11 18:10:29 UTC
This porblem is not related only to NVC0.

This bugs is the same (I think): https://bugs.freedesktop.org/show_bug.cgi?id=98690

MacBook Pro 11.3 (GK107, GeForce GT 750M), Gentoo Linux, kernel 4.10.8.

[   49.196313] Workqueue: pm pm_runtime_work
[   49.196314] Call Trace:
[   49.196321]  ? dump_stack+0x46/0x59
[   49.196323]  ? __warn+0xb9/0xe0
[   49.196327]  ? pci_pm_runtime_resume+0xa0/0xa0
[   49.196329]  ? warn_slowpath_fmt+0x4a/0x50
[   49.196349]  ? gen6_read32+0x92/0x1e0 [i915]
[   49.196369]  ? hsw_enable_pc8+0x6b7/0x720 [i915]
[   49.196371]  ? pci_pm_runtime_resume+0xa0/0xa0
[   49.196384]  ? intel_runtime_suspend+0x142/0x250 [i915]
[   49.196386]  ? pci_pm_runtime_suspend+0x50/0x140
[   49.196387]  ? __rpm_callback+0xb1/0x1f0
[   49.196389]  ? rpm_callback+0x1a/0x70
[   49.196390]  ? pci_pm_runtime_resume+0xa0/0xa0
[   49.196392]  ? rpm_suspend+0x11d/0x670
[   49.196396]  ? _raw_write_unlock_irq+0xe/0x20
[   49.196400]  ? finish_task_switch+0xa7/0x260
[   49.196403]  ? __update_idle_core+0x1b/0xb0
[   49.196405]  ? pm_runtime_work+0x62/0xa0
[   49.196407]  ? process_one_work+0x133/0x480
[   49.196408]  ? worker_thread+0x42/0x4c0
[   49.196411]  ? kthread+0xef/0x130
[   49.196412]  ? process_one_work+0x480/0x480
[   49.196415]  ? kthread_create_on_node+0x40/0x40
[   49.196416]  ? ret_from_fork+0x23/0x30
Comment 15 andrewb03 2017-08-26 16:02:51 UTC
This issue is still reproducible on 4.12.9.

I cannot go below 4.10 due to fixes on Ryzen, so for me this is a critical bug.

I have attached dmesg and xorg log output.
Comment 16 andrewb03 2017-08-26 16:03:33 UTC
Created attachment 133807 [details]
dmesg from freezing session
Comment 17 andrewb03 2017-08-26 16:04:01 UTC
Created attachment 133808 [details]
filtered journalctl out showing nouveau errors
Comment 18 andrewb03 2017-09-19 21:30:45 UTC
According to the HangDiagnosis page - https://nouveau.freedesktop.org/wiki/HangDiagnosis - this is my crash level:
Display is frozen in X, but mouse cursor moves.

SSH works as well but keyboard freezes so I'm forced to either reboot the PC via ssh or restart X via ssh.

Any updates on this bug?  I'm seeing it on 4.12.10 in Arch Linux.

This seems to happen right around the crash:

nouveau 0000:0a:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
nouveau 0000:0a:00.0: fifo: runlist 0: scheduled for recovery
nouveau 0000:0a:00.0: fifo: channel 6: killed
nouveau 0000:0a:00.0: fifo: engine 7: scheduled for recovery
nouveau 0000:0a:00.0: fifo: engine 0: scheduled for recovery
nouveau 0000:0a:00.0: gnome-shell[5893]: channel 6 killed!

At the time I was simply using Discord in gnome-shell via a Chrome tab.
Comment 19 andrewb03 2017-09-19 22:13:48 UTC
Created attachment 134350 [details]
dmesg from ARCH 4.12.3 kernel

Linux 4.12.12 (confirmed in Arch) and 4.12.13 (confirmed in both my Gentoo and Arch installs) appear to break nouveau's autodetection of the display and gnome-shell crashes:

[   12.945435] nouveau 0000:0a:00.0: DRM: DDC responded, but no EDID for DP-1
[   13.087167] gnome-shell[901]: segfault at 28 ip 00007ffab9d672f5 sp 00007fff5771ba20 error 4 in libmutter-0.so.0.0.0[7ffab9d1d000+139000]
[   16.264988] nouveau 0000:0a:00.0: DRM: DDC responded, but no EDID for DP-1
[   16.286324] nouveau 0000:0a:00.0: DRM: DDC responded, but no EDID for DP-1

Attached dmesg from failed nouveau init.

Can we get movement on this?
Comment 20 Ilia Mirkin 2017-09-19 23:33:19 UTC
Odd, that issue should have been fixed in 4.12.11. Try 4.13?
Comment 21 andrewb03 2017-09-20 22:08:30 UTC
4.13 throws on my Gentoo install (since it wasn't available on Arch yet):

nouveau 0000:0a:00.0: DRM: DDC responded, but no EDID for DP-4

When this happens the console freezes unless I disable KMS via the command line (SSH works fine).

My GPUs are 660ti in the second PCI slot connected to DP-1 and 1080ti in the first PCI slot connected to DP-4.

Are you saying 4.12.11+ was supposed to resolve the random freezing?

On Gentoo I don't even have X install so I think it freezes when nouveau tries to load the FB console.

Would it be worth building nouveau in instead of as a kernel module?
Comment 22 Ilia Mirkin 2017-09-20 22:11:30 UTC
"Random anything" is a wholly undiagnosable issue. 4.12.11 resolved a regression in 4.12 which caused EDID to not be read properly over DP. There have been many reports of this causing a variety of issues.

Sounds like your issue is wholly unrelated to the original issue reported here as well. When in doubt, file a new bug. Marking bugs as dup is trivial. Untangling separate issues from one bug is impossible.
Comment 23 andrewb03 2017-09-21 00:00:08 UTC
I'll open a separate bug for the EDID issue on 4.12.12+ and the crashing doesn't look related to this issue on second glance.
Comment 24 kong 2017-09-21 04:23:16 UTC
In 4.12.12 on fc26.
X hang and hang again randomly, and logged like this:

nouveau 0000:01:00.0: gr: DATA_ERROR 00000005 [INVALID_ENUM] ch 18 [007f294000 Xorg[2256]] subc 0 class a197 mthd 19c8 data 00000000
nouveau 0000:01:00.0: gr: ILLEGAL_MTHD ch 18 [007f294000 Xorg[2256]] subc 0 class a197 mthd 19d8 data 200308e0
nouveau 0000:01:00.0: gr: ILLEGAL_MTHD ch 18 [007f294000 Xorg[2256]] subc 0 class a197 mthd 19dc data 00000800
nouveau 0000:01:00.0: fifo: write fault at 0000000000 engine 00 [GR] client 0c [GPC0/RAST] reason 02 [PTE] on channel 18 [007f294000 Xorg[2256]]
nouveau 0000:01:00.0: fifo: channel 18: killed
nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery
nouveau 0000:01:00.0: gr: DATA_ERROR 0000000c [INVALID_BITFIELD] ch 18 [007f294000 Xorg[2256]] subc 0 class a197 mthd 19e4 data 00340000
nouveau 0000:01:00.0: gr: DATA_ERROR 0000000c [INVALID_BITFIELD] ch 18 [007f294000 Xorg[2256]] subc 0 class a197 mthd 19e8 data a01108e3
nouveau 0000:01:00.0: gr: DATA_ERROR 0000000c [INVALID_BITFIELD] ch 18 [007f294000 Xorg[2256]] subc 0 class a197 mthd 19ec data 00000560


tried 4.12.13 on fc26. 

android emulator randomly results:

nouveau: kernel rejected pushbuf: Invalid argument
nouveau: ch6: krec 0 pushes 0 bufs 13 relocs 0
nouveau: ch6: buf 00000000 00000002 00000004 00000004 00000000
nouveau: ch6: buf 00000001 00000048 00000002 00000002 00000000
nouveau: ch6: buf 00000002 00000007 00000002 00000002 00000000
nouveau: ch6: buf 00000003 00000008 00000002 00000002 00000002
nouveau: ch6: buf 00000004 0000000b 00000002 00000002 00000000
nouveau: ch6: buf 00000005 0000000a 00000002 00000002 00000002
nouveau: ch6: buf 00000006 00000006 00000004 00000000 00000004
nouveau: ch6: buf 00000007 0000004c 00000002 00000000 00000002
nouveau: ch6: buf 00000008 0000004d 00000002 00000000 00000002
nouveau: ch6: buf 00000009 00000057 00000004 00000004 00000000
nouveau: ch6: buf 0000000a 00000095 00000002 00000002 00000000
nouveau: ch6: buf 0000000b 00000065 00000002 00000002 00000000
nouveau: ch6: buf 0000000c 00000043 00000002 00000002 00000000
qemu-system-i386: pushbuf.c:727: nouveau_pushbuf_data: Assertion `kref' failed.
Aborted (core dumped)


now , turn driver to xorg-x11-drv-nvidia , it not crashes.
i don't it whether about to this issue , but it must be nouveau's issue and output similar error logs.
Comment 25 kenorb 2019-01-05 21:52:04 UTC
Related, possible dup: #100567
Comment 26 kenorb 2019-01-05 21:53:53 UTC
Similar problem, described in: https://bugs.freedesktop.org/show_bug.cgi?id=100567#c18

Call Trace:
 __schedule+0x29e/0x840
 schedule+0x2c/0x80
 schedule_timeout+0x258/0x360
 ? nv50_wndw_atomic_destroy_state+0x1d/0x20 [nouveau]
 dma_fence_default_wait+0x1fc/0x260
 ? dma_fence_release+0xa0/0xa0
 dma_fence_wait_timeout+0x3e/0xf0
 drm_atomic_helper_wait_for_fences+0x3f/0xc0 [drm_kms_helper]
 nv50_disp_atomic_commit_tail+0x78/0x860 [nouveau]
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 nv50_disp_atomic_commit_work+0x12/0x20 [nouveau]
 process_one_work+0x20f/0x3c0
 worker_thread+0x34/0x400
 kthread+0x120/0x140
 ? pwq_unbound_release_workfn+0xd0/0xd0
 ? kthread_bind+0x40/0x40
 ret_from_fork+0x35/0x40

Full log: https://gist.github.com/kenorb/5b95caa1694dbf7f030ccc808a110856
Comment 27 aaron.hamid 2019-07-23 23:38:57 UTC
I am encountering the same "kernel rejected pushbuf: Device or resource busy" crash/lock cited here and in https://bugs.freedesktop.org/show_bug.cgi?id=100567
I'm posting here as I don't see SCHED_ERROR in my systemd journal, and I can reliably trigger it by running qemu android emulator like kong in https://bugs.freedesktop.org/show_bug.cgi?id=99900#c24 (so as far as I can tell, unless resolved android development is dead for me)

Fedora 30
Linux noir 5.1.18-300.fc30.x86_64 #1 SMP Mon Jul 15 15:42:34 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

lsmod | grep nouveau
nouveau              2248704  10
mxm_wmi                16384  1 nouveau
i2c_algo_bit           16384  1 nouveau
drm_kms_helper        212992  1 nouveau
ttm                   114688  1 nouveau
drm                   495616  10 drm_kms_helper,ttm,nouveau
wmi                    36864  3 wmi_bmof,mxm_wmi,nouveau
video                  49152  1 nouveau

X11 Package:
xorg-x11-drv-nouveau.x86_64 1:1.0.15-7.fc30

About, Graphics: "NV134"

lspci -vs 01:00.0
01:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070 Ti] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Gigabyte Technology Co., Ltd Device 3794
	Flags: bus master, fast devsel, latency 0, IRQ 126
	Memory at a2000000 (32-bit, non-prefetchable) [size=16M]
	Memory at 90000000 (64-bit, prefetchable) [size=256M]
	Memory at a0000000 (64-bit, prefetchable) [size=32M]
	I/O ports at 3000 [size=128]
	Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: nouveau
	Kernel modules: nouveau

journalctl

Jul 23 18:56:50 noir org.gnome.Shell.desktop[2455]: nouveau: kernel rejected pushbuf: Device or resource busy
Jul 23 18:56:50 noir org.gnome.Shell.desktop[2455]: nouveau: ch6: krec 0 pushes 1 bufs 12 relocs 0
Jul 23 18:56:50 noir org.gnome.Shell.desktop[2455]: nouveau: ch6: buf 00000000 00000003 00000004 00000004 00000000
Jul 23 18:56:50 noir org.gnome.Shell.desktop[2455]: nouveau: ch6: buf 00000001 00000008 00000002 00000002 00000002
Jul 23 18:56:50 noir org.gnome.Shell.desktop[2455]: nouveau: ch6: buf 00000002 0000000a 00000002 00000002 00000000
Jul 23 18:56:50 noir org.gnome.Shell.desktop[2455]: nouveau: ch6: buf 00000003 00000006 00000004 00000000 00000004
Jul 23 18:56:50 noir org.gnome.Shell.desktop[2455]: nouveau: ch6: buf 00000004 00000007 00000002 00000002 00000000
Jul 23 18:56:50 noir org.gnome.Shell.desktop[2455]: nouveau: ch6: buf 00000005 00000021 00000002 00000002 00000002
Jul 23 18:56:50 noir org.gnome.Shell.desktop[2455]: nouveau: ch6: buf 00000006 00000013 00000004 00000004 00000000
Jul 23 18:56:50 noir org.gnome.Shell.desktop[2455]: nouveau: ch6: buf 00000007 0000003b 00000002 00000002 00000000
Jul 23 18:56:50 noir org.gnome.Shell.desktop[2455]: nouveau: ch6: buf 00000008 000000c1 00000002 00000000 00000002
Jul 23 18:56:50 noir org.gnome.Shell.desktop[2455]: nouveau: ch6: buf 00000009 0000007e 00000002 00000002 00000000
Jul 23 18:56:50 noir org.gnome.Shell.desktop[2455]: nouveau: ch6: buf 0000000a 00000049 00000002 00000002 00000000
Jul 23 18:56:50 noir org.gnome.Shell.desktop[2455]: nouveau: ch6: buf 0000000b 00000057 00000002 00000002 00000000
Jul 23 18:56:50 noir org.gnome.Shell.desktop[2455]: nouveau: ch6: psh 00000000 000003ff10 0000041c54
....
Jul 23 18:56:50 noir kernel: nouveau 0000:01:00.0: Xwayland[2517]: nv50cal_space: -16
...
Jul 23 18:56:50 noir kernel: nouveau 0000:01:00.0: Xwayland[2517]: nv50cal_space: -16
Jul 23 18:56:51 noir kernel: nouveau 0000:01:00.0: Xwayland[2517]: nv50cal_space: -16
Jul 23 18:56:51 noir kernel: nouveau 0000:01:00.0: Xwayland[2517]: nv50cal_space: -16
Jul 23 18:56:52 noir kernel: nouveau 0000:01:00.0: Xwayland[2517]: nv50cal_space: -16
Jul 23 18:56:52 noir kernel: nouveau 0000:01:00.0: Xwayland[2517]: nv50cal_space: -16
...
Comment 28 Martin Peres 2019-12-04 09:24:14 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/325.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.