Bug 89912

Summary: Sometimes Nouveau hangs kernel with PGRAPH engine fault
Product: xorg Reporter: Dāvis <davispuh>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED MOVED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: argymeg, rockowitz
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Loaded kernel modules none

Description Dāvis 2015-04-05 20:16:17 UTC
Created attachment 114883 [details]
Loaded kernel modules

With Nvidia GTX 650 Ti on Arch Linux using kernel 3.19.2 and xf86-video-nouveau 1.0.11-3 sometimes random hangs happen which locks up whole kernel. Screen stays frozen and mouse cursor can be moved, but nothing else, even numlock doesn't respond.

From kernel log, last messages

kernel: nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 8 [0x003f61d000 kwin_x11[697]]
kernel: nouveau E[  PGRAPH][0000:01:00.0] GPC2/TPC0/MP trap: MULTIPLE_WARP_ERRORS
kernel: nouveau E[   PFIFO][0000:01:00.0] read fault at 0x867e84f000 [PDE] from GR/GPC2/RAST on channel 0x003f61d000 [kwin_x11[697]]
kernel: nouveau E[   PFIFO][0000:01:00.0] PGRAPH engine fault on channel 8, recovering...
kernel: nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 8 [0x003f61d000 kwin_x11[697]]
kernel: nouveau E[  PGRAPH][0000:01:00.0] GPC2/TPC0/MP trap:
kernel: nouveau E[  PGRAPH][0000:01:00.0] TRAP ch 8 [0x003f61d000 kwin_x11[697]]
kernel: nouveau E[  PGRAPH][0000:01:00.0] ROP0 0xbadf1200 0xbadf1200
kernel: nouveau E[  PGRAPH][0000:01:00.0] ROP1 0xbadf1200 0xbadf1200
kernel: nouveau E[  PGRAPH][0000:01:00.0] TRAP UNHANDLED 0xb8df1200
kernel: nouveau E[    PBUS][0000:01:00.0] MMIO read of 0x00000000 FAULT at 0x400108 [ IBUS ]


from some other times

---
kernel: nouveau E[   PFIFO][0000:01:00.0] write fault at 0x00002af000 [PTE] from GR/GPC1/PROP_0 on channel 0x003fa28000 [Xorg[476]]
kernel: nouveau E[   PFIFO][0000:01:00.0] PGRAPH engine fault on channel 2, recovering...
---
kernel: nouveau E[   PFIFO][0000:01:00.0] write fault at 0x00002aa000 [PTE] from GR/GPC1/PROP_0 on channel 0x003fa28000 [Xorg[459]]
kernel: nouveau E[   PFIFO][0000:01:00.0] PGRAPH engine fault on channel 2, recovering...
---
kernel: nouveau E[   PFIFO][0000:01:00.0] write fault at 0x00002a0000 [PTE] from GR/GPC0/PROP_0 on channel 0x003fa28000 [Xorg[496]]
kernel: nouveau E[   PFIFO][0000:01:00.0] PGRAPH engine fault on channel 2, recovering...

---

kernel: fb: switching to nouveaufb from EFI VGA
kernel: nouveau  [  DEVICE][0000:01:00.0] BOOT0  : 0x0e6060a1
kernel: nouveau  [  DEVICE][0000:01:00.0] Chipset: GK106 (NVE6)
kernel: nouveau  [  DEVICE][0000:01:00.0] Family : NVE0
kernel: nouveau  [   VBIOS][0000:01:00.0] using image from PROM
kernel: nouveau  [   VBIOS][0000:01:00.0] BIT signature found
kernel: nouveau  [   VBIOS][0000:01:00.0] version 80.06.21.00.37
kernel: nouveau  [     PMC][0000:01:00.0] MSI interrupts enabled
kernel: nouveau  [     PFB][0000:01:00.0] RAM type: GDDR5
kernel: nouveau  [     PFB][0000:01:00.0] RAM size: 1024 MiB
kernel: nouveau  [     PFB][0000:01:00.0]    ZCOMP: 0 tags
kernel: nouveau  [    VOLT][0000:01:00.0] GPU voltage: 887500uv
kernel: nouveau  [  PTHERM][0000:01:00.0] FAN control: PWM
kernel: nouveau  [  PTHERM][0000:01:00.0] fan management: automatic
kernel: nouveau  [  PTHERM][0000:01:00.0] internal sensor: yes
kernel: nouveau  [     CLK][0000:01:00.0] 07: core 324 MHz memory 648 MHz 
kernel: nouveau  [     CLK][0000:01:00.0] 0a: core 549 MHz memory 1620 MHz 
kernel: nouveau  [     CLK][0000:01:00.0] 0f: core 928-954 MHz memory 5400 MHz 
kernel: nouveau  [     CLK][0000:01:00.0] --: core 324 MHz memory 648 MHz 
kernel: nouveau  [     DRM] VRAM: 1024 MiB
kernel: nouveau  [     DRM] GART: 1048576 MiB
kernel: nouveau  [     DRM] TMDS table version 2.0
kernel: nouveau  [     DRM] DCB version 4.0
kernel: nouveau  [     DRM] DCB outp 00: 01000f02 00020030
kernel: nouveau  [     DRM] DCB outp 01: 02000f00 00020030
kernel: nouveau  [     DRM] DCB outp 02: 08011f82 0f420030
kernel: nouveau  [     DRM] DCB outp 03: 02022f62 0f420010
kernel: nouveau  [     DRM] DCB conn 00: 00001030
kernel: nouveau  [     DRM] DCB conn 01: 00002131
kernel: nouveau  [     DRM] DCB conn 02: 00010263
kernel: nouveau  [     DRM] MM: using COPY for buffer copies
kernel: nouveau  [     DRM] allocated 2560x1536 fb: 0x60000, bo ffff88022316a400
kernel: fbcon: nouveaufb (fb0) is primary device
kernel: nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device
kernel: nouveau 0000:01:00.0: registered panic notifier
kernel: [drm] Initialized nouveau 1.2.1 20120801 for 0000:01:00.0 on minor 0
kernel: nouveau E[   PFIFO][0000:01:00.0] write fault at 0x00002a2000 [PTE] from GR/GPC1/PROP_0 on channel 0x003fa28000 [Xorg[512]]
kernel: nouveau E[   PFIFO][0000:01:00.0] PGRAPH engine fault on channel 2, recovering...
Comment 1 Dāvis 2015-05-17 03:52:31 UTC
Still happening with kernel 4.0.2

kernel: nouveau E[   PFIFO][0000:01:00.0] write fault at 0x00002aa000 [PTE] from GR/GPC1/PROP_0 on channel 0x003fa28000 [Xorg[485]]
kernel: nouveau E[   PFIFO][0000:01:00.0] PGRAPH engine fault on channel 2, recovering...


Any idea how to find cause? why it hangs...
Comment 2 Argyris Megalios 2015-06-12 11:10:36 UTC
I'm having the same issue here. Same card (GTX 650 Ti) on Arch, presently on kernel version 4.0.5. The last journal entries after each freeze are consistently:

kernel: nouveau E[   PFIFO][0000:01:00.0] write fault at 0x00002a0000 [PTE] from GR/GPC0/PROP_0 on channel 0x007f96c000 [Xorg[383]]
kernel: nouveau E[   PFIFO][0000:01:00.0] PGR engine fault on channel 2, recovering...

Except the channel number which is different every time.
Everything except the mouse cursor freezes, while sound also continues for a few seconds before stopping. However, I can login from another machine via SSH and restart SDDM just fine, although it takes about half a minute (from the usual 2-3 seconds).
I'm using KDE and although it happens randomly, there seems to be a correlation with having Kaffeine playing TV in the background. No consistency, though.
Comment 3 Argyris Megalios 2015-06-12 11:13:07 UTC
Bug 90453 could be relevant, although the error messages are different.
Comment 4 Thiago 2015-11-05 14:48:34 UTC
Same problem here!!
Im using kernel 4.0.2-1-amd64 on Debian 


01:00.0 VGA compatible controller: NVIDIA Corporation GK106 [GeForce GTX 650 Ti] (rev a1)

/var/log/syslog

kernel: [47450.816837] nouveau E[chromium[1738]] fail set_domain
kernel: [47450.816843] nouveau E[chromium[1738]] validating bo list
kernel: [47450.816846] nouveau E[chromium[1738]] validate: -22
kernel: [47450.817278] nouveau E[chromium[1738]] fail set_domain
kernel: [47450.817281] nouveau E[chromium[1738]] validating bo list
kernel: [47450.817283] nouveau E[chromium[1738]] validate: -22
kernel: [47450.817389] nouveau E[chromium[1738]] fail set_domain
kernel: [47450.817390] nouveau E[chromium[1738]] validating bo list
kernel: [47450.817393] nouveau E[chromium[1738]] validate: -22
kernel: [47450.818377] nouveau E[chromium[1738]] fail set_domain
kernel: [47450.818380] nouveau E[chromium[1738]] validating bo list
kernel: [47450.818384] nouveau E[chromium[1738]] validate: -22
kernel: [47450.825570] nouveau E[   PFIFO][0000:01:00.0] read fault at 0x000c580000 [PTE] from GR/GPC0/PROP_0 on channel 0x003f7cf000 [chromium[1738]]
kernel: [47450.825576] nouveau E[   PFIFO][0000:01:00.0] PGRAPH engine fault on channel 6, recovering...

system freezes but mouse still working and clementine still playing.
Comment 5 Thiago 2015-11-05 14:50:44 UTC
using kernel 4.2.0-1-amd64 on Debian
Comment 6 Emil Velikov 2015-11-05 23:02:13 UTC
Looking at the "fail set_domain" messages you're likely hitting bug 92077. Take a look at the patch/discussion in there please.
Comment 7 Martin Peres 2019-12-04 08:57:56 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/179.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.