Bug 81690 - nouveau GPU locks up under memory pressure
Summary: nouveau GPU locks up under memory pressure
Status: NEW
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL: http://download.wakfu.asia/full/unix/
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-07-23 21:18 UTC by Michal Suchanek
Modified: 2016-04-12 16:58 UTC (History)
5 users (show)

See Also:
i915 platform:
i915 features:


Attachments
freeze log (230.79 KB, text/plain)
2015-12-08 10:47 UTC, Daniel
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michal Suchanek 2014-07-23 21:18:36 UTC
When there is memory pressure GPU tends to hang.

This is probably related to system memory pressure (not vram) although I have no idea about vram utilisation.

Usually crash happens when I start an application that uses the GPU and the system starts to swap and/or OOM killer kills something and/or applications crash due to bad handling of OOM condition.

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GeForce GT 620 [10de:0f01] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: ASUSTeK Computer Inc. Device [1043:83ff]
	Flags: bus master, fast devsel, latency 0, IRQ 52
	Memory at fc000000 (32-bit, non-prefetchable) [size=16M]
	Memory at f0000000 (64-bit, prefetchable) [size=128M]
	Memory at f8000000 (64-bit, prefetchable) [size=32M]
	I/O ports at dc80 [size=128]
	Expansion ROM at fde00000 [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [b4] Vendor Specific Information: Len=14 <?>
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Kernel driver in use: nouveau

Linux 3.15-trunk-amd64 #1 SMP Debian 3.15.5-1~exp1 (2014-07-10) x86_64 GNU/Linux
ii  libgl1-mesa-dri:am 10.2.3-1       amd64

[ 2574.171692] nouveau E[   PFIFO][0000:01:00.0] read fault at 0x0000011000 [INVALID_STORAGE_TYPE] from PFIFO/PFIFO on channel 0x007edbc000 [unknown]
[ 2664.669780] nouveau E[     DRM] GPU lockup - switching to software fbcon
[ 2679.688012] nouveau E[Xorg[1971]] failed to idle channel 0xcccc0001 [Xorg[1971]]
[  151.697805] nouveau E[   PFIFO][0000:01:00.0] read fault at 0x0000011000 [INVALID_STORAGE_TYPE] from PFIFO/PFIFO on channel 0x007ed88000 [unknown]
[  168.639601] nouveau E[     DRM] GPU lockup - switching to software fbcon
[  183.760010] nouveau E[Xorg[2027]] failed to idle channel 0xcccc0001 [Xorg[2027]]
[  134.917421] nouveau E[   PFIFO][0000:01:00.0] read fault at 0x0000011000 [INVALID_STORAGE_TYPE] from PFIFO/PFIFO on channel 0x007ed88000 [unknown]
[  165.296145] nouveau E[     DRM] GPU lockup - switching to software fbcon
[    7.563122] nouveau  [  DEVICE][0000:01:00.0] BOOT0  : 0x0c1080a1
[    7.569331] nouveau  [  DEVICE][0000:01:00.0] Chipset: GF108 (NVC1)
[    7.575693] nouveau  [  DEVICE][0000:01:00.0] Family : NVC0
[    7.586561] usbcore: registered new interface driver snd-usb-audio
[    7.644889] nouveau  [   VBIOS][0000:01:00.0] checking PRAMIN for image...
[    7.790243] nouveau  [   VBIOS][0000:01:00.0] ... appears to be valid
[    7.790245] nouveau  [   VBIOS][0000:01:00.0] using image from PRAMIN
[    7.790338] nouveau  [   VBIOS][0000:01:00.0] BIT signature found
[    7.790340] nouveau  [   VBIOS][0000:01:00.0] version 70.08.ae.00.02
[    7.790366] Bluetooth: HCI socket layer initialized
[    7.790367] Bluetooth: L2CAP socket layer initialized
[    7.790376] Bluetooth: SCO socket layer initialized
[    7.797368] nouveau 0000:01:00.0: irq 52 for MSI/MSI-X
[    7.797377] nouveau  [     PMC][0000:01:00.0] MSI interrupts enabled
[    7.797415] nouveau W[     PFB][0000:01:00.0][0x00000000][ffff88022bbb7800] reclocking of this ram type unsupported
[    7.797416] nouveau  [     PFB][0000:01:00.0] RAM type: DDR3
[    7.797417] nouveau  [     PFB][0000:01:00.0] RAM size: 2048 MiB
[    7.797418] nouveau  [     PFB][0000:01:00.0]    ZCOMP: 0 tags
[    7.801509] nouveau  [    VOLT][0000:01:00.0] GPU voltage: 900000uv
[    9.300033] nouveau  [  PTHERM][0000:01:00.0] FAN control: none / external
[    9.306998] nouveau  [  PTHERM][0000:01:00.0] fan management: automatic
[    9.313701] nouveau  [  PTHERM][0000:01:00.0] internal sensor: yes
[    9.320011] nouveau  [     CLK][0000:01:00.0] 03: core 50 MHz memory 324 MHz 
[    9.320113] EXT4-fs (sdd1): mounting ext3 file system using the ext4 subsystem
[    9.334538] nouveau  [     CLK][0000:01:00.0] 07: core 405 MHz memory 324 MHz 
[    9.341863] nouveau  [     CLK][0000:01:00.0] 0f: core 700 MHz memory 700 MHz 
[    9.349339] nouveau  [     CLK][0000:01:00.0] --: core 405 MHz memory 324 MHz 
[    9.359052] [TTM] Zone  kernel: Available graphics memory: 4032366 kiB
[    9.365668] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[    9.372279] [TTM] Initializing pool allocator
[    9.376741] [TTM] Initializing DMA pool allocator
[    9.381547] nouveau  [     DRM] VRAM: 2048 MiB
[    9.386087] nouveau  [     DRM] GART: 1048576 MiB
[    9.390892] nouveau  [     DRM] TMDS table version 2.0
[    9.396126] nouveau  [     DRM] DCB version 4.0
[    9.400746] nouveau  [     DRM] DCB outp 00: 01000302 00020030
[    9.406675] nouveau  [     DRM] DCB outp 01: 02000300 00000000
[    9.412608] nouveau  [     DRM] DCB outp 02: 08011392 00020020
[    9.418523] nouveau  [     DRM] DCB outp 03: 04022310 00000000
[    9.421527] EXT4-fs (sdd1): mounted filesystem with ordered data mode. Opts: errors=remount-ro
[    9.433144] nouveau  [     DRM] DCB conn 00: 00001030
[    9.439774] nouveau  [     DRM] DCB conn 01: 00002161
[    9.446363] nouveau  [     DRM] DCB conn 02: 00000200
[    9.452457] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    9.459164] [drm] Driver supports precise vblank timestamp query.
[    9.470482] nouveau  [     DRM] MM: using COPY0 for buffer copies
[    9.500028] usb 4-2: new full-speed USB device number 4 using uhci_hcd
[    9.584981] nouveau  [     DRM] allocated 1600x1600 fb: 0x60000, bo ffff88022e07c800
[    9.592947] fbcon: nouveaufb (fb0) is primary device
[    9.642993] EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts: (null)
[    9.684077] Console: switching to colour frame buffer device 150x75
[    9.705241] nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device
[    9.705246] nouveau 0000:01:00.0: registered panic notifier
[    9.705257] [drm] Initialized nouveau 1.1.1 20120801 for 0000:01:00.0 on minor 0
[177491.295050] nouveau E[Wakfu[2020]] fail ttm_validate
[177491.300109] nouveau E[Wakfu[2020]] validate gart_list
[177491.305449] nouveau E[Wakfu[2020]] validate: -12
[177717.658727] usb 8-4: USB disconnect, device number 14
[177803.434648] nouveau E[   PFIFO][0000:01:00.0] write fault at 0x0000218000 [PAGE_NOT_PRESENT] from PGRAPH/DISPATCH on channel 0x007f89c000 [Wakfu[2020]]
[177803.438624] nouveau E[   PFIFO][0000:01:00.0] PGRAPH engine fault on channel 5, recovering...
[177983.108013] nouveau E[Xorg[1899]] failed to idle channel 0xcccc0000 [Xorg[1899]]
[177998.112017] nouveau E[Xorg[1899]] failed to idle channel 0xcccc0000 [Xorg[1899]]
[177998.119751] nouveau E[   PFIFO][0000:01:00.0] read fault at 0x000001b000 [PAGE_NOT_PRESENT] from PFIFO/BAR_READ on channel 0x007fb5a000 [unknown]
[178002.857403] nouveau E[     DRM] GPU lockup - switching to software fbcon
[178015.816010] nouveau E[Wakfu[2016]] failed to idle channel 0xcccc0000 [Wakfu[2016]]

an easy way to trigger the issue is to 

download the above game client, 
unpack, 
run the launcher script (wakfu/wakfu), 
wait for updates to finish, 
and press the PLAY button repeatedly until memory runs out. 

The client takes about 1.2GB
Comment 1 sven 2014-07-29 10:28:08 UTC
I also have memory related problems when loading big textures.

When I try to use a ~717MB sized OpenGL 2D texture array, I get the following to the application's stderr (or stdout, haven't checked):

> nouveau: kernel rejected pushbuf: Device or resource busy
> nouveau: ch0: krec 0 pushes 1 bufs 10 relocs 0
> nouveau: ch0: buf 00000000 00000003 00000004 00000004 00000000
> nouveau: ch0: buf 00000001 00000010 00000002 00000000 00000002
> nouveau: ch0: buf 00000002 00000008 00000002 00000000 00000002
> nouveau: ch0: buf 00000003 00000013 00000002 00000002 00000002
> nouveau: ch0: buf 00000004 00000019 00000002 00000002 00000000
> nouveau: ch0: buf 00000005 00000011 00000002 00000000 00000002
> nouveau: ch0: buf 00000006 00000012 00000002 00000000 00000002
> nouveau: ch0: buf 00000007 00000015 00000002 00000002 00000000
> nouveau: ch0: buf 00000008 00000017 00000002 00000002 00000000
> nouveau: ch0: buf 00000009 00000014 00000002 00000000 00000002
> nouveau: ch0: psh 00000000 000004e6fc 000004eef4

And the kernel log reads:
> [   12.849796] nouveau  [  DEVICE][0000:01:00.0] Chipset: GF119 (NVD9)
> [   12.849800] nouveau  [  DEVICE][0000:01:00.0] Family : NVD0
> [   13.584920] nouveau  [     PFB][0000:01:00.0] RAM size: 1024 MiB
...
> [  328.496640] nouveau W[   PFIFO][0000:01:00.0] INTR 0x00000001: 0x00000000
> [  328.496652] nouveau E[   PFIFO][0000:01:00.0] INTR 0x08800000
> [  328.496689] nouveau E[    PBUS][0000:01:00.0] MMIO read of 0x00000000 FAULT at 0x002100 [ !ENGINE ]
> [  340.851380] nouveau E[   PFIFO][0000:01:00.0] read fault at 0x003a330000 [PAGE_NOT_PRESENT] from PGRAPH/DISPATCH on channel 0x003fb4e000 [DummyName[1554]]
> [  340.851386] nouveau E[   PFIFO][0000:01:00.0] PGRAPH engine fault on channel 3, recovering...
> [  344.331776] nouveau E[DummyName[1554]] nv50cal_space: -16
> [  344.438226] nouveau E[DummyName[1554]] nv50cal_space: -16
> [  344.544699] nouveau E[DummyName[1554]] nv50cal_space: -16
> [  344.650491] nouveau E[DummyName[1554]] nv50cal_space: -16
> [  344.756217] nouveau E[DummyName[1554]] nv50cal_space: -16
> [  344.861902] nouveau E[DummyName[1554]] nv50cal_space: -16
> [  344.969239] nouveau E[DummyName[1554]] nv50cal_space: -16
> [  345.076069] nouveau E[DummyName[1554]] nv50cal_space: -16
> [  345.183105] nouveau E[DummyName[1554]] nv50cal_space: -16
> [  345.290630] nouveau E[DummyName[1554]] nv50cal_space: -16
> [  345.398302] nouveau E[DummyName[1554]] nv50cal_space: -16
> [  360.409590] nouveau E[DummyName[1554]] failed to idle channel 0xcccc0000 [DummyName[1554]]
> [  360.512756] nouveau E[DummyName[1554]] failed to idle channel 0xcccc0000 [DummyName[1554]]
Comment 2 Pierre Moreau 2014-12-09 17:44:12 UTC
Correcting product and component to link it to Nouveau.
Comment 3 xpue 2015-08-26 15:17:12 UTC
Happens for me too, when any program uses 3d, and any other(or same) uses significant amount of memory and causes swapping.

You can find many similiar bug reports here by searching PAGE_NOT_PRESENT.

Only two messages in dmesg every time that it happens:

nouveau E[   PFIFO][0000:02:00.0] read fault at 0x00065a0000 [PAGE_NOT_PRESENT] from PGRAPH/GPC0/TEX on channel 0x003fb49000 [chrome[21010]]
nouveau E[   PFIFO][0000:02:00.0] PGRAPH engine fault on channel 5, recovering...

nouveau E[   PFIFO][0000:02:00.0] read fault at 0x001146e000 [PAGE_NOT_PRESENT] from PGRAPH/GPC0/PROP on channel 0x003fb49000 [chrome[20217]]
nouveau E[   PFIFO][0000:02:00.0] PGRAPH engine fault on channel 5, recovering...

nouveau E[   PFIFO][0000:02:00.0] write fault at 0x0000218000 [PAGE_NOT_PRESENT] from PGRAPH/DISPATCH on channel 0x003fb49000 [Xorg[79]]
nouveau E[   PFIFO][0000:02:00.0] PGRAPH engine fault on channel 5, recovering...

nouveau E[   PFIFO][0000:02:00.0] read fault at 0x001583a000 [PAGE_NOT_PRESENT] from PGRAPH/GPC0/UNK07 on channel 0x003fb49000 [Xorg[79]]
nouveau E[   PFIFO][0000:02:00.0] PGRAPH engine fault on channel 5, recovering...

nouveau E[   PFIFO][0000:02:00.0] read fault at 0x0008c14000 [PAGE_NOT_PRESENT] from PGRAPH/GPC0/PROP on channel 0x003fb49000 [Xorg[81]]
nouveau E[   PFIFO][0000:02:00.0] PGRAPH engine fault on channel 5, recovering...
Comment 4 Michal Suchanek 2015-09-11 09:03:43 UTC
Maybe this is somewhat relevant. The PC has supposedly working IOMMU which might be reason for the page faults. I am not sure how the memory mapping for the GPU works.
Comment 5 Timothy Pearson 2015-09-20 23:54:25 UTC
I'm seeing something similar to this on a dual-socket Opteron system with kernel 4.2.  It is sufficient for all memory to be exhausted on one NUMA node; when Linux attempts to recover by migrating processes to the other node this bug is triggered.
Comment 6 Ian Kumlien 2015-09-23 21:39:11 UTC
I don't know if this is really related but... 

I'm seeing this:
[40181.763458] nouveau E[   PFIFO][0000:01:00.0] read fault at 0x000a575000 [PAGE_NOT_PRESENT] from PGRAPH/GPC0/PROP on channel 0x007f921000 [chrome[6026]]
[40181.763462] nouveau E[   PFIFO][0000:01:00.0] PGRAPH engine fault on channel 10, recovering...
[40182.417849] nouveau E[chrome[6026]] nv50cal_space: -16
[40182.701542] nouveau E[chrome[6026]] nv50cal_space: -16
[40182.985634] nouveau E[chrome[6026]] nv50cal_space: -16
[40183.270485] nouveau E[chrome[6026]] nv50cal_space: -16
[40183.554109] nouveau E[chrome[6026]] nv50cal_space: -16
[40183.841470] nouveau E[chrome[6026]] nv50cal_space: -16
[40184.128689] nouveau E[chrome[6026]] nv50cal_space: -16
[40184.415447] nouveau E[chrome[6026]] nv50cal_space: -16
[40184.701808] nouveau E[chrome[6026]] nv50cal_space: -16
[40184.986853] nouveau E[chrome[6026]] nv50cal_space: -16
[40185.272766] nouveau E[chrome[6026]] nv50cal_space: -16
[40185.558682] nouveau E[chrome[6026]] nv50cal_space: -16
[40185.843290] nouveau E[chrome[6026]] nv50cal_space: -16
[40186.128950] nouveau E[chrome[6026]] nv50cal_space: -16
[40186.415131] nouveau E[chrome[6026]] nv50cal_space: -16
[40186.700506] nouveau E[chrome[6026]] nv50cal_space: -16
[40186.986265] nouveau E[chrome[6026]] nv50cal_space: -16
[40187.271499] nouveau E[chrome[6026]] nv50cal_space: -16
[40187.556816] nouveau E[chrome[6026]] nv50cal_space: -16
[40187.842722] nouveau E[chrome[6026]] nv50cal_space: -16
[40188.133485] nouveau E[chrome[6026]] nv50cal_space: -16
[40188.418877] nouveau E[chrome[6026]] nv50cal_space: -16
[40188.705268] nouveau E[chrome[6026]] nv50cal_space: -16
[40188.990129] nouveau E[chrome[6026]] nv50cal_space: -16
[40189.275092] nouveau E[chrome[6026]] nv50cal_space: -16
[40189.560675] nouveau E[chrome[6026]] nv50cal_space: -16
[40189.846162] nouveau E[chrome[6026]] nv50cal_space: -16
[40190.131173] nouveau E[chrome[6026]] nv50cal_space: -16
[40190.414957] nouveau E[chrome[6026]] nv50cal_space: -16
[40190.698904] nouveau E[chrome[6026]] nv50cal_space: -16
[40190.982435] nouveau E[chrome[6026]] nv50cal_space: -16
[40191.266176] nouveau E[chrome[6026]] nv50cal_space: -16
[40191.549908] nouveau E[chrome[6026]] nv50cal_space: -16
[40191.833565] nouveau E[chrome[6026]] nv50cal_space: -16
[40192.117459] nouveau E[chrome[6026]] nv50cal_space: -16
[40192.401711] nouveau E[chrome[6026]] nv50cal_space: -16
[40192.685162] nouveau E[chrome[6026]] nv50cal_space: -16
[40192.969316] nouveau E[chrome[6026]] nv50cal_space: -16
[40193.256315] nouveau E[chrome[6026]] nv50cal_space: -16
[40193.541295] nouveau E[chrome[6026]] nv50cal_space: -16
[40193.828543] nouveau E[chrome[6026]] nv50cal_space: -16
[40194.113320] nouveau E[chrome[6026]] nv50cal_space: -16
[40194.398205] nouveau E[chrome[6026]] nv50cal_space: -16
[40194.685591] nouveau E[chrome[6026]] nv50cal_space: -16
[40194.969929] nouveau E[chrome[6026]] nv50cal_space: -16
[40195.254368] nouveau E[chrome[6026]] nv50cal_space: -16
[40195.537985] nouveau E[chrome[6026]] nv50cal_space: -16
[40195.821983] nouveau E[chrome[6026]] nv50cal_space: -16
[40196.105431] nouveau E[chrome[6026]] nv50cal_space: -16
[40196.389035] nouveau E[chrome[6026]] nv50cal_space: -16
[40196.672610] nouveau E[chrome[6026]] nv50cal_space: -16
[40196.957269] nouveau E[chrome[6026]] nv50cal_space: -16
[40197.263701] nouveau E[chrome[6026]] nv50cal_space: -16
[40197.548514] nouveau E[chrome[6026]] nv50cal_space: -16
[40197.831412] nouveau E[chrome[6026]] nv50cal_space: -16
[40198.114322] nouveau E[chrome[6026]] nv50cal_space: -16
[40198.397061] nouveau E[chrome[6026]] nv50cal_space: -16
[40198.680167] nouveau E[chrome[6026]] nv50cal_space: -16
[40198.964247] nouveau E[chrome[6026]] nv50cal_space: -16
[40199.248475] nouveau E[chrome[6026]] nv50cal_space: -16
[40199.532005] nouveau E[chrome[6026]] nv50cal_space: -16
[40199.818370] nouveau E[chrome[6026]] nv50cal_space: -16
[40214.814500] nouveau E[chrome[6026]] failed to idle channel 0xcccc0000 [chrome[6026]]
[40215.097878] nouveau E[chrome[6026]] failed to idle channel 0xcccc0000 [chrome[6026]]
[178161.256855] nouveau E[   PFIFO][0000:01:00.0] read fault at 0x0009399000 [PAGE_NOT_PRESENT] from PGRAPH/GPC0/PROP on channel 0x007f923000 [chrome[29152]]
[178161.256859] nouveau E[   PFIFO][0000:01:00.0] PGRAPH engine fault on channel 10, recovering...

I get big blocky dark squares rendering specifically in the corners and, in this case, a youtube video flickering.

After killing chrome and starting it again:
[178192.982581] nouveau E[chrome[29152]] failed to idle channel 0xcccc0000 [chrome[29152]]
[178207.974336] nouveau E[chrome[29152]] failed to idle channel 0xcccc0000 [chrome[29152]]
---

I have been seeing this a lot and it usually leads to a complete X deadlock...

This is on a single socket AMD machine: I don't know if this is really related but... 

I'm seeing this:
[40181.763458] nouveau E[   PFIFO][0000:01:00.0] read fault at 0x000a575000 [PAGE_NOT_PRESENT] from PGRAPH/GPC0/PROP on channel 0x007f921000 [chrome[6026]]
[40181.763462] nouveau E[   PFIFO][0000:01:00.0] PGRAPH engine fault on channel 10, recovering...
[40182.417849] nouveau E[chrome[6026]] nv50cal_space: -16
[40182.701542] nouveau E[chrome[6026]] nv50cal_space: -16
[40182.985634] nouveau E[chrome[6026]] nv50cal_space: -16
[40183.270485] nouveau E[chrome[6026]] nv50cal_space: -16
[40183.554109] nouveau E[chrome[6026]] nv50cal_space: -16
[40183.841470] nouveau E[chrome[6026]] nv50cal_space: -16
[40184.128689] nouveau E[chrome[6026]] nv50cal_space: -16
[40184.415447] nouveau E[chrome[6026]] nv50cal_space: -16
[40184.701808] nouveau E[chrome[6026]] nv50cal_space: -16
[40184.986853] nouveau E[chrome[6026]] nv50cal_space: -16
[40185.272766] nouveau E[chrome[6026]] nv50cal_space: -16
[40185.558682] nouveau E[chrome[6026]] nv50cal_space: -16
[40185.843290] nouveau E[chrome[6026]] nv50cal_space: -16
[40186.128950] nouveau E[chrome[6026]] nv50cal_space: -16
[40186.415131] nouveau E[chrome[6026]] nv50cal_space: -16
[40186.700506] nouveau E[chrome[6026]] nv50cal_space: -16
[40186.986265] nouveau E[chrome[6026]] nv50cal_space: -16
[40187.271499] nouveau E[chrome[6026]] nv50cal_space: -16
[40187.556816] nouveau E[chrome[6026]] nv50cal_space: -16
[40187.842722] nouveau E[chrome[6026]] nv50cal_space: -16
[40188.133485] nouveau E[chrome[6026]] nv50cal_space: -16
[40188.418877] nouveau E[chrome[6026]] nv50cal_space: -16
[40188.705268] nouveau E[chrome[6026]] nv50cal_space: -16
[40188.990129] nouveau E[chrome[6026]] nv50cal_space: -16
[40189.275092] nouveau E[chrome[6026]] nv50cal_space: -16
[40189.560675] nouveau E[chrome[6026]] nv50cal_space: -16
[40189.846162] nouveau E[chrome[6026]] nv50cal_space: -16
[40190.131173] nouveau E[chrome[6026]] nv50cal_space: -16
[40190.414957] nouveau E[chrome[6026]] nv50cal_space: -16
[40190.698904] nouveau E[chrome[6026]] nv50cal_space: -16
[40190.982435] nouveau E[chrome[6026]] nv50cal_space: -16
[40191.266176] nouveau E[chrome[6026]] nv50cal_space: -16
[40191.549908] nouveau E[chrome[6026]] nv50cal_space: -16
[40191.833565] nouveau E[chrome[6026]] nv50cal_space: -16
[40192.117459] nouveau E[chrome[6026]] nv50cal_space: -16
[40192.401711] nouveau E[chrome[6026]] nv50cal_space: -16
[40192.685162] nouveau E[chrome[6026]] nv50cal_space: -16
[40192.969316] nouveau E[chrome[6026]] nv50cal_space: -16
[40193.256315] nouveau E[chrome[6026]] nv50cal_space: -16
[40193.541295] nouveau E[chrome[6026]] nv50cal_space: -16
[40193.828543] nouveau E[chrome[6026]] nv50cal_space: -16
[40194.113320] nouveau E[chrome[6026]] nv50cal_space: -16
[40194.398205] nouveau E[chrome[6026]] nv50cal_space: -16
[40194.685591] nouveau E[chrome[6026]] nv50cal_space: -16
[40194.969929] nouveau E[chrome[6026]] nv50cal_space: -16
[40195.254368] nouveau E[chrome[6026]] nv50cal_space: -16
[40195.537985] nouveau E[chrome[6026]] nv50cal_space: -16
[40195.821983] nouveau E[chrome[6026]] nv50cal_space: -16
[40196.105431] nouveau E[chrome[6026]] nv50cal_space: -16
[40196.389035] nouveau E[chrome[6026]] nv50cal_space: -16
[40196.672610] nouveau E[chrome[6026]] nv50cal_space: -16
[40196.957269] nouveau E[chrome[6026]] nv50cal_space: -16
[40197.263701] nouveau E[chrome[6026]] nv50cal_space: -16
[40197.548514] nouveau E[chrome[6026]] nv50cal_space: -16
[40197.831412] nouveau E[chrome[6026]] nv50cal_space: -16
[40198.114322] nouveau E[chrome[6026]] nv50cal_space: -16
[40198.397061] nouveau E[chrome[6026]] nv50cal_space: -16
[40198.680167] nouveau E[chrome[6026]] nv50cal_space: -16
[40198.964247] nouveau E[chrome[6026]] nv50cal_space: -16
[40199.248475] nouveau E[chrome[6026]] nv50cal_space: -16
[40199.532005] nouveau E[chrome[6026]] nv50cal_space: -16
[40199.818370] nouveau E[chrome[6026]] nv50cal_space: -16
[40214.814500] nouveau E[chrome[6026]] failed to idle channel 0xcccc0000 [chrome[6026]]
[40215.097878] nouveau E[chrome[6026]] failed to idle channel 0xcccc0000 [chrome[6026]]
[178161.256855] nouveau E[   PFIFO][0000:01:00.0] read fault at 0x0009399000 [PAGE_NOT_PRESENT] from PGRAPH/GPC0/PROP on channel 0x007f923000 [chrome[29152]]
[178161.256859] nouveau E[   PFIFO][0000:01:00.0] PGRAPH engine fault on channel 10, recovering...

I get big blocky dark squares rendering specifically in the corners and, in this case, a youtube video flickering.

After killing chrome and starting it again:
[178192.982581] nouveau E[chrome[29152]] failed to idle channel 0xcccc0000 [chrome[29152]]
[178207.974336] nouveau E[chrome[29152]] failed to idle channel 0xcccc0000 [chrome[29152]]
---

I have been seeing this a lot and it usually leads to a complete X deadlock...

This is on a single socket AMD machine: AMD FX(tm)-8350 Eight-Core Processor
With 32GB of ECC memory (no ecc errors detected)

This would mean that it's been reproduced on a non NUMA system with only video playing....
Comment 7 Woden Cafe 2015-09-25 14:37:46 UTC
Hi guys,

I have been receiving this error which appears to be the same error.

It causes the system to completely hang, video completely frozen.

The system log says it's related to Chromium, I had it happen to me twice within a half hour. I wasn't doing anything too intensive, mostly had Chromium open with Eclipse and Thunderbird.

First, some info about my system:

~$ lsb_release -a
LSB Version:	core-2.0-amd64:core-2.0-noarch:core-3.0-amd64:core-3.0-noarch:core-3.1-amd64:core-3.1-noarch:core-3.2-amd64:core-3.2-noarch:core-4.0-amd64:core-4.0-noarch:core-4.1-amd64:core-4.1-noarch:security-4.0-amd64:security-4.0-noarch:security-4.1-amd64:security-4.1-noarch
Distributor ID:	Ubuntu
Description:	Ubuntu 15.04
Release:	15.04
Codename:	vivid

~$ uname -a
Linux kubuntu-cboyd 3.19.3-031903-generic #201503261036 SMP Thu Mar 26 14:37:55 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

~$ lspci -v
06:00.0 VGA compatible controller: NVIDIA Corporation GF106 [GeForce GT 440] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device 2312
	Flags: bus master, fast devsel, latency 0, IRQ 37
	Memory at f8000000 (32-bit, non-prefetchable) [size=16M]
	Memory at c8000000 (64-bit, prefetchable) [size=128M]
	Memory at c4000000 (64-bit, prefetchable) [size=32M]
	I/O ports at dc00 [size=128]
	Expansion ROM at fbd00000 [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [b4] Vendor Specific Information: Len=14 <?>
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Kernel driver in use: nouveau

~$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 26
Model name:            Intel(R) Core(TM) i7 CPU         950  @ 3.07GHz
Stepping:              5
CPU MHz:               1733.000
CPU max MHz:           3068.0000
CPU min MHz:           1600.0000
BogoMIPS:              6118.10
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K
NUMA node0 CPU(s):     0-7

~$ cat /proc/meminfo | grep MemTotal
MemTotal:       20552364 kB

Here is the syslog info, note the issue happens twice:

Sep 25 08:51:56 kubuntu-cboyd kernel: [62867.937071] nouveau E[     DRM] DDC responded, but no EDID for VGA-1
Sep 25 08:52:06 kubuntu-cboyd kernel: [62877.996104] nouveau E[     DRM] DDC responded, but no EDID for VGA-1
Sep 25 08:52:16 kubuntu-cboyd kernel: [62888.047116] nouveau E[     DRM] DDC responded, but no EDID for VGA-1
Sep 25 08:52:26 kubuntu-cboyd kernel: [62898.102679] nouveau E[     DRM] DDC responded, but no EDID for VGA-1
Sep 25 08:52:36 kubuntu-cboyd kernel: [62908.157635] nouveau E[     DRM] DDC responded, but no EDID for VGA-1
Sep 25 08:52:47 kubuntu-cboyd kernel: [62918.213027] nouveau E[     DRM] DDC responded, but no EDID for VGA-1
Sep 25 08:52:57 kubuntu-cboyd kernel: [62928.268037] nouveau E[     DRM] DDC responded, but no EDID for VGA-1
Sep 25 08:53:07 kubuntu-cboyd kernel: [62938.323730] nouveau E[     DRM] DDC responded, but no EDID for VGA-1
Sep 25 08:53:14 kubuntu-cboyd kernel: [62945.920019] nouveau E[chromium-browse[4468]] multiple instances of buffer 113 on validation list
Sep 25 08:53:14 kubuntu-cboyd kernel: [62945.920024] nouveau E[chromium-browse[4468]] validate_init
Sep 25 08:53:14 kubuntu-cboyd kernel: [62945.920026] nouveau E[chromium-browse[4468]] validate: -22
Sep 25 08:53:14 kubuntu-cboyd kernel: [62945.928974] nouveau E[   PFIFO][0000:06:00.0] write fault at 0x00078c0000 [PAGE_NOT_PRESENT] from PGRAPH/GPC0/PROP on channel 0x00bf9b3000 [chromium-browse[4468]]
Sep 25 08:53:14 kubuntu-cboyd kernel: [62945.928977] nouveau E[   PFIFO][0000:06:00.0] PGRAPH engine fault on channel 5, recovering...
Sep 25 08:53:17 kubuntu-cboyd kernel: [62948.378624] nouveau E[     DRM] DDC responded, but no EDID for VGA-1
Sep 25 08:53:27 kubuntu-cboyd kernel: [62958.434144] nouveau E[     DRM] DDC responded, but no EDID for VGA-1
Comment 8 Michal Suchanek 2015-09-30 13:47:33 UTC
Hello, 

do you have a test program that triggers this without much dependencies which you can upload?

Comment #1 says something about uploading large texture array.
Comment 9 Daniel 2015-12-08 10:47:07 UTC
Created attachment 120413 [details]
freeze log

I'm getting similar issues after while. See my log file.

lspci | grep VGA
01:00.0 VGA compatible controller: NVIDIA Corporation GF116 [GeForce GTX 550 Ti] (rev a1)

Is it related?
Comment 10 Philippe "RzR" Coval 2015-12-22 12:15:10 UTC
I think I am also affected by this nouveau bug, on :

01:00.0 VGA compatible controller: NVIDIA Corporation GK106 [GeForce GTX 645 OEM] (rev a1)


Now I have installed koops,
I can provide more traces or data if anyone want to tell me what to do.


[ 1480.528031] ------------[ cut here ]------------
[ 1480.528033] WARNING: CPU: 0 PID: 4 at /home/kernel/COD/linux/include/drm/drm_crtc.h:1565 drm_helper_choose_crtc_dpms+0x93/0xa0 [drm_kms_helper]()
[ 1480.528034] Modules linked in: bnep rfcomm bluetooth uvcvideo joydev videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core v4l2_common intel_rapl videodev x86_pkg_temp_thermal media intel_powerclamp input_leds coretemp kvm_intel kvm irqbypass snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep mei_me dcdbas snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd soundcore mei shpchp lpc_ich 8250_fintek mac_hid serio_raw binfmt_misc parport_pc ppdev lp parport drbg ansi_cprng dm_crypt hid_generic usbhid hid uas usb_storage nouveau mxm_wmi wmi i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul crc32_pclmul syscopyarea sysfillrect sysimgblt fb_sys_fops aesni_intel aes_x86_64 lrw gf128mul glue_helper e1000e drm ablk_helper cryptd psmouse ahci ptp libahci pps_core video fjes
[ 1480.528054] CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G        W       4.4.0-999-generic #201512202100
[ 1480.528055] Hardware name: Dell Inc. OptiPlex 9020/0PC5F7, BIOS A02 08/15/2013
[ 1480.528060] Workqueue: events nvif_notify_work [nouveau]
[ 1480.528061]  0000000000000000 0000000097f6c821 ffff88040b663d10 ffffffff813c9124
[ 1480.528062]  0000000000000000 ffff88040b663d48 ffffffff8107db92 ffff880035ec0000
[ 1480.528063]  ffff880035ecb000 ffff880035ecb000 0000000000000003 0000000000000000
[ 1480.528064] Call Trace:
[ 1480.528066]  [<ffffffff813c9124>] dump_stack+0x44/0x60
[ 1480.528067]  [<ffffffff8107db92>] warn_slowpath_common+0x82/0xc0
[ 1480.528068]  [<ffffffff8107dcda>] warn_slowpath_null+0x1a/0x20
[ 1480.528070]  [<ffffffffc0263353>] drm_helper_choose_crtc_dpms+0x93/0xa0 [drm_kms_helper]
[ 1480.528072]  [<ffffffffc02633d7>] drm_helper_connector_dpms+0x77/0x100 [drm_kms_helper]
[ 1480.528088]  [<ffffffffc038eb70>] ? nv50_display_crtc_get+0x20/0x20 [nouveau]
[ 1480.528102]  [<ffffffffc038b53b>] nouveau_connector_hotplug+0x3b/0xb0 [nouveau]
[ 1480.528108]  [<ffffffffc02eaa77>] nvif_notify_work+0x27/0xa0 [nouveau]
[ 1480.528109]  [<ffffffff81094f6d>] ? pwq_dec_nr_in_flight+0x4d/0xa0
[ 1480.528111]  [<ffffffff8109687a>] process_one_work+0x1aa/0x440
[ 1480.528112]  [<ffffffff81096b5b>] worker_thread+0x4b/0x4c0
[ 1480.528113]  [<ffffffff81096b10>] ? process_one_work+0x440/0x440
[ 1480.528115]  [<ffffffff8109ccd8>] kthread+0xd8/0xf0
[ 1480.528116]  [<ffffffff8109cc00>] ? kthread_create_on_node+0x1a0/0x1a0
[ 1480.528118]  [<ffffffff817fd38f>] ret_from_fork+0x3f/0x70
[ 1480.528119]  [<ffffffff8109cc00>] ? kthread_create_on_node+0x1a0/0x1a0
[ 1480.528120] ---[ end trace 83b6c8059d499e87 ]---


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.