Bug 107829 - nouveau crash/freeze [MULTIPLE_WARP_ERRORS] warp 3f0009 [ILLEGAL_INSTR_ENCODING]
Summary: nouveau crash/freeze [MULTIPLE_WARP_ERRORS] warp 3f0009 [ILLEGAL_INSTR_ENCODING]
Status: NEW
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-09-05 07:56 UTC by sassmann
Modified: 2019-01-10 16:45 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
dmesg.txt (102.72 KB, text/plain)
2018-09-05 07:56 UTC, sassmann
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description sassmann 2018-09-05 07:56:43 UTC
Created attachment 141458 [details]
dmesg.txt

After an arbitrary amount of time of working in the terminal the screen hard freezes.
External monitor is connected via DisplayPort on Lenovo Dock.

dmesg then shows:
[172362.507754] nouveau 0000:01:00.0: gr: TRAP ch 6 [00ff396000 Xorg[4075]]
[172362.507767] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 3f0009 [ILLEGAL_INSTR_ENCODING]
[172362.507775] nouveau 0000:01:00.0: gr: GPC0/TPC2/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 3f0009 [ILLEGAL_INSTR_ENCODING]
[172362.507782] nouveau 0000:01:00.0: gr: GPC0/TPC3/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 3c0009 [ILLEGAL_INSTR_ENCODING]
[172362.507805] nouveau 0000:01:00.0: gr: TRAP ch 6 [00ff396000 Xorg[4075]]
[172362.507815] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000000 [] warp 3e0009 [ILLEGAL_INSTR_ENCODING]
[172362.507823] nouveau 0000:01:00.0: gr: GPC0/TPC2/MP trap: global 00000000 [] warp 3e0009 [ILLEGAL_INSTR_ENCODING]
[172362.507830] nouveau 0000:01:00.0: gr: GPC0/TPC3/MP trap: global 00000000 [] warp 3d0009 [ILLEGAL_INSTR_ENCODING]
[172362.517638] nouveau 0000:01:00.0: gr: TRAP ch 6 [00ff396000 Xorg[4075]]
[172362.517651] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 3d0009 [ILLEGAL_INSTR_ENCODING]
[172362.517658] nouveau 0000:01:00.0: gr: GPC0/TPC2/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 3d0009 [ILLEGAL_INSTR_ENCODING]
[172362.517665] nouveau 0000:01:00.0: gr: GPC0/TPC3/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 3f0009 [ILLEGAL_INSTR_ENCODING]
[172362.517685] nouveau 0000:01:00.0: gr: TRAP ch 6 [00ff396000 Xorg[4075]]
[172362.517695] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000000 [] warp 3c0009 [ILLEGAL_INSTR_ENCODING]
[172362.517702] nouveau 0000:01:00.0: gr: GPC0/TPC2/MP trap: global 00000000 [] warp 3c0009 [ILLEGAL_INSTR_ENCODING]
[172362.517711] nouveau 0000:01:00.0: gr: GPC0/TPC3/MP trap: global 00000000 [] warp 3e0009 [ILLEGAL_INSTR_ENCODING]
[172362.534375] nouveau 0000:01:00.0: gr: TRAP ch 6 [00ff396000 Xorg[4075]]
[172362.534387] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 3f0009 [ILLEGAL_INSTR_ENCODING]
[172362.534394] nouveau 0000:01:00.0: gr: GPC0/TPC2/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 3f0009 [ILLEGAL_INSTR_ENCODING]
[172362.534399] nouveau 0000:01:00.0: gr: GPC0/TPC3/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 3d0009 [ILLEGAL_INSTR_ENCODING]
Reboot is required at this point. Happens regularly, sometimes after few hours, sometimes after 1-2 days.

Hardware: Lenovo P50 running Fedora 28 4.17.19-200.fc28.x86_64
xorg-x11-drv-nouveau-1.0.15-4.fc28.x86_64
mesa-dri-drivers-18.0.5-3.fc28.x86_64

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM107GLM [Quadro M1000M] [10de:13b1] (rev a2) (prog-if 00 [VGA controller])
        Subsystem: Lenovo Device [17aa:2230]
        Flags: bus master, fast devsel, latency 0, IRQ 131
        Memory at b2000000 (32-bit, non-prefetchable) [size=16M]
        Memory at a0000000 (64-bit, prefetchable) [size=256M]
        Memory at b0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at 4000 [size=128]
        Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [250] Latency Tolerance Reporting
        Capabilities: [258] L1 PM Substates
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] #19
        Kernel driver in use: nouveau
        Kernel modules: nouveau
Comment 1 sassmann 2018-09-05 12:13:14 UTC
error on kernel 4.18.5 looked a little bit different.
[11735.012648] nouveau 0000:01:00.0: gr: TRAP ch 5 [00ff35f000 Xorg[2601]]
[11735.012660] nouveau 0000:01:00.0: gr: GPC0/TPC0/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 000e [OOR_ADDR]
[11735.012666] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 3000e [OOR_ADDR]
[11735.012672] nouveau 0000:01:00.0: gr: GPC0/TPC2/MP trap: global 00000000 [] warp 000e [OOR_ADDR]
[11735.012678] nouveau 0000:01:00.0: gr: GPC0/TPC3/MP trap: global 00000000 [] warp 000e [OOR_ADDR]
[11735.012701] nouveau 0000:01:00.0: gr: TRAP ch 5 [00ff35f000 Xorg[2601]]
[11735.012709] nouveau 0000:01:00.0: gr: GPC0/TPC0/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 9000e [OOR_ADDR]
[11735.012715] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 2000e [OOR_ADDR]
[11735.012720] nouveau 0000:01:00.0: gr: GPC0/TPC2/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 6000e [OOR_ADDR]
[11735.012726] nouveau 0000:01:00.0: gr: GPC0/TPC3/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 4000e [OOR_ADDR]
[11735.013192] nouveau 0000:01:00.0: gr: TRAP ch 4 [00ff8a7000 systemd-logind[1203]]
[11735.013201] nouveau 0000:01:00.0: gr: GPC0/TPC0/MP trap: global 00000000 [] warp 30009 [ILLEGAL_INSTR_ENCODING]
[11735.013208] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000000 [] warp 10009 [ILLEGAL_INSTR_ENCODING]
[11735.013214] nouveau 0000:01:00.0: gr: GPC0/TPC2/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 0009 [ILLEGAL_INSTR_ENCODING]
[11735.013219] nouveau 0000:01:00.0: gr: GPC0/TPC3/MP trap: global 00000000 [] warp 10009 [ILLEGAL_INSTR_ENCODING]
Comment 2 sassmann 2018-12-22 11:35:33 UTC
Tested again with 4.20-rc7. System ran for 1-2 days and then froze again.
The error looked different though.
[162840.653595] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[162840.653610] nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
[162840.653621] nouveau 0000:01:00.0: fifo: channel 4: killed
[162840.653631] nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery
[162840.654013] nouveau 0000:01:00.0: systemd-logind[1383]: channel 4 killed!
Comment 3 kenorb 2019-01-06 00:08:40 UTC
Related post: https://askubuntu.com/q/1046945/78223
Comment 4 Karol Herbst 2019-01-09 00:47:13 UTC
Problem is, the log isn't able to tell us which application is causing that.

Do you think you could try to SSH into the machine while the screen hard freezes and check with top/htop if there is anything consuming lots of CPU? And see if killing that application unfreezes the screen?

If the only application consuming significantly more CPU is Xorg, maybe killing applications inside the Xorg session until it unfreezes could help us track down which application is causing that freeze.
Comment 5 sassmann 2019-01-09 12:52:49 UTC
Unfortunately there's no process hogging the CPU. Would it help to add some debug kernel cmd line options?
Comment 6 sassmann 2019-01-09 17:52:27 UTC
So I went ahead and tried 5.0-rc1, playing video with chromium to put some load on the gpu. After few hours it froze again.

[ 6603.232849] nouveau 0000:01:00.0: gr: TRAP ch 4 [00ff8a9000 systemd-logind[1385]]
[ 6603.232865] nouveau 0000:01:00.0: gr: GPC0/TPC0/MP trap: global 00000000 [] warp 3e0001 [STACK_ERROR]
[ 6603.246125] nouveau 0000:01:00.0: gr: TRAP ch 4 [00ff8a9000 systemd-logind[1385]]
[ 6603.246137] nouveau 0000:01:00.0: gr: GPC0/TPC2/MP trap: global 00000000 [] warp 3d0001 [STACK_ERROR]

I then logged in via ssh and chromium was running with 100% cpu. After killing chromium the log went a bit further, but the screen did not recover and stayed frozen.

[ 6758.631306] nouveau 0000:01:00.0: Xorg[6494]: failed to idle channel 8 [Xorg[6494]]
[ 6773.631329] nouveau 0000:01:00.0: Xorg[6494]: failed to idle channel 8 [Xorg[6494]]
[ 6773.632334] nouveau 0000:01:00.0: fifo: fault 00 [READ] at 00000000000c2000 engine 07 [HOST0] client 06 [HUB/HOST] reason 42 [] on channel 8 [00fec69000 Xorg[6494]]
[ 6773.632347] nouveau 0000:01:00.0: fifo: channel 8: killed
[ 6773.632352] nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
[ 6773.632366] nouveau 0000:01:00.0: fifo: engine 5: scheduled for recovery
[ 6773.632378] nouveau 0000:01:00.0: Xorg[6494]: channel 8 killed!
Comment 7 sassmann 2019-01-10 16:45:35 UTC
Happened again, this time while starting firefox. No tabs opened yet.
nouveau 0000:01:00.0: disp: 0x000064a8[0]: INIT_GENERIC_CONDITON: unknown 0x07
nouveau 0000:01:00.0: DRM: GPU lockup - switching to software fbcon
nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery
nouveau 0000:01:00.0: fifo: channel 4: killed
nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery
nouveau 0000:01:00.0: systemd-logind[1385]: channel 4 killed!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.