Summary: | nouveau crash/freeze [MULTIPLE_WARP_ERRORS] warp 3f0009 [ILLEGAL_INSTR_ENCODING] | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | sassmann | ||||||
Component: | Driver/nouveau | Assignee: | Nouveau Project <nouveau> | ||||||
Status: | RESOLVED MOVED | QA Contact: | Xorg Project Team <xorg-team> | ||||||
Severity: | major | ||||||||
Priority: | medium | CC: | hgcoin, john, karolherbst | ||||||
Version: | unspecified | ||||||||
Hardware: | x86-64 (AMD64) | ||||||||
OS: | Linux (All) | ||||||||
See Also: | https://bugs.freedesktop.org/show_bug.cgi?id=108080 | ||||||||
Whiteboard: | |||||||||
i915 platform: | i915 features: | ||||||||
Attachments: |
|
error on kernel 4.18.5 looked a little bit different. [11735.012648] nouveau 0000:01:00.0: gr: TRAP ch 5 [00ff35f000 Xorg[2601]] [11735.012660] nouveau 0000:01:00.0: gr: GPC0/TPC0/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 000e [OOR_ADDR] [11735.012666] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 3000e [OOR_ADDR] [11735.012672] nouveau 0000:01:00.0: gr: GPC0/TPC2/MP trap: global 00000000 [] warp 000e [OOR_ADDR] [11735.012678] nouveau 0000:01:00.0: gr: GPC0/TPC3/MP trap: global 00000000 [] warp 000e [OOR_ADDR] [11735.012701] nouveau 0000:01:00.0: gr: TRAP ch 5 [00ff35f000 Xorg[2601]] [11735.012709] nouveau 0000:01:00.0: gr: GPC0/TPC0/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 9000e [OOR_ADDR] [11735.012715] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 2000e [OOR_ADDR] [11735.012720] nouveau 0000:01:00.0: gr: GPC0/TPC2/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 6000e [OOR_ADDR] [11735.012726] nouveau 0000:01:00.0: gr: GPC0/TPC3/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 4000e [OOR_ADDR] [11735.013192] nouveau 0000:01:00.0: gr: TRAP ch 4 [00ff8a7000 systemd-logind[1203]] [11735.013201] nouveau 0000:01:00.0: gr: GPC0/TPC0/MP trap: global 00000000 [] warp 30009 [ILLEGAL_INSTR_ENCODING] [11735.013208] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000000 [] warp 10009 [ILLEGAL_INSTR_ENCODING] [11735.013214] nouveau 0000:01:00.0: gr: GPC0/TPC2/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 0009 [ILLEGAL_INSTR_ENCODING] [11735.013219] nouveau 0000:01:00.0: gr: GPC0/TPC3/MP trap: global 00000000 [] warp 10009 [ILLEGAL_INSTR_ENCODING] Tested again with 4.20-rc7. System ran for 1-2 days and then froze again. The error looked different though. [162840.653595] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [162840.653610] nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery [162840.653621] nouveau 0000:01:00.0: fifo: channel 4: killed [162840.653631] nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery [162840.654013] nouveau 0000:01:00.0: systemd-logind[1383]: channel 4 killed! Related post: https://askubuntu.com/q/1046945/78223 Problem is, the log isn't able to tell us which application is causing that. Do you think you could try to SSH into the machine while the screen hard freezes and check with top/htop if there is anything consuming lots of CPU? And see if killing that application unfreezes the screen? If the only application consuming significantly more CPU is Xorg, maybe killing applications inside the Xorg session until it unfreezes could help us track down which application is causing that freeze. Unfortunately there's no process hogging the CPU. Would it help to add some debug kernel cmd line options? So I went ahead and tried 5.0-rc1, playing video with chromium to put some load on the gpu. After few hours it froze again. [ 6603.232849] nouveau 0000:01:00.0: gr: TRAP ch 4 [00ff8a9000 systemd-logind[1385]] [ 6603.232865] nouveau 0000:01:00.0: gr: GPC0/TPC0/MP trap: global 00000000 [] warp 3e0001 [STACK_ERROR] [ 6603.246125] nouveau 0000:01:00.0: gr: TRAP ch 4 [00ff8a9000 systemd-logind[1385]] [ 6603.246137] nouveau 0000:01:00.0: gr: GPC0/TPC2/MP trap: global 00000000 [] warp 3d0001 [STACK_ERROR] I then logged in via ssh and chromium was running with 100% cpu. After killing chromium the log went a bit further, but the screen did not recover and stayed frozen. [ 6758.631306] nouveau 0000:01:00.0: Xorg[6494]: failed to idle channel 8 [Xorg[6494]] [ 6773.631329] nouveau 0000:01:00.0: Xorg[6494]: failed to idle channel 8 [Xorg[6494]] [ 6773.632334] nouveau 0000:01:00.0: fifo: fault 00 [READ] at 00000000000c2000 engine 07 [HOST0] client 06 [HUB/HOST] reason 42 [] on channel 8 [00fec69000 Xorg[6494]] [ 6773.632347] nouveau 0000:01:00.0: fifo: channel 8: killed [ 6773.632352] nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery [ 6773.632366] nouveau 0000:01:00.0: fifo: engine 5: scheduled for recovery [ 6773.632378] nouveau 0000:01:00.0: Xorg[6494]: channel 8 killed! Happened again, this time while starting firefox. No tabs opened yet. nouveau 0000:01:00.0: disp: 0x000064a8[0]: INIT_GENERIC_CONDITON: unknown 0x07 nouveau 0000:01:00.0: DRM: GPU lockup - switching to software fbcon nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery nouveau 0000:01:00.0: fifo: channel 4: killed nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery nouveau 0000:01:00.0: systemd-logind[1385]: channel 4 killed! are you still on Fedora 28? I could create a mesa package to try out for you for fedora 29 which should fix those issues, but they would need to be tested a bit longer to be sure. I am sure what the issue is, just fixing it is quite challanging. anyway, here is the copr for fc29, just triggered a new build with an updated version: https://copr.fedorainfracloud.org/coprs/karolherbst/mesa/ version should be 18.2.8-9001.fc29 I'm on f29 by now. Looking at https://copr.fedorainfracloud.org/coprs/karolherbst/mesa/ the build failed. Let me know when a new build is available. Thanks! Any news on this? Still having this issue. Created attachment 144827 [details]
Syslog with nouveau events leading to hard lock
Attached is a /var/log/syslog snip showing many events leading to the hard lock, followed by a trimmed reboot trace showing the nouveau configuration.
Kernel is generic Linux ceo1homenx 5.2.0-8-generic #9-Ubuntu SMP Mon Jul 8 13:07:27 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Context is: a development system left for the night, crashed after perhaps 10 hours of idleness. System is not a server for anything, not running any vms. Running apps of note was an email client and web browser, otherwise just basic KDE. Compositor was off.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/454. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 141458 [details] dmesg.txt After an arbitrary amount of time of working in the terminal the screen hard freezes. External monitor is connected via DisplayPort on Lenovo Dock. dmesg then shows: [172362.507754] nouveau 0000:01:00.0: gr: TRAP ch 6 [00ff396000 Xorg[4075]] [172362.507767] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 3f0009 [ILLEGAL_INSTR_ENCODING] [172362.507775] nouveau 0000:01:00.0: gr: GPC0/TPC2/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 3f0009 [ILLEGAL_INSTR_ENCODING] [172362.507782] nouveau 0000:01:00.0: gr: GPC0/TPC3/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 3c0009 [ILLEGAL_INSTR_ENCODING] [172362.507805] nouveau 0000:01:00.0: gr: TRAP ch 6 [00ff396000 Xorg[4075]] [172362.507815] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000000 [] warp 3e0009 [ILLEGAL_INSTR_ENCODING] [172362.507823] nouveau 0000:01:00.0: gr: GPC0/TPC2/MP trap: global 00000000 [] warp 3e0009 [ILLEGAL_INSTR_ENCODING] [172362.507830] nouveau 0000:01:00.0: gr: GPC0/TPC3/MP trap: global 00000000 [] warp 3d0009 [ILLEGAL_INSTR_ENCODING] [172362.517638] nouveau 0000:01:00.0: gr: TRAP ch 6 [00ff396000 Xorg[4075]] [172362.517651] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 3d0009 [ILLEGAL_INSTR_ENCODING] [172362.517658] nouveau 0000:01:00.0: gr: GPC0/TPC2/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 3d0009 [ILLEGAL_INSTR_ENCODING] [172362.517665] nouveau 0000:01:00.0: gr: GPC0/TPC3/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 3f0009 [ILLEGAL_INSTR_ENCODING] [172362.517685] nouveau 0000:01:00.0: gr: TRAP ch 6 [00ff396000 Xorg[4075]] [172362.517695] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000000 [] warp 3c0009 [ILLEGAL_INSTR_ENCODING] [172362.517702] nouveau 0000:01:00.0: gr: GPC0/TPC2/MP trap: global 00000000 [] warp 3c0009 [ILLEGAL_INSTR_ENCODING] [172362.517711] nouveau 0000:01:00.0: gr: GPC0/TPC3/MP trap: global 00000000 [] warp 3e0009 [ILLEGAL_INSTR_ENCODING] [172362.534375] nouveau 0000:01:00.0: gr: TRAP ch 6 [00ff396000 Xorg[4075]] [172362.534387] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 3f0009 [ILLEGAL_INSTR_ENCODING] [172362.534394] nouveau 0000:01:00.0: gr: GPC0/TPC2/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 3f0009 [ILLEGAL_INSTR_ENCODING] [172362.534399] nouveau 0000:01:00.0: gr: GPC0/TPC3/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 3d0009 [ILLEGAL_INSTR_ENCODING] Reboot is required at this point. Happens regularly, sometimes after few hours, sometimes after 1-2 days. Hardware: Lenovo P50 running Fedora 28 4.17.19-200.fc28.x86_64 xorg-x11-drv-nouveau-1.0.15-4.fc28.x86_64 mesa-dri-drivers-18.0.5-3.fc28.x86_64 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM107GLM [Quadro M1000M] [10de:13b1] (rev a2) (prog-if 00 [VGA controller]) Subsystem: Lenovo Device [17aa:2230] Flags: bus master, fast devsel, latency 0, IRQ 131 Memory at b2000000 (32-bit, non-prefetchable) [size=16M] Memory at a0000000 (64-bit, prefetchable) [size=256M] Memory at b0000000 (64-bit, prefetchable) [size=32M] I/O ports at 4000 [size=128] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Capabilities: [100] Virtual Channel Capabilities: [250] Latency Tolerance Reporting Capabilities: [258] L1 PM Substates Capabilities: [128] Power Budgeting <?> Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Capabilities: [900] #19 Kernel driver in use: nouveau Kernel modules: nouveau