Bug 94694

Summary: [NVC1] Attempt to use kde/plasma renders the system immediately unresponsive
Product: Mesa Reporter: Alexander Dubov <oakad>
Component: Drivers/DRI/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED MOVED QA Contact: Nouveau Project <nouveau>
Severity: major    
Priority: medium    
Version: 11.1   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: Xorg.0.log for the failed session

Description Alexander Dubov 2016-03-25 13:03:40 UTC
Created attachment 122538 [details]
Xorg.0.log for the failed session

The crux of the problem: 

Mar 25 23:40:25 mercador kernel: nouveau 0000:01:00.0: plasmashell[6208]: failed
 to idle channel 17 [plasmashell[6208]]
Mar 25 23:40:27 mercador kernel: nouveau 0000:01:00.0: timeout at drivers/gpu/dr
m/nouveau/nvkm/engine/fifo/gpfifogf100.c:66/gf100_fifo_gpfifo_engine_fini()!
Mar 25 23:40:27 mercador kernel: nouveau 0000:01:00.0: fifo: channel 17 [plasmas
hell[6208]] kick timeout
Mar 25 23:40:27 mercador kernel: nouveau: plasmashell[6208]:00000000:0000906f: d
etach gr failed, -16
Mar 25 23:40:27 mercador kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 0d []
Mar 25 23:40:29 mercador kernel: nouveau 0000:01:00.0: timeout at drivers/gpu/dr
m/nouveau/nvkm/engine/fifo/gpfifogf100.c:66/gf100_fifo_gpfifo_engine_fini()!
Mar 25 23:40:29 mercador kernel: nouveau 0000:01:00.0: fifo: channel 17 [plasmas
hell[6208]] kick timeout
Mar 25 23:40:29 mercador kernel: nouveau: plasmashell[6208]:00000000:0000906f: d
etach sw failed, -16
Mar 25 23:40:44 mercador kernel: nouveau 0000:01:00.0: plasmashell[6208]: failed
 to idle channel 17 [plasmashell[6208]]

This will happen immediately upon any interaction with KDE 5 panel.

At this moment system is rendered inoperable, even though it may still possible to reboot it over network. The CPU usage becomes high, apparently being consumed by an in-kernel thread.

The kernel version is 4.4.6
Kde/plasma version is 5.19.0, qt is 5.5.1
X11/nouveau driver version is 1.0.12

Xorg.0.log is attached (with some additional errors signalling that event queue is filling up).
Comment 1 Ilia Mirkin 2016-03-25 13:12:04 UTC
Anything before the "failed to idle" message in dmesg? That generally indicates "GPU hung", but often there will be information earlier which helps understand why the GPU is dying.
Comment 2 Alexander Dubov 2016-03-25 13:29:52 UTC
Some additional messages do appear:

Mar 25 23:40:44 mercador kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 0d []
Mar 25 23:40:46 mercador kernel: nouveau 0000:01:00.0: fifo: runlist update time
out
Mar 25 23:40:46 mercador kernel: nouveau 0000:01:00.0: fifo: INTR 00000001: 0000
000b

During a different attempt:

Mar 25 18:19:03 mercador kernel: [ 3381.724766] nouveau 0000:01:00.0: fifo: PBDM
A0: 00040000 [] ch 17 [003f4cc000 plasmashell[24734]] subc 0 mthd 2490 data 0000
00d0
Mar 25 18:19:03 mercador kernel: [ 3381.724779] nouveau 0000:01:00.0: fifo: PBDM
A0: 00200000 [ILLEGAL_MTHD] ch 17 [003f4cc000 plasmashell[24734]] subc 0 mthd 00
0c data 80ff0e04
Mar 25 18:19:03 mercador kernel: [ 3381.724789] nouveau 0000:01:00.0: fifo: PBDM
A0: 02000000 [] ch 17 [003f4cc000 plasmashell[24734]] subc 0 mthd 0020 data 0000
8006
Mar 25 18:19:03 mercador kernel: [ 3381.724801] nouveau 0000:01:00.0: fifo: PBDM
A0: 00200000 [ILLEGAL_MTHD] ch 17 [003f4cc000 plasmashell[24734]] subc 0 mthd 00
30 data 800103e4
Mar 25 18:19:03 mercador kernel: [ 3381.724812] nouveau 0000:01:00.0: fifo: PBDM
A0: 00200000 [ILLEGAL_MTHD] ch 17 [003f4cc000 plasmashell[24734]] subc 0 mthd 00
34 data 20010680
Mar 25 18:19:03 mercador kernel: [ 3381.724822] nouveau 0000:01:00.0: fifo: PBDM
A0: 00200000 [ILLEGAL_MTHD] ch 17 [003f4cc000 plasmashell[24734]] subc 0 mthd 00
38 data 00001111
.... more similar messages to follow ....

Also, there may have been other messages during other runs (because of a nasty nature of a fault it's not always possible to capture them).

Also, it appears that only Plasma is affected: Gnome, Firefox (including WebGL stuff), most KDE apps running stand alone appear to just work.
Comment 3 Alexander Dubov 2016-03-25 17:48:42 UTC
It so appears that the issue is due to use of libGLESv2. After re-linking the Qt library to libGL everything appears to work correctly.

Still, it is pretty unfortunate that a problematic user space library can crash the kernel like this.
Comment 4 GitLab Migration User 2019-09-18 20:42:15 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1098.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.