I have an NVidia 670 GTX with a DVI & HDMI displays connected respectively. Using the proprietary NVidia driver, I've had no spinlocks while disconnecting or connecting (eg. powering-off, and powering-on) the HDMI display while X is running. Switched over to xf86-video-nouveau-1.0.12 and kernel-4.8.7/4.8.9 nouveau driver (and removing NVidia and resetting OpenGL driver links), and I'm experiencing an X (xorg-server-1.18.4) 100% CPU usage/spinlock on one core whenever I power-on or power-off the HDMI device. I've switched on all verbose logging within the nouveau kernel and DRM kernel driver. I've compiled xorg-server with debug flags, and used gdb and strace remotely on the X (xorg-server) and watched the logs when the 100% CPU usage would start. The only oddity with the logs; upon HDMI connect/disconnect and 100% CPU usage, what looks like a count down timer starts to spawn through strace. (From looks, looks like the DPMS timer for monitor time-out.) Looking at GDB backtrace doesn't show much, but I'll attach anyways because the programmer who wrote & understand the code may likely see what's occurring. From looks, looks like the DPMS timer for monitor time-out. However; I've disabled DPMS within nouveau kernel command line and xorg.conf, and the 100% CPU (spinlock) still occurs and DPMS is obviously disabled within the logs. One thing I did not think of, I should compile the kernel (eg. nouveau kernel driver) using debugging. (Think I read an earlier post of using MMIO tracing.)
Created attachment 128220 [details] strace-withNoRace.log
Created attachment 128221 [details] strace-withRace.log
Created attachment 128222 [details] gdb-4.log
Also should mention I've I have been using: /usr/bin/startx -- -logverbose 20 -verbose 20 Forgot to attach my Xorg.conf and Xorg.0.log, although I think nothing is mentioned within the Xorg.0.log file.
Created attachment 128241 [details] xorg.conf
Created attachment 128242 [details] Xorg.0.log
Created attachment 128255 [details] Xorg.0-HDMI-1-Samsung.log This is a log file, with loading nouveau module/driver after the system is booted and the Samsung HDMI monitor already powered on and connected to a Yamaha Receiver, and then further connected to the HDMI connector of the NVidia card. The Dell monitor is connected via DVI-I-1 connector of the NVidia card.
Switching off the HDMI device seems to present the following, using the kobject Linux kernel tracing/hacking option. The same is presented again via syslog when switching on the HDMI device. Nov 28 18:35:30 localhost4 kernel: kobject: 'card0' (ffff8807f9bf8010): kobject_uevent_env Nov 28 18:35:30 localhost4 kernel: kobject: 'card0' (ffff8807f9bf8010): fill_kobj_path: path = '/devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:10.0/0000:04:00.0/drm/card0'
Created attachment 128260 [details] dmesg-hdmi-connected-cpurun.log On connect, this is what dmesg outputs using nouveau.debug=debug with Linux kernel kobjects.
So far I've narrowed this bug down to the Nouveau's "ZaphodHeads" option within xorg.conf. Without ZaphodHeads using an xorg, it looks like the dual monitor setup uses Xinerama by default (regardless of the mention of Xinerama being disabled by default), and both displays work, even after powering cycling the Yamaha receiver or HDMI display. No CPU run, etc. Although I'm not a fan of extending the displays, I prefer separate displays! But when using ZaphodHeads option within xorg.conf, the CPU spinlock is once again present when power cycling the HDMI display, wither by the monitor power button or home theater HDMI capable receiver. I've tried using the trace option for nouveau driver/module, but I see nothing abnormal aside from a bad EDID value, which Xorg should have no problems handling gracefully. I've also tried the kernel mmio tracing, but seems nouveau is hard locking the computer during tracing. I'm assuming since I haven't seen anything obvious within the nouveau debug tracing, the problem maybe elsewhere. Using strace on the X process, seems to show some anomalies within the output. Using "startx -- -logverbose 5 -verbose 5" or with higher values seems to show no further debugging output, aside from the initial start-up of X. (Albeit, might see a VT switch or something else every so often.) And the process stuck within the 100% CPU spinlock is X itself. So I'll guess X is in a loop checking or running through something, and I can't get any further debugging than this. Most I can do with gdb is debugging segfaults. I would think this bug is something that should be sending hints into the verbose logging method. Very likely the problem is within X at this point. Suggestions for further debug methods for X would be helpful!
(In reply to Roger from comment #10) > [...] > So I'll guess X is in a loop checking or running through something, and I > can't get any further debugging than this. Most I can do with gdb is > debugging segfaults. I would think this bug is something that should be > sending hints into the verbose logging method. > > Very likely the problem is within X at this point. > > Suggestions for further debug methods for X would be helpful! If that's the case, I reckon attaching gdb to the running process while it's taking 100% CPU will most likely give you the portion of code where it's stuck, a backtrace would tell what brought it in there, and a few "step" or "next" would allow you to follow a few iterations. So best, imho, is to use gdb to investigate when the issue is occurring, means connecting to the system remotely via ssh though.
Doh. I forgot to use next/step instead of repeated backtrace/bt. So if my attached gdb log file(s) do not contain the race condition, I'll later attach some gdb log files using next/step commands.
Created attachment 128380 [details] gdb-5-step.log Using next, step, step, step; the following occurred in brevity. (See gdb-5-step.log for the entire gdb log.) eip:Error while running hook_stop: Value can't be converted to integer within: xorg-server-1.18.4/os/WaitFor.c:229 xorg-server-1.18.4/xorg-server-1.18.4/dix/dixutils.c:417 xorg-server-1.18.4/config/udev.c:336 xorg-server-1.18.4/dix/dixutils.c:421
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/xserver/issues/508.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.