Created attachment 80540 [details] kernel log Kernel logs report "Watchdog detected hard LOCKUP on cpu 1". It has occurred once per day for the last three days. Each time, it happens seemingly randomly - monitor is turned off, I'm connected remotely via SSH but system is idle. This is on ArchLinux with xf86-video-nouveau 1.0.7. The machine is ~2.5 years old and has been very solid until now. My video card is: 01:00.0 VGA compatible controller: NVIDIA Corporation GT216 [GeForce GT 220] (rev a2). I have attached the kernel log. I also note that I don't know why the 'PTHERM' events would be occurring - this was while the connected monitor was off and I was only connected remotely via SSH.
nouveau_fan_update: spin_lock_irqsave(&fan->lock, flags); /* schedule next fan update, if not at target speed already */ if (list_empty(&fan->alarm.head) && target != duty) { u16 bump_period = fan->bios.bump_period; u16 slow_down_period = fan->bios.slow_down_period; ... ptimer->alarm(ptimer, delay * 1000 * 1000, &fan->alarm); If delay is somehow 0, the ->alarm will cause nouveau_fan_update to get called immediately. Can you add a printk to that function that shows the values? (This may end up totally flooding your dmesg too... but I think the values may be the same across prints.) e.g. diff --git a/drivers/gpu/drm/nouveau/core/subdev/therm/fan.c b/drivers/gpu/drm/nouveau/core/subdev/therm/fan.c index c728380..9453afd 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/therm/fan.c +++ b/drivers/gpu/drm/nouveau/core/subdev/therm/fan.c @@ -88,7 +88,7 @@ nouveau_fan_update(struct nouveau_fan *fan, bool immediate, int target) delay = min(bump_period, slow_down_period) ; else delay = bump_period; - + nv_info(therm, "Scheduling fan update in %d (slow: %d, bump: %d)\n", delay, slow_down_period, bump_period); ptimer->alarm(ptimer, delay * 1000 * 1000, &fan->alarm); }
(In reply to comment #1) > If delay is somehow 0, the ->alarm will cause nouveau_fan_update to get > called immediately. Can you add a printk to that function that shows the > values? (This may end up totally flooding your dmesg too... but I think the > values may be the same across prints.) Thanks for your help. I will try to add the printk. The lock has made the machine inaccessible remotely, so it'll be a couple of days until I can post the results.
I've had the debug statement active for the last few days, but the error hasn't re-occurred. I discovered that my video card's fan was only working intermittently, which a clean seems to have fixed. Since the inactive fan likely caused the error I was seeing, I am closing this bug report as invalid.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.