Created attachment 95961 [details]
Kernel log for session with manual radeon module loading with modesetting enabled
Flagged as major as this prevents kernel boot if the user doesnt know how to disable the functionality with the kernel boot parameters.
Originally reported in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=741619 , Debian kernel v3.13 enables dpm by default - partway through the boot process (when cryptsetup is opening encrypted disks), the kernel hangs (confirmed with no response to ping).
The kernel boots fine with radeon.dpm=0.
Following instructions (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=741485#10) I booted with radeon.modeset=0, made sure X wasn't running and radeon was unloaded, then modprobe'd radeon modeset=1 - after a small delay everything hung.
I have attached kern.log (REISUB FTW this time it seems?), there is an X log but that simply gave up after finding no modesetting support, before the point where I reloaded radeon.
kern.log is verbose as I always boot with full debugging information, but in this case its useless as nothing appears to be logged associated with the radeon load (and I'm assuming the magic sync worked).
In terms of fighting the issue, I have C and C++ experience but no real kernel debugging EXP and certainly nothing todo with the graphical stack.
uname -a: Linux omega1 3.13-1-amd64 #1 SMP Debian 3.13.5-1 (2014-03-04) x86_64 GNU/Linux
X server: 1.15.0
Does booting an older kernel (e.g., 3.11 or 3.12) with radeon.dpm=1 work ok? Please attach your dmesg output with kms enabled.
I tried twice to get at flushed dmesg output for v3.13, both failed - I confirmed the hang happens with v3.12 and v3.11 - looks like REISUB does nothing, no dmesg files were created for the relevant boots...
Shall I set up netconsole and see what I can get?
Just for my education, it looks like dmesg output is part of kern.log (https://stackoverflow.com/a/11413417/1188444) - is this correct?
Yes. dmesg prints out the kernel log for the current boot. Can you attach the log with radeon.dpm=0? The log in comment 0 is with modeset=0.
Created attachment 95969 [details]
dmesg from normal v3.13 boot with radeon.dpm=0
Normal boot dmesg attached
I am having a similar problem with my 4870. If i disconnect my second monitor I can boot with radeon.dmp = 1. If I reconnect the monitor again it will lock up shortly after boot.
A bisect brings up this commit https://github.com/torvalds/linux/commit/ab70b1dde73ff4525c3cd51090c233482c50f21
Which makes sense since this commit enabled dpm by default on radeon x7xx series cards.
Created attachment 96138 [details]
Greg: Log where i managed to get system to boot to terminal session before crash happened.
Log from kernel created from bisected copy of Linus's github on an linux machine. I added a few DRM_INFO() outputs in an attempt to find where the lockup is occurring. From what I can tell the crash is preventing the logs from being written out,
Created attachment 96139 [details]
Kernel Panic Displayed after waiting out the lockup very rare
If I wait around long enough the system might kernel panic and display a few messages that don't seem to get added to the logs. I have a photo of them taken from my phone to attach.
Created attachment 96140 [details]
Disabled all but one core and hyper-threading. Helped with log loss
Log with more info. Disabled hyper-threading and all but one core.
Created attachment 96141 [details]
Image of kernel panic with single core only
Image of kernel panic with single core only
Created attachment 96170 [details] [review]
disable some dpm features
Does the attached kernel patch help? If so, can you narrow down what part of it helps?
Patch did not resolve the problem.
Created attachment 96182 [details]
Core Dump at insane logging level.
I enabled every debug line I could find. Got this.
Mar 21 10:00:54 endora systemd: Received SIGCHLD from PID 584 (sd_cicero).
Mar 21 10:00:54 endora systemd: Child 584 (sd_cicero) died (code=exited, status=1/FAILURE)
Mar 21 10:00:54 endora kernel: [drm:r600_irq_process], r600_irq_process start: rptr 18624, wptr 18640
Mar 21 10:00:54 endora kernel: [drm:drm_calc_vbltimestamp_from_scanoutpos], crtc 1 : v 13 p(521,-40)@ 28.771340 -> 28.771949 [e 0 us, 0 rep]
Mar 21 10:00:54 endora kernel: [drm:r600_irq_process], IH: D2 vblank
Mar 21 10:00:54 endora kernel: sd_festival: segfault at 2d0 ip 00007fcc15b8d1a1 sp 00007ffff79b0a10 error 4 in libpthread-2.19.so[7fcc15b80000+18000]
Mar 21 10:00:54 endora kernel: potentially unexpected fatal signal 11.
Mar 21 10:00:54 endora kernel: CPU: 7 PID: 590 Comm: sd_festival Tainted: G I 3.14.0-rc7-ARCH-00059-g08edb33-dirty #1
Mar 21 10:00:54 endora kernel: Hardware name: /DX58SO, BIOS SOX5810J.86A.5600.2013.0729.2250 07/29/2013
Mar 21 10:00:54 endora kernel: task: ffff8801a537cf00 ti: ffff8800c1f2e000 task.ti: ffff8800c1f2e000
Mar 21 10:00:54 endora kernel: RIP: 0033:[<00007fcc15b8d1a1>] [<00007fcc15b8d1a1>] 0x7fcc15b8d1a1
Mar 21 10:00:54 endora kernel: RSP: 002b:00007ffff79b0a10 EFLAGS: 00010206
Mar 21 10:00:54 endora kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007fcc15b7a678
Mar 21 10:00:54 endora kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Mar 21 10:00:54 endora kernel: RBP: 00007ffff79b0bb8 R08: 00007fcc15b7a678 R09: 00007fcc15b7a670
Mar 21 10:00:54 endora kernel: R10: 00007ffff79b07e0 R11: 00007fcc15b8d1a0 R12: 0000000000405d6c
Mar 21 10:00:54 endora kernel: R13: 00007ffff79b0bb0 R14: 0000000000000000 R15: 0000000000000000
Mar 21 10:00:54 endora kernel: FS: 00007fcc16afe700(0000) GS:ffff8801aece0000(0000) knlGS:0000000000000000
Mar 21 10:00:54 endora kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 21 10:00:54 endora kernel: CR2: 00000000000002d0 CR3: 00000000c1cf8000 CR4: 00000000000007e0
Mar 21 10:00:54 endora kernel:
Mar 21 10:00:54 endora kernel: [drm:r600_irq_process], r600_irq_process start: rptr 18640, wptr 18656
Mar 21 10:00:54 endora kernel: [drm:drm_calc_vbltimestamp_from_scanoutpos], crtc 0 : v 13 p(1866,-47)@ 28.774347 -> 28.775032 [e 0 us, 0 rep]
Mar 21 10:00:54 endora kernel: [drm:r600_irq_process], IH: D1 vblank
Mar 21 10:00:54 endora systemd-coredump: Process 590 (sd_festival) dumped core.
Uninstalled segfalting package - Speech-Dispatcher (part of kde accessability) - Segfault resovled system still locks up with dpm.radeon = 1.
I noticed v3.13.7-1 has come to Debian testing today with the following in the changelog:
- drm/radeon: fix runpm disabling on non-PX harder
(may fix #741619, #742507)
I can confirm it doesnt help when I get rid of 'radeon.dpm=0' on boot.
This is still a problem with kernel 3.14.4-1.
(In reply to comment #15)
> This is still a problem with kernel 3.14.4-1.
It's fixed in:
Doesn't look like that has hit 3.14 yet.
Thanks, but thats a workaround, not a fix - is DPM abandoned for these cards?
Created attachment 99834 [details] [review]
fix for 73911
this patch will make a key register's value correct and fix this bug
Comment on attachment 99834 [details] [review]
fix for 73911
jyliu: You've put the patch on the wrong ticket, this is for DPM on r600g cards.