Bug 76286

Summary: Kernel v3.13 hang during boot now that dpm is enabled for radeon driver - Radeon HD4870
Product: DRI Reporter: OmegaPhil
Component: DRM/RadeonAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: major    
Priority: medium CC: landjgregory, OmegaPhil
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Kernel log for session with manual radeon module loading with modesetting enabled
none
dmesg from normal v3.13 boot with radeon.dpm=0
none
Greg: Log where i managed to get system to boot to terminal session before crash happened.
none
Kernel Panic Displayed after waiting out the lockup very rare
none
Disabled all but one core and hyper-threading. Helped with log loss
none
Image of kernel panic with single core only
none
disable some dpm features
none
Core Dump at insane logging level.
none
fix for 73911 none

Description OmegaPhil 2014-03-17 18:32:46 UTC
Created attachment 95961 [details]
Kernel log for session with manual radeon module loading with modesetting enabled

Flagged as major as this prevents kernel boot if the user doesnt know how to disable the functionality with the kernel boot parameters.

Originally reported in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=741619 , Debian kernel v3.13 enables dpm by default - partway through the boot process (when cryptsetup is opening encrypted disks), the kernel hangs (confirmed with no response to ping).

The kernel boots fine with radeon.dpm=0.

Following instructions (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=741485#10) I booted with radeon.modeset=0, made sure X wasn't running and radeon was unloaded, then modprobe'd radeon modeset=1 - after a small delay everything hung.

I have attached kern.log (REISUB FTW this time it seems?), there is an X log but that simply gave up after finding no modesetting support, before the point where I reloaded radeon.

kern.log is verbose as I always boot with full debugging information, but in this case its useless as nothing appears to be logged associated with the radeon load (and I'm assuming the magic sync worked).

In terms of fighting the issue, I have C and C++ experience but no real kernel debugging EXP and certainly nothing todo with the graphical stack.


===============================

uname -a: Linux omega1 3.13-1-amd64 #1 SMP Debian 3.13.5-1 (2014-03-04) x86_64 GNU/Linux
Debian Testing
X server: 1.15.0
Radeon: 7.3.0
Comment 1 Alex Deucher 2014-03-17 20:00:56 UTC
Does booting an older kernel (e.g., 3.11 or 3.12) with radeon.dpm=1 work ok?  Please attach your dmesg output with kms enabled.
Comment 2 OmegaPhil 2014-03-17 21:10:15 UTC
I tried twice to get at flushed dmesg output for v3.13, both failed - I confirmed the hang happens with v3.12 and v3.11 - looks like REISUB does nothing, no dmesg files were created for the relevant boots...

Shall I set up netconsole and see what I can get?

Just for my education, it looks like dmesg output is part of kern.log (https://stackoverflow.com/a/11413417/1188444) - is this correct?
Comment 3 Alex Deucher 2014-03-17 21:13:02 UTC
Yes. dmesg prints out the kernel log for the current boot.  Can you attach the log with radeon.dpm=0?  The log in comment 0 is with modeset=0.
Comment 4 OmegaPhil 2014-03-17 21:18:51 UTC
Created attachment 95969 [details]
dmesg from normal v3.13 boot with radeon.dpm=0

Normal boot dmesg attached
Comment 5 Gregory Land 2014-03-21 06:06:09 UTC
I am having a similar problem with my 4870.  If i disconnect my second monitor I can boot with radeon.dmp = 1.  If I reconnect the monitor again it will lock up shortly after boot.

A bisect brings up this commit https://github.com/torvalds/linux/commit/ab70b1dde73ff4525c3cd51090c233482c50f21
Which makes sense since this commit enabled dpm by default on radeon x7xx series cards.
Comment 6 Gregory Land 2014-03-21 06:21:34 UTC
Created attachment 96138 [details]
Greg: Log where i managed to get system to boot to terminal session before crash happened.

  Log from kernel created from bisected copy of Linus's github on an linux machine.  I added a few DRM_INFO() outputs in an attempt to find where the lockup is occurring.  From what I can tell the crash is preventing the logs from being written out,
Comment 7 Gregory Land 2014-03-21 06:41:38 UTC
Created attachment 96139 [details]
Kernel Panic Displayed after waiting out the lockup very rare

  If I wait around long enough the system might kernel panic and display a few messages that don't seem to get added to the logs.  I have a photo of them taken from my phone to attach.
Comment 8 Gregory Land 2014-03-21 07:13:39 UTC
Created attachment 96140 [details]
Disabled all but one core and hyper-threading.  Helped with log loss

  Log with more info.  Disabled hyper-threading and all but one core.
Comment 9 Gregory Land 2014-03-21 07:37:09 UTC
Created attachment 96141 [details]
Image of kernel panic with single core only

Image of kernel panic with single core only
Comment 10 Alex Deucher 2014-03-21 14:45:17 UTC
Created attachment 96170 [details] [review]
disable some dpm features

Does the attached kernel patch help?  If so, can you narrow down what part of it helps?
Comment 11 Gregory Land 2014-03-21 17:48:21 UTC
Patch did not resolve the problem.
Comment 12 Gregory Land 2014-03-21 18:11:30 UTC
Created attachment 96182 [details]
Core Dump at insane logging level.

I enabled every debug line I could find. Got this.


Mar 21 10:00:54 endora systemd[1]: Received SIGCHLD from PID 584 (sd_cicero).
Mar 21 10:00:54 endora systemd[1]: Child 584 (sd_cicero) died (code=exited, status=1/FAILURE)
Mar 21 10:00:54 endora kernel: [drm:r600_irq_process], r600_irq_process start: rptr 18624, wptr 18640
Mar 21 10:00:54 endora kernel: [drm:drm_calc_vbltimestamp_from_scanoutpos], crtc 1 : v 13 p(521,-40)@ 28.771340 -> 28.771949 [e 0 us, 0 rep]
Mar 21 10:00:54 endora kernel: [drm:r600_irq_process], IH: D2 vblank
Mar 21 10:00:54 endora kernel: sd_festival[590]: segfault at 2d0 ip 00007fcc15b8d1a1 sp 00007ffff79b0a10 error 4 in libpthread-2.19.so[7fcc15b80000+18000]
Mar 21 10:00:54 endora kernel: potentially unexpected fatal signal 11.
Mar 21 10:00:54 endora kernel: CPU: 7 PID: 590 Comm: sd_festival Tainted: G          I  3.14.0-rc7-ARCH-00059-g08edb33-dirty #1
Mar 21 10:00:54 endora kernel: Hardware name:                  /DX58SO, BIOS SOX5810J.86A.5600.2013.0729.2250 07/29/2013
Mar 21 10:00:54 endora kernel: task: ffff8801a537cf00 ti: ffff8800c1f2e000 task.ti: ffff8800c1f2e000
Mar 21 10:00:54 endora kernel: RIP: 0033:[<00007fcc15b8d1a1>]  [<00007fcc15b8d1a1>] 0x7fcc15b8d1a1
Mar 21 10:00:54 endora kernel: RSP: 002b:00007ffff79b0a10  EFLAGS: 00010206
Mar 21 10:00:54 endora kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007fcc15b7a678
Mar 21 10:00:54 endora kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Mar 21 10:00:54 endora kernel: RBP: 00007ffff79b0bb8 R08: 00007fcc15b7a678 R09: 00007fcc15b7a670
Mar 21 10:00:54 endora kernel: R10: 00007ffff79b07e0 R11: 00007fcc15b8d1a0 R12: 0000000000405d6c
Mar 21 10:00:54 endora kernel: R13: 00007ffff79b0bb0 R14: 0000000000000000 R15: 0000000000000000
Mar 21 10:00:54 endora kernel: FS:  00007fcc16afe700(0000) GS:ffff8801aece0000(0000) knlGS:0000000000000000
Mar 21 10:00:54 endora kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 21 10:00:54 endora kernel: CR2: 00000000000002d0 CR3: 00000000c1cf8000 CR4: 00000000000007e0
Mar 21 10:00:54 endora kernel: 
Mar 21 10:00:54 endora kernel: [drm:r600_irq_process], r600_irq_process start: rptr 18640, wptr 18656
Mar 21 10:00:54 endora kernel: [drm:drm_calc_vbltimestamp_from_scanoutpos], crtc 0 : v 13 p(1866,-47)@ 28.774347 -> 28.775032 [e 0 us, 0 rep]
Mar 21 10:00:54 endora kernel: [drm:r600_irq_process], IH: D1 vblank
Mar 21 10:00:54 endora systemd-coredump[591]: Process 590 (sd_festival) dumped core.
Comment 13 Gregory Land 2014-03-21 18:32:12 UTC
Uninstalled segfalting package - Speech-Dispatcher (part of kde accessability) - Segfault resovled system still locks up with dpm.radeon = 1.
Comment 14 OmegaPhil 2014-03-29 21:22:38 UTC
I noticed v3.13.7-1 has come to Debian testing today with the following in the changelog:

   - drm/radeon: fix runpm disabling on non-PX harder
      (may fix #741619, #742507)

I can confirm it doesnt help when I get rid of 'radeon.dpm=0' on boot.
Comment 15 OmegaPhil 2014-05-22 13:48:21 UTC
This is still a problem with kernel 3.14.4-1.
Comment 16 Alex Deucher 2014-05-22 14:02:23 UTC
(In reply to comment #15)
> This is still a problem with kernel 3.14.4-1.

It's fixed in:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=76e6dcece841faebbee78895780e8209ff40d922
Doesn't look like that has hit 3.14 yet.
Comment 17 OmegaPhil 2014-05-22 14:58:51 UTC
Thanks, but thats a workaround, not a fix - is DPM abandoned for these cards?
Comment 18 jyliu 2014-05-26 07:36:36 UTC
Created attachment 99834 [details] [review]
fix for 73911

this patch will make a key register's value correct and fix this bug
Comment 19 OmegaPhil 2014-05-27 07:06:14 UTC
Comment on attachment 99834 [details] [review]
fix for 73911

jyliu: You've put the patch on the wrong ticket, this is for DPM on r600g cards.
Comment 20 Martin Peres 2019-11-19 08:46:54 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/467.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.