Bug 86244

Summary: radeon kernel panic when booted with HDMI
Product: DRI Reporter: Takashi Iwai <tiwai>
Component: DRM/RadeonAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Takashi Iwai 2014-11-13 09:52:09 UTC
We've got a few bug reports about the kernel panic showing radeon kernel code path.  All seem related with HDMI.  Typically, the kernel panic or Oops is triggered when the system is booted with HDMI.  One reporter mentioned that it happens when switching to HDMI-only mode even after hotplugging HDMI, too.

The affected kernels are, at least, 3.16 and 3.17.  One report mentions that it happens with 3.15, too.

Oops looks like:
 BUG: unable to handle kernel paging request at 000000040101077c
 IP: [<ffffffffa00631dd>] drm_helper_connector_dpms+0x4d/0x230 [drm_kms_helper]
 PGD bb0db067 PUD 0 
 Oops: 0000 [#1] SMP 
 CPU: 0 PID: 817 Comm: Xorg Not tainted 3.16.6-2-default #1
 Hardware name: MEDION S561X/S561X, BIOS A16C1IM7 Ver1.0G  09/10/2009
 task: ffff8800bab88290 ti: ffff8800babac000 task.ti: ffff8800babac000
 RIP: 0010:[<ffffffffa00631dd>]  [<ffffffffa00631dd>] drm_helper_connector_dpms+0x4d/0x230 [drm_kms_helper]
 RSP: 0018:ffff8800babafd50  EFLAGS: 00010286
 RAX: ffffffffa0063190 RBX: ffff880037201e00 RCX: 000000040101046c
 RDX: 0000000000000003 RSI: 0000000000000003 RDI: ffff8800ba310400
 RBP: 6144796669746f6e R08: 0000000000000008 R09: ffff88012fd68d40
 R10: ffffffffa00332e0 R11: ffff8800babafe08 R12: 0000000000000003
 R13: 0000000000000003 R14: 0000000000000000 R15: 00000000fffffff2
 FS:  00007ff1601658c0(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000000040101077c CR3: 00000000b9b4d000 CR4: 00000000000407f0
 Stack:
  ffff8800372af800 ffff8800babafdb0 ffff8800ba310428 ffff8800b9e26750
  0000000000000003 ffffffffa0023fc6 ffff8800babafe08 00000000000000ab
  fffffffffffffff2 ffff8800372af800 ffff880138b05800 ffffffffa002400e
 Call Trace:
 [<ffffffffa0023fc6>] drm_mode_obj_set_property_ioctl+0x396/0x3b0 [drm]
 [<ffffffffa002400e>] drm_mode_connector_property_set_ioctl+0x2e/0x40 [drm]
 [<ffffffffa0013897>] drm_ioctl+0x1c7/0x5b0 [drm]
 [<ffffffffa01cd046>] radeon_drm_ioctl+0x46/0x80 [radeon]
 [<ffffffff811c3c9f>] do_vfs_ioctl+0x2cf/0x4b0
 [<ffffffff811c3f01>] SyS_ioctl+0x81/0xa0
 [<ffffffff815d0c2d>] system_call_fastpath+0x1a/0x1f
 [<00007ff15e493397>] 0x7ff15e493396
 Code: 6b 40 44 8b b7 04 02 00 00 45 39 ee 0f 84 0c 01 00 00 48 85 db 44 89 af 04 02 00 00 0f 84 bc 01 00 00 48 8b 0b 41 bc 03 00 00 00 <48> 8b 91 10 03 00 00 48 81 c1 10 03 00 00 [  132.533672] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
 Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
 drm_kms_helper: panic occurred, switching back to text console
 Rebooting in 90 seconds..

The spot is drm_helper_choose_encoder_dpms(), and the connector_list seems broken.

Another bug report shows a slightly different trace, and the spot it drm_crtc_helper_set_config(),

	list_for_each_entry(encoder, &dev->mode_config.encoder_list, head) {
		save_encoders[count++] = *encoder;
	}

so encoder_list seems broken.


The original bug reports:

HDMI output using radeon driver freezes when used as only output
  http://bugzilla.opensuse.org/show_bug.cgi?id=904932

13.2 doesn't boot: "BUG: unable to handle kernel NULL pointer deference..." / "unable to handle kernel paging request"
  http://bugzilla.opensuse.org/show_bug.cgi?id=901550

Kernel 3.15 + radeon + hdmi = system freeze
  http://bugzilla.opensuse.org/show_bug.cgi?id=884390


FWIW, I tested the very same system (openSUSE 13.2) with a couple of machines with radeon, but couldn't reproduce the problem.  I noticed that all three reports above are with rv710, while my systems were newer ones (TURKS, ARUBA), so this issue might be specific to chip model.
Comment 2 Takashi Iwai 2014-11-13 14:26:39 UTC
I'll ask reporters to test with the latest 3.18-rc.
The stack trace looks different, but let's hope it's because of different triggeer path due to memory corruption.

Any way to work around without that patch?  Passing radeon.audio=0 would do?
Comment 3 Alex Deucher 2014-11-13 14:27:54 UTC
Yes, radeon.audio=0 will avoid that path as well.
Comment 4 Takashi Iwai 2014-12-02 07:25:12 UTC
The reporter confirmed that 3.18-rc4 works fine.  Thanks!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.