Bug 97369 - AMDGPU/Iceland hangs kernel 4.8-rc2
Summary: AMDGPU/Iceland hangs kernel 4.8-rc2
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-08-16 18:03 UTC by Armin K
Modified: 2016-08-16 18:37 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Description Armin K 2016-08-16 18:03:12 UTC
Since update to 4.8 series, kernel hangs when Xorg tries to start up. I've tracked the issue to amdgpu driver. Blacklisting the driver makes the hang go away.

systemd-journald has recorded the following:

Aug 16 19:50:19 krejzi kernel: amdgpu 0000:01:00.0: Refused to change power state, currently in D3
Aug 16 19:50:19 krejzi kernel: amdgpu 0000:01:00.0: Refused to change power state, currently in D3
Aug 16 19:50:19 krejzi kernel: amdgpu 0000:01:00.0: Refused to change power state, currently in D3
Aug 16 19:50:21 krejzi kernel: amdgpu 0000:01:00.0: Wait for MC idle timedout !
Aug 16 19:50:22 krejzi kernel: amdgpu 0000:01:00.0: Wait for MC idle timedout !
Aug 16 19:50:22 krejzi kernel: [drm] PCIE GART of 2048M enabled (table at 0x0000000000040000).
Aug 16 19:50:24 krejzi kernel: BUG: unable to handle kernel paging request at ffffc91c00763fec
Aug 16 19:50:24 krejzi kernel: IP: [<ffffffffa0084aad>] iceland_smu_populate_single_firmware_entry+0x4d/0x100 [amdgpu]
Aug 16 19:50:24 krejzi kernel: PGD 23700e067 PUD 0 
Aug 16 19:50:24 krejzi kernel: Oops: 0002 [#1] PREEMPT SMP
Aug 16 19:50:24 krejzi kernel: Modules linked in: iwlmvm amdkfd amdgpu ttm iwlwifi intel_vbtn
Aug 16 19:50:24 krejzi kernel: CPU: 3 PID: 129 Comm: kworker/3:2 Not tainted 4.8.0-rc2-krejzi #1
Aug 16 19:50:24 krejzi kernel: Hardware name: HP HP ProBook 470 G3/8102, BIOS N78 Ver. 01.11 05/09/2016
Aug 16 19:50:24 krejzi kernel: Workqueue: pm pm_runtime_work
Aug 16 19:50:24 krejzi kernel: task: ffff8802365b9b00 task.stack: ffff88023627c000
Aug 16 19:50:24 krejzi kernel: RIP: 0010:[<ffffffffa0084aad>]  [<ffffffffa0084aad>] iceland_smu_populate_single_firmware_entry+0x4d/0x100 [amdgpu]
Aug 16 19:50:24 krejzi kernel: RSP: 0018:ffff88023627fc40  EFLAGS: 00010202
Aug 16 19:50:24 krejzi kernel: RAX: 000000008002e000 RBX: ffff880231500000 RCX: 000000000000007e
Aug 16 19:50:24 krejzi kernel: RDX: ffffc91c00763fec RSI: 0000000000000003 RDI: 0000000000002180
Aug 16 19:50:24 krejzi kernel: RBP: ffffc90000764000 R08: 0000000000000002 R09: ffff88023627fc34
Aug 16 19:50:24 krejzi kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff880231717500
Aug 16 19:50:24 krejzi kernel: R13: 0000000000008000 R14: 0000000000000000 R15: 0000000000000246
Aug 16 19:50:24 krejzi kernel: FS:  0000000000000000(0000) GS:ffff8802404c0000(0000) knlGS:0000000000000000
Aug 16 19:50:24 krejzi kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 16 19:50:24 krejzi kernel: CR2: ffffc91c00763fec CR3: 0000000001e07000 CR4: 00000000003406e0
Aug 16 19:50:24 krejzi kernel: Stack:
Aug 16 19:50:24 krejzi kernel:  0000000000000000 ffff8802315005bc ffff880231500000 ffffffffa0085156
Aug 16 19:50:24 krejzi kernel:  ffff0040ffffffff ffff880231500000 ffff8802315037b8 0000000000000048
Aug 16 19:50:24 krejzi kernel:  0000000000000002 ffff88023627fdd0 ffff880231500000 ffffffffa008578c
Aug 16 19:50:24 krejzi kernel: Call Trace:
Aug 16 19:50:24 krejzi kernel:  [<ffffffffa0085156>] ? iceland_smu_start+0x336/0x5d0 [amdgpu]
Aug 16 19:50:24 krejzi kernel:  [<ffffffffa008578c>] ? iceland_dpm_resume+0x1c/0x40 [amdgpu]
Aug 16 19:50:24 krejzi kernel:  [<ffffffffa004e5a8>] ? amdgpu_resume+0x58/0xa0 [amdgpu]
Aug 16 19:50:24 krejzi kernel:  [<ffffffffa00513e3>] ? amdgpu_resume_kms+0xa3/0x370 [amdgpu]
Aug 16 19:50:24 krejzi kernel:  [<ffffffffa004e12c>] ? amdgpu_pmops_runtime_resume+0x6c/0xa0 [amdgpu]
Aug 16 19:50:24 krejzi kernel:  [<ffffffff81362023>] ? pci_pm_runtime_resume+0x73/0xa0
Aug 16 19:50:24 krejzi kernel:  [<ffffffff81509760>] ? vga_switcheroo_set_dynamic_switch+0x80/0x80
Aug 16 19:50:24 krejzi kernel:  [<ffffffff81518688>] ? __rpm_callback+0x28/0x60
Aug 16 19:50:24 krejzi kernel:  [<ffffffff815186da>] ? rpm_callback+0x1a/0x70
Aug 16 19:50:24 krejzi kernel:  [<ffffffff81509760>] ? vga_switcheroo_set_dynamic_switch+0x80/0x80
Aug 16 19:50:24 krejzi kernel:  [<ffffffff81519733>] ? rpm_resume+0x3e3/0x5f0
Aug 16 19:50:24 krejzi kernel:  [<ffffffff81061427>] ? __switch_to+0x37/0x580
Aug 16 19:50:24 krejzi kernel:  [<ffffffff818c4cbe>] ? _raw_spin_unlock_irq+0xe/0x20
Aug 16 19:50:24 krejzi kernel:  [<ffffffff8151a06e>] ? pm_runtime_work+0x4e/0xa0
Aug 16 19:50:24 krejzi kernel:  [<ffffffff810d51d2>] ? process_one_work+0x1c2/0x400
Aug 16 19:50:24 krejzi kernel:  [<ffffffff810d5452>] ? worker_thread+0x42/0x4c0
Aug 16 19:50:24 krejzi kernel:  [<ffffffff818c071b>] ? __schedule+0x2db/0x6a0
Aug 16 19:50:24 krejzi kernel:  [<ffffffff810d5410>] ? process_one_work+0x400/0x400
Aug 16 19:50:24 krejzi kernel:  [<ffffffff810da758>] ? kthread+0xc8/0xe0
Aug 16 19:50:24 krejzi kernel:  [<ffffffff818c533f>] ? ret_from_fork+0x1f/0x40
Aug 16 19:50:24 krejzi kernel:  [<ffffffff810da690>] ? kthread_worker_fn+0x170/0x170
Aug 16 19:50:24 krejzi kernel: Code: 05 48 8b 8c 0b e0 65 00 00 48 85 c9 0f 84 b7 00 00 00 48 8b 49 08 48 05 2f 03 00 00 48 c1 e0 05 48 8b 44 03 08 8b 79 14 8b 49 10 <66> 89 32 c7 42 0c 00 00 00 00 66 89 4a 02 48 89 c1 48 c1 e9 20 
Aug 16 19:50:24 krejzi kernel: RIP  [<ffffffffa0084aad>] iceland_smu_populate_single_firmware_entry+0x4d/0x100 [amdgpu]
Aug 16 19:50:24 krejzi kernel:  RSP <ffff88023627fc40>
Aug 16 19:50:24 krejzi kernel: CR2: ffffc91c00763fec
Aug 16 19:50:24 krejzi kernel: ---[ end trace 176c593915795723 ]---
Aug 16 19:50:24 krejzi kernel: general protection fault: 0000 [#2] PREEMPT SMP
Aug 16 19:50:24 krejzi kernel: Modules linked in: iwlmvm amdkfd amdgpu ttm iwlwifi intel_vbtn
Aug 16 19:50:24 krejzi kernel: CPU: 3 PID: 129 Comm: kworker/3:2 Tainted: G      D         4.8.0-rc2-krejzi #1
Aug 16 19:50:24 krejzi kernel: Hardware name: HP HP ProBook 470 G3/8102, BIOS N78 Ver. 01.11 05/09/2016
Aug 16 19:50:24 krejzi kernel: task: ffff8802365b9b00 task.stack: ffff88023627c000
Aug 16 19:50:24 krejzi kernel: RIP: 0010:[<ffffffff810fe430>]  [<ffffffff810fe430>] queued_spin_lock_slowpath+0x150/0x190
Aug 16 19:50:24 krejzi kernel: RSP: 0018:ffff88023627fe70  EFLAGS: 00010082
Aug 16 19:50:24 krejzi kernel: RAX: ccccccccccce3fbc RBX: ffff88023627ff18 RCX: ffff8802404d72c0
Aug 16 19:50:24 krejzi kernel: RDX: 0000000000002d47 RSI: ffffffff81c4c914 RDI: 0000000000100000
Aug 16 19:50:24 krejzi kernel: RBP: 0000000000000000 R08: 00000000000000ff R09: 0000000000000000
Aug 16 19:50:24 krejzi kernel: R10: 00000000b98c6018 R11: ffff880237003200 R12: ffff8802404d72c0
Aug 16 19:50:24 krejzi kernel: R13: 0000000000000046 R14: ffffc91c00763fec R15: 0000000000000009
Aug 16 19:50:24 krejzi kernel: FS:  0000000000000000(0000) GS:ffff8802404c0000(0000) knlGS:0000000000000000
Aug 16 19:50:24 krejzi kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 16 19:50:24 krejzi kernel: CR2: 0000000000000028 CR3: 00000002240c9000 CR4: 00000000003406e0
Aug 16 19:50:24 krejzi kernel: Stack:
Aug 16 19:50:24 krejzi kernel:  ffff88023627ff18 0000000000000282 0000000000000000 ffffffff818c4e61
Aug 16 19:50:24 krejzi kernel:  ffff88023627ff18 ffff88023627ff10 ffffffff810f7103 ffff8802365ba210
Aug 16 19:50:24 krejzi kernel:  ffff8802365b9b00 0000000000000000 ffffffff810bcf9c ffff8802365b9b00
Aug 16 19:50:24 krejzi kernel: Call Trace:
Aug 16 19:50:24 krejzi kernel:  [<ffffffff818c4e61>] ? _raw_spin_lock_irqsave+0x31/0x40
Aug 16 19:50:24 krejzi kernel:  [<ffffffff810f7103>] ? complete+0x13/0x40
Aug 16 19:50:24 krejzi kernel:  [<ffffffff810bcf9c>] ? mm_release+0x9c/0x120
Aug 16 19:50:24 krejzi kernel:  [<ffffffff810c28ea>] ? do_exit+0x70a/0xad0
Aug 16 19:50:24 krejzi kernel:  [<ffffffff818c6a47>] ? rewind_stack_do_exit+0x17/0x20
Aug 16 19:50:24 krejzi kernel:  [<ffffffff810da690>] ? kthread_worker_fn+0x170/0x170
Aug 16 19:50:24 krejzi kernel: Code: b8 01 00 00 00 66 89 03 5b 5d 41 5c c3 c1 ea 12 83 e0 03 83 ea 01 48 c1 e0 04 48 63 d2 48 05 c0 72 01 00 48 03 04 d5 60 03 ea 81 <48> 89 08 8b 41 08 85 c0 75 09 f3 90 8b 41 08 85 c0 74 f7 4c 8b 
Aug 16 19:50:24 krejzi kernel: RIP  [<ffffffff810fe430>] queued_spin_lock_slowpath+0x150/0x190
Aug 16 19:50:24 krejzi kernel:  RSP <ffff88023627fe70>
Aug 16 19:50:24 krejzi kernel: ---[ end trace 176c593915795724 ]---
Aug 16 19:50:24 krejzi kernel: Fixing recursive fault but reboot is needed!
Aug 16 19:50:24 krejzi kernel: BUG: scheduling while atomic: kworker/3:2/129/0x00000003
Aug 16 19:50:24 krejzi kernel: Modules linked in: iwlmvm amdkfd amdgpu ttm iwlwifi intel_vbtn
Aug 16 19:50:24 krejzi kernel: Preemption disabled at:[<          (null)>]           (null)
Aug 16 19:50:24 krejzi kernel: 
Aug 16 19:50:24 krejzi kernel: CPU: 3 PID: 129 Comm: kworker/3:2 Tainted: G      D         4.8.0-rc2-krejzi #1
Aug 16 19:50:24 krejzi kernel: Hardware name: HP HP ProBook 470 G3/8102, BIOS N78 Ver. 01.11 05/09/2016
Aug 16 19:50:24 krejzi kernel:  0000000000000086 00000000b523baa8 ffffffff8132745c ffff8802404d67c0
Aug 16 19:50:24 krejzi kernel:  00000000000167c0 ffffffff810de8a7 ffffffff818c09bf ffff8802365b9b00
Aug 16 19:50:24 krejzi kernel:  00000000b523baa8 ffff880236280000 000000000000000b 0000000000000000
Aug 16 19:50:24 krejzi kernel: Call Trace:
Aug 16 19:50:24 krejzi kernel:  [<ffffffff8132745c>] ? dump_stack+0x46/0x5a
Aug 16 19:50:24 krejzi kernel:  [<ffffffff810de8a7>] ? __schedule_bug+0x57/0xb0
Aug 16 19:50:24 krejzi kernel:  [<ffffffff818c09bf>] ? __schedule+0x57f/0x6a0
Aug 16 19:50:24 krejzi kernel:  [<ffffffff818c0b16>] ? schedule+0x36/0x90
Aug 16 19:50:24 krejzi kernel:  [<ffffffff810c2a96>] ? do_exit+0x8b6/0xad0
Aug 16 19:50:24 krejzi kernel:  [<ffffffff818c6a47>] ? rewind_stack_do_exit+0x17/0x20
Aug 16 19:50:24 krejzi kernel:  [<ffffffff810da690>] ? kthread_worker_fn+0x170/0x170
Aug 16 19:50:24 krejzi kernel: BUG: unable to handle kernel paging request at ffffffffffffffd8
Aug 16 19:50:24 krejzi kernel: IP: [<ffffffff810dab57>] kthread_data+0x7/0x10



Relevant dmesg output from amdgpu load time (also obtained from journal)

Aug 16 19:49:49 krejzi kernel: [drm] amdgpu kernel modesetting enabled.
Aug 16 19:49:49 krejzi kernel: vga_switcheroo: detected switching method \_SB_.PCI0.GFX0.ATPX handle
Aug 16 19:49:49 krejzi kernel: ATPX version 1, functions 0x00000003
Aug 16 19:49:49 krejzi kernel: ATPX Hybrid Graphics
Aug 16 19:49:49 krejzi kernel: iwlwifi 0000:03:00.0: loaded firmware version 22.361476.0 op_mode iwlmvm
Aug 16 19:49:49 krejzi kernel: CRAT table not found
Aug 16 19:49:49 krejzi kernel: Finished initializing topology ret=0
Aug 16 19:49:49 krejzi kernel: kfd kfd: Initialized module
Aug 16 19:49:49 krejzi kernel: amdgpu 0000:01:00.0: enabling device (0006 -> 0007)
Aug 16 19:49:49 krejzi kernel: [drm] initializing kernel modesetting (TOPAZ 0x1002:0x6900 0x103C:0x811C 0x83).
Aug 16 19:49:49 krejzi kernel: [drm] register mmio base: 0xE2000000
Aug 16 19:49:49 krejzi kernel: [drm] register mmio size: 262144
Aug 16 19:49:49 krejzi kernel: [drm] doorbell mmio base: 0xE0000000
Aug 16 19:49:49 krejzi kernel: [drm] doorbell mmio size: 2097152
Aug 16 19:49:49 krejzi kernel: [drm] probing gen 2 caps for device 8086:9d10 = 1724843/e
Aug 16 19:49:49 krejzi kernel: [drm] probing mlw for device 8086:9d10 = 1724843
Aug 16 19:49:49 krejzi kernel: vga_switcheroo: enabled
Aug 16 19:49:49 krejzi kernel: ATOM BIOS: HP/Quanta
Aug 16 19:49:49 krejzi kernel: [drm] GPU not posted. posting now...
Aug 16 19:49:49 krejzi kernel: [drm] Changing default dispclk from 0Mhz to 600Mhz
Aug 16 19:49:49 krejzi kernel: amdgpu 0000:01:00.0: VRAM: 2048M 0x0000000000000000 - 0x000000007FFFFFFF (2048M used)
Aug 16 19:49:49 krejzi kernel: amdgpu 0000:01:00.0: GTT: 2048M 0x0000000080000000 - 0x00000000FFFFFFFF
Aug 16 19:49:49 krejzi kernel: [drm] Detected VRAM RAM=2048M, BAR=256M
Aug 16 19:49:49 krejzi kernel: [drm] RAM width 64bits DDR3
Aug 16 19:49:49 krejzi kernel: [TTM] Zone  kernel: Available graphics memory: 4027936 kiB
Aug 16 19:49:49 krejzi kernel: [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
Aug 16 19:49:49 krejzi kernel: [TTM] Initializing pool allocator
Aug 16 19:49:49 krejzi kernel: [TTM] Initializing DMA pool allocator
Aug 16 19:49:49 krejzi kernel: [drm] amdgpu: 2048M of VRAM memory ready
Aug 16 19:49:49 krejzi kernel: [drm] amdgpu: 2048M of GTT memory ready.
Aug 16 19:49:49 krejzi kernel: [drm] GART: num cpu pages 524288, num gpu pages 524288
Aug 16 19:49:49 krejzi kernel: [drm] PCIE GART of 2048M enabled (table at 0x0000000000040000).
Aug 16 19:49:49 krejzi kernel: [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
Aug 16 19:49:49 krejzi kernel: [drm] Driver supports precise vblank timestamp query.
Aug 16 19:49:49 krejzi kernel: amdgpu 0000:01:00.0: amdgpu: using MSI.
Aug 16 19:49:49 krejzi kernel: [drm] amdgpu: irq initialized.
Aug 16 19:49:49 krejzi kernel: amdgpu 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000080000010, cpu addr 0xffff880231523010
Aug 16 19:49:49 krejzi kernel: amdgpu 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000080000020, cpu addr 0xffff880231523020
Aug 16 19:49:49 krejzi kernel: amdgpu 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000080000030, cpu addr 0xffff880231523030
Aug 16 19:49:49 krejzi kernel: amdgpu 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000080000040, cpu addr 0xffff880231523040
Aug 16 19:49:49 krejzi kernel: amdgpu 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000080000050, cpu addr 0xffff880231523050
Aug 16 19:49:49 krejzi kernel: amdgpu 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000080000060, cpu addr 0xffff880231523060
Aug 16 19:49:49 krejzi kernel: amdgpu 0000:01:00.0: fence driver on ring 6 use gpu addr 0x0000000080000070, cpu addr 0xffff880231523070
Aug 16 19:49:49 krejzi kernel: amdgpu 0000:01:00.0: fence driver on ring 7 use gpu addr 0x0000000080000080, cpu addr 0xffff880231523080
Aug 16 19:49:49 krejzi kernel: amdgpu 0000:01:00.0: fence driver on ring 8 use gpu addr 0x0000000080000090, cpu addr 0xffff880231523090
Aug 16 19:49:49 krejzi kernel: amdgpu 0000:01:00.0: fence driver on ring 9 use gpu addr 0x00000000800000a0, cpu addr 0xffff8802315230a0
Aug 16 19:49:49 krejzi kernel: amdgpu 0000:01:00.0: fence driver on ring 10 use gpu addr 0x00000000800000b0, cpu addr 0xffff8802315230b0
Aug 16 19:49:49 krejzi kernel: [drm] ring test on 0 succeeded in 15 usecs
Aug 16 19:49:49 krejzi kernel: [drm] ring test on 1 succeeded in 19 usecs
Aug 16 19:49:49 krejzi kernel: [drm] ring test on 2 succeeded in 15 usecs
Aug 16 19:49:49 krejzi kernel: [drm] ring test on 3 succeeded in 5 usecs
Aug 16 19:49:49 krejzi kernel: [drm] ring test on 4 succeeded in 2 usecs
Aug 16 19:49:49 krejzi kernel: [drm] ring test on 5 succeeded in 2 usecs
Aug 16 19:49:49 krejzi kernel: [drm] ring test on 6 succeeded in 2 usecs
Aug 16 19:49:49 krejzi kernel: [drm] ring test on 7 succeeded in 3 usecs
Aug 16 19:49:49 krejzi kernel: [drm] ring test on 8 succeeded in 2 usecs
Aug 16 19:49:49 krejzi kernel: [drm] ring test on 9 succeeded in 6 usecs
Aug 16 19:49:49 krejzi kernel: [drm] ring test on 10 succeeded in 4 usecs
Aug 16 19:49:49 krejzi kernel: [drm] ib test on ring 0 succeeded
Aug 16 19:49:49 krejzi kernel: [drm] ib test on ring 1 succeeded
Aug 16 19:49:49 krejzi kernel: [drm] ib test on ring 2 succeeded
Aug 16 19:49:49 krejzi kernel: [drm] ib test on ring 3 succeeded
Aug 16 19:49:49 krejzi kernel: [drm] ib test on ring 4 succeeded
Aug 16 19:49:49 krejzi kernel: [drm] ib test on ring 5 succeeded
Aug 16 19:49:49 krejzi kernel: [drm] ib test on ring 6 succeeded
Aug 16 19:49:49 krejzi kernel: [drm] ib test on ring 7 succeeded
Aug 16 19:49:49 krejzi kernel: [drm] ib test on ring 8 succeeded
Aug 16 19:49:49 krejzi kernel: [drm:sdma_v2_4_ring_test_ib [amdgpu]] *ERROR* amdgpu: fence wait failed (1000).
Aug 16 19:49:49 krejzi kernel: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 9 (1000).
Aug 16 19:49:49 krejzi kernel: [drm:sdma_v2_4_ring_test_ib [amdgpu]] *ERROR* amdgpu: fence wait failed (1000).
Aug 16 19:49:49 krejzi kernel: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: failed testing IB on ring 10 (1000).
Aug 16 19:49:49 krejzi kernel: [drm] Initialized amdgpu 3.3.0 20150101 for 0000:01:00.0 on minor 1


Radeon R7 M340 Hybrid Graphics, Topaz/Iceland.
Comment 2 Armin K 2016-08-16 18:37:41 UTC
Applying patch linked from Comment 1 fixes the issue.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.