Bug 110883 - [Regression linux 5.2-rc4][bisected] hang on boot
Summary: [Regression linux 5.2-rc4][bisected] hang on boot
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
: 110906 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-06-10 13:02 UTC by Sibren Vasse
Modified: 2019-06-17 11:44 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
bisect log (2.92 KB, text/plain)
2019-06-10 13:02 UTC, Sibren Vasse
no flags Details
dmesg (149.92 KB, text/plain)
2019-06-10 13:03 UTC, Sibren Vasse
no flags Details
xorg log (4.65 KB, text/plain)
2019-06-10 13:03 UTC, Sibren Vasse
no flags Details
patch (1.02 KB, patch)
2019-06-10 18:28 UTC, Sibren Vasse
no flags Details | Splinter Review

Description Sibren Vasse 2019-06-10 13:02:35 UTC
Created attachment 144491 [details]
bisect log

Laptop Hardware:
Model: HP EliteBook 8570w
GPU: Chelsea XT GL [FirePro M4000]

relevant kernel command line:
amdgpu.si_support=1 amdgpu.dc=1 radeon.si_support=0

After upgrading from 5.2-rc3 to 5.2-rc4 laptop hangs during boot. (Xorg does not start and I'm unable to switch to a TTY)

---
Jun 10 12:36:32 hostname kernel: amdgpu 0000:01:00.0: amdgpu_device_ip_init failed
Jun 10 12:36:32 hostname kernel: amdgpu 0000:01:00.0: Fatal error during GPU init
Jun 10 12:36:32 hostname kernel: [drm] amdgpu: finishing device.
Jun 10 12:36:32 hostname kernel: [drm] amdgpu atom LVDS backlight unloaded
Jun 10 12:36:32 hostname kernel: ------------[ cut here ]------------
Jun 10 12:36:32 hostname kernel: Memory manager not clean during takedown.
Jun 10 12:36:32 hostname kernel: WARNING: CPU: 1 PID: 226 at drivers/gpu/drm/drm_mm.c:939 drm_mm_takedown+0x1f/0x30 [drm]
Jun 10 12:36:32 hostname kernel: Modules linked in: amdgpu(+) amd_iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm agpgart
Jun 10 12:36:32 hostname kernel: CPU: 1 PID: 226 Comm: modprobe Not tainted 5.2.0-rc4-mainline #6
Jun 10 12:36:32 hostname kernel: Hardware name: Hewlett-Packard HP EliteBook 8570w/176B, BIOS 68IAV Ver. F.70 07/30/2018
Jun 10 12:36:32 hostname kernel: RIP: 0010:drm_mm_takedown+0x1f/0x30 [drm]
Jun 10 12:36:32 hostname kernel: Code: c4 ce d1 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 47 38 48 83 c7 38 48 39 c7 75 01 c3 48 c7 c7 48 ad 1d c0 e8 9b c7 ce d1 <0f> 0b c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00
Jun 10 12:36:32 hostname kernel: RSP: 0018:ffffb846c1cc7990 EFLAGS: 00010286
Jun 10 12:36:32 hostname kernel: RAX: 0000000000000000 RBX: ffff92bb29eb4060 RCX: 0000000000000000
Jun 10 12:36:32 hostname kernel: RDX: 0000000000000000 RSI: 0000000000000082 RDI: 00000000ffffffff
Jun 10 12:36:32 hostname kernel: RBP: ffff92bb29eb41f0 R08: 00000000000002ae R09: 0000000000000001
Jun 10 12:36:32 hostname kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff92bb2b104f00
Jun 10 12:36:32 hostname kernel: R13: ffff92bb2b104fe8 R14: 0000000000000170 R15: ffff92bb29ce1020
Jun 10 12:36:32 hostname kernel: FS:  00007f6a1ec00740(0000) GS:ffff92bb2da40000(0000) knlGS:0000000000000000
Jun 10 12:36:32 hostname kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 10 12:36:32 hostname kernel: CR2: 00007ffe870f1fc0 CR3: 0000000429ec6003 CR4: 00000000001606e0
Jun 10 12:36:32 hostname kernel: Call Trace:
Jun 10 12:36:32 hostname kernel:  amdgpu_vram_mgr_fini+0x2d/0xa0 [amdgpu]
Jun 10 12:36:32 hostname kernel:  ttm_bo_clean_mm+0xa9/0xb0 [ttm]
Jun 10 12:36:32 hostname kernel:  amdgpu_ttm_fini+0x71/0xd0 [amdgpu]
Jun 10 12:36:32 hostname kernel:  amdgpu_bo_fini+0xe/0x30 [amdgpu]
Jun 10 12:36:32 hostname kernel:  gmc_v6_0_sw_fini+0x26/0x50 [amdgpu]
Jun 10 12:36:32 hostname kernel:  amdgpu_device_fini+0x257/0x43d [amdgpu]
Jun 10 12:36:32 hostname kernel:  amdgpu_driver_unload_kms+0xb2/0x150 [amdgpu]
Jun 10 12:36:32 hostname kernel:  amdgpu_driver_load_kms.cold.2+0x5a/0xc4 [amdgpu]
Jun 10 12:36:32 hostname kernel:  drm_dev_register+0x10d/0x150 [drm]
Jun 10 12:36:32 hostname kernel:  amdgpu_pci_probe+0xcb/0x130 [amdgpu]
Jun 10 12:36:32 hostname kernel:  ? _raw_spin_unlock_irqrestore+0x20/0x40
Jun 10 12:36:32 hostname kernel:  local_pci_probe+0x42/0x80
Jun 10 12:36:32 hostname kernel:  ? pci_match_device+0xc5/0x100
Jun 10 12:36:32 hostname kernel:  pci_device_probe+0x112/0x190
Jun 10 12:36:32 hostname kernel:  really_probe+0xef/0x390
Jun 10 12:36:32 hostname kernel:  driver_probe_device+0xb4/0x100
Jun 10 12:36:32 hostname kernel:  device_driver_attach+0x4f/0x60
Jun 10 12:36:32 hostname kernel:  __driver_attach+0x86/0x140
Jun 10 12:36:32 hostname kernel:  ? device_driver_attach+0x60/0x60
Jun 10 12:36:32 hostname kernel:  ? device_driver_attach+0x60/0x60
Jun 10 12:36:32 hostname kernel:  bus_for_each_dev+0x77/0xc0
Jun 10 12:36:32 hostname kernel:  bus_add_driver+0x14a/0x1e0
Jun 10 12:36:32 hostname kernel:  ? 0xffffffffc0694000
Jun 10 12:36:32 hostname kernel:  driver_register+0x6b/0xb0
Jun 10 12:36:32 hostname kernel:  ? 0xffffffffc0694000
Jun 10 12:36:32 hostname kernel:  do_one_initcall+0x46/0x224
Jun 10 12:36:32 hostname kernel:  ? kmem_cache_alloc_trace+0x33/0x1c0
Jun 10 12:36:32 hostname kernel:  ? do_init_module+0x22/0x220
Jun 10 12:36:32 hostname kernel:  do_init_module+0x5a/0x220
Jun 10 12:36:32 hostname kernel:  load_module+0x2049/0x2300
Jun 10 12:36:32 hostname kernel:  ? vfs_read+0x116/0x140
Jun 10 12:36:32 hostname kernel:  ? __se_sys_finit_module+0x97/0xf0
Jun 10 12:36:32 hostname kernel:  __se_sys_finit_module+0x97/0xf0
Jun 10 12:36:32 hostname kernel:  do_syscall_64+0x5b/0x1a0
Jun 10 12:36:32 hostname kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun 10 12:36:32 hostname kernel: RIP: 0033:0x7f6a1ed2097d
Jun 10 12:36:32 hostname kernel: Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b3 94 0c 00 f7 d8 64 89 01 48
Jun 10 12:36:32 hostname kernel: RSP: 002b:00007ffc58e63b78 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
Jun 10 12:36:32 hostname kernel: RAX: ffffffffffffffda RBX: 000056130b231da0 RCX: 00007f6a1ed2097d
Jun 10 12:36:32 hostname kernel: RDX: 0000000000000000 RSI: 000056130b234250 RDI: 000000000000000e
Jun 10 12:36:32 hostname kernel: RBP: 000056130b234250 R08: 0000000000000000 R09: 0000000000000000
Jun 10 12:36:32 hostname kernel: R10: 000000000000000e R11: 0000000000000246 R12: 0000000000000000
Jun 10 12:36:32 hostname kernel: R13: 000056130b231c60 R14: 0000000000060000 R15: 000056130b231da0
Jun 10 12:36:32 hostname kernel: ---[ end trace eaa2ccb642b1fac6 ]---
Jun 10 12:36:32 hostname kernel: [TTM] Finalizing pool allocator
Jun 10 12:36:32 hostname kernel: [TTM] Finalizing DMA pool allocator
Jun 10 12:36:32 hostname kernel: [TTM] Zone  kernel: Used memory at exit: 50 KiB
Jun 10 12:36:32 hostname kernel: [TTM] Zone   dma32: Used memory at exit: 0 KiB
Jun 10 12:36:32 hostname kernel: [drm] amdgpu: ttm finalized
Jun 10 12:36:32 hostname kernel: amdgpu: probe of 0000:01:00.0 failed with error -22
----

I've bisected the issue between tags v5.2-rc3 and v5.2-rc4.
1929059893022a3bbed43934c7313e66aad7346b is the first bad commit. The issue is not present on v5.2-rc4 with this commit reverted. Bisect log attached.
Comment 1 Sibren Vasse 2019-06-10 13:03:06 UTC
Created attachment 144492 [details]
dmesg
Comment 2 Sibren Vasse 2019-06-10 13:03:43 UTC
Created attachment 144493 [details]
xorg log
Comment 3 Sibren Vasse 2019-06-10 18:28:38 UTC
Created attachment 144495 [details] [review]
patch

I've created this patch, which fixes the issue for me.
Can someone take a look at it and consider including it?
Comment 4 Michel Dänzer 2019-06-11 08:47:45 UTC
Please add a reference to this bug report in the commit log, and send the patch to the amd-gfx mailing list for review.
Comment 5 Sibren Vasse 2019-06-11 10:46:14 UTC
A (similar) patch was already submitted to amd-gfx (https://lists.freedesktop.org/archives/amd-gfx/2019-June/034946.html)
However, applying this to v5.2-rc4 does not solve the issue.
Comment 6 Michel Dänzer 2019-06-12 16:23:08 UTC
Does https://patchwork.freedesktop.org/patch/309712/ work?
Comment 7 Sibren Vasse 2019-06-12 16:53:18 UTC
> Does https://patchwork.freedesktop.org/patch/309712/ work?

Yes, it does.
Comment 8 Michel Dänzer 2019-06-13 07:35:15 UTC
*** Bug 110906 has been marked as a duplicate of this bug. ***
Comment 9 Paul Menzel 2019-06-13 11:54:54 UTC
I also confirm that this fixes the problem on the Dell OptiPlex 5040.
Comment 10 Sibren Vasse 2019-06-17 11:44:00 UTC
Commit is included in 5.2-rc5. Issue no longer present.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f3a5231c8f14acd42845e9e60f506b4e948f0e68


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.