Created attachment 137596 [details] dmesg device: 00:02.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde XT [Radeon HD 7770/8760 / R7 250X] [1002:683d] 00:0a.0 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series] [1002:aab0] uname -a: Linux localhost.localdomain 4.15.4-300.fc27.x86_64 #1 SMP Mon Feb 19 23:31:15 UTC 2018 x86_64 x86_64 x86_64 GNU/Linu kernel params include radeon.si_support=0 amdgpu.si_support=1 The host also contains a Tonga, passed through to another VM. Trace (full dmesg also attached): [ 1.486955] BUG: unable to handle kernel NULL pointer dereference at 000000000000003c [ 1.486971] IP: drm_pcie_get_speed_cap_mask+0x35/0xe0 [drm] [ 1.486972] PGD 0 P4D 0 [ 1.486975] Oops: 0000 [#1] SMP PTI [ 1.486977] Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) virtio_console virtio_net virtio_blk crc32c_intel chash i2c_algo_bit drm_kms_helper ttm serio_raw drm ata_generic qemu_fw_cfg virtio_pci pata_acpi virtio_rng virtio_ring virtio [ 1.486987] CPU: 0 PID: 324 Comm: systemd-udevd Not tainted 4.15.4-300.fc27.x86_64 #1 [ 1.486989] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 [ 1.486996] RIP: 0010:drm_pcie_get_speed_cap_mask+0x35/0xe0 [drm] [ 1.486998] RSP: 0018:ffffa663c089b908 EFLAGS: 00010286 [ 1.486999] RAX: ffff8f38ba5a6800 RBX: ffff8f38b3890000 RCX: 0000000000000000 [ 1.487003] RDX: 0000000000000000 RSI: ffffa663c089b998 RDI: ffff8f38b385c000 [ 1.487004] RBP: 0000000000000000 R08: ffffc715c4ce2600 R09: 0000000000040000 [ 1.487006] R10: 0000000000140000 R11: 0000000000000000 R12: 0000000000000003 [ 1.487007] R13: ffff8f38b2e0a9c8 R14: ffff8f38b2e00000 R15: 0000000000000000 [ 1.487009] FS: 00007f1bcd4a91c0(0000) GS:ffff8f38bfc00000(0000) knlGS:0000000000000000 [ 1.487011] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.487012] CR2: 000000000000003c CR3: 0000000135546004 CR4: 00000000001606f0 [ 1.487016] Call Trace: [ 1.487065] si_dpm_sw_init+0x330/0x15d0 [amdgpu] [ 1.487070] ? request_threaded_irq+0xad/0x160 [ 1.487074] ? printk+0x52/0x6e [ 1.487101] amdgpu_device_init+0xcb4/0x15e0 [amdgpu] [ 1.487105] ? kmalloc_order+0x14/0x40 [ 1.487130] amdgpu_driver_load_kms+0x86/0x2d0 [amdgpu] [ 1.487155] drm_dev_register+0x132/0x1c0 [drm] [ 1.487180] amdgpu_pci_probe+0x10a/0x140 [amdgpu] [ 1.487184] local_pci_probe+0x42/0xa0 [ 1.487190] ? pci_assign_irq+0x27/0x130 [ 1.487192] pci_device_probe+0x141/0x1b0 [ 1.487196] driver_probe_device+0x315/0x480 [ 1.487198] __driver_attach+0xa0/0xe0 [ 1.487201] ? driver_probe_device+0x480/0x480 [ 1.487203] bus_for_each_dev+0x6b/0xb0 [ 1.487205] bus_add_driver+0x1c2/0x260 [ 1.487207] ? 0xffffffffc07b6000 [ 1.487209] driver_register+0x57/0xc0 [ 1.487211] ? 0xffffffffc07b6000 [ 1.487214] do_one_initcall+0x4e/0x190 [ 1.487218] ? _cond_resched+0x15/0x40 [ 1.487220] ? kmem_cache_alloc_trace+0xac/0x1b0 [ 1.487223] ? do_init_module+0x22/0x201 [ 1.487226] do_init_module+0x5b/0x201 [ 1.487228] load_module+0x26b1/0x2b60 [ 1.487231] ? SYSC_init_module+0x160/0x190 [ 1.487233] ? _cond_resched+0x15/0x40 [ 1.487235] SYSC_init_module+0x160/0x190 [ 1.487238] do_syscall_64+0x75/0x180 [ 1.487240] entry_SYSCALL_64_after_hwframe+0x21/0x86 [ 1.487243] RIP: 0033:0x7f1bccda71da [ 1.487244] RSP: 002b:00007ffec1e4f598 EFLAGS: 00000246 ORIG_RAX: 00000000000000af [ 1.487246] RAX: ffffffffffffffda RBX: 0000555c28533860 RCX: 00007f1bccda71da [ 1.487248] RDX: 0000555c28531780 RSI: 00000000005748f3 RDI: 0000555c28de0b10 [ 1.487250] RBP: 0000555c28531780 R08: 0000000000000005 R09: 00007ffec1e4dd23 [ 1.487251] R10: 0000000000000005 R11: 0000000000000246 R12: 0000555c28de0b10 [ 1.487253] R13: 0000555c285317b0 R14: 0000000000020000 R15: 0000000000000000 [ 1.487255] Code: 10 c7 06 00 00 00 00 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 48 8b 87 c0 01 00 00 48 85 c0 74 18 48 8b 40 10 48 8b 68 38 <0f> b7 45 3c 66 3d 06 11 74 06 66 3d 66 11 75 1c b8 ea ff ff ff [ 1.487281] RIP: drm_pcie_get_speed_cap_mask+0x35/0xe0 [drm] RSP: ffffa663c089b908 [ 1.487283] CR2: 000000000000003c [ 1.487296] ---[ end trace 81fa2514df506ee9 ]---
Created attachment 137609 [details] [review] possible fix This patch should fix it.
Created attachment 137644 [details] [review] possible fix Fix includes.
Sorry for the delay, I can confirm this fixes the NULL issue. pp_dpm_mclk / pp_dpm_sclk look empty to me, not sure though if that is just because they are not hooked up yet for SI, but since I don't need DPM and this now boots AMDGPU with the default config to an usable state I'd consider this fixed. (leaving open because I don't know if the patch landed yet, feel free to close when you push it)
SI still uses the legacy dpm code rather than powerplay so it doesn't expose all the same options as newer chips. SI also has an older smu implementation so it has a more limited feature set compared to CI and VI.
Sorry for the noise: I have a null dereference in a Cape Verde but not in bootup my bug is https://bugs.freedesktop.org/show_bug.cgi?id=102553 Can somebody check that bug It blocks me to switch from radeon to amdgpu. Thank you very much.
I can confirm I have the same issue with my GPU passed through to a VM. Not sure how to test the possible fix. I have never applied a patch but found a few pointers online. I did the following but my card still has the same. If possible please review and let me know how to apply the proposed fix to my system. git clone git://anongit.freedesktop.org/drm/drm-amd cd drm-amd nano 0001-drm-amdgpu-used-cached-pcie-gen-info-for-SI-v2.patch copy content of possible fix into file generated above git am --signoff < 0001-drm-amdgpu-used-cached-pcie-gen-info-for-SI-v2.patch make defconfig make make install
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.