Summary: |
hard crash of amdgpu in 4.20-rc |
Product: |
DRI
|
Reporter: |
Dan Horák <dan> |
Component: |
DRM/AMDgpu | Assignee: |
Default DRI bug account <dri-devel> |
Status: |
RESOLVED
FIXED
|
QA Contact: |
|
Severity: |
normal
|
|
|
Priority: |
medium
|
CC: |
bcrocker
|
Version: |
unspecified | |
|
Hardware: |
PowerPC | |
|
OS: |
Linux (All) | |
|
See Also: |
https://bugs.freedesktop.org/show_bug.cgi?id=108585
|
Whiteboard: |
|
i915 platform:
|
|
i915 features:
|
|
Attachments: |
|
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 142474 [details] full dmesg output I'm seeing hard crashes (taking down the whole system) in the amdgpu driver in 4.20-rc kernels (starting around rc1). This is on Power9 Talos system with Radeon Pro WX4100. after "modprobe amdgpu" in a system booted with "modprobe.blacklist=amdgpu" I got following and the system stopped responding ... lis 15 12:40:56 talos.danny.cz kernel: [drm] amdgpu kernel modesetting enabled. lis 15 12:40:56 talos.danny.cz kernel: amdgpu 0000:01:00.0: enabling device (0540 -> 0542) lis 15 12:40:56 talos.danny.cz kernel: [drm] initializing kernel modesetting (POLARIS11 0x1002:0x67E3 0x1002:0x0B0D 0x00). lis 15 12:40:56 talos.danny.cz kernel: [drm] register mmio base: 0x00000000 lis 15 12:40:56 talos.danny.cz kernel: [drm] register mmio size: 262144 lis 15 12:40:56 talos.danny.cz kernel: [drm] PCI I/O BAR is not found. lis 15 12:40:56 talos.danny.cz kernel: [drm] add ip block number 0 <vi_common> lis 15 12:40:56 talos.danny.cz kernel: [drm] add ip block number 1 <gmc_v8_0> lis 15 12:40:56 talos.danny.cz kernel: [drm] add ip block number 2 <tonga_ih> lis 15 12:40:56 talos.danny.cz kernel: [drm] add ip block number 3 <gfx_v8_0> lis 15 12:40:56 talos.danny.cz kernel: [drm] add ip block number 4 <sdma_v3_0> lis 15 12:40:56 talos.danny.cz kernel: [drm] add ip block number 5 <powerplay> lis 15 12:40:56 talos.danny.cz kernel: [drm] add ip block number 6 <dm> lis 15 12:40:56 talos.danny.cz kernel: [drm] add ip block number 7 <uvd_v6_0> lis 15 12:40:56 talos.danny.cz kernel: [drm] add ip block number 8 <vce_v3_0> lis 15 12:40:56 talos.danny.cz kernel: [drm] UVD is enabled in VM mode lis 15 12:40:56 talos.danny.cz kernel: [drm] UVD ENC is enabled in VM mode lis 15 12:40:56 talos.danny.cz kernel: [drm] VCE enabled in VM mode lis 15 12:40:56 talos.danny.cz kernel: ATOM BIOS: 113-D0150600-103 lis 15 12:40:56 talos.danny.cz kernel: [drm] vm size is 256 GB, 2 levels, block size is 10-bit, fragment size is 9-bit lis 15 12:40:56 talos.danny.cz kernel: amdgpu: No suitable DMA available lis 15 12:40:56 talos.danny.cz kernel: amdgpu 0000:01:00.0: BAR 2: releasing [mem 0x6000010000000-0x60000101fffff 64bit pref] lis 15 12:40:56 talos.danny.cz kernel: amdgpu 0000:01:00.0: BAR 0: releasing [mem 0x6000000000000-0x600000fffffff 64bit pref] lis 15 12:40:56 talos.danny.cz kernel: pci 0000:00:00.0: BAR 15: releasing [mem 0x6000000000000-0x6003fbff0ffff 64bit pref] lis 15 12:40:56 talos.danny.cz kernel: pci 0000:00:00.0: BAR 15: assigned [mem 0x6000000000000-0x600017fffffff 64bit pref] lis 15 12:40:56 talos.danny.cz kernel: amdgpu 0000:01:00.0: BAR 0: assigned [mem 0x6000000000000-0x60000ffffffff 64bit pref] lis 15 12:40:56 talos.danny.cz kernel: amdgpu 0000:01:00.0: BAR 2: assigned [mem 0x6000100000000-0x60001001fffff 64bit pref] lis 15 12:40:56 talos.danny.cz kernel: pci 0000:00:00.0: PCI bridge to [bus 01] lis 15 12:40:56 talos.danny.cz kernel: pci 0000:00:00.0: bridge window [mem 0x600c000000000-0x600c07fefffff] lis 15 12:40:56 talos.danny.cz kernel: pci 0000:00:00.0: bridge window [mem 0x6000000000000-0x6003fbff0ffff 64bit pref] lis 15 12:40:56 talos.danny.cz kernel: amdgpu 0000:01:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used) lis 15 12:40:56 talos.danny.cz kernel: amdgpu 0000:01:00.0: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF lis 15 12:40:56 talos.danny.cz kernel: [drm] Detected VRAM RAM=4096M, BAR=4096M lis 15 12:40:56 talos.danny.cz kernel: [drm] RAM width 128bits GDDR5 lis 15 12:40:56 talos.danny.cz kernel: [TTM] Zone kernel: Available graphics memory: 33386016 kiB lis 15 12:40:56 talos.danny.cz kernel: [TTM] Zone dma32: Available graphics memory: 2097152 kiB lis 15 12:40:56 talos.danny.cz kernel: [TTM] Initializing pool allocator lis 15 12:40:56 talos.danny.cz kernel: [drm] amdgpu: 4096M of VRAM memory ready lis 15 12:40:56 talos.danny.cz kernel: [drm] amdgpu: 4096M of GTT memory ready. lis 15 12:40:56 talos.danny.cz kernel: [drm] GART: num cpu pages 4096, num gpu pages 65536 lis 15 12:40:56 talos.danny.cz kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F4008D0000). lis 15 12:40:56 talos.danny.cz kernel: [drm] Chained IB support enabled! lis 15 12:40:56 talos.danny.cz kernel: [drm] Found UVD firmware Version: 1.130 Family ID: 16 lis 15 12:40:56 talos.danny.cz kernel: [drm] Found VCE firmware Version: 53.26 Binary ID: 3 lis 15 12:40:56 talos.danny.cz kernel: amdgpu: [powerplay] dpm has been enabled lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: values for Engine clock lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: 214000 lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: 517000 lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: 845000 lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: 1049000 lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: 1099000 lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: 1136000 lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: 1175000 lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: 1201000 lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: Validation clocks: lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: engine_max_clock: 0 lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: memory_max_clock: 0 lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: level : 8 lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: reducing engine clock level from 8 to 0 lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: values for Memory clock lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: 300000 lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: 1500000 lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: Validation clocks: lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: engine_max_clock: 0 lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: memory_max_clock: 0 lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: level : 8 lis 15 12:40:56 talos.danny.cz kernel: [drm] DM_PPLIB: reducing memory clock level from 2 to 0 lis 15 12:40:56 talos.danny.cz kernel: [drm] Display Core initialized with v3.1.68! lis 15 12:40:56 talos.danny.cz kernel: [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). lis 15 12:40:56 talos.danny.cz kernel: [drm] Driver supports precise vblank timestamp query. lis 15 12:40:56 talos.danny.cz kernel: [drm] UVD and UVD ENC initialized successfully. lis 15 12:40:58 talos.danny.cz kernel: [drm] VCE initialized successfully. lis 15 12:40:58 talos.danny.cz kernel: [drm] Cannot find any crtc or sizes lis 15 12:40:58 talos.danny.cz kernel: Unable to handle kernel paging request for data at address 0xc000001369cefffc lis 15 12:40:58 talos.danny.cz kernel: Faulting instruction address: 0xc008000011b8be54 lis 15 12:40:58 talos.danny.cz kernel: Oops: Kernel access of bad area, sig: 11 [#1] lis 15 12:40:58 talos.danny.cz kernel: LE SMP NR_CPUS=1024 NUMA PowerNV lis 15 12:40:58 talos.danny.cz kernel: Modules linked in: amdgpu(+) mfd_core chash gpu_sched i2c_algo_bit ttm drm_kms_helper drm drm_panel_orientation_quirks fb_sys_fops syscopyarea sysfillrect sysimgblt xt_CHECKSUM ipt_MASQUERADE tun kvm_hv kvm devlink ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables sunrpc dm_crypt snd_hda_codec_realtek snd_hda_codec_generic at24 snd_hda_codec_hdmi snd_hda_intel regmap_i2c snd_hda_codec ipmi_powernv ipmi_devintf i2c_opal snd_hda_core i2c_core snd_hwdep snd_seq vmx_crypto snd_seq_device snd_pcm ses enclosure ipmi_msghandler snd_timer scsi_transport_sas snd ofpart powernv_flash mtd rtc_opal opal_prd crct10dif_vpmsum soundcore raid1 aacraid tg3 crc32c_vpmsum lis 15 12:40:58 talos.danny.cz kernel: CPU: 0 PID: 338 Comm: kworker/0:2 Not tainted 4.20.0-rc2+ #1 lis 15 12:40:58 talos.danny.cz kernel: Workqueue: events work_for_cpu_fn lis 15 12:40:58 talos.danny.cz kernel: NIP: c008000011b8be54 LR: c008000011b7885c CTR: c008000011b8bd68 lis 15 12:40:58 talos.danny.cz kernel: REGS: c0000007f84533c0 TRAP: 0300 Not tainted (4.20.0-rc2+) lis 15 12:40:58 talos.danny.cz kernel: MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 84002482 XER: 20040000 lis 15 12:40:58 talos.danny.cz kernel: CFAR: c008000011b8c6fc DAR: c000001369cefffc DSISR: 42000000 IRQMASK: 0 GPR00: c008000011b7885c c0000007f8453648 c008000011d69e00 c0000007f74bf67c GPR04: 000000000001d524 00000000000249f0 c0000007f8453758 0000000020130307 GPR08: c000001369cefff4 c000000769cf0000 0000000000000001 0000000002100800 GPR12: c008000011b8bd68 c0000000018b0000 c000000000151e88 c0000007fe1f8340 GPR16: 0000000000000000 0000000000000000 0000000000000000 c0000007f87d30c0 GPR20: c0000007f87d30c8 c0000007f87d30b8 c0000007f87d30d8 c0000007f87d30e0 GPR24: c0000007f87d30d0 c0000007f87dc528 0000000000000000 0000000000000001 GPR28: c000000769cf0000 c0000007f8453710 c0000007f74b2340 c000200721935c00 lis 15 12:40:58 talos.danny.cz kernel: NIP [c008000011b8be54] smu7_set_power_state_tasks+0xec/0xab0 [amdgpu] lis 15 12:40:58 talos.danny.cz kernel: LR [c008000011b7885c] phm_set_power_state+0x64/0xc0 [amdgpu] lis 15 12:40:58 talos.danny.cz kernel: Call Trace: lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453648] [c008000011b4ee7c] amdgpu_cgs_write_ind_register+0x84/0x170 [amdgpu] (unreliable) lis 15 12:40:58 talos.danny.cz kernel: [c0000007f84536e8] [c008000011b7885c] phm_set_power_state+0x64/0xc0 [amdgpu] lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453728] [c008000011ba0d48] psm_adjust_power_state_dynamic+0x130/0x270 [amdgpu] lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453788] [c008000011b764f0] hwmgr_handle_task+0x58/0x178 [amdgpu] lis 15 12:40:58 talos.danny.cz kernel: [c0000007f84537c8] [c008000011bae29c] pp_late_init+0xa4/0x1f0 [amdgpu] lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453868] [c008000011a318d8] amdgpu_device_ip_late_init+0x90/0x1b0 [amdgpu] lis 15 12:40:58 talos.danny.cz kernel: [c0000007f84538f8] [c008000011a34cb8] amdgpu_device_init+0x1590/0x18e0 [amdgpu] lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453a08] [c008000011a3823c] amdgpu_driver_load_kms+0xb4/0x330 [amdgpu] lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453a88] [c008000010ccae30] drm_dev_register+0x1b8/0x280 [drm] lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453b28] [c008000011a306bc] amdgpu_pci_probe+0x114/0x200 [amdgpu] lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453bb8] [c00000000070024c] local_pci_probe+0x6c/0x140 lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453c48] [c000000000143b88] work_for_cpu_fn+0x38/0x60 lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453c78] [c000000000148c40] process_one_work+0x250/0x500 lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453d18] [c000000000149160] worker_thread+0x270/0x5b0 lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453db8] [c00000000015202c] kthread+0x1ac/0x1c0 lis 15 12:40:58 talos.danny.cz kernel: [c0000007f8453e28] [c00000000000bdd0] ret_from_kernel_thread+0x5c/0x6c lis 15 12:40:58 talos.danny.cz kernel: Instruction dump: lis 15 12:40:58 talos.danny.cz kernel: 7d485378 7f872000 419e0464 39480001 38c6000c 794a0020 4200ffe4 1d08000c lis 15 12:40:58 talos.danny.cz kernel: 81490d3c 614a0001 7d094214 91490d3c <90880008> 81490064 2faa0000 419e0880 lis 15 12:40:58 talos.danny.cz kernel: ---[ end trace d5e132cd328da1c7 ]--- lis 15 12:40:58 talos.danny.cz kernel: