Created attachment 142356 [details] dmesg-20181102 After updating to the latest polaris11 firmware (from linux-firmware-20181008-88.gitc6b6265d.fc28.noarch) I'm experiencing driver/card crashes. This is on a Power9 system running 4.19.0 kernel. There was no such crashes with the previous firmware and all 4.19-pre kernels (and even earlier). ... lis 02 11:54:00 talos.danny.cz kernel: EEH: Frozen PHB#0-PE#0 detected lis 02 11:54:00 talos.danny.cz kernel: EEH: PE location: N/A, PHB location: N/A lis 02 11:54:00 talos.danny.cz kernel: CPU: 35 PID: 3250 Comm: InputThread Not tainted 4.19.0-1.fc30.op.1.ppc64le #1 lis 02 11:54:00 talos.danny.cz kernel: Call Trace: lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bf810] [c000000000be3f9c] dump_stack+0xb0/0xf4 (unreliable) lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bf850] [c000000000040738] eeh_dev_check_failure+0x4a8/0x5d0 lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bf8f0] [c0000000000408ec] eeh_check_failure+0x8c/0xd0 lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bf930] [c00800000de61998] amdgpu_mm_rreg+0x240/0x2a0 [amdgpu] lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bf990] [c00800000df22a68] dce_v11_0_lock_cursor+0x50/0xf0 [amdgpu] lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bf9d0] [c00800000df236a0] dce_v11_0_crtc_cursor_move+0x38/0x80 [amdgpu] lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bfa10] [c00800000d11f248] drm_mode_cursor_common+0x1e0/0x2c0 [drm] lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bfae0] [c00800000d11f6c8] drm_mode_cursor_ioctl+0x50/0x70 [drm] lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bfb30] [c00800000d0f7f14] drm_ioctl_kernel+0xdc/0x170 [drm] lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bfb90] [c00800000d0f8414] drm_ioctl+0x20c/0x430 [drm] lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bfcd0] [c00800000de60078] amdgpu_drm_ioctl+0x70/0xd0 [amdgpu] lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bfd20] [c00000000040f0f4] do_vfs_ioctl+0xd4/0x8d0 lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bfdc0] [c00000000040f9b4] ksys_ioctl+0xc4/0x110 lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bfe10] [c00000000040fa28] sys_ioctl+0x28/0x80 lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bfe30] [c00000000000b9e4] system_call+0x5c/0x70 lis 02 11:54:00 talos.danny.cz kernel: EEH: Detected PCI bus error on PHB#0-PE#0 lis 02 11:54:00 talos.danny.cz kernel: EEH: This PCI device has failed 1 times in the last hour and will be permanently disabled after 5 failures. lis 02 11:54:00 talos.danny.cz kernel: EEH: Notify device drivers to shutdown lis 02 11:54:00 talos.danny.cz kernel: EEH: Beginning: 'error_detected(IO frozen)' lis 02 11:54:00 talos.danny.cz kernel: EEH: PE#0 (PCI 0000:01:00.1): driver not EEH aware lis 02 11:54:00 talos.danny.cz kernel: EEH: PE#0 (PCI 0000:01:00.0): driver not EEH aware lis 02 11:54:00 talos.danny.cz kernel: EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'none' lis 02 11:54:00 talos.danny.cz kernel: EEH: Collect temporary log lis 02 11:54:00 talos.danny.cz kernel: EEH: of node=0000:01:00.1 lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI device/vendor: aae01002 lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI cmd/status register: 00100546 lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E capabilities and status follow: lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E 00: 0012a010 00008fa1 00002930 00400883 lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E 10: 10810000 00000000 00000000 00000000 lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E 20: 00000000 lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E AER capability register set follows: lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E AER 00: 32820001 00000000 00000000 00462030 lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E AER 10: 00000000 00002000 000001e0 00000000 lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E AER 30: 00000000 00000000 lis 02 11:54:00 talos.danny.cz kernel: EEH: of node=0000:01:00.0 lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI device/vendor: 67e31002 lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI cmd/status register: 00100546 lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E capabilities and status follow: lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E 00: 0012a010 00008fa1 00002930 00400883 lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E 10: 10810000 00000000 00000000 00000000 lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E 20: 00000000 lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E AER capability register set follows: lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E AER 00: 20020001 00000000 00000000 00462030 lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E AER 10: 00000000 00002000 000001e0 00000000 lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E AER 30: 00000000 00000000 lis 02 11:54:00 talos.danny.cz kernel: PHB4 PHB#0 Diag-data (Version: 1) lis 02 11:54:00 talos.danny.cz kernel: brdgCtl: 00000002 lis 02 11:54:00 talos.danny.cz kernel: RootSts: 00060020 00402000 a0810008 00100107 00000800 lis 02 11:54:00 talos.danny.cz kernel: PhbSts: 0000001c00000000 0000001c00000000 lis 02 11:54:00 talos.danny.cz kernel: Lem: 0000000100000080 0000000000000000 0000000000000080 lis 02 11:54:00 talos.danny.cz kernel: PhbErr: 0000028000000000 0000020000000000 2148000098000240 a008400000000000 lis 02 11:54:00 talos.danny.cz kernel: RxeTceErr: 2000000000000000 2000000000000000 c000000000000000 0000000000000000 lis 02 11:54:00 talos.danny.cz kernel: PblErr: 0000000000020000 0000000000020000 0000000000000000 0000000000000000 lis 02 11:54:00 talos.danny.cz kernel: RegbErr: 0000004000000000 0000004000000000 8800001c00000000 0000000000000200 lis 02 11:54:00 talos.danny.cz kernel: PE[000] A/B: 8300b03800000000 8000000000000000 lis 02 11:54:00 talos.danny.cz kernel: EEH: Reset with hotplug activity lis 02 11:54:00 talos.danny.cz kernel: iommu: Removing device 0000:01:00.1 from group 0 lis 02 11:54:00 talos.danny.cz kernel: pci 0000:01:00.1: Dropping the link to 0000:01:00.0 lis 02 11:54:00 talos.danny.cz kernel: [drm] amdgpu: finishing device. lis 02 11:54:04 talos.danny.cz kernel: EEH: 2100000 reads ignored for recovering device at location=unknown driver=amdgpu pci addr=0000:01:00.0 lis 02 11:54:04 talos.danny.cz kernel: EEH: Might be infinite loop in amdgpu driver lis 02 11:54:04 talos.danny.cz kernel: CPU: 8 PID: 335 Comm: eehd Not tainted 4.19.0-1.fc30.op.1.ppc64le #1 lis 02 11:54:04 talos.danny.cz kernel: Call Trace: lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8516ed0] [c000000000be3f9c] dump_stack+0xb0/0xf4 (unreliable) lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8516f10] [c000000000040640] eeh_dev_check_failure+0x3b0/0x5d0 lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8516fb0] [c0000000000408ec] eeh_check_failure+0x8c/0xd0 lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8516ff0] [c00800000de61998] amdgpu_mm_rreg+0x240/0x2a0 [amdgpu] lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517050] [c00800000de68904] cail_reg_read+0x2c/0x50 [amdgpu] lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517070] [c00800000de7123c] atom_get_src_int+0x104/0xa00 [amdgpu] lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517120] [c00800000de72b10] atom_op_test+0xd8/0x1d0 [amdgpu] lis 02 11:54:04 talos.danny.cz kernel: [c0000007f85171b0] [c00800000de74d7c] amdgpu_atom_execute_table_locked+0x204/0x380 [amdgpu] lis 02 11:54:04 talos.danny.cz kernel: [c0000007f85172a0] [c00800000de75020] atom_op_calltable+0x128/0x1e0 [amdgpu] lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517320] [c00800000de74d7c] amdgpu_atom_execute_table_locked+0x204/0x380 [amdgpu] lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517410] [c00800000de758f8] amdgpu_atom_execute_table+0x70/0xb0 [amdgpu] lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517450] [c00800000de96fc0] amdgpu_atombios_encoder_setup_dig_transmitter+0x1d8/0xc10 [amdgpu] lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517540] [c00800000de97e58] amdgpu_atombios_encoder_dpms+0x1a0/0x5a0 [amdgpu] lis 02 11:54:04 talos.danny.cz kernel: [c0000007f85175d0] [c00800000df255a4] dce_v11_0_encoder_disable+0x2c/0x160 [amdgpu] lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517640] [c00800000d4803e8] drm_encoder_disable+0x60/0xc0 [drm_kms_helper] lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517670] [c00800000d4804c8] __drm_helper_disable_unused_functions+0x80/0x160 [drm_kms_helper] lis 02 11:54:04 talos.danny.cz kernel: [c0000007f85176b0] [c00800000d481ad0] drm_crtc_helper_set_config+0x978/0xb70 [drm_kms_helper] lis 02 11:54:04 talos.danny.cz kernel: [c0000007f85177c0] [c00800000de7f958] amdgpu_display_crtc_set_config+0x70/0x1c0 [amdgpu] lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517800] [c00800000d0ff274] __drm_mode_set_config_internal+0xac/0x1a0 [drm] lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517850] [c00800000d0ff450] drm_crtc_force_disable+0x88/0xa0 [drm] lis 02 11:54:04 talos.danny.cz kernel: [c0000007f85178a0] [c00800000d0ff4e4] drm_crtc_force_disable_all+0x7c/0x100 [drm] lis 02 11:54:04 talos.danny.cz kernel: [c0000007f85178e0] [c00800000e0553f4] amdgpu_device_fini+0xa0/0x628 [amdgpu] lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517990] [c00800000de67b04] amdgpu_driver_unload_kms+0x6c/0x100 [amdgpu] lis 02 11:54:04 talos.danny.cz kernel: [c0000007f85179c0] [c00800000d0fa978] drm_dev_unregister+0x80/0x170 [drm] lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517a00] [c00800000de6055c] amdgpu_pci_remove+0x34/0x80 [amdgpu] lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517a30] [c0000000006d91dc] pci_device_remove+0x6c/0x120 lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517a70] [c000000000790410] device_release_driver_internal+0x290/0x370 lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517ac0] [c0000000006cc718] pci_stop_bus_device+0xb8/0x110 lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517b00] [c0000000006cc918] pci_stop_and_remove_bus_device+0x28/0x40 lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517b30] [c000000000066ac0] pci_hp_remove_devices+0x90/0x130 lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517bc0] [c000000000045f40] eeh_reset_device+0xa0/0x1f4 lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517c50] [c0000000000455c8] eeh_handle_normal_event+0x2b8/0x650 lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517d10] [c000000000046710] eeh_event_handler+0x1c0/0x1e0 lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517dc0] [c00000000014900c] kthread+0x1ac/0x1c0 lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517e30] [c00000000000bdd4] ret_from_kernel_thread+0x5c/0x68 lis 02 11:54:05 talos.danny.cz kernel: [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 5secs aborting lis 02 11:54:05 talos.danny.cz kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing D860 (len 824, WS 0, PS 0) @ 0xD9E0 lis 02 11:54:05 talos.danny.cz kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing D71A (len 326, WS 0, PS 0) @ 0xD80A lis 02 11:54:07 talos.danny.cz kernel: EEH: 4200000 reads ignored for recovering device at location=unknown driver=amdgpu pci addr=0000:01:00.0 lis 02 11:54:07 talos.danny.cz kernel: EEH: Might be infinite loop in amdgpu driver lis 02 11:54:07 talos.danny.cz kernel: CPU: 9 PID: 335 Comm: eehd Not tainted 4.19.0-1.fc30.op.1.ppc64le #1 lis 02 11:54:07 talos.danny.cz kernel: Call Trace: lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517120] [c000000000be3f9c] dump_stack+0xb0/0xf4 (unreliable) lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517160] [c000000000040640] eeh_dev_check_failure+0x3b0/0x5d0 lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517200] [c0000000000408ec] eeh_check_failure+0x8c/0xd0 lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517240] [c00800000de61998] amdgpu_mm_rreg+0x240/0x2a0 [amdgpu] lis 02 11:54:07 talos.danny.cz kernel: [c0000007f85172a0] [c00800000de68904] cail_reg_read+0x2c/0x50 [amdgpu] lis 02 11:54:07 talos.danny.cz kernel: [c0000007f85172c0] [c00800000de7123c] atom_get_src_int+0x104/0xa00 [amdgpu] lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517370] [c00800000de72b10] atom_op_test+0xd8/0x1d0 [amdgpu] lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517400] [c00800000de74d7c] amdgpu_atom_execute_table_locked+0x204/0x380 [amdgpu] lis 02 11:54:07 talos.danny.cz kernel: [c0000007f85174f0] [c00800000de758f8] amdgpu_atom_execute_table+0x70/0xb0 [amdgpu] lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517530] [c00800000de6c154] amdgpu_atombios_crtc_blank+0x4c/0x70 [amdgpu] lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517560] [c00800000df23ff8] dce_v11_0_crtc_dpms+0x170/0x1b0 [amdgpu] lis 02 11:54:07 talos.danny.cz kernel: [c0000007f85175a0] [c00800000df28e50] dce_v11_0_crtc_disable+0x38/0x2e0 [amdgpu] lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517670] [c00800000d48050c] __drm_helper_disable_unused_functions+0xc4/0x160 [drm_kms_helper] lis 02 11:54:07 talos.danny.cz kernel: [c0000007f85176b0] [c00800000d481ad0] drm_crtc_helper_set_config+0x978/0xb70 [drm_kms_helper] lis 02 11:54:07 talos.danny.cz kernel: [c0000007f85177c0] [c00800000de7f958] amdgpu_display_crtc_set_config+0x70/0x1c0 [amdgpu] lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517800] [c00800000d0ff274] __drm_mode_set_config_internal+0xac/0x1a0 [drm] lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517850] [c00800000d0ff450] drm_crtc_force_disable+0x88/0xa0 [drm] lis 02 11:54:07 talos.danny.cz kernel: [c0000007f85178a0] [c00800000d0ff4e4] drm_crtc_force_disable_all+0x7c/0x100 [drm] lis 02 11:54:07 talos.danny.cz kernel: [c0000007f85178e0] [c00800000e0553f4] amdgpu_device_fini+0xa0/0x628 [amdgpu] lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517990] [c00800000de67b04] amdgpu_driver_unload_kms+0x6c/0x100 [amdgpu] lis 02 11:54:07 talos.danny.cz kernel: [c0000007f85179c0] [c00800000d0fa978] drm_dev_unregister+0x80/0x170 [drm] lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517a00] [c00800000de6055c] amdgpu_pci_remove+0x34/0x80 [amdgpu] lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517a30] [c0000000006d91dc] pci_device_remove+0x6c/0x120 lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517a70] [c000000000790410] device_release_driver_internal+0x290/0x370 lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517ac0] [c0000000006cc718] pci_stop_bus_device+0xb8/0x110 lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517b00] [c0000000006cc918] pci_stop_and_remove_bus_device+0x28/0x40 lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517b30] [c000000000066ac0] pci_hp_remove_devices+0x90/0x130 lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517bc0] [c000000000045f40] eeh_reset_device+0xa0/0x1f4 lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517c50] [c0000000000455c8] eeh_handle_normal_event+0x2b8/0x650 lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517d10] [c000000000046710] eeh_event_handler+0x1c0/0x1e0 lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517dc0] [c00000000014900c] kthread+0x1ac/0x1c0 lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517e30] [c00000000000bdd4] ret_from_kernel_thread+0x5c/0x68 lis 02 11:54:08 talos.danny.cz kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=134395, emitted seq=134397 lis 02 11:54:08 talos.danny.cz kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=54428, emitted seq=54430 lis 02 11:54:08 talos.danny.cz kernel: [drm] GPU recovery disabled. lis 02 11:54:08 talos.danny.cz kernel: [drm] GPU recovery disabled. lis 02 11:54:10 talos.danny.cz kernel: [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 5secs aborting lis 02 11:54:10 talos.danny.cz kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing C1E0 (len 116, WS 0, PS 0) @ 0xC22D lis 02 11:54:10 talos.danny.cz kernel: EEH: 6300000 reads ignored for recovering device at location=unknown driver=amdgpu pci addr=0000:01:00.0 lis 02 11:54:10 talos.danny.cz kernel: EEH: Might be infinite loop in amdgpu driver lis 02 11:54:10 talos.danny.cz kernel: CPU: 11 PID: 335 Comm: eehd Not tainted 4.19.0-1.fc30.op.1.ppc64le #1 lis 02 11:54:10 talos.danny.cz kernel: Call Trace: lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517120] [c000000000be3f9c] dump_stack+0xb0/0xf4 (unreliable) lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517160] [c000000000040640] eeh_dev_check_failure+0x3b0/0x5d0 lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517200] [c0000000000408ec] eeh_check_failure+0x8c/0xd0 lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517240] [c00800000de61998] amdgpu_mm_rreg+0x240/0x2a0 [amdgpu] lis 02 11:54:10 talos.danny.cz kernel: [c0000007f85172a0] [c00800000de68904] cail_reg_read+0x2c/0x50 [amdgpu] lis 02 11:54:10 talos.danny.cz kernel: [c0000007f85172c0] [c00800000de7123c] atom_get_src_int+0x104/0xa00 [amdgpu] lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517370] [c00800000de72b10] atom_op_test+0xd8/0x1d0 [amdgpu] lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517400] [c00800000de74d7c] amdgpu_atom_execute_table_locked+0x204/0x380 [amdgpu] lis 02 11:54:10 talos.danny.cz kernel: [c0000007f85174f0] [c00800000de758f8] amdgpu_atom_execute_table+0x70/0xb0 [amdgpu] lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517530] [c00800000de6c0e0] amdgpu_atombios_crtc_enable+0x48/0x70 [amdgpu] lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517560] [c00800000df24014] dce_v11_0_crtc_dpms+0x18c/0x1b0 [amdgpu] lis 02 11:54:10 talos.danny.cz kernel: [c0000007f85175a0] [c00800000df28e50] dce_v11_0_crtc_disable+0x38/0x2e0 [amdgpu] lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517670] [c00800000d48050c] __drm_helper_disable_unused_functions+0xc4/0x160 [drm_kms_helper] lis 02 11:54:10 talos.danny.cz kernel: [c0000007f85176b0] [c00800000d481ad0] drm_crtc_helper_set_config+0x978/0xb70 [drm_kms_helper] lis 02 11:54:10 talos.danny.cz kernel: [c0000007f85177c0] [c00800000de7f958] amdgpu_display_crtc_set_config+0x70/0x1c0 [amdgpu] lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517800] [c00800000d0ff274] __drm_mode_set_config_internal+0xac/0x1a0 [drm] lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517850] [c00800000d0ff450] drm_crtc_force_disable+0x88/0xa0 [drm] lis 02 11:54:10 talos.danny.cz kernel: [c0000007f85178a0] [c00800000d0ff4e4] drm_crtc_force_disable_all+0x7c/0x100 [drm] lis 02 11:54:10 talos.danny.cz kernel: [c0000007f85178e0] [c00800000e0553f4] amdgpu_device_fini+0xa0/0x628 [amdgpu] lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517990] [c00800000de67b04] amdgpu_driver_unload_kms+0x6c/0x100 [amdgpu] lis 02 11:54:10 talos.danny.cz kernel: [c0000007f85179c0] [c00800000d0fa978] drm_dev_unregister+0x80/0x170 [drm] lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517a00] [c00800000de6055c] amdgpu_pci_remove+0x34/0x80 [amdgpu] lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517a30] [c0000000006d91dc] pci_device_remove+0x6c/0x120 lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517a70] [c000000000790410] device_release_driver_internal+0x290/0x370 lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517ac0] [c0000000006cc718] pci_stop_bus_device+0xb8/0x110 lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517b00] [c0000000006cc918] pci_stop_and_remove_bus_device+0x28/0x40 lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517b30] [c000000000066ac0] pci_hp_remove_devices+0x90/0x130 lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517bc0] [c000000000045f40] eeh_reset_device+0xa0/0x1f4 lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517c50] [c0000000000455c8] eeh_handle_normal_event+0x2b8/0x650 lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517d10] [c000000000046710] eeh_event_handler+0x1c0/0x1e0 lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517dc0] [c00000000014900c] kthread+0x1ac/0x1c0 lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517e30] [c00000000000bdd4] ret_from_kernel_thread+0x5c/0x68 lis 02 11:54:13 talos.danny.cz kernel: EEH: 8400000 reads ignored for recovering device at location=unknown driver=amdgpu pci addr=0000:01:00.0 lis 02 11:54:13 talos.danny.cz kernel: EEH: Might be infinite loop in amdgpu driver lis 02 11:54:13 talos.danny.cz kernel: CPU: 11 PID: 335 Comm: eehd Not tainted 4.19.0-1.fc30.op.1.ppc64le #1 lis 02 11:54:13 talos.danny.cz kernel: Call Trace: lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517120] [c000000000be3f9c] dump_stack+0xb0/0xf4 (unreliable) lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517160] [c000000000040640] eeh_dev_check_failure+0x3b0/0x5d0 lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517200] [c0000000000408ec] eeh_check_failure+0x8c/0xd0 lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517240] [c00800000de61998] amdgpu_mm_rreg+0x240/0x2a0 [amdgpu] lis 02 11:54:13 talos.danny.cz kernel: [c0000007f85172a0] [c00800000de68904] cail_reg_read+0x2c/0x50 [amdgpu] lis 02 11:54:13 talos.danny.cz kernel: [c0000007f85172c0] [c00800000de7123c] atom_get_src_int+0x104/0xa00 [amdgpu] lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517370] [c00800000de72b10] atom_op_test+0xd8/0x1d0 [amdgpu] lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517400] [c00800000de74d7c] amdgpu_atom_execute_table_locked+0x204/0x380 [amdgpu] lis 02 11:54:13 talos.danny.cz kernel: [c0000007f85174f0] [c00800000de758f8] amdgpu_atom_execute_table+0x70/0xb0 [amdgpu] lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517530] [c00800000de6c0e0] amdgpu_atombios_crtc_enable+0x48/0x70 [amdgpu] lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517560] [c00800000df24014] dce_v11_0_crtc_dpms+0x18c/0x1b0 [amdgpu] lis 02 11:54:13 talos.danny.cz kernel: [c0000007f85175a0] [c00800000df28e50] dce_v11_0_crtc_disable+0x38/0x2e0 [amdgpu] lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517670] [c00800000d48050c] __drm_helper_disable_unused_functions+0xc4/0x160 [drm_kms_helper] lis 02 11:54:13 talos.danny.cz kernel: [c0000007f85176b0] [c00800000d481ad0] drm_crtc_helper_set_config+0x978/0xb70 [drm_kms_helper] lis 02 11:54:13 talos.danny.cz kernel: [c0000007f85177c0] [c00800000de7f958] amdgpu_display_crtc_set_config+0x70/0x1c0 [amdgpu] lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517800] [c00800000d0ff274] __drm_mode_set_config_internal+0xac/0x1a0 [drm] lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517850] [c00800000d0ff450] drm_crtc_force_disable+0x88/0xa0 [drm] lis 02 11:54:13 talos.danny.cz kernel: [c0000007f85178a0] [c00800000d0ff4e4] drm_crtc_force_disable_all+0x7c/0x100 [drm] lis 02 11:54:13 talos.danny.cz kernel: [c0000007f85178e0] [c00800000e0553f4] amdgpu_device_fini+0xa0/0x628 [amdgpu] lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517990] [c00800000de67b04] amdgpu_driver_unload_kms+0x6c/0x100 [amdgpu] lis 02 11:54:13 talos.danny.cz kernel: [c0000007f85179c0] [c00800000d0fa978] drm_dev_unregister+0x80/0x170 [drm] lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517a00] [c00800000de6055c] amdgpu_pci_remove+0x34/0x80 [amdgpu] lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517a30] [c0000000006d91dc] pci_device_remove+0x6c/0x120 lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517a70] [c000000000790410] device_release_driver_internal+0x290/0x370 lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517ac0] [c0000000006cc718] pci_stop_bus_device+0xb8/0x110 lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517b00] [c0000000006cc918] pci_stop_and_remove_bus_device+0x28/0x40 lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517b30] [c000000000066ac0] pci_hp_remove_devices+0x90/0x130 lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517bc0] [c000000000045f40] eeh_reset_device+0xa0/0x1f4 lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517c50] [c0000000000455c8] eeh_handle_normal_event+0x2b8/0x650 lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517d10] [c000000000046710] eeh_event_handler+0x1c0/0x1e0 lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517dc0] [c00000000014900c] kthread+0x1ac/0x1c0 lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517e30] [c00000000000bdd4] ret_from_kernel_thread+0x5c/0x68 lis 02 11:54:15 talos.danny.cz kernel: [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 5secs aborting lis 02 11:54:15 talos.danny.cz kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing C254 (len 62, WS 0, PS 0) @ 0xC270 lis 02 11:57:13 talos.danny.cz kernel: alsactl[2517]: segfault (11) at 28 nip 122708cfc lr 122708db0 code 1 in alsactl[1226f0000+20000] lis 02 11:57:13 talos.danny.cz kernel: alsactl[2517]: code: 4bfee1f5 e8410018 00000000 01000000 00000280 3c4c0003 3842f220 7c0802a6 lis 02 11:57:13 talos.danny.cz kernel: alsactl[2517]: code: fbc1fff0 f8010010 f821ffc1 7c7e1b78 <81240000> 2f890000 409d0048 fba10028 lis 02 12:01:43 talos.danny.cz kernel: opal-power: Poweroff requested
for the record - this is with Radeon Pro WX 4100
Does this patch help? https://patchwork.freedesktop.org/patch/259364/ Can you bisect which firmware commit (https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git) caused the regression and then narrow down which firmware updated causes the issue? I'd start with the smc firmware and they try the rlc, followed by the CP (mec, me, pfp, ce).
OK, will try both. It crashed twice since last Thursday when I updated the firmware, so it might take me some time to get a better info. There isn't a clear reproducer.
Testing with 4.20-pre kernels is not possible due bug 108754 :-(
I got a new crash today, after ~10 days without an issue. Again it was when I was scrolling a page in Firefox.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/588.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.