Bug 108644 - driver/card crashes with latest polaris11 firmware
Summary: driver/card crashes with latest polaris11 firmware
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: PowerPC Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-03 15:32 UTC by Dan Horák
Modified: 2019-11-19 09:01 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg-20181102 (222.34 KB, text/plain)
2018-11-03 15:32 UTC, Dan Horák
no flags Details

Description Dan Horák 2018-11-03 15:32:43 UTC
Created attachment 142356 [details]
dmesg-20181102

After updating to the latest polaris11 firmware (from linux-firmware-20181008-88.gitc6b6265d.fc28.noarch) I'm experiencing driver/card crashes. This is on a Power9 system running 4.19.0 kernel. There was no such crashes with the previous firmware and all 4.19-pre kernels (and even earlier).

...
lis 02 11:54:00 talos.danny.cz kernel: EEH: Frozen PHB#0-PE#0 detected
lis 02 11:54:00 talos.danny.cz kernel: EEH: PE location: N/A, PHB location: N/A
lis 02 11:54:00 talos.danny.cz kernel: CPU: 35 PID: 3250 Comm: InputThread Not tainted 4.19.0-1.fc30.op.1.ppc64le #1
lis 02 11:54:00 talos.danny.cz kernel: Call Trace:
lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bf810] [c000000000be3f9c] dump_stack+0xb0/0xf4 (unreliable)
lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bf850] [c000000000040738] eeh_dev_check_failure+0x4a8/0x5d0
lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bf8f0] [c0000000000408ec] eeh_check_failure+0x8c/0xd0
lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bf930] [c00800000de61998] amdgpu_mm_rreg+0x240/0x2a0 [amdgpu]
lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bf990] [c00800000df22a68] dce_v11_0_lock_cursor+0x50/0xf0 [amdgpu]
lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bf9d0] [c00800000df236a0] dce_v11_0_crtc_cursor_move+0x38/0x80 [amdgpu]
lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bfa10] [c00800000d11f248] drm_mode_cursor_common+0x1e0/0x2c0 [drm]
lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bfae0] [c00800000d11f6c8] drm_mode_cursor_ioctl+0x50/0x70 [drm]
lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bfb30] [c00800000d0f7f14] drm_ioctl_kernel+0xdc/0x170 [drm]
lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bfb90] [c00800000d0f8414] drm_ioctl+0x20c/0x430 [drm]
lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bfcd0] [c00800000de60078] amdgpu_drm_ioctl+0x70/0xd0 [amdgpu]
lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bfd20] [c00000000040f0f4] do_vfs_ioctl+0xd4/0x8d0
lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bfdc0] [c00000000040f9b4] ksys_ioctl+0xc4/0x110
lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bfe10] [c00000000040fa28] sys_ioctl+0x28/0x80
lis 02 11:54:00 talos.danny.cz kernel: [c0002006af9bfe30] [c00000000000b9e4] system_call+0x5c/0x70
lis 02 11:54:00 talos.danny.cz kernel: EEH: Detected PCI bus error on PHB#0-PE#0
lis 02 11:54:00 talos.danny.cz kernel: EEH: This PCI device has failed 1 times in the last hour and will be permanently disabled after 5 failures.
lis 02 11:54:00 talos.danny.cz kernel: EEH: Notify device drivers to shutdown
lis 02 11:54:00 talos.danny.cz kernel: EEH: Beginning: 'error_detected(IO frozen)'
lis 02 11:54:00 talos.danny.cz kernel: EEH: PE#0 (PCI 0000:01:00.1): driver not EEH aware
lis 02 11:54:00 talos.danny.cz kernel: EEH: PE#0 (PCI 0000:01:00.0): driver not EEH aware
lis 02 11:54:00 talos.danny.cz kernel: EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'none'
lis 02 11:54:00 talos.danny.cz kernel: EEH: Collect temporary log
lis 02 11:54:00 talos.danny.cz kernel: EEH: of node=0000:01:00.1
lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI device/vendor: aae01002
lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI cmd/status register: 00100546
lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E capabilities and status follow:
lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E 00: 0012a010 00008fa1 00002930 00400883 
lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E 10: 10810000 00000000 00000000 00000000 
lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E 20: 00000000 
lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E AER capability register set follows:
lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E AER 00: 32820001 00000000 00000000 00462030 
lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E AER 10: 00000000 00002000 000001e0 00000000 
lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 
lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E AER 30: 00000000 00000000 
lis 02 11:54:00 talos.danny.cz kernel: EEH: of node=0000:01:00.0
lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI device/vendor: 67e31002
lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI cmd/status register: 00100546
lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E capabilities and status follow:
lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E 00: 0012a010 00008fa1 00002930 00400883 
lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E 10: 10810000 00000000 00000000 00000000 
lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E 20: 00000000 
lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E AER capability register set follows:
lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E AER 00: 20020001 00000000 00000000 00462030 
lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E AER 10: 00000000 00002000 000001e0 00000000 
lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 
lis 02 11:54:00 talos.danny.cz kernel: EEH: PCI-E AER 30: 00000000 00000000 
lis 02 11:54:00 talos.danny.cz kernel: PHB4 PHB#0 Diag-data (Version: 1)
lis 02 11:54:00 talos.danny.cz kernel: brdgCtl:    00000002
lis 02 11:54:00 talos.danny.cz kernel: RootSts:    00060020 00402000 a0810008 00100107 00000800
lis 02 11:54:00 talos.danny.cz kernel: PhbSts:     0000001c00000000 0000001c00000000
lis 02 11:54:00 talos.danny.cz kernel: Lem:        0000000100000080 0000000000000000 0000000000000080
lis 02 11:54:00 talos.danny.cz kernel: PhbErr:     0000028000000000 0000020000000000 2148000098000240 a008400000000000
lis 02 11:54:00 talos.danny.cz kernel: RxeTceErr:  2000000000000000 2000000000000000 c000000000000000 0000000000000000
lis 02 11:54:00 talos.danny.cz kernel: PblErr:     0000000000020000 0000000000020000 0000000000000000 0000000000000000
lis 02 11:54:00 talos.danny.cz kernel: RegbErr:    0000004000000000 0000004000000000 8800001c00000000 0000000000000200
lis 02 11:54:00 talos.danny.cz kernel: PE[000] A/B: 8300b03800000000 8000000000000000
lis 02 11:54:00 talos.danny.cz kernel: EEH: Reset with hotplug activity
lis 02 11:54:00 talos.danny.cz kernel: iommu: Removing device 0000:01:00.1 from group 0
lis 02 11:54:00 talos.danny.cz kernel: pci 0000:01:00.1: Dropping the link to 0000:01:00.0
lis 02 11:54:00 talos.danny.cz kernel: [drm] amdgpu: finishing device.
lis 02 11:54:04 talos.danny.cz kernel: EEH: 2100000 reads ignored for recovering device at location=unknown driver=amdgpu pci addr=0000:01:00.0
lis 02 11:54:04 talos.danny.cz kernel: EEH: Might be infinite loop in amdgpu driver
lis 02 11:54:04 talos.danny.cz kernel: CPU: 8 PID: 335 Comm: eehd Not tainted 4.19.0-1.fc30.op.1.ppc64le #1
lis 02 11:54:04 talos.danny.cz kernel: Call Trace:
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8516ed0] [c000000000be3f9c] dump_stack+0xb0/0xf4 (unreliable)
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8516f10] [c000000000040640] eeh_dev_check_failure+0x3b0/0x5d0
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8516fb0] [c0000000000408ec] eeh_check_failure+0x8c/0xd0
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8516ff0] [c00800000de61998] amdgpu_mm_rreg+0x240/0x2a0 [amdgpu]
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517050] [c00800000de68904] cail_reg_read+0x2c/0x50 [amdgpu]
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517070] [c00800000de7123c] atom_get_src_int+0x104/0xa00 [amdgpu]
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517120] [c00800000de72b10] atom_op_test+0xd8/0x1d0 [amdgpu]
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f85171b0] [c00800000de74d7c] amdgpu_atom_execute_table_locked+0x204/0x380 [amdgpu]
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f85172a0] [c00800000de75020] atom_op_calltable+0x128/0x1e0 [amdgpu]
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517320] [c00800000de74d7c] amdgpu_atom_execute_table_locked+0x204/0x380 [amdgpu]
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517410] [c00800000de758f8] amdgpu_atom_execute_table+0x70/0xb0 [amdgpu]
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517450] [c00800000de96fc0] amdgpu_atombios_encoder_setup_dig_transmitter+0x1d8/0xc10 [amdgpu]
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517540] [c00800000de97e58] amdgpu_atombios_encoder_dpms+0x1a0/0x5a0 [amdgpu]
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f85175d0] [c00800000df255a4] dce_v11_0_encoder_disable+0x2c/0x160 [amdgpu]
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517640] [c00800000d4803e8] drm_encoder_disable+0x60/0xc0 [drm_kms_helper]
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517670] [c00800000d4804c8] __drm_helper_disable_unused_functions+0x80/0x160 [drm_kms_helper]
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f85176b0] [c00800000d481ad0] drm_crtc_helper_set_config+0x978/0xb70 [drm_kms_helper]
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f85177c0] [c00800000de7f958] amdgpu_display_crtc_set_config+0x70/0x1c0 [amdgpu]
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517800] [c00800000d0ff274] __drm_mode_set_config_internal+0xac/0x1a0 [drm]
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517850] [c00800000d0ff450] drm_crtc_force_disable+0x88/0xa0 [drm]
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f85178a0] [c00800000d0ff4e4] drm_crtc_force_disable_all+0x7c/0x100 [drm]
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f85178e0] [c00800000e0553f4] amdgpu_device_fini+0xa0/0x628 [amdgpu]
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517990] [c00800000de67b04] amdgpu_driver_unload_kms+0x6c/0x100 [amdgpu]
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f85179c0] [c00800000d0fa978] drm_dev_unregister+0x80/0x170 [drm]
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517a00] [c00800000de6055c] amdgpu_pci_remove+0x34/0x80 [amdgpu]
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517a30] [c0000000006d91dc] pci_device_remove+0x6c/0x120
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517a70] [c000000000790410] device_release_driver_internal+0x290/0x370
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517ac0] [c0000000006cc718] pci_stop_bus_device+0xb8/0x110
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517b00] [c0000000006cc918] pci_stop_and_remove_bus_device+0x28/0x40
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517b30] [c000000000066ac0] pci_hp_remove_devices+0x90/0x130
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517bc0] [c000000000045f40] eeh_reset_device+0xa0/0x1f4
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517c50] [c0000000000455c8] eeh_handle_normal_event+0x2b8/0x650
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517d10] [c000000000046710] eeh_event_handler+0x1c0/0x1e0
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517dc0] [c00000000014900c] kthread+0x1ac/0x1c0
lis 02 11:54:04 talos.danny.cz kernel: [c0000007f8517e30] [c00000000000bdd4] ret_from_kernel_thread+0x5c/0x68
lis 02 11:54:05 talos.danny.cz kernel: [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 5secs aborting
lis 02 11:54:05 talos.danny.cz kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing D860 (len 824, WS 0, PS 0) @ 0xD9E0
lis 02 11:54:05 talos.danny.cz kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing D71A (len 326, WS 0, PS 0) @ 0xD80A
lis 02 11:54:07 talos.danny.cz kernel: EEH: 4200000 reads ignored for recovering device at location=unknown driver=amdgpu pci addr=0000:01:00.0
lis 02 11:54:07 talos.danny.cz kernel: EEH: Might be infinite loop in amdgpu driver
lis 02 11:54:07 talos.danny.cz kernel: CPU: 9 PID: 335 Comm: eehd Not tainted 4.19.0-1.fc30.op.1.ppc64le #1
lis 02 11:54:07 talos.danny.cz kernel: Call Trace:
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517120] [c000000000be3f9c] dump_stack+0xb0/0xf4 (unreliable)
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517160] [c000000000040640] eeh_dev_check_failure+0x3b0/0x5d0
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517200] [c0000000000408ec] eeh_check_failure+0x8c/0xd0
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517240] [c00800000de61998] amdgpu_mm_rreg+0x240/0x2a0 [amdgpu]
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f85172a0] [c00800000de68904] cail_reg_read+0x2c/0x50 [amdgpu]
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f85172c0] [c00800000de7123c] atom_get_src_int+0x104/0xa00 [amdgpu]
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517370] [c00800000de72b10] atom_op_test+0xd8/0x1d0 [amdgpu]
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517400] [c00800000de74d7c] amdgpu_atom_execute_table_locked+0x204/0x380 [amdgpu]
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f85174f0] [c00800000de758f8] amdgpu_atom_execute_table+0x70/0xb0 [amdgpu]
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517530] [c00800000de6c154] amdgpu_atombios_crtc_blank+0x4c/0x70 [amdgpu]
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517560] [c00800000df23ff8] dce_v11_0_crtc_dpms+0x170/0x1b0 [amdgpu]
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f85175a0] [c00800000df28e50] dce_v11_0_crtc_disable+0x38/0x2e0 [amdgpu]
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517670] [c00800000d48050c] __drm_helper_disable_unused_functions+0xc4/0x160 [drm_kms_helper]
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f85176b0] [c00800000d481ad0] drm_crtc_helper_set_config+0x978/0xb70 [drm_kms_helper]
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f85177c0] [c00800000de7f958] amdgpu_display_crtc_set_config+0x70/0x1c0 [amdgpu]
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517800] [c00800000d0ff274] __drm_mode_set_config_internal+0xac/0x1a0 [drm]
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517850] [c00800000d0ff450] drm_crtc_force_disable+0x88/0xa0 [drm]
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f85178a0] [c00800000d0ff4e4] drm_crtc_force_disable_all+0x7c/0x100 [drm]
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f85178e0] [c00800000e0553f4] amdgpu_device_fini+0xa0/0x628 [amdgpu]
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517990] [c00800000de67b04] amdgpu_driver_unload_kms+0x6c/0x100 [amdgpu]
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f85179c0] [c00800000d0fa978] drm_dev_unregister+0x80/0x170 [drm]
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517a00] [c00800000de6055c] amdgpu_pci_remove+0x34/0x80 [amdgpu]
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517a30] [c0000000006d91dc] pci_device_remove+0x6c/0x120
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517a70] [c000000000790410] device_release_driver_internal+0x290/0x370
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517ac0] [c0000000006cc718] pci_stop_bus_device+0xb8/0x110
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517b00] [c0000000006cc918] pci_stop_and_remove_bus_device+0x28/0x40
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517b30] [c000000000066ac0] pci_hp_remove_devices+0x90/0x130
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517bc0] [c000000000045f40] eeh_reset_device+0xa0/0x1f4
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517c50] [c0000000000455c8] eeh_handle_normal_event+0x2b8/0x650
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517d10] [c000000000046710] eeh_event_handler+0x1c0/0x1e0
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517dc0] [c00000000014900c] kthread+0x1ac/0x1c0
lis 02 11:54:07 talos.danny.cz kernel: [c0000007f8517e30] [c00000000000bdd4] ret_from_kernel_thread+0x5c/0x68
lis 02 11:54:08 talos.danny.cz kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=134395, emitted seq=134397
lis 02 11:54:08 talos.danny.cz kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=54428, emitted seq=54430
lis 02 11:54:08 talos.danny.cz kernel: [drm] GPU recovery disabled.
lis 02 11:54:08 talos.danny.cz kernel: [drm] GPU recovery disabled.
lis 02 11:54:10 talos.danny.cz kernel: [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 5secs aborting
lis 02 11:54:10 talos.danny.cz kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing C1E0 (len 116, WS 0, PS 0) @ 0xC22D
lis 02 11:54:10 talos.danny.cz kernel: EEH: 6300000 reads ignored for recovering device at location=unknown driver=amdgpu pci addr=0000:01:00.0
lis 02 11:54:10 talos.danny.cz kernel: EEH: Might be infinite loop in amdgpu driver
lis 02 11:54:10 talos.danny.cz kernel: CPU: 11 PID: 335 Comm: eehd Not tainted 4.19.0-1.fc30.op.1.ppc64le #1
lis 02 11:54:10 talos.danny.cz kernel: Call Trace:
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517120] [c000000000be3f9c] dump_stack+0xb0/0xf4 (unreliable)
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517160] [c000000000040640] eeh_dev_check_failure+0x3b0/0x5d0
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517200] [c0000000000408ec] eeh_check_failure+0x8c/0xd0
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517240] [c00800000de61998] amdgpu_mm_rreg+0x240/0x2a0 [amdgpu]
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f85172a0] [c00800000de68904] cail_reg_read+0x2c/0x50 [amdgpu]
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f85172c0] [c00800000de7123c] atom_get_src_int+0x104/0xa00 [amdgpu]
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517370] [c00800000de72b10] atom_op_test+0xd8/0x1d0 [amdgpu]
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517400] [c00800000de74d7c] amdgpu_atom_execute_table_locked+0x204/0x380 [amdgpu]
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f85174f0] [c00800000de758f8] amdgpu_atom_execute_table+0x70/0xb0 [amdgpu]
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517530] [c00800000de6c0e0] amdgpu_atombios_crtc_enable+0x48/0x70 [amdgpu]
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517560] [c00800000df24014] dce_v11_0_crtc_dpms+0x18c/0x1b0 [amdgpu]
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f85175a0] [c00800000df28e50] dce_v11_0_crtc_disable+0x38/0x2e0 [amdgpu]
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517670] [c00800000d48050c] __drm_helper_disable_unused_functions+0xc4/0x160 [drm_kms_helper]
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f85176b0] [c00800000d481ad0] drm_crtc_helper_set_config+0x978/0xb70 [drm_kms_helper]
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f85177c0] [c00800000de7f958] amdgpu_display_crtc_set_config+0x70/0x1c0 [amdgpu]
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517800] [c00800000d0ff274] __drm_mode_set_config_internal+0xac/0x1a0 [drm]
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517850] [c00800000d0ff450] drm_crtc_force_disable+0x88/0xa0 [drm]
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f85178a0] [c00800000d0ff4e4] drm_crtc_force_disable_all+0x7c/0x100 [drm]
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f85178e0] [c00800000e0553f4] amdgpu_device_fini+0xa0/0x628 [amdgpu]
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517990] [c00800000de67b04] amdgpu_driver_unload_kms+0x6c/0x100 [amdgpu]
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f85179c0] [c00800000d0fa978] drm_dev_unregister+0x80/0x170 [drm]
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517a00] [c00800000de6055c] amdgpu_pci_remove+0x34/0x80 [amdgpu]
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517a30] [c0000000006d91dc] pci_device_remove+0x6c/0x120
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517a70] [c000000000790410] device_release_driver_internal+0x290/0x370
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517ac0] [c0000000006cc718] pci_stop_bus_device+0xb8/0x110
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517b00] [c0000000006cc918] pci_stop_and_remove_bus_device+0x28/0x40
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517b30] [c000000000066ac0] pci_hp_remove_devices+0x90/0x130
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517bc0] [c000000000045f40] eeh_reset_device+0xa0/0x1f4
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517c50] [c0000000000455c8] eeh_handle_normal_event+0x2b8/0x650
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517d10] [c000000000046710] eeh_event_handler+0x1c0/0x1e0
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517dc0] [c00000000014900c] kthread+0x1ac/0x1c0
lis 02 11:54:10 talos.danny.cz kernel: [c0000007f8517e30] [c00000000000bdd4] ret_from_kernel_thread+0x5c/0x68
lis 02 11:54:13 talos.danny.cz kernel: EEH: 8400000 reads ignored for recovering device at location=unknown driver=amdgpu pci addr=0000:01:00.0
lis 02 11:54:13 talos.danny.cz kernel: EEH: Might be infinite loop in amdgpu driver
lis 02 11:54:13 talos.danny.cz kernel: CPU: 11 PID: 335 Comm: eehd Not tainted 4.19.0-1.fc30.op.1.ppc64le #1
lis 02 11:54:13 talos.danny.cz kernel: Call Trace:
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517120] [c000000000be3f9c] dump_stack+0xb0/0xf4 (unreliable)
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517160] [c000000000040640] eeh_dev_check_failure+0x3b0/0x5d0
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517200] [c0000000000408ec] eeh_check_failure+0x8c/0xd0
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517240] [c00800000de61998] amdgpu_mm_rreg+0x240/0x2a0 [amdgpu]
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f85172a0] [c00800000de68904] cail_reg_read+0x2c/0x50 [amdgpu]
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f85172c0] [c00800000de7123c] atom_get_src_int+0x104/0xa00 [amdgpu]
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517370] [c00800000de72b10] atom_op_test+0xd8/0x1d0 [amdgpu]
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517400] [c00800000de74d7c] amdgpu_atom_execute_table_locked+0x204/0x380 [amdgpu]
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f85174f0] [c00800000de758f8] amdgpu_atom_execute_table+0x70/0xb0 [amdgpu]
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517530] [c00800000de6c0e0] amdgpu_atombios_crtc_enable+0x48/0x70 [amdgpu]
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517560] [c00800000df24014] dce_v11_0_crtc_dpms+0x18c/0x1b0 [amdgpu]
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f85175a0] [c00800000df28e50] dce_v11_0_crtc_disable+0x38/0x2e0 [amdgpu]
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517670] [c00800000d48050c] __drm_helper_disable_unused_functions+0xc4/0x160 [drm_kms_helper]
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f85176b0] [c00800000d481ad0] drm_crtc_helper_set_config+0x978/0xb70 [drm_kms_helper]
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f85177c0] [c00800000de7f958] amdgpu_display_crtc_set_config+0x70/0x1c0 [amdgpu]
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517800] [c00800000d0ff274] __drm_mode_set_config_internal+0xac/0x1a0 [drm]
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517850] [c00800000d0ff450] drm_crtc_force_disable+0x88/0xa0 [drm]
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f85178a0] [c00800000d0ff4e4] drm_crtc_force_disable_all+0x7c/0x100 [drm]
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f85178e0] [c00800000e0553f4] amdgpu_device_fini+0xa0/0x628 [amdgpu]
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517990] [c00800000de67b04] amdgpu_driver_unload_kms+0x6c/0x100 [amdgpu]
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f85179c0] [c00800000d0fa978] drm_dev_unregister+0x80/0x170 [drm]
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517a00] [c00800000de6055c] amdgpu_pci_remove+0x34/0x80 [amdgpu]
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517a30] [c0000000006d91dc] pci_device_remove+0x6c/0x120
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517a70] [c000000000790410] device_release_driver_internal+0x290/0x370
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517ac0] [c0000000006cc718] pci_stop_bus_device+0xb8/0x110
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517b00] [c0000000006cc918] pci_stop_and_remove_bus_device+0x28/0x40
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517b30] [c000000000066ac0] pci_hp_remove_devices+0x90/0x130
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517bc0] [c000000000045f40] eeh_reset_device+0xa0/0x1f4
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517c50] [c0000000000455c8] eeh_handle_normal_event+0x2b8/0x650
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517d10] [c000000000046710] eeh_event_handler+0x1c0/0x1e0
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517dc0] [c00000000014900c] kthread+0x1ac/0x1c0
lis 02 11:54:13 talos.danny.cz kernel: [c0000007f8517e30] [c00000000000bdd4] ret_from_kernel_thread+0x5c/0x68
lis 02 11:54:15 talos.danny.cz kernel: [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 5secs aborting
lis 02 11:54:15 talos.danny.cz kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing C254 (len 62, WS 0, PS 0) @ 0xC270
lis 02 11:57:13 talos.danny.cz kernel: alsactl[2517]: segfault (11) at 28 nip 122708cfc lr 122708db0 code 1 in alsactl[1226f0000+20000]
lis 02 11:57:13 talos.danny.cz kernel: alsactl[2517]: code: 4bfee1f5 e8410018 00000000 01000000 00000280 3c4c0003 3842f220 7c0802a6 
lis 02 11:57:13 talos.danny.cz kernel: alsactl[2517]: code: fbc1fff0 f8010010 f821ffc1 7c7e1b78 <81240000> 2f890000 409d0048 fba10028 
lis 02 12:01:43 talos.danny.cz kernel: opal-power: Poweroff requested
Comment 1 Dan Horák 2018-11-05 16:31:01 UTC
for the record - this is with Radeon Pro WX 4100
Comment 2 Alex Deucher 2018-11-05 16:45:43 UTC
Does this patch help?
https://patchwork.freedesktop.org/patch/259364/

Can you bisect which firmware commit (https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git) caused the regression and then narrow down which firmware updated causes the issue?  I'd start with the smc firmware and they try the rlc, followed by the CP (mec, me, pfp, ce).
Comment 3 Dan Horák 2018-11-05 17:26:07 UTC
OK, will try both.

It crashed twice since last Thursday when I updated the firmware, so it might take me some time to get a better info. There isn't a clear reproducer.
Comment 4 Dan Horák 2018-11-15 12:35:40 UTC
Testing with 4.20-pre kernels is not possible due bug 108754 :-(
Comment 5 Dan Horák 2018-11-15 12:37:36 UTC
I got a new crash today, after ~10 days without an issue. Again it was when I was scrolling a page in Firefox.
Comment 6 Martin Peres 2019-11-19 09:01:50 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/588.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.