Env: kernel 5.0.13, AMD rx580 GPU 8GB We run about 32 game soft on the GPU concurrently and run a media encoder soft on VCE by vaapi at same time. We meet the kernel crash, after runing 3 to 7 days. . We meet such crash 5 times. We had enabled kdump ,if you need other kernel dump info, we can upload them Log: [172936.893428] binder_dkms: binder_deferred_func, binder_index = 12 [172937.052608] pci_generic_config_write32: 138 callbacks suppressed [172937.052615] pci_bus 000d:30: 2-byte config write to 000d:30:00.0 offset 0x4 may corrupt adjacent RW1C bits [172937.052633] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 offset 0x44 may corrupt adjacent RW1C bits [172937.054110] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.062690] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.069361] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 offset 0x78 may corrupt adjacent RW1C bits [172937.071029] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 offset 0x80 may corrupt adjacent RW1C bits [172937.071034] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 offset 0x8c may corrupt adjacent RW1C bits [172937.071038] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 offset 0x98 may corrupt adjacent RW1C bits [172937.071042] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 offset 0xa0 may corrupt adjacent RW1C bits [172937.071083] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 offset 0x44 may corrupt adjacent RW1C bits [172937.071091] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.071094] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 offset 0x44 may corrupt adjacent RW1C bits [172937.071110] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 offset 0x4 may corrupt adjacent RW1C bits [172937.079477] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.087723] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.095955] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.104270] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.112418] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.120490] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.128525] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.136557] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.144446] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.152254] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.160039] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.167779] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.175490] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.176747] binder_dkms: binder_defer_work 12 [172937.183035] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.183075] binder_dkms: binder_deferred_func, binder_index = 12 [172937.355641] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.362028] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.368253] pcieport 0004:48:00.0: can't derive routing for PCI INT A [172937.368425] megaraid_sas 0004:49:00.0: PCI INT A: no GSI [172937.368585] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.370828] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx> [172937.375070] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.375167] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.375190] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.381702] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.381732] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx> [172937.387959] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.387990] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.394171] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.394186] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx> [172937.400498] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.400523] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.406755] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.406775] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx> [172937.412977] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.412998] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.419250] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.419273] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx> [172937.422361] [drm] schedsdma0 is not ready, skipping [172937.422363] [drm] schedsdma1 is not ready, skipping [172937.422448] [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-2) [172937.425437] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.425450] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.431814] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.431837] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx> [172937.438038] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.438054] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.450635] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.454268] pcieport 0002:e8:00.0: can't derive routing for PCI INT B [172937.454272] ixgbe 0002:e9:00.1: PCI INT B: no GSI [172937.456923] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.456967] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx> [172937.463122] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.463148] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.469428] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.469455] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx> [172937.475705] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.475722] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.481928] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.481946] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx> [172937.488092] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.488108] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.494407] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.494440] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx> [172937.500663] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.500678] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.506909] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.506930] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx> [172937.511392] [drm] schedsdma0 is not ready, skipping [172937.511394] [drm] schedsdma1 is not ready, skipping [172937.511481] [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-2) [172937.512346] Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000000000008 [172937.512348] Mem abort info: [172937.512350]???? ESR = 0x96000004 [172937.512352]???? Exception class = DABT (current EL), IL = 32 bits [172937.512353]???? SET = 0, FnV = 0 [172937.512354]???? EA = 0, S1PTW = 0 [172937.512355] Data abort info: [172937.512356]???? ISV = 0, ISS = 0x00000004 [172937.512357]???? CM = 0, WnR = 0 [172937.512359] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000fb340bc6 [172937.512361] [0000000000000008] pgd=000000139dfe9003, pud=00000015d583a003, pmd=0000000000000000 [172937.512367] Internal error: Oops: 96000004 [#1] SMP [172937.512370] Modules linked in: nfnetlink_log veth ipt_REJECT nf_reject_ipv4 xt_comment xt_mark xt_nat xt_tcpudp ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xt_conntrack br_netfilter bridge stp llc iptable_filter xt_addrtype iptable_nat nf_nat_ipv4 nf_nat bpfilter ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 overlay nls_iso8859_1 joydev input_leds snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer ipmi_ssif snd ipmi_si soundcore ipmi_devintf ipmi_msghandler tcp_bbr sch_fq ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi binder_dkms(OE) ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear hibmc_drm hid_generic usbhid hid ses enclosure marvell aes_ce_blk aes_ce_cipher amdgpu chash i2c_algo_bit gpu_sched ttm crct10dif_ce drm_kms_helper ghash_ce syscopyarea sha2_ce [172937.512432]?? sysfillrect sysimgblt fb_sys_fops sha256_arm64 sha1_ce ixgbe drm hisi_sas_v2_hw hisi_sas_main megaraid_sas xfrm_algo libsas mdio ehci_platform scsi_transport_sas hns_dsaf hns_enet_drv hns_mdio hnae aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 [172937.512448] Process RenderThread (pid: 1569015, stack limit = 0x00000000349701c4) [172937.512451] CPU: 23 PID: 1569015 Comm: RenderThread Kdump: loaded Tainted: G???????????????????? OE???????? 5.0.13-1905061257-generic #appstream [172937.512453] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.58 10/24/2018 [172937.512454] pstate: 80400005 (Nzcv daif +PAN -UAO) [172937.512531] pc : amdgpu_vm_bo_update_mapping+0x120/0x3a0 [amdgpu] [172937.512603] lr : amdgpu_vm_bo_update+0x2a4/0x6b8 [amdgpu] [172937.512604] sp : ffff0000c0e4b8c0 [172937.512605] x29: ffff0000c0e4b8c0 x28: ffff801fd3010000 [172937.512608] x27: 0000000000000001 x26: ffff80161d777000 [172937.512610] x25: 0000000000100d1f x24: 0000000000100d00 [172937.512612] x23: 0000000000000000 x22: ffff809533c54f00 [172937.512614] x21: 0000000000000037 x20: ffff0000116cc000 [172937.512616] x19: 000000000000000a x18: 0000000000000000 [172937.512618] x17: 0000000000000000 x16: 0000000000000000 [172937.512620] x15: 0000000000000000 x14: 00000003000000b0 [172937.512623] x13: 0000000600000240 x12: 0000000000000000 [172937.512624] x11: 000000060000018d x10: 0000000000000040 [172937.512627] x9 : 0000000000000000 x8 : ffff0000c0e4b860 [172937.512628] x7 : 0000000000000020 x6 : 000000000000001f [172937.512631] x5 : 0000000000100d1f x4 : 0000000000100d00 [172937.512633] x3 : 0000000000000000 x2 : 000000000000000b [172937.512636] x1 : 0000000000000000 x0 : ffff80161d776000[172937.512638] Call trace: [172937.512711]?? amdgpu_vm_bo_update_mapping+0x120/0x3a0 [amdgpu] [172937.512784]?? amdgpu_vm_bo_update+0x2a4/0x6b8 [amdgpu] [172937.512857]?? amdgpu_cs_ioctl+0xcbc/0x14a8 [amdgpu] [172937.512882]?? drm_ioctl_kernel+0x90/0x100 [drm] [172937.512904]?? drm_ioctl+0x1ec/0x418 [drm] [172937.512977]?? amdgpu_drm_ioctl+0x58/0x90 [amdgpu] [172937.513055] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.513136]?? amdgpu_kms_compat_ioctl+0x40/0x68 [amdgpu] [172937.513140] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.513220] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.513235] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.513311] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.513323] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.513412] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.513421] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.513497] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.513513] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.513587] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.513603] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.513677] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.513687] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.513761] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.522750] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx> [172937.522822] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22) [172937.525171]?? __arm64_compat_sys_ioctl+0x144/0x410 [172937.525177]?? el0_svc_common+0x78/0x120 [172937.525179]?? el0_svc_compat_handler+0x30/0x40 [172937.525182]?? el0_svc_compat+0x8/0x34 [172937.525187] Code: f9406b41 b966c793 f941e800 71002e7f (f9400421) [172937.525195] SMP: stopping secondary CPUs [172937.526945] Starting crashdump kernel... [172937.526952] Bye!
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/826.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.