Bug 110887 - 5.0 kernel crash , drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-2)
Summary: 5.0 kernel crash , drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update ...
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: DRI git
Hardware: ARM Linux (All)
: medium critical
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-11 07:07 UTC by wormwang
Modified: 2019-08-22 23:14 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description wormwang 2019-06-11 07:07:43 UTC
Env: kernel 5.0.13, AMD rx580 GPU 8GB


We run about 32 game soft on the GPU concurrently and run a media 
encoder soft on VCE by vaapi at same time.

We meet the kernel crash, after runing 3 to 7 days. . We meet such crash 
5 times. We had enabled kdump ,if you need other kernel dump info, we 
can upload them


Log:

[172936.893428] binder_dkms: binder_deferred_func, binder_index = 12
[172937.052608] pci_generic_config_write32: 138 callbacks suppressed
[172937.052615] pci_bus 000d:30: 2-byte config write to 000d:30:00.0 
offset 0x4 may corrupt adjacent RW1C bits
[172937.052633] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
offset 0x44 may corrupt adjacent RW1C bits
[172937.054110] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.062690] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.069361] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
offset 0x78 may corrupt adjacent RW1C bits
[172937.071029] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
offset 0x80 may corrupt adjacent RW1C bits
[172937.071034] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
offset 0x8c may corrupt adjacent RW1C bits
[172937.071038] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
offset 0x98 may corrupt adjacent RW1C bits
[172937.071042] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
offset 0xa0 may corrupt adjacent RW1C bits
[172937.071083] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
offset 0x44 may corrupt adjacent RW1C bits
[172937.071091] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.071094] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
offset 0x44 may corrupt adjacent RW1C bits
[172937.071110] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
offset 0x4 may corrupt adjacent RW1C bits
[172937.079477] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.087723] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.095955] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.104270] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.112418] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.120490] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.128525] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.136557] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.144446] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.152254] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.160039] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.167779] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.175490] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.176747] binder_dkms: binder_defer_work 12
[172937.183035] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.183075] binder_dkms: binder_deferred_func, binder_index = 12
[172937.355641] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.362028] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.368253] pcieport 0004:48:00.0: can't derive routing for PCI INT A
[172937.368425] megaraid_sas 0004:49:00.0: PCI INT A: no GSI
[172937.368585] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.370828] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx>
[172937.375070] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.375167] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.375190] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.381702] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.381732] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx>
[172937.387959] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.387990] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.394171] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.394186] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx>
[172937.400498] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.400523] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.406755] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.406775] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx>
[172937.412977] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.412998] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.419250] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.419273] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx>
[172937.422361] [drm] schedsdma0 is not ready, skipping
[172937.422363] [drm] schedsdma1 is not ready, skipping
[172937.422448] [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't 
update BO_VA (-2)
[172937.425437] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.425450] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.431814] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.431837] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx>
[172937.438038] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.438054] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.450635] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.454268] pcieport 0002:e8:00.0: can't derive routing for PCI INT B
[172937.454272] ixgbe 0002:e9:00.1: PCI INT B: no GSI
[172937.456923] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.456967] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx>
[172937.463122] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.463148] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.469428] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.469455] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx>
[172937.475705] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.475722] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.481928] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.481946] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx>
[172937.488092] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.488108] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.494407] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.494440] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx>
[172937.500663] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.500678] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.506909] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.506930] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx>
[172937.511392] [drm] schedsdma0 is not ready, skipping
[172937.511394] [drm] schedsdma1 is not ready, skipping
[172937.511481] [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't 
update BO_VA (-2)
[172937.512346] Unable to handle kernel access to user memory outside 
uaccess routines at virtual address 0000000000000008
[172937.512348] Mem abort info:
[172937.512350]???? ESR = 0x96000004
[172937.512352]???? Exception class = DABT (current EL), IL = 32 bits
[172937.512353]???? SET = 0, FnV = 0
[172937.512354]???? EA = 0, S1PTW = 0
[172937.512355] Data abort info:
[172937.512356]???? ISV = 0, ISS = 0x00000004
[172937.512357]???? CM = 0, WnR = 0
[172937.512359] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000fb340bc6
[172937.512361] [0000000000000008] pgd=000000139dfe9003, 
pud=00000015d583a003, pmd=0000000000000000
[172937.512367] Internal error: Oops: 96000004 [#1] SMP
[172937.512370] Modules linked in: nfnetlink_log veth ipt_REJECT 
nf_reject_ipv4 xt_comment xt_mark xt_nat xt_tcpudp ipt_MASQUERADE 
nf_conntrack_netlink nfnetlink xfrm_user xt_conntrack br_netfilter 
bridge stp llc iptable_filter xt_addrtype iptable_nat nf_nat_ipv4 nf_nat 
bpfilter ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs nf_conntrack nf_defrag_ipv6 
nf_defrag_ipv4 overlay nls_iso8859_1 joydev input_leds snd_hda_intel 
snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer ipmi_ssif snd 
ipmi_si soundcore ipmi_devintf ipmi_msghandler tcp_bbr sch_fq ib_iser 
rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi binder_dkms(OE) ip_tables x_tables autofs4 btrfs 
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath 
linear hibmc_drm hid_generic usbhid hid ses enclosure marvell aes_ce_blk 
aes_ce_cipher amdgpu chash i2c_algo_bit gpu_sched ttm crct10dif_ce 
drm_kms_helper ghash_ce syscopyarea sha2_ce
[172937.512432]?? sysfillrect sysimgblt fb_sys_fops sha256_arm64 sha1_ce 
ixgbe drm hisi_sas_v2_hw hisi_sas_main megaraid_sas xfrm_algo libsas 
mdio ehci_platform scsi_transport_sas hns_dsaf hns_enet_drv hns_mdio 
hnae aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64
[172937.512448] Process RenderThread (pid: 1569015, stack limit = 
0x00000000349701c4)
[172937.512451] CPU: 23 PID: 1569015 Comm: RenderThread Kdump: loaded 
Tainted: G???????????????????? OE???????? 5.0.13-1905061257-generic #appstream
[172937.512453] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.58 
10/24/2018
[172937.512454] pstate: 80400005 (Nzcv daif +PAN -UAO)
[172937.512531] pc : amdgpu_vm_bo_update_mapping+0x120/0x3a0 [amdgpu]
[172937.512603] lr : amdgpu_vm_bo_update+0x2a4/0x6b8 [amdgpu]
[172937.512604] sp : ffff0000c0e4b8c0
[172937.512605] x29: ffff0000c0e4b8c0 x28: ffff801fd3010000
[172937.512608] x27: 0000000000000001 x26: ffff80161d777000
[172937.512610] x25: 0000000000100d1f x24: 0000000000100d00
[172937.512612] x23: 0000000000000000 x22: ffff809533c54f00
[172937.512614] x21: 0000000000000037 x20: ffff0000116cc000
[172937.512616] x19: 000000000000000a x18: 0000000000000000
[172937.512618] x17: 0000000000000000 x16: 0000000000000000
[172937.512620] x15: 0000000000000000 x14: 00000003000000b0
[172937.512623] x13: 0000000600000240 x12: 0000000000000000
[172937.512624] x11: 000000060000018d x10: 0000000000000040
[172937.512627] x9 : 0000000000000000 x8 : ffff0000c0e4b860
[172937.512628] x7 : 0000000000000020 x6 : 000000000000001f
[172937.512631] x5 : 0000000000100d1f x4 : 0000000000100d00
[172937.512633] x3 : 0000000000000000 x2 : 000000000000000b
[172937.512636] x1 : 0000000000000000 x0 : 
ffff80161d776000[172937.512638] Call trace:
[172937.512711]?? amdgpu_vm_bo_update_mapping+0x120/0x3a0 [amdgpu]
[172937.512784]?? amdgpu_vm_bo_update+0x2a4/0x6b8 [amdgpu]
[172937.512857]?? amdgpu_cs_ioctl+0xcbc/0x14a8 [amdgpu]
[172937.512882]?? drm_ioctl_kernel+0x90/0x100 [drm]
[172937.512904]?? drm_ioctl+0x1ec/0x418 [drm]
[172937.512977]?? amdgpu_drm_ioctl+0x58/0x90 [amdgpu]
[172937.513055] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.513136]?? amdgpu_kms_compat_ioctl+0x40/0x68 [amdgpu]
[172937.513140] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.513220] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.513235] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.513311] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.513323] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.513412] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.513421] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.513497] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.513513] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.513587] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.513603] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.513677] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.513687] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.513761] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.522750] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.522822] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.525171]?? __arm64_compat_sys_ioctl+0x144/0x410
[172937.525177]?? el0_svc_common+0x78/0x120
[172937.525179]?? el0_svc_compat_handler+0x30/0x40
[172937.525182]?? el0_svc_compat+0x8/0x34
[172937.525187] Code: f9406b41 b966c793 f941e800 71002e7f (f9400421)
[172937.525195] SMP: stopping secondary CPUs
[172937.526945] Starting crashdump kernel...
[172937.526952] Bye!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.