Bug 110887 - 5.0 kernel crash , drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-2)
Summary: 5.0 kernel crash , drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update ...
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: DRI git
Hardware: ARM Linux (All)
: medium critical
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-11 07:07 UTC by wormwang
Modified: 2019-11-19 09:31 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description wormwang 2019-06-11 07:07:43 UTC
Env: kernel 5.0.13, AMD rx580 GPU 8GB


We run about 32 game soft on the GPU concurrently and run a media 
encoder soft on VCE by vaapi at same time.

We meet the kernel crash, after runing 3 to 7 days. . We meet such crash 
5 times. We had enabled kdump ,if you need other kernel dump info, we 
can upload them


Log:

[172936.893428] binder_dkms: binder_deferred_func, binder_index = 12
[172937.052608] pci_generic_config_write32: 138 callbacks suppressed
[172937.052615] pci_bus 000d:30: 2-byte config write to 000d:30:00.0 
offset 0x4 may corrupt adjacent RW1C bits
[172937.052633] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
offset 0x44 may corrupt adjacent RW1C bits
[172937.054110] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.062690] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.069361] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
offset 0x78 may corrupt adjacent RW1C bits
[172937.071029] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
offset 0x80 may corrupt adjacent RW1C bits
[172937.071034] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
offset 0x8c may corrupt adjacent RW1C bits
[172937.071038] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
offset 0x98 may corrupt adjacent RW1C bits
[172937.071042] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
offset 0xa0 may corrupt adjacent RW1C bits
[172937.071083] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
offset 0x44 may corrupt adjacent RW1C bits
[172937.071091] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.071094] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
offset 0x44 may corrupt adjacent RW1C bits
[172937.071110] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
offset 0x4 may corrupt adjacent RW1C bits
[172937.079477] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.087723] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.095955] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.104270] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.112418] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.120490] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.128525] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.136557] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.144446] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.152254] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.160039] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.167779] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.175490] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.176747] binder_dkms: binder_defer_work 12
[172937.183035] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.183075] binder_dkms: binder_deferred_func, binder_index = 12
[172937.355641] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.362028] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.368253] pcieport 0004:48:00.0: can't derive routing for PCI INT A
[172937.368425] megaraid_sas 0004:49:00.0: PCI INT A: no GSI
[172937.368585] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.370828] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx>
[172937.375070] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.375167] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.375190] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.381702] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.381732] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx>
[172937.387959] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.387990] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.394171] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.394186] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx>
[172937.400498] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.400523] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.406755] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.406775] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx>
[172937.412977] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.412998] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.419250] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.419273] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx>
[172937.422361] [drm] schedsdma0 is not ready, skipping
[172937.422363] [drm] schedsdma1 is not ready, skipping
[172937.422448] [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't 
update BO_VA (-2)
[172937.425437] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.425450] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.431814] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.431837] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx>
[172937.438038] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.438054] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.450635] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.454268] pcieport 0002:e8:00.0: can't derive routing for PCI INT B
[172937.454272] ixgbe 0002:e9:00.1: PCI INT B: no GSI
[172937.456923] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.456967] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx>
[172937.463122] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.463148] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.469428] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.469455] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx>
[172937.475705] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.475722] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.481928] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.481946] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx>
[172937.488092] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.488108] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.494407] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.494440] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx>
[172937.500663] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.500678] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.506909] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.506930] amdgpu 0005:01:00.0: couldn't schedule ib on ring <gfx>
[172937.511392] [drm] schedsdma0 is not ready, skipping
[172937.511394] [drm] schedsdma1 is not ready, skipping
[172937.511481] [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't 
update BO_VA (-2)
[172937.512346] Unable to handle kernel access to user memory outside 
uaccess routines at virtual address 0000000000000008
[172937.512348] Mem abort info:
[172937.512350]???? ESR = 0x96000004
[172937.512352]???? Exception class = DABT (current EL), IL = 32 bits
[172937.512353]???? SET = 0, FnV = 0
[172937.512354]???? EA = 0, S1PTW = 0
[172937.512355] Data abort info:
[172937.512356]???? ISV = 0, ISS = 0x00000004
[172937.512357]???? CM = 0, WnR = 0
[172937.512359] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000fb340bc6
[172937.512361] [0000000000000008] pgd=000000139dfe9003, 
pud=00000015d583a003, pmd=0000000000000000
[172937.512367] Internal error: Oops: 96000004 [#1] SMP
[172937.512370] Modules linked in: nfnetlink_log veth ipt_REJECT 
nf_reject_ipv4 xt_comment xt_mark xt_nat xt_tcpudp ipt_MASQUERADE 
nf_conntrack_netlink nfnetlink xfrm_user xt_conntrack br_netfilter 
bridge stp llc iptable_filter xt_addrtype iptable_nat nf_nat_ipv4 nf_nat 
bpfilter ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs nf_conntrack nf_defrag_ipv6 
nf_defrag_ipv4 overlay nls_iso8859_1 joydev input_leds snd_hda_intel 
snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer ipmi_ssif snd 
ipmi_si soundcore ipmi_devintf ipmi_msghandler tcp_bbr sch_fq ib_iser 
rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi binder_dkms(OE) ip_tables x_tables autofs4 btrfs 
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath 
linear hibmc_drm hid_generic usbhid hid ses enclosure marvell aes_ce_blk 
aes_ce_cipher amdgpu chash i2c_algo_bit gpu_sched ttm crct10dif_ce 
drm_kms_helper ghash_ce syscopyarea sha2_ce
[172937.512432]?? sysfillrect sysimgblt fb_sys_fops sha256_arm64 sha1_ce 
ixgbe drm hisi_sas_v2_hw hisi_sas_main megaraid_sas xfrm_algo libsas 
mdio ehci_platform scsi_transport_sas hns_dsaf hns_enet_drv hns_mdio 
hnae aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64
[172937.512448] Process RenderThread (pid: 1569015, stack limit = 
0x00000000349701c4)
[172937.512451] CPU: 23 PID: 1569015 Comm: RenderThread Kdump: loaded 
Tainted: G???????????????????? OE???????? 5.0.13-1905061257-generic #appstream
[172937.512453] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.58 
10/24/2018
[172937.512454] pstate: 80400005 (Nzcv daif +PAN -UAO)
[172937.512531] pc : amdgpu_vm_bo_update_mapping+0x120/0x3a0 [amdgpu]
[172937.512603] lr : amdgpu_vm_bo_update+0x2a4/0x6b8 [amdgpu]
[172937.512604] sp : ffff0000c0e4b8c0
[172937.512605] x29: ffff0000c0e4b8c0 x28: ffff801fd3010000
[172937.512608] x27: 0000000000000001 x26: ffff80161d777000
[172937.512610] x25: 0000000000100d1f x24: 0000000000100d00
[172937.512612] x23: 0000000000000000 x22: ffff809533c54f00
[172937.512614] x21: 0000000000000037 x20: ffff0000116cc000
[172937.512616] x19: 000000000000000a x18: 0000000000000000
[172937.512618] x17: 0000000000000000 x16: 0000000000000000
[172937.512620] x15: 0000000000000000 x14: 00000003000000b0
[172937.512623] x13: 0000000600000240 x12: 0000000000000000
[172937.512624] x11: 000000060000018d x10: 0000000000000040
[172937.512627] x9 : 0000000000000000 x8 : ffff0000c0e4b860
[172937.512628] x7 : 0000000000000020 x6 : 000000000000001f
[172937.512631] x5 : 0000000000100d1f x4 : 0000000000100d00
[172937.512633] x3 : 0000000000000000 x2 : 000000000000000b
[172937.512636] x1 : 0000000000000000 x0 : 
ffff80161d776000[172937.512638] Call trace:
[172937.512711]?? amdgpu_vm_bo_update_mapping+0x120/0x3a0 [amdgpu]
[172937.512784]?? amdgpu_vm_bo_update+0x2a4/0x6b8 [amdgpu]
[172937.512857]?? amdgpu_cs_ioctl+0xcbc/0x14a8 [amdgpu]
[172937.512882]?? drm_ioctl_kernel+0x90/0x100 [drm]
[172937.512904]?? drm_ioctl+0x1ec/0x418 [drm]
[172937.512977]?? amdgpu_drm_ioctl+0x58/0x90 [amdgpu]
[172937.513055] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.513136]?? amdgpu_kms_compat_ioctl+0x40/0x68 [amdgpu]
[172937.513140] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.513220] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.513235] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.513311] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.513323] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.513412] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.513421] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.513497] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.513513] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.513587] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.513603] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.513677] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.513687] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.513761] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.522750] amdgpu 000d:31:00.0: couldn't schedule ib on ring <gfx>
[172937.522822] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling 
IBs (-22)
[172937.525171]?? __arm64_compat_sys_ioctl+0x144/0x410
[172937.525177]?? el0_svc_common+0x78/0x120
[172937.525179]?? el0_svc_compat_handler+0x30/0x40
[172937.525182]?? el0_svc_compat+0x8/0x34
[172937.525187] Code: f9406b41 b966c793 f941e800 71002e7f (f9400421)
[172937.525195] SMP: stopping secondary CPUs
[172937.526945] Starting crashdump kernel...
[172937.526952] Bye!
Comment 1 Martin Peres 2019-11-19 09:31:15 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/826.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.