Summary: | [kernel 5.4-rc4][amdgpu][CIK]: FW bug: No PASID in KFD interrupt | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | erhard_f | ||||||||||||
Component: | DRM/amdkfd | Assignee: | Default DRI bug account <dri-devel> | ||||||||||||
Status: | RESOLVED MOVED | QA Contact: | |||||||||||||
Severity: | normal | ||||||||||||||
Priority: | medium | ||||||||||||||
Version: | XOrg git | ||||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||||
OS: | Linux (All) | ||||||||||||||
See Also: | https://bugs.freedesktop.org/show_bug.cgi?id=111021 | ||||||||||||||
Whiteboard: | |||||||||||||||
i915 platform: | i915 features: | ||||||||||||||
Attachments: |
|
Created attachment 145613 [details]
kernel.config (5.4-rc1)
Forgot for a moment about the GitLab Tracker... Moved over there: https://gitlab.freedesktop.org/mesa/mesa/issues/1881 After re-reading https://bugs.freedesktop.org/enter_bug.cgi I think the appropriate place for the bug is here after all as 'amdkfd' is explicitely mentioned in the stacktrace. Re-opening with current data. Created attachment 145818 [details]
dmesg (kernel 5.4-rc4)
Created attachment 145819 [details]
kernel.config (5.4-rc4)
Created attachment 145820 [details]
rocminfo output (ROC 2.9)
Interesting is that clinfo (2.2.18.04.06) produces this hit whereas rocminfo (ROC 2.9) gives proper output.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/7. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 145612 [details] dmesg (kernel 5.4-rc1) Card is a Sapphire Radeon R9 290 Tri-X running on a Supermicro H8SGL (Opteron 6380) with Gentoo Linux. OpenCL driver is ROCm 2.8.0. clinfo segfaults, also the kernel gets a hit: [...] Okt 02 12:47:51 yea kernel: clinfo[1138]: segfault at 1000 ip 00007f78d4f52971 sp 00007ffd81ab7170 error 6 in libhsa-runtime64.so.1.1.9[7f78d4f34000+c7000] Okt 02 12:47:51 yea kernel: Code: ff ff ff 48 8b 85 58 ff ff ff 48 8b 80 b8 03 00 00 48 8b 95 78 ff ff ff 48 c1 e2 03 48 01 c2 48 8b 85 68 ff ff ff 48 8b 40 18 <48> 89 02 c6 45 b0 01 bb 00 00 00 00 0f b6 45 b0 83 f0 01 84 c0 74 Okt 02 12:47:59 yea kernel: Evicting PASID 32770 queues Okt 02 12:47:59 yea kernel: ------------[ cut here ]------------ Okt 02 12:47:59 yea kernel: FW bug: No PASID in KFD interrupt Okt 02 12:47:59 yea kernel: WARNING: CPU: 5 PID: 0 at drivers/gpu/drm/amd/amdgpu/../amdkfd/cik_event_interrupt.c:70 cik_event_interrupt_isr+0x223/0x230 [amdgpu] Okt 02 12:47:59 yea kernel: Modules linked in: fuse dm_crypt nhpoly1305_sse2 nhpoly1305 chacha_x86_64 chacha_generic adiantum poly1305_generic algif_skcipher amd64_edac_mod crct10dif_pclmul crc32_generic crc32_pclmul dm_mod joydev input_leds ghash_generic gf128mul gcm hid_generic usbhid hid xts ext4 crc16 mbcache ctr jbd2 ath5k led_class amdgpu cbc mac80211 ath ohci_pci ecb evdev cfg80211 gpu_sched ehci_pci ohci_hcd snd_oxygen i2c_algo_bit ehci_hcd fam15h_power snd_oxygen_lib aesni_intel ttm snd_mpu401_uart sr_mod glue_helper rfkill snd_rawmidi usbcore crypto_simd k10temp libarc4 cdrom cryptd drm_kms_helper snd_hda_codec_hdmi hwmon snd_seq_device i2c_piix4 usb_common cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt snd_hda_intel fb_sys_fops cfbcopyarea snd_intel_nhlt fb snd_hda_codec font snd_hwdep fbdev snd_hda_core drm e1000e snd_pcm snd_timer snd drm_panel_orientation_quirks backlight soundcore button acpi_cpufreq processor lzo zstd sg zram zsmalloc Okt 02 12:47:59 yea kernel: CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.4.0-rc1 #1 Okt 02 12:47:59 yea kernel: Hardware name: Supermicro H8SGL/H8SGL, BIOS 3.5b 03/18/2016 Okt 02 12:47:59 yea kernel: RIP: 0010:cik_event_interrupt_isr+0x223/0x230 [amdgpu] Okt 02 12:47:59 yea kernel: Code: ff 0f b6 05 53 15 49 00 84 c0 74 07 31 c0 e9 b0 fe ff ff 48 c7 c7 c0 b2 88 c1 88 44 24 08 c6 05 36 15 49 00 01 e8 81 0f a5 f8 <0f> 0b 0f b6 44 24 08 e9 8d fe ff ff 90 48 b8 00 00 00 00 00 fc ff Okt 02 12:47:59 yea kernel: RSP: 0018:ffff8883e7888c08 EFLAGS: 00010086 Okt 02 12:47:59 yea kernel: RAX: 0000000000000000 RBX: ffff8883cc044b48 RCX: ffffffffba10693f Okt 02 12:47:59 yea kernel: RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffff8883e5704f80 Okt 02 12:47:59 yea kernel: RBP: ffff8883e7888c40 R08: fffffbfff76d3d31 R09: fffffbfff76d3d31 Okt 02 12:47:59 yea kernel: R10: fffffbfff76d3d30 R11: ffffffffbb69e983 R12: 0000000000000008 Okt 02 12:47:59 yea kernel: R13: 00000000000000b5 R14: 0000000000000023 R15: 0000000000000000 Okt 02 12:47:59 yea kernel: FS: 0000000000000000(0000) GS:ffff8883e7880000(0000) knlGS:0000000000000000 Okt 02 12:47:59 yea kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Okt 02 12:47:59 yea kernel: CR2: 00007fea9066f000 CR3: 00000007f52c2000 CR4: 00000000000406e0 Okt 02 12:47:59 yea kernel: Call Trace: Okt 02 12:47:59 yea kernel: <IRQ> Okt 02 12:47:59 yea kernel: kgd2kfd_interrupt+0x151/0x1a0 [amdgpu] Okt 02 12:47:59 yea kernel: ? kgd2kfd_resume+0xa0/0xa0 [amdgpu] Okt 02 12:47:59 yea kernel: ? check_flags.part.41+0x82/0x210 Okt 02 12:47:59 yea kernel: ? amdgpu_fence_process+0x95/0x1b0 [amdgpu] Okt 02 12:47:59 yea kernel: ? amdgpu_irq_dispatch+0x184/0x390 [amdgpu] Okt 02 12:47:59 yea kernel: ? gfx_v7_0_eop_irq+0xba/0x100 [amdgpu] Okt 02 12:47:59 yea kernel: amdgpu_irq_dispatch+0x1c6/0x390 [amdgpu] Okt 02 12:47:59 yea kernel: ? amdgpu_irq_add_id+0x160/0x160 [amdgpu] Okt 02 12:47:59 yea kernel: ? lock_downgrade+0x390/0x390 Okt 02 12:47:59 yea kernel: amdgpu_ih_process+0xf4/0x1d0 [amdgpu] Okt 02 12:47:59 yea kernel: ? amdgpu_irq_disable_all+0x1b0/0x1b0 [amdgpu] Okt 02 12:47:59 yea kernel: amdgpu_irq_handler+0x20/0x60 [amdgpu] Okt 02 12:47:59 yea kernel: ? amdgpu_irq_disable_all+0x1b0/0x1b0 [amdgpu] Okt 02 12:47:59 yea kernel: __handle_irq_event_percpu+0x72/0x390 Okt 02 12:47:59 yea kernel: handle_irq_event_percpu+0x6a/0xe0 Okt 02 12:47:59 yea kernel: ? __handle_irq_event_percpu+0x390/0x390 Okt 02 12:47:59 yea kernel: ? rwlock_bug.part.2+0x50/0x50 Okt 02 12:47:59 yea kernel: ? do_raw_spin_unlock+0x9d/0x130 Okt 02 12:47:59 yea kernel: handle_irq_event+0x4f/0x7e Okt 02 12:47:59 yea kernel: handle_edge_irq+0x100/0x2d0 Okt 02 12:47:59 yea kernel: do_IRQ+0x72/0x160 Okt 02 12:47:59 yea kernel: common_interrupt+0xf/0xf Okt 02 12:47:59 yea kernel: </IRQ> Okt 02 12:47:59 yea kernel: RIP: 0010:cpuidle_enter_state+0xcd/0x640 Okt 02 12:47:59 yea kernel: Code: 00 31 ff e8 a5 86 80 ff 80 7c 24 10 00 74 12 9c 58 f6 c4 02 0f 85 42 05 00 00 31 ff e8 cc 5e 89 ff e8 f7 be 8f ff fb 45 85 e4 <0f> 88 fb 03 00 00 4d 63 ec 4f 8d 74 6d 00 49 c1 e6 05 4a 8d 7c 33 Okt 02 12:47:59 yea kernel: RSP: 0018:ffff8883e571fd98 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffdd Okt 02 12:47:59 yea kernel: RAX: 0000000000000000 RBX: ffffffffc0316680 RCX: ffffffffba1067e0 Okt 02 12:47:59 yea kernel: RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8883e5704fb4 Okt 02 12:47:59 yea kernel: RBP: ffff888812779028 R08: 0000000000000002 R09: 0000000000000000 Okt 02 12:47:59 yea kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002 Okt 02 12:47:59 yea kernel: R13: 0000000000000002 R14: ffffffffc0316740 R15: ffffffffc0316780 Okt 02 12:47:59 yea kernel: ? lockdep_hardirqs_on+0x190/0x280 Okt 02 12:47:59 yea kernel: ? cpuidle_enter_state+0xc9/0x640 Okt 02 12:47:59 yea kernel: cpuidle_enter+0x37/0x60 Okt 02 12:47:59 yea kernel: do_idle+0x2e7/0x380 Okt 02 12:47:59 yea kernel: ? arch_cpu_idle_exit+0x40/0x40 Okt 02 12:47:59 yea kernel: ? schedule_idle+0x41/0x50 Okt 02 12:47:59 yea kernel: cpu_startup_entry+0x14/0x20 Okt 02 12:47:59 yea kernel: start_secondary+0x1fd/0x240 Okt 02 12:47:59 yea kernel: ? set_cpu_sibling_map+0xbc0/0xbc0 Okt 02 12:47:59 yea kernel: secondary_startup_64+0xa4/0xb0 Okt 02 12:47:59 yea kernel: irq event stamp: 450550 Okt 02 12:47:59 yea kernel: hardirqs last enabled at (450547): [<ffffffffba8c30b9>] cpuidle_enter_state+0xc9/0x640 Okt 02 12:47:59 yea kernel: hardirqs last disabled at (450548): [<ffffffffba00276a>] trace_hardirqs_off_thunk+0x1a/0x20 Okt 02 12:47:59 yea kernel: softirqs last enabled at (450550): [<ffffffffba07b210>] irq_enter+0x70/0x80 Okt 02 12:47:59 yea kernel: softirqs last disabled at (450549): [<ffffffffba07b1f5>] irq_enter+0x55/0x80 Okt 02 12:47:59 yea kernel: ---[ end trace 5951fa91933dcafd ]---