Bug 111881 - [kernel 5.4-rc1][amdgpu][CIK]: FW bug: No PASID in KFD interrupt
Summary: [kernel 5.4-rc1][amdgpu][CIK]: FW bug: No PASID in KFD interrupt
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/amdkfd (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: not set not set
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-10-02 11:03 UTC by erhard_f
Modified: 2019-10-03 22:02 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
dmesg (kernel 5.4-rc1) (75.58 KB, text/plain)
2019-10-02 11:03 UTC, erhard_f
no flags Details
kernel.config (5.4-rc1) (100.56 KB, text/plain)
2019-10-02 11:05 UTC, erhard_f
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description erhard_f 2019-10-02 11:03:26 UTC
Created attachment 145612 [details]
dmesg (kernel 5.4-rc1)

Card is a Sapphire Radeon R9 290 Tri-X running on a Supermicro H8SGL (Opteron 6380) with Gentoo Linux. OpenCL driver is ROCm 2.8.0.

clinfo segfaults, also the kernel gets a hit:

[...]
Okt 02 12:47:51 yea kernel: clinfo[1138]: segfault at 1000 ip 00007f78d4f52971 sp 00007ffd81ab7170 error 6 in libhsa-runtime64.so.1.1.9[7f78d4f34000+c7000]
Okt 02 12:47:51 yea kernel: Code: ff ff ff 48 8b 85 58 ff ff ff 48 8b 80 b8 03 00 00 48 8b 95 78 ff ff ff 48 c1 e2 03 48 01 c2 48 8b 85 68 ff ff ff 48 8b 40 18 <48> 89 02 c6 45 b0 01 bb 00 00 00 00 0f b6 45 b0 83 f0 01 84 c0 74
Okt 02 12:47:59 yea kernel: Evicting PASID 32770 queues
Okt 02 12:47:59 yea kernel: ------------[ cut here ]------------
Okt 02 12:47:59 yea kernel: FW bug: No PASID in KFD interrupt
Okt 02 12:47:59 yea kernel: WARNING: CPU: 5 PID: 0 at drivers/gpu/drm/amd/amdgpu/../amdkfd/cik_event_interrupt.c:70 cik_event_interrupt_isr+0x223/0x230 [amdgpu]
Okt 02 12:47:59 yea kernel: Modules linked in: fuse dm_crypt nhpoly1305_sse2 nhpoly1305 chacha_x86_64 chacha_generic adiantum poly1305_generic algif_skcipher amd64_edac_mod crct10dif_pclmul crc32_generic crc32_pclmul dm_mod joydev input_leds ghash_generic gf128mul gcm hid_generic usbhid hid xts ext4 crc16 mbcache ctr jbd2 ath5k led_class amdgpu cbc mac80211 ath ohci_pci ecb evdev cfg80211 gpu_sched ehci_pci ohci_hcd snd_oxygen i2c_algo_bit ehci_hcd fam15h_power snd_oxygen_lib aesni_intel ttm snd_mpu401_uart sr_mod glue_helper rfkill snd_rawmidi usbcore crypto_simd k10temp libarc4 cdrom cryptd drm_kms_helper snd_hda_codec_hdmi hwmon snd_seq_device i2c_piix4 usb_common cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt snd_hda_intel fb_sys_fops cfbcopyarea snd_intel_nhlt fb snd_hda_codec font snd_hwdep fbdev snd_hda_core drm e1000e snd_pcm snd_timer snd drm_panel_orientation_quirks backlight soundcore button acpi_cpufreq processor lzo zstd sg zram zsmalloc
Okt 02 12:47:59 yea kernel: CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.4.0-rc1 #1
Okt 02 12:47:59 yea kernel: Hardware name: Supermicro H8SGL/H8SGL, BIOS 3.5b       03/18/2016
Okt 02 12:47:59 yea kernel: RIP: 0010:cik_event_interrupt_isr+0x223/0x230 [amdgpu]
Okt 02 12:47:59 yea kernel: Code: ff 0f b6 05 53 15 49 00 84 c0 74 07 31 c0 e9 b0 fe ff ff 48 c7 c7 c0 b2 88 c1 88 44 24 08 c6 05 36 15 49 00 01 e8 81 0f a5 f8 <0f> 0b 0f b6 44 24 08 e9 8d fe ff ff 90 48 b8 00 00 00 00 00 fc ff
Okt 02 12:47:59 yea kernel: RSP: 0018:ffff8883e7888c08 EFLAGS: 00010086
Okt 02 12:47:59 yea kernel: RAX: 0000000000000000 RBX: ffff8883cc044b48 RCX: ffffffffba10693f
Okt 02 12:47:59 yea kernel: RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffff8883e5704f80
Okt 02 12:47:59 yea kernel: RBP: ffff8883e7888c40 R08: fffffbfff76d3d31 R09: fffffbfff76d3d31
Okt 02 12:47:59 yea kernel: R10: fffffbfff76d3d30 R11: ffffffffbb69e983 R12: 0000000000000008
Okt 02 12:47:59 yea kernel: R13: 00000000000000b5 R14: 0000000000000023 R15: 0000000000000000
Okt 02 12:47:59 yea kernel: FS:  0000000000000000(0000) GS:ffff8883e7880000(0000) knlGS:0000000000000000
Okt 02 12:47:59 yea kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Okt 02 12:47:59 yea kernel: CR2: 00007fea9066f000 CR3: 00000007f52c2000 CR4: 00000000000406e0
Okt 02 12:47:59 yea kernel: Call Trace:
Okt 02 12:47:59 yea kernel:  <IRQ>
Okt 02 12:47:59 yea kernel:  kgd2kfd_interrupt+0x151/0x1a0 [amdgpu]
Okt 02 12:47:59 yea kernel:  ? kgd2kfd_resume+0xa0/0xa0 [amdgpu]
Okt 02 12:47:59 yea kernel:  ? check_flags.part.41+0x82/0x210
Okt 02 12:47:59 yea kernel:  ? amdgpu_fence_process+0x95/0x1b0 [amdgpu]
Okt 02 12:47:59 yea kernel:  ? amdgpu_irq_dispatch+0x184/0x390 [amdgpu]
Okt 02 12:47:59 yea kernel:  ? gfx_v7_0_eop_irq+0xba/0x100 [amdgpu]
Okt 02 12:47:59 yea kernel:  amdgpu_irq_dispatch+0x1c6/0x390 [amdgpu]
Okt 02 12:47:59 yea kernel:  ? amdgpu_irq_add_id+0x160/0x160 [amdgpu]
Okt 02 12:47:59 yea kernel:  ? lock_downgrade+0x390/0x390
Okt 02 12:47:59 yea kernel:  amdgpu_ih_process+0xf4/0x1d0 [amdgpu]
Okt 02 12:47:59 yea kernel:  ? amdgpu_irq_disable_all+0x1b0/0x1b0 [amdgpu]
Okt 02 12:47:59 yea kernel:  amdgpu_irq_handler+0x20/0x60 [amdgpu]
Okt 02 12:47:59 yea kernel:  ? amdgpu_irq_disable_all+0x1b0/0x1b0 [amdgpu]
Okt 02 12:47:59 yea kernel:  __handle_irq_event_percpu+0x72/0x390
Okt 02 12:47:59 yea kernel:  handle_irq_event_percpu+0x6a/0xe0
Okt 02 12:47:59 yea kernel:  ? __handle_irq_event_percpu+0x390/0x390
Okt 02 12:47:59 yea kernel:  ? rwlock_bug.part.2+0x50/0x50
Okt 02 12:47:59 yea kernel:  ? do_raw_spin_unlock+0x9d/0x130
Okt 02 12:47:59 yea kernel:  handle_irq_event+0x4f/0x7e
Okt 02 12:47:59 yea kernel:  handle_edge_irq+0x100/0x2d0
Okt 02 12:47:59 yea kernel:  do_IRQ+0x72/0x160
Okt 02 12:47:59 yea kernel:  common_interrupt+0xf/0xf
Okt 02 12:47:59 yea kernel:  </IRQ>
Okt 02 12:47:59 yea kernel: RIP: 0010:cpuidle_enter_state+0xcd/0x640
Okt 02 12:47:59 yea kernel: Code: 00 31 ff e8 a5 86 80 ff 80 7c 24 10 00 74 12 9c 58 f6 c4 02 0f 85 42 05 00 00 31 ff e8 cc 5e 89 ff e8 f7 be 8f ff fb 45 85 e4 <0f> 88 fb 03 00 00 4d 63 ec 4f 8d 74 6d 00 49 c1 e6 05 4a 8d 7c 33
Okt 02 12:47:59 yea kernel: RSP: 0018:ffff8883e571fd98 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffdd
Okt 02 12:47:59 yea kernel: RAX: 0000000000000000 RBX: ffffffffc0316680 RCX: ffffffffba1067e0
Okt 02 12:47:59 yea kernel: RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8883e5704fb4
Okt 02 12:47:59 yea kernel: RBP: ffff888812779028 R08: 0000000000000002 R09: 0000000000000000
Okt 02 12:47:59 yea kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
Okt 02 12:47:59 yea kernel: R13: 0000000000000002 R14: ffffffffc0316740 R15: ffffffffc0316780
Okt 02 12:47:59 yea kernel:  ? lockdep_hardirqs_on+0x190/0x280
Okt 02 12:47:59 yea kernel:  ? cpuidle_enter_state+0xc9/0x640
Okt 02 12:47:59 yea kernel:  cpuidle_enter+0x37/0x60
Okt 02 12:47:59 yea kernel:  do_idle+0x2e7/0x380
Okt 02 12:47:59 yea kernel:  ? arch_cpu_idle_exit+0x40/0x40
Okt 02 12:47:59 yea kernel:  ? schedule_idle+0x41/0x50
Okt 02 12:47:59 yea kernel:  cpu_startup_entry+0x14/0x20
Okt 02 12:47:59 yea kernel:  start_secondary+0x1fd/0x240
Okt 02 12:47:59 yea kernel:  ? set_cpu_sibling_map+0xbc0/0xbc0
Okt 02 12:47:59 yea kernel:  secondary_startup_64+0xa4/0xb0
Okt 02 12:47:59 yea kernel: irq event stamp: 450550
Okt 02 12:47:59 yea kernel: hardirqs last  enabled at (450547): [<ffffffffba8c30b9>] cpuidle_enter_state+0xc9/0x640
Okt 02 12:47:59 yea kernel: hardirqs last disabled at (450548): [<ffffffffba00276a>] trace_hardirqs_off_thunk+0x1a/0x20
Okt 02 12:47:59 yea kernel: softirqs last  enabled at (450550): [<ffffffffba07b210>] irq_enter+0x70/0x80
Okt 02 12:47:59 yea kernel: softirqs last disabled at (450549): [<ffffffffba07b1f5>] irq_enter+0x55/0x80
Okt 02 12:47:59 yea kernel: ---[ end trace 5951fa91933dcafd ]---
Comment 1 erhard_f 2019-10-02 11:05:01 UTC
Created attachment 145613 [details]
kernel.config (5.4-rc1)
Comment 2 erhard_f 2019-10-03 22:02:22 UTC
Forgot for a moment about the GitLab Tracker...

Moved over there: https://gitlab.freedesktop.org/mesa/mesa/issues/1881


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.