Bug 111455

Summary: DMAR: [INTR-REMAP] Blocked an interrupt request due to source-id verification failure
Product: DRI Reporter: Nikolay Kichukov <nikolay>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: not set    
Priority: not set CC: nikolay
Version: XOrg git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Nikolay Kichukov 2019-08-21 12:59:45 UTC
Hello team,
the below error is printed in the log and causes the graphics driver to freeze:

[Tue Aug 20 12:04:38 2019] DMAR: DRHD: handling fault status reg 2
[Tue Aug 20 12:04:38 2019] DMAR: [INTR-REMAP] Request device [00:00.0] fault index 26 [fault reason 38] Blocked an interrupt request due to source-id verification failure
[Tue Aug 20 12:04:38 2019] [drm] Fence fallback timer expired on ring gfx
[Tue Aug 20 12:04:38 2019] [drm] Fence fallback timer expired on ring gfx
[Tue Aug 20 12:04:39 2019] [drm] Fence fallback timer expired on ring gfx
[Tue Aug 20 12:04:39 2019] [drm] Fence fallback timer expired on ring gfx
[Tue Aug 20 12:04:40 2019] [drm] Fence fallback timer expired on ring gfx
[Tue Aug 20 12:04:40 2019] [drm] Fence fallback timer expired on ring sdma0
[Tue Aug 20 12:04:41 2019] [drm] Fence fallback timer expired on ring sdma0
...

Hardware is: Dell Precision Tower 5810 with Advanced Micro Devices, Inc. [AMD/ATI] Oland GL [FirePro W2100] video card.

Kernel: 5.2.8 x86_64 (GNU/Gentoo Linux)
Kernel Command line: BOOT_IMAGE=/kernel-genkernel-x86_64-5.2.8-gentoo root=/dev/mapper/root ro crypt_root=UUID=e11887f5-4104-4a9e-9c53-7e1d904a0b28 root_trim=no elevator=bfq scsi_mod.use_blk_mq=1 libata.allow_tpm=1 domdadm dolvm intel_iommu=on

IOMMU is on, because the system acts mainly as a KVM/Libvirt host.

amdgpu driver information:
# dmesg | grep amd
[   14.614307] [drm] amdgpu kernel modesetting enabled.
[   14.615206] amdgpu 0000:03:00.0: remove_conflicting_pci_framebuffers: bar 0: 0xe0000000 -> 0xefffffff
[   14.615208] amdgpu 0000:03:00.0: remove_conflicting_pci_framebuffers: bar 2: 0xf7e00000 -> 0xf7e3ffff
[   14.615209] amdgpu 0000:03:00.0: vgaarb: deactivate vga console
[   14.617260] amdgpu 0000:03:00.0: kfd not supported on this ASIC
[   14.622071] amdgpu 0000:03:00.0: No more image in the PCI ROM
[   14.624585] amdgpu 0000:03:00.0: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
[   14.624586] amdgpu 0000:03:00.0: GART: 1024M 0x000000FF00000000 - 0x000000FF3FFFFFFF
[   14.626166] [drm] amdgpu: 2048M of VRAM memory ready
[   14.626169] [drm] amdgpu: 3072M of GTT memory ready.
[   14.626832] amdgpu 0000:03:00.0: PCIE GART of 1024M enabled (table at 0x000000F400900000).
[   14.643922] [drm] amdgpu: dpm initialized
[   14.862039] fbcon: amdgpudrmfb (fb0) is primary device
[   14.990216] amdgpu 0000:03:00.0: fb0: amdgpudrmfb frame buffer device
[   15.260648] [drm] Initialized amdgpu 3.32.0 20150101 for 0000:03:00.0 on minor 0

and module dependencies:
# lsmod | grep amdgpu
amdgpu               3772416  7
gpu_sched              36864  1 amdgpu
ttm                   114688  1 amdgpu
drm_kms_helper        212992  1 amdgpu
drm                   462848  7 gpu_sched,drm_kms_helper,amdgpu,ttm
i2c_algo_bit           16384  2 igb,amdgpu

Happy to collect output from 'drm.debug=0x1e log_buf_len=4M' booted kernel if that makes sense.

Thanks,
-Nikolay
Comment 1 Nikolay Kichukov 2019-08-26 08:08:52 UTC
Not much seems to have been captured by running the kernel with:
'drm.debug=0x1e log_buf_len=4M'

...snip...
Aug 25 19:03:44 localhost kernel: [283823.907593][T15386] [drm:amdgpu_display_flip_work_func [amdgpu]] crtc:1[00000000c905a8f5], pflip_stat:AMDGPU_FLIP_SUBMITTED, work: 00000000c841325d,
Aug 25 19:03:44 localhost kernel: [283823.922605][ T8596] [drm:amdgpu_display_crtc_page_flip_target [amdgpu]] crtc:0[00000000cc62eb17], pflip_stat:AMDGPU_FLIP_PENDING, work: 0000000019340169,
Aug 25 19:03:44 localhost kernel: [283823.922640][ T8596] [drm:amdgpu_display_crtc_page_flip_target [amdgpu]] crtc:1[00000000c905a8f5], pflip_stat:AMDGPU_FLIP_PENDING, work: 00000000b4d9949b,
Aug 25 19:03:44 localhost kernel: [283823.924283][T15386] [drm:amdgpu_display_flip_work_func [amdgpu]] crtc:0[00000000cc62eb17], pflip_stat:AMDGPU_FLIP_SUBMITTED, work: 0000000019340169,
Aug 25 19:03:44 localhost kernel: [283823.924316][T15386] [drm:amdgpu_display_flip_work_func [amdgpu]] crtc:1[00000000c905a8f5], pflip_stat:AMDGPU_FLIP_SUBMITTED, work: 00000000b4d9949b,
Aug 25 19:03:44 localhost kernel: [283823.939256][ T8596] [drm:amdgpu_display_crtc_page_flip_target [amdgpu]] crtc:0[00000000cc62eb17], pflip_stat:AMDGPU_FLIP_PENDING, work: 00000000a8e63b00,
Aug 25 19:03:44 localhost kernel: [283823.939292][ T8596] [drm:amdgpu_display_crtc_page_flip_target [amdgpu]] crtc:1[00000000c905a8f5], pflip_stat:AMDGPU_FLIP_PENDING, work: 00000000b1a1f198,
Aug 25 19:03:44 localhost kernel: [283823.940944][T15386] [drm:amdgpu_display_flip_work_func [amdgpu]] crtc:0[00000000cc62eb17], pflip_stat:AMDGPU_FLIP_SUBMITTED, work: 00000000a8e63b00,
Aug 25 19:03:44 localhost kernel: [283823.940975][T15386] [drm:amdgpu_display_flip_work_func [amdgpu]] crtc:1[00000000c905a8f5], pflip_stat:AMDGPU_FLIP_SUBMITTED, work: 00000000b1a1f198,
Aug 25 19:03:44 localhost kernel: [283823.955946][ T8596] [drm:amdgpu_display_crtc_page_flip_target [amdgpu]] crtc:0[00000000cc62eb17], pflip_stat:AMDGPU_FLIP_PENDING, work: 00000000614cee44,
Aug 25 19:03:44 localhost kernel: [283823.955982][ T8596] [drm:amdgpu_display_crtc_page_flip_target [amdgpu]] crtc:1[00000000c905a8f5], pflip_stat:AMDGPU_FLIP_PENDING, work: 00000000df74fb67,
Aug 25 19:03:44 localhost kernel: [283823.957643][T15386] [drm:amdgpu_display_flip_work_func [amdgpu]] crtc:0[00000000cc62eb17], pflip_stat:AMDGPU_FLIP_SUBMITTED, work: 00000000614cee44,
Aug 25 19:03:44 localhost kernel: [283823.957673][T15386] [drm:amdgpu_display_flip_work_func [amdgpu]] crtc:1[00000000c905a8f5], pflip_stat:AMDGPU_FLIP_SUBMITTED, work: 00000000df74fb67,
Aug 25 19:03:45 localhost kernel: [283824.053148][    C0] DMAR: DRHD: handling fault status reg 2
Aug 25 19:03:45 localhost kernel: [283824.053153][    C0] DMAR: [INTR-REMAP] Request device [00:00.0] fault index 26 [fault reason 38] Blocked an interrupt request due to source-id verification failure
...end...
Comment 2 Martin Peres 2019-11-19 09:39:01 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/889.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.