Bug 107783 - [4.19rc1][Regression][SKL][IOMMU] i915 0000:00:02.0: Device initialization failed (-12)
Summary: [4.19rc1][Regression][SKL][IOMMU] i915 0000:00:02.0: Device initialization fa...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium critical
Assignee: Lakshmi
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged, ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-09-02 15:34 UTC by Ernest Hurtado
Modified: 2018-09-27 07:54 UTC (History)
2 users (show)

See Also:
i915 platform: KBL, SKL
i915 features: display/Other


Attachments
dmesg -debug=0xf (57.29 KB, text/plain)
2018-09-02 15:34 UTC, Ernest Hurtado
no flags Details
config (160.12 KB, text/plain)
2018-09-02 15:35 UTC, Ernest Hurtado
no flags Details
dmesg intel_iommu=on (52.12 KB, text/plain)
2018-09-02 18:50 UTC, Ernest Hurtado
no flags Details

Description Ernest Hurtado 2018-09-02 15:34:52 UTC
Created attachment 141414 [details]
dmesg -debug=0xf

Booting Linux 4.19rc1 with CONFIG_INTEL_IOMMU_DEFAULT_ON=y results in no display output. It didn't happen in Linux 4.18-4.18.5

kernel: DMAR: No ATSR found
kernel: DMAR: dmar0: Using Queued invalidation
kernel: DMAR: dmar1: Using Queued invalidation
kernel: DMAR: Setting RMRR:
kernel: WARNING: CPU: 2 PID: 1 at mm/page_alloc.c:4103 __alloc_pages_nodemask+0xed6/0x1030
kernel: Modules linked in:
kernel: CPU: 2 PID: 1 Comm: swapper/0 Tainted: G                T 4.19.0-rc1-00219-g360bd62dc494 #1
kernel: Hardware name: LENOVO 20JJS0HD00/20JJS0HD00, BIOS R0HET51W (1.31 ) 07/04/2018
kernel: RIP: 0010:__alloc_pages_nodemask+0xed6/0x1030
kernel: Code: 70 db ff ff 48 85 c0 0f 85 f8 fd ff ff 81 e3 00 00 40 00 89 5c 24 4c e9 fe f5 ff ff f7 44 24 30 00 02 00 00 0f 85 70 f2 ff ff <0f> 0b e9 69 f2 ff ff 44 8b 64 24 50 c7 04 24 10 00 00 00 e9 c7 fc
kernel: RSP: 0000:ffffb2d280cbbb50 EFLAGS: 00010046
kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
kernel: RDX: 0000000000488020 RSI: 0000000000488020 RDI: ffffb2d280cbbc30
kernel: RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000f86
kernel: R10: 0000000000000000 R11: ffff911616a4d300 R12: 0000000000488020
kernel: R13: 000000000000000b R14: 0000000000000000 R15: ffff911616a4d600
kernel: FS:  0000000000000000(0000) GS:ffff911619500000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: ffffb2d280dec000 CR3: 0000000192008001 CR4: 00000000003606e0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: Call Trace:
kernel:  ? pci_mmcfg_read+0x88/0xf0
kernel:  intel_pasid_alloc_table+0x177/0x1c0
kernel:  dmar_insert_one_dev_info+0x2da/0x4e0
kernel:  set_domain_for_dev+0x6f/0x100
kernel:  iommu_prepare_identity_map+0x4d/0x9c
kernel:  intel_iommu_init+0x16f0/0x1de8
kernel:  ? printk+0x59/0x75
kernel:  pci_iommu_init+0x57/0xcc
kernel:  ? e820__memblock_setup+0x160/0x160
kernel:  do_one_initcall+0x46/0x1fb
kernel_init_freeable+0x33b/0x43a
kernel:  ? rest_init+0xc5/0xc5
kernel_init+0xa/0x102
kernel:  ret_from_fork+0x35/0x40
kernel: ---[ end trace 58cad12ab768e8a5 ]---
kernel: DMAR: Mapping reserved region failed
kernel: DMAR: Setting identity map for device 0000:00:14.0 [0x8f49f000 - 0x8f4befff]
kernel: DMAR: Prepare 0-16MiB unity mapping for LPC
kernel: DMAR: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
kernel: DMAR: DRHD: handling fault status reg 3
kernel: DMAR: [DMA Write] Request device [00:02.0] fault addr 9b000000 [fault reason 01] Present bit in root entry is clear
kernel: DMAR: Intel(R) Virtualization Technology for Directed I/O
kernel: iommu: Adding device 0000:00:00.0 to group 0
kernel: iommu: Adding device 0000:00:02.0 to group 1
kernel: iommu: Adding device 0000:00:08.0 to group 2
kernel: iommu: Adding device 0000:00:13.0 to group 3
kernel: iommu: Adding device 0000:00:14.0 to group 4
kernel: iommu: Adding device 0000:00:14.2 to group 4
kernel: iommu: Adding device 0000:00:16.0 to group 5
kernel: iommu: Adding device 0000:00:1c.0 to group 6
kernel: iommu: Adding device 0000:00:1c.2 to group 7
kernel: iommu: Adding device 0000:00:1c.4 to group 8
kernel: iommu: Adding device 0000:00:1d.0 to group 9
kernel: iommu: Adding device 0000:00:1f.0 to group 10
kernel: iommu: Adding device 0000:00:1f.2 to group 10
kernel: iommu: Adding device 0000:00:1f.3 to group 10
kernel: iommu: Adding device 0000:00:1f.4 to group 10
kernel: iommu: Adding device 0000:02:00.0 to group 11
kernel: iommu: Adding device 0000:04:00.0 to group 12
kernel: DMAR: DRHD: handling fault status reg 3
kernel: DMAR: [DMA Write] Request device [00:02.0] fault addr 9a800000 [fault reason 01] Present bit in root entry is clear
kernel: DMAR: DRHD: handling fault status reg 3
kernel: DMAR: [DMA Write] Request device [00:02.0] fault addr 9a840000 [fault reason 01] Present bit in root entry is clear
kernel: iommu: Adding device 0000:05:00.0 to group 13
kernel: DMAR: DRHD: handling fault status reg 3

Adding "intel_iommu=igfx_off" fixes this issue although  there are still below errors which again didn't happen in Linux 4.18-4.18.5:

kernel: DMAR: DRHD: handling fault status reg 3
kernel: DMAR: [DMA Write] Request device [00:02.0] fault addr 9a980000 [fault reason 01] Present bit in root entry is clear
Comment 1 Ernest Hurtado 2018-09-02 15:35:41 UTC
Created attachment 141415 [details]
config
Comment 2 Ernest Hurtado 2018-09-02 15:45:41 UTC
I also tested this on linux from git up to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=360bd62dc4943a0754e6cb5637e01b5b43143cfc

There were previous issues with IOMMU causing occasional GPU hangs: https://bugs.freedesktop.org/show_bug.cgi?id=89360 which were fixed in Linux 4.18. This one is much more severe though.
Comment 3 Ernest Hurtado 2018-09-02 17:36:00 UTC
After passing "iommu=pt" display works but there are similar errors:

kernel: DMAR: No ATSR found
kernel: DMAR: dmar0: Using Queued invalidation
kernel: DMAR: dmar1: Using Queued invalidation
kernel: DMAR: Hardware identity mapping for device 0000:00:00.0
kernel: WARNING: CPU: 0 PID: 1 at mm/page_alloc.c:4066 __alloc_pages_nodemask+0xe30/0xf80
kernel: Modules linked in:
kernel: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0 #1
kernel: Hardware name: LENOVO 20JJS0HD00/20JJS0HD00, BIOS R0HET51W (1.31 ) 07/04/2018
kernel: RIP: 0010:__alloc_pages_nodemask+0xe30/0xf80
kernel: Code: d6 dd ff ff 48 85 c0 0f 85 05 fe ff ff 81 e3 00 00 40 00 89 5c 24 4c e9 5a f6 ff ff f7 44 24 30 00 02 00 00 0f 85 ed f2 ff ff <0f> 0b e9 e6 f2 ff ff 44 8b 64 24 50 41 be 10 00 00 00 e9 df fc ff
kernel: RSP: 0000:ffffb27500cbbbb0 EFLAGS: 00010046
kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000000b
kernel: RDX: 0000000000488020 RSI: 0000000000488020 RDI: ffff93b3617fc000
kernel: RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000787
kernel: R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000100000
kernel: R13: 0000000000000000 R14: 000000000000000b R15: 0000000000000100
kernel: FS:  0000000000000000(0000) GS:ffff93b359400000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000000 CR3: 00000001e680a001 CR4: 00000000003606f0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: Call Trace:
kernel:  ? sched_clock+0x5/0x10
kernel:  ? pci_mmcfg_read+0x89/0xe0
kernel:  ? pci_read_config_word+0x62/0xa0
kernel:  intel_pasid_alloc_table+0x118/0x180
kernel:  dmar_insert_one_dev_info+0x2a2/0x4a0
kernel:  ? klist_iter_exit+0x17/0x30
kernel:  domain_add_dev_info+0x50/0x90
kernel:  dev_prepare_static_identity_mapping+0x30/0x72
kernel:  intel_iommu_init+0xc7c/0x118a
kernel:  ? printk+0x58/0x6f
kernel:  ? preempt_count_add+0x68/0xa0
kernel:  ? free_reserved_area.cold.30+0x18/0x1d
kernel:  ? do_early_param+0x8e/0x8e
kernel:  ? e820__memblock_setup+0x9d/0x9d
kernel:  pci_iommu_init+0x16/0x3f
kernel:  do_one_initcall+0x46/0x1f5
kernel_init_freeable+0x222/0x2b4
kernel:  ? rest_init+0xc5/0xc5
kernel_init+0xa/0x10d
kernel:  ret_from_fork+0x35/0x40
kernel: ---[ end trace 706bc7dc523f6c1a ]---
kernel: DMAR: Failed to setup IOMMU pass-through
kernel: DMAR: Initialization failed
kernel: =============================================================================
kernel: BUG iommu_domain (Tainted: G        W        ): Objects remaining in iommu_domain on __kmem_cache_shutdown()
kernel: -----------------------------------------------------------------------------
kernel taint
kernel: INFO: Slab 0x0000000073358ae1 objects=11 used=1 fp=0x00000000c5af0509 flags=0x2ffff0000008100
kernel: CPU: 0 PID: 1 Comm: swapper/0 Tainted: G    B   W         4.19.0 #1
kernel: Hardware name: LENOVO 20JJS0HD00/20JJS0HD00, BIOS R0HET51W (1.31 ) 07/04/2018
kernel: Call Trace:
kernel:  dump_stack+0x5c/0x80
kernel:  slab_err+0xb0/0xd4
kernel:  ? ksm_migrate_page+0x50/0x60
kernel:  ? on_each_cpu_cond+0xb9/0xf0
kernel:  ? __kmalloc+0x1e0/0x220
kernel:  __kmem_cache_shutdown.cold.43+0x1b/0x1a3
kernel:  shutdown_cache+0x11/0x140
kernel:  kmem_cache_destroy+0x1e6/0x210
kernel:  intel_iommu_init+0xff3/0x118a
kernel:  ? printk+0x58/0x6f
kernel:  ? preempt_count_add+0x68/0xa0
kernel:  ? free_reserved_area.cold.30+0x18/0x1d
kernel:  ? do_early_param+0x8e/0x8e
kernel:  ? e820__memblock_setup+0x9d/0x9d
kernel:  pci_iommu_init+0x16/0x3f
kernel:  do_one_initcall+0x46/0x1f5
kernel_init_freeable+0x222/0x2b4
kernel:  ? rest_init+0xc5/0xc5
kernel_init+0xa/0x10d
kernel:  ret_from_fork+0x35/0x40
kernel: INFO: Object 0x0000000020568c9c @offset=5632
kernel: kmem_cache_destroy iommu_domain: Slab cache still has objects
kernel: CPU: 0 PID: 1 Comm: swapper/0 Tainted: G    B   W         4.19.0 #1
kernel: Hardware name: LENOVO 20JJS0HD00/20JJS0HD00, BIOS R0HET51W (1.31 ) 07/04/2018
kernel: Call Trace:
kernel:  dump_stack+0x5c/0x80
kernel:  kmem_cache_destroy+0x204/0x210
kernel:  intel_iommu_init+0xff3/0x118a
kernel:  ? printk+0x58/0x6f
kernel:  ? preempt_count_add+0x68/0xa0
kernel:  ? free_reserved_area.cold.30+0x18/0x1d
kernel:  ? do_early_param+0x8e/0x8e
kernel:  ? e820__memblock_setup+0x9d/0x9d
kernel:  pci_iommu_init+0x16/0x3f
kernel:  do_one_initcall+0x46/0x1f5
kernel_init_freeable+0x222/0x2b4
kernel:  ? rest_init+0xc5/0xc5
kernel_init+0xa/0x10d
kernel:  ret_from_fork+0x35/0x40
kernel: PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
kernel: software IO TLB: mapped [mem 0x75f02000-0x79f02000] (64MB)
Comment 4 Ernest Hurtado 2018-09-02 18:50:58 UTC
Created attachment 141417 [details]
dmesg intel_iommu=on

Test with different kernel (intel_iommu=on)

DMAR: No ATSR found
DMAR: dmar0: Using Queued invalidation
DMAR: dmar1: Using Queued invalidation
DMAR: Setting RMRR:
WARNING: CPU: 3 PID: 1 at mm/page_alloc.c:4066 __alloc_pages_nodemask+0xe30/0xf80
Modules linked in:
CPU: 3 PID: 1 Comm: swapper/0 Not tainted 4.19.0-1-MANJARO #1
Hardware name: LENOVO 20JJS0HD00/20JJS0HD00, BIOS R0HET51W (1.31 ) 07/04/2018
RIP: 0010:__alloc_pages_nodemask+0xe30/0xf80
Code: d6 dd ff ff 48 85 c0 0f 85 05 fe ff ff 81 e3 00 00 40 00 89 5c 24 4c e9 5a f6 ff ff f7 44 24 30 00 02 00 00 0f 85 ed f2 ff ff <0f> 0b e9 e6 f2 ff ff 44 8b 64 24 50 41 be 10 00 00 00 e9 df fc ff
RSP: 0000:ffffa53200cbbb88 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000000b
RDX: 0000000000488020 RSI: 0000000000488020 RDI: ffff9347a17fc000
RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000787
R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000100000
R13: 0000000000000000 R14: 000000000000000b R15: 0000000000000100
FS:  0000000000000000(0000) GS:ffff934799580000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000010ea0a001 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ? pci_mmcfg_read+0x89/0xe0
 ? pci_read_config_word+0x62/0xa0
 intel_pasid_alloc_table+0x118/0x180
 dmar_insert_one_dev_info+0x2a2/0x4a0
 set_domain_for_dev+0x6f/0x100
 iommu_prepare_identity_map+0x4d/0xa0
 intel_iommu_init+0xd81/0x118a
 ? printk+0x58/0x6f
 ? preempt_count_add+0x68/0xa0
 ? free_reserved_area.cold.30+0x18/0x1d
 ? do_early_param+0x8e/0x8e
 ? e820__memblock_setup+0x9d/0x9d
 pci_iommu_init+0x16/0x3f
 do_one_initcall+0x46/0x1f5
kernel_init_freeable+0x222/0x2b4
 ? rest_init+0xc5/0xc5
kernel_init+0xa/0x10d
 ret_from_fork+0x35/0x40
---[ end trace 512a682af8e9143d ]---
DMAR: Mapping reserved region failed
DMAR: Setting identity map for device 0000:00:14.0 [0x8f49f000 - 0x8f4befff]
DMAR: Prepare 0-16MiB unity mapping for LPC
DMAR: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
DMAR: Intel(R) Virtualization Technology for Directed I/O
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Write] Request device [00:02.0] fault addr 0 [fault reason 01] Present bit in root entry is clear
iommu: Adding device 0000:00:00.0 to group 0
iommu: Adding device 0000:00:02.0 to group 1
iommu: Adding device 0000:00:08.0 to group 2
iommu: Adding device 0000:00:13.0 to group 3
iommu: Adding device 0000:00:14.0 to group 4
iommu: Adding device 0000:00:14.2 to group 4
iommu: Adding device 0000:00:16.0 to group 5
iommu: Adding device 0000:00:1c.0 to group 6
iommu: Adding device 0000:00:1c.2 to group 7
iommu: Adding device 0000:00:1c.4 to group 8
iommu: Adding device 0000:00:1d.0 to group 9
iommu: Adding device 0000:00:1f.0 to group 10
iommu: Adding device 0000:00:1f.2 to group 10
iommu: Adding device 0000:00:1f.3 to group 10
iommu: Adding device 0000:00:1f.4 to group 10
DMAR: DRHD: handling fault status reg 3
DMAR: [DMA Write] Request device [00:02.0] fault addr 9a880000 [fault reason 01] Present bit in root entry is clear
iommu: Adding device 0000:02:00.0 to group 11
iommu: Adding device 0000:04:00.0 to group 12
iommu: Adding device 0000:05:00.0 to group 13
DMAR: DRHD: handling fault status reg 3
DMAR: [DMA Write] Request device [00:02.0] fault addr 9a8c0000 [fault reason 01] Present bit in root entry is clear
DMAR: DRHD: handling fault status reg 3
Comment 5 Lakshmi 2018-09-04 05:29:35 UTC
Ernest, Can you try this patch?
https://lkml.org/lkml/2018/9/1/37
Comment 6 Ernest Hurtado 2018-09-04 11:26:45 UTC
(In reply to Lakshmi from comment #5)
> Ernest, Can you try this patch?
> https://lkml.org/lkml/2018/9/1/37

I can confirm this patch fixes issue. Thank you.
Comment 7 Lakshmi 2018-09-05 05:23:41 UTC
This patch is not merged to mainline yet. It's under process to go to mainline. Once the patch get merged into mainline, this bug can be closed.
Comment 8 Michael Groh 2018-09-19 12:52:14 UTC
Hello everyone,

i have a ThinkPad T470s - using a KBL i7-7500U. Without the patch from comment #5

> Ernest, Can you try this patch?
> https://lkml.org/lkml/2018/9/1/37

i had "broken" display output: mostly black, with a few colored pixels in a rectangular pattern.

After applying the patch to 4.19.0-rc4 the display-output is working again.

I would argue that this patch should be included for 4.19, as it regresses from previous releases.

Thank you for your work, if you need more testing or information i am willing to help.

Michael
Comment 9 Ernest Hurtado 2018-09-26 12:04:08 UTC
The patch was merged in Linus (Greg) tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=be9e6598aeb0db70a7927d6b3bb4d3d6fb1c3e18
Comment 10 Lakshmi 2018-09-27 07:54:04 UTC
Closing this bug as resolved/fixed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.