Bug 91127

Summary: Graphics problems with intel_iommu=on on Linux >= 3.7
Product: DRI Reporter: Ting-Wei Lan <lantw44>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED NOTOURBUG QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: ILK i915 features:
Attachments:
Description Flags
Serial console log of Linux 4.1
none
dmesg log of Linux 4.2-rc2
none
dmesg log of Linux 4.2-rc2 - kernel panic none

Description Ting-Wei Lan 2015-06-27 16:38:51 UTC
Created attachment 116752 [details]
Serial console log of Linux 4.1

I found Intel-IOMMU.txt in Linux documentation says I should file a bug if intel_iommu=igfx_off fixes anything.


When using Linux 3.7 or later versions, characters on the screen become broken after the graphics driver is loaded. Here is the screenshot. It is the same as bug 90037, so I don't attach the screenshot again:

https://bugs.freedesktop.org/attachment.cgi?id=115079

After the display server (Xorg or Wayland) is started, it shows more errors and crashes the system, so I cannot access the system to run dmesg or other commands. I will attach a serial console log instead.

git bisect says the bad commit is
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=398b7a1

If I use intel_iommu=igfx_off, there is no graphics problem.


I found some behavior differences between Linux 3.6 and Linux 3.7.
https://bugs.freedesktop.org/show_bug.cgi?id=90037#c8


(CPU and GPU)
Intel Core i5 CPU 650 @ 3.20GHz
Intel Ironlake Desktop

(Motherboard)
ASUSTeK Computer INC. P7H55D-M EVO
Comment 1 Ting-Wei Lan 2015-06-28 09:48:29 UTC
After doing git bisect again, I found the bad commit is
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=edef7e6

I hope it is more useful than the big merge commit.
Comment 2 Chris Wilson 2015-06-28 11:51:34 UTC
Can you please paste the output of lspci -s 0:0:2 -nvv?
Comment 3 Chris Wilson 2015-06-28 11:52:26 UTC
If you look in drivers/char/agp/intel-gtt.c, you will find the function needs_ilk_vtd_wa(). We need to work out why that is failing to match your machine.
Comment 4 Ting-Wei Lan 2015-06-28 11:53:05 UTC
# lspci -s 0:0:2 -nvv
00:02.0 0300: 8086:0042 (rev 12) (prog-if 00 [VGA controller])
	Subsystem: 1043:8383
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 39
	Region 0: Memory at f7400000 (64-bit, non-prefetchable) [size=4M]
	Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Region 4: I/O ports at bc00 [size=8]
	Expansion ROM at <unassigned> [disabled]
	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee00000  Data: 40b3
	Capabilities: [d0] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [a4] PCI Advanced Features
		AFCap: TP+ FLR+
		AFCtrl: FLR-
		AFStatus: TP-
	Kernel driver in use: i915
	Kernel modules: i915
Comment 5 Ting-Wei Lan 2015-06-28 13:02:03 UTC
My gpu_devid is 0x0042, but
PCI_DEVICE_ID_INTEL_IRONLAKE_M_HB is 0x0044 and
PCI_DEVICE_ID_INTEL_IRONLAKE_M_IG is 0x0046.
Comment 6 Chris Wilson 2015-06-28 13:11:56 UTC
(In reply to Ting-Wei Lan from comment #5)
> My gpu_devid is 0x0042, but
> PCI_DEVICE_ID_INTEL_IRONLAKE_M_HB is 0x0044 and
> PCI_DEVICE_ID_INTEL_IRONLAKE_M_IG is 0x0046.

diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
index 3bb678e..7c68576 100644
--- a/drivers/char/agp/intel-gtt.c
+++ b/drivers/char/agp/intel-gtt.c
@@ -581,7 +581,7 @@ static inline int needs_ilk_vtd_wa(void)
        /* Query intel_iommu to see if we need the workaround. Presumably that
         * was loaded first.
         */
-       if ((gpu_devid == PCI_DEVICE_ID_INTEL_IRONLAKE_M_HB ||
+       if ((gpu_devid == PCI_DEVICE_ID_INTEL_IRONLAKE_D_IG ||
             gpu_devid == PCI_DEVICE_ID_INTEL_IRONLAKE_M_IG) &&
             intel_iommu_gfx_mapped)
                return 1;
Comment 7 Ting-Wei Lan 2015-06-28 13:54:10 UTC
(In reply to Chris Wilson from comment #6)
> (In reply to Ting-Wei Lan from comment #5)
> > My gpu_devid is 0x0042, but
> > PCI_DEVICE_ID_INTEL_IRONLAKE_M_HB is 0x0044 and
> > PCI_DEVICE_ID_INTEL_IRONLAKE_M_IG is 0x0046.
> 
> diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
> index 3bb678e..7c68576 100644
> --- a/drivers/char/agp/intel-gtt.c
> +++ b/drivers/char/agp/intel-gtt.c
> @@ -581,7 +581,7 @@ static inline int needs_ilk_vtd_wa(void)
>         /* Query intel_iommu to see if we need the workaround. Presumably
> that
>          * was loaded first.
>          */
> -       if ((gpu_devid == PCI_DEVICE_ID_INTEL_IRONLAKE_M_HB ||
> +       if ((gpu_devid == PCI_DEVICE_ID_INTEL_IRONLAKE_D_IG ||
>              gpu_devid == PCI_DEVICE_ID_INTEL_IRONLAKE_M_IG) &&
>              intel_iommu_gfx_mapped)
>                 return 1;

This patch fixes the problem on my machine.
Comment 8 Ting-Wei Lan 2015-06-28 14:42:36 UTC
My system hangs and I have to press the reset button to reboot it now.

Serial console log:

[ 2711.310268] Kernel panic - not syncing: DMAR hardware is malfunctioning
[ 2711.310268]
[ 2711.318436] CPU: 0 PID: 2670 Comm: Xorg Not tainted 4.1.0+ #1
[ 2711.324232] Hardware name: System manufacturer System Product Name/P7H55D-M EVO, BIOS 1604    07/22/2010
[ 2711.333783]  0000000000000000 000000009fe49665 ffff8804090cbad8 ffffffff8179238b
[ 2711.341321]  0000000000000000 ffffffff81aa6d28 ffff8804090cbb58 ffffffff81791152
[ 2711.348858]  0000000200000008 ffff8804090cbb68 ffff8804090cbb08 000000009fe49665
[ 2711.356401] Call Trace:
[ 2711.358875]  [<ffffffff8179238b>] dump_stack+0x45/0x57
[ 2711.364059]  [<ffffffff81791152>] panic+0xd0/0x203
[ 2711.368898]  [<ffffffff814d24aa>] __iommu_flush_iotlb+0x21a/0x230
[ 2711.375046]  [<ffffffff814d29b9>] iommu_flush_iotlb_psi+0x99/0x110
[ 2711.381282]  [<ffffffff814d6d15>] intel_unmap+0x1c5/0x240
[ 2711.386729]  [<ffffffff814d6da2>] intel_unmap_sg+0x12/0x20
[ 2711.392306]  [<ffffffffa0178253>] i915_gem_gtt_finish_object+0xb3/0xe0 [i915]
[ 2711.399524]  [<ffffffffa017fc67>] i915_vma_unbind+0x227/0x250 [i915]
[ 2711.405953]  [<ffffffffa0180ace>] i915_gem_free_object+0x8e/0x330 [i915]
[ 2711.412728]  [<ffffffffa00963f7>] drm_gem_object_free+0x27/0x40 [drm]
[ 2711.419233]  [<ffffffffa0096988>] drm_gem_object_handle_unreference_unlocked+0x118/0x130 [drm]
[ 2711.427922]  [<ffffffffa0096a45>] drm_gem_handle_delete+0xa5/0x100 [drm]
[ 2711.434691]  [<ffffffffa00970d0>] drm_gem_close_ioctl+0x20/0x30 [drm]
[ 2711.441192]  [<ffffffffa00979af>] drm_ioctl+0x12f/0x620 [drm]
[ 2711.446985]  [<ffffffff81205e8f>] ? __slab_free+0xbf/0x260
[ 2711.452526]  [<ffffffffa00970b0>] ? drm_gem_handle_create+0x50/0x50 [drm]
[ 2711.459369]  [<ffffffff8123a4e6>] do_vfs_ioctl+0x2c6/0x4d0
[ 2711.464899]  [<ffffffff8123a771>] SyS_ioctl+0x81/0xa0
[ 2711.469994]  [<ffffffff81023f27>] ? syscall_trace_leave+0xc7/0x140
[ 2711.476232]  [<ffffffff81798aee>] system_call_fastpath+0x12/0x71
[ 2711.482345] Kernel Offset: disabled
[ 2711.485867] drm_kms_helper: panic occurred, switching back to text console
[ 2711.492832] ------------[ cut here ]------------
[ 2711.497495] WARNING: CPU: 0 PID: 2670 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x61/0x70()
[ 2711.507133] Modules linked in: bnep bluetooth rfkill ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_CHECKSUM nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter tun ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw hwmon_vid snd_hda_codec_hdmi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller iTCO_wdt snd_hda_codec iTCO_vendor_support fuse crct10dif_pclmul snd_hda_core crc32_pclmul crc32c_intel snd_hwdep ghash_clmulni_intel snd_seq snd_seq_device snd_pcm snd_timer snd shpchp pcspkr i2c_i801 mei_me mei soundcore asus_atk0110 lpc_ich mfd_core acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ata_generic pata_acpi i915 i2c_algo_bit video drm_kms_helper 8021q garp stp llc drm mrp firewire_ohci r8169 uas serio_raw usb_storage mii firewire_core pata_marvell crc_itu_t
[ 2711.609743] CPU: 0 PID: 2670 Comm: Xorg Not tainted 4.1.0+ #1
[ 2711.615538] Hardware name: System manufacturer System Product Name/P7H55D-M EVO, BIOS 1604    07/22/2010
[ 2711.625089]  0000000000000000 000000009fe49665 ffff88041fc03d68 ffffffff8179238b
[ 2711.632626]  0000000000000000 0000000000000000 ffff88041fc03da8 ffffffff8109f4da
[ 2711.640163]  ffff88041fc17838 0000000000000001 ffff88041fc577c0 0000000000000000
[ 2711.647699] Call Trace:
[ 2711.650172]  <IRQ>  [<ffffffff8179238b>] dump_stack+0x45/0x57
[ 2711.655996]  [<ffffffff8109f4da>] warn_slowpath_common+0x8a/0xc0
[ 2711.662048]  [<ffffffff8109f60a>] warn_slowpath_null+0x1a/0x20
[ 2711.667928]  [<ffffffff8104da81>] native_smp_send_reschedule+0x61/0x70
[ 2711.674507]  [<ffffffff810dc62d>] trigger_load_balance+0x13d/0x240
[ 2711.680739]  [<ffffffff810c9f47>] scheduler_tick+0x97/0xe0
[ 2711.686275]  [<ffffffff811079c1>] update_process_times+0x51/0x60
[ 2711.692333]  [<ffffffff81117fa5>] tick_sched_handle.isra.18+0x25/0x60
[ 2711.698829]  [<ffffffff81118024>] tick_sched_timer+0x44/0x80
[ 2711.704538]  [<ffffffff81108613>] __run_hrtimer+0x73/0x1d0
[ 2711.710069]  [<ffffffff81117fe0>] ? tick_sched_handle.isra.18+0x60/0x60
[ 2711.719486]  [<ffffffff81108b13>] hrtimer_interrupt+0x103/0x220
[ 2711.728189]  [<ffffffff810503ec>] local_apic_timer_interrupt+0x3c/0x70
[ 2711.737492]  [<ffffffff8179b8f1>] smp_apic_timer_interrupt+0x41/0x60
[ 2711.746586]  [<ffffffff817999be>] apic_timer_interrupt+0x6e/0x80
[ 2711.755293]  <EOI>  [<ffffffff810dbbfe>] ? pick_next_task_fair+0x5ce/0x980
[ 2711.764918]  [<ffffffff81794897>] ? __schedule+0x6e7/0x970
[ 2711.773139]  [<ffffffff8179427b>] ? __schedule+0xcb/0x970
[ 2711.781235]  [<ffffffff81794b57>] schedule+0x37/0x90
[ 2711.788883]  [<ffffffff81794e9e>] schedule_preempt_disabled+0xe/0x10
[ 2711.797936]  [<ffffffff810e6775>] mutex_optimistic_spin+0x1a5/0x1e0
[ 2711.806901]  [<ffffffff8179699a>] __mutex_lock_slowpath+0x3a/0x120
[ 2711.815786]  [<ffffffff81796aa3>] mutex_lock+0x23/0x40
[ 2711.823659]  [<ffffffffa01bca7e>] intel_begin_crtc_commit+0x7e/0x1c0 [i915]
[ 2711.833373]  [<ffffffffa011e8d2>] drm_plane_helper_commit+0x132/0x300 [drm_kms_helper]
[ 2711.844045]  [<ffffffffa011ebec>] drm_plane_helper_disable+0x5c/0xb0 [drm_kms_helper]
[ 2711.854626]  [<ffffffffa00a359e>] drm_plane_force_disable+0x2e/0x90 [drm]
[ 2711.864159]  [<ffffffffa0127b41>] restore_fbdev_mode+0x51/0xf0 [drm_kms_helper]
[ 2711.874225]  [<ffffffffa0127d95>] drm_fb_helper_force_kernel_mode+0x85/0xc0 [drm_kms_helper]
[ 2711.885444]  [<ffffffffa0128c49>] drm_fb_helper_panic+0x29/0x30 [drm_kms_helper]
[ 2711.895589]  [<ffffffff810bf04f>] notifier_call_chain+0x4f/0x80
[ 2711.904268]  [<ffffffff810bf11a>] atomic_notifier_call_chain+0x1a/0x20
[ 2711.913566]  [<ffffffff8179117f>] panic+0xfd/0x203
[ 2711.921122]  [<ffffffff814d24aa>] __iommu_flush_iotlb+0x21a/0x230
[ 2711.929980]  [<ffffffff814d29b9>] iommu_flush_iotlb_psi+0x99/0x110
[ 2711.938925]  [<ffffffff814d6d15>] intel_unmap+0x1c5/0x240
[ 2711.947084]  [<ffffffff814d6da2>] intel_unmap_sg+0x12/0x20
[ 2711.955340]  [<ffffffffa0178253>] i915_gem_gtt_finish_object+0xb3/0xe0 [i915]
[ 2711.965259]  [<ffffffffa017fc67>] i915_vma_unbind+0x227/0x250 [i915]
[ 2711.974391]  [<ffffffffa0180ace>] i915_gem_free_object+0x8e/0x330 [i915]
[ 2711.983834]  [<ffffffffa00963f7>] drm_gem_object_free+0x27/0x40 [drm]
[ 2711.992999]  [<ffffffffa0096988>] drm_gem_object_handle_unreference_unlocked+0x118/0x130 [drm]
[ 2712.004326]  [<ffffffffa0096a45>] drm_gem_handle_delete+0xa5/0x100 [drm]
[ 2712.013658]  [<ffffffffa00970d0>] drm_gem_close_ioctl+0x20/0x30 [drm]
[ 2712.022634]  [<ffffffffa00979af>] drm_ioctl+0x12f/0x620 [drm]
[ 2712.030802]  [<ffffffff81205e8f>] ? __slab_free+0xbf/0x260
[ 2712.038632]  [<ffffffffa00970b0>] ? drm_gem_handle_create+0x50/0x50 [drm]
[ 2712.047675]  [<ffffffff8123a4e6>] do_vfs_ioctl+0x2c6/0x4d0
[ 2712.055324]  [<ffffffff8123a771>] SyS_ioctl+0x81/0xa0
[ 2712.062446]  [<ffffffff81023f27>] ? syscall_trace_leave+0xc7/0x140
[ 2712.070685]  [<ffffffff81798aee>] system_call_fastpath+0x12/0x71
[ 2712.078714] ---[ end trace 429f9f85c882e9c4 ]---
Comment 9 Chris Wilson 2015-06-28 14:53:55 UTC
(In reply to Ting-Wei Lan from comment #8)
> My system hangs and I have to press the reset button to reboot it now.
> 
> Serial console log:
> 
> [ 2711.310268] Kernel panic - not syncing: DMAR hardware is malfunctioning
> [ 2711.310268]

That panic is caused by the iommu not responding to the TLB flush. You're better chasing the iommu experts as to what the likely cause is and whether it is addressable.
Comment 10 Ting-Wei Lan 2015-07-02 18:33:12 UTC
I sent a message to Linux IOMMU list.
http://lists.linuxfoundation.org/pipermail/iommu/2015-June/013538.html
Comment 11 Ting-Wei Lan 2015-07-16 06:02:37 UTC
After upgrading to Linux 4.2-rc2, I see a lot of these messages in dmesg:

DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr <some_addr>
DMAR:[fault reason 05] PTE Write access is not set

There is no crash after running it for more than one hour. I will run the system longer to make sure that it really doesn't crash.

A dmesg log of Linux 4.2-rc2 is attached.
Comment 12 Ting-Wei Lan 2015-07-16 06:03:27 UTC
Created attachment 117161 [details]
dmesg log of Linux 4.2-rc2
Comment 13 Ting-Wei Lan 2015-07-16 13:19:49 UTC
Created attachment 117170 [details]
dmesg log of Linux 4.2-rc2 - kernel panic

It crashes after using it for more than two hours. A similar kernel panic backtrace is showed.
Comment 14 Jani Nikula 2016-06-17 15:42:06 UTC
This is sad, but it will not get fixed (if it can be fixed) by keeping this bug open.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.