Created attachment 116752 [details] Serial console log of Linux 4.1 I found Intel-IOMMU.txt in Linux documentation says I should file a bug if intel_iommu=igfx_off fixes anything. When using Linux 3.7 or later versions, characters on the screen become broken after the graphics driver is loaded. Here is the screenshot. It is the same as bug 90037, so I don't attach the screenshot again: https://bugs.freedesktop.org/attachment.cgi?id=115079 After the display server (Xorg or Wayland) is started, it shows more errors and crashes the system, so I cannot access the system to run dmesg or other commands. I will attach a serial console log instead. git bisect says the bad commit is https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=398b7a1 If I use intel_iommu=igfx_off, there is no graphics problem. I found some behavior differences between Linux 3.6 and Linux 3.7. https://bugs.freedesktop.org/show_bug.cgi?id=90037#c8 (CPU and GPU) Intel Core i5 CPU 650 @ 3.20GHz Intel Ironlake Desktop (Motherboard) ASUSTeK Computer INC. P7H55D-M EVO
After doing git bisect again, I found the bad commit is https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=edef7e6 I hope it is more useful than the big merge commit.
Can you please paste the output of lspci -s 0:0:2 -nvv?
If you look in drivers/char/agp/intel-gtt.c, you will find the function needs_ilk_vtd_wa(). We need to work out why that is failing to match your machine.
# lspci -s 0:0:2 -nvv 00:02.0 0300: 8086:0042 (rev 12) (prog-if 00 [VGA controller]) Subsystem: 1043:8383 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 39 Region 0: Memory at f7400000 (64-bit, non-prefetchable) [size=4M] Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M] Region 4: I/O ports at bc00 [size=8] Expansion ROM at <unassigned> [disabled] Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- Address: fee00000 Data: 40b3 Capabilities: [d0] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [a4] PCI Advanced Features AFCap: TP+ FLR+ AFCtrl: FLR- AFStatus: TP- Kernel driver in use: i915 Kernel modules: i915
My gpu_devid is 0x0042, but PCI_DEVICE_ID_INTEL_IRONLAKE_M_HB is 0x0044 and PCI_DEVICE_ID_INTEL_IRONLAKE_M_IG is 0x0046.
(In reply to Ting-Wei Lan from comment #5) > My gpu_devid is 0x0042, but > PCI_DEVICE_ID_INTEL_IRONLAKE_M_HB is 0x0044 and > PCI_DEVICE_ID_INTEL_IRONLAKE_M_IG is 0x0046. diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c index 3bb678e..7c68576 100644 --- a/drivers/char/agp/intel-gtt.c +++ b/drivers/char/agp/intel-gtt.c @@ -581,7 +581,7 @@ static inline int needs_ilk_vtd_wa(void) /* Query intel_iommu to see if we need the workaround. Presumably that * was loaded first. */ - if ((gpu_devid == PCI_DEVICE_ID_INTEL_IRONLAKE_M_HB || + if ((gpu_devid == PCI_DEVICE_ID_INTEL_IRONLAKE_D_IG || gpu_devid == PCI_DEVICE_ID_INTEL_IRONLAKE_M_IG) && intel_iommu_gfx_mapped) return 1;
(In reply to Chris Wilson from comment #6) > (In reply to Ting-Wei Lan from comment #5) > > My gpu_devid is 0x0042, but > > PCI_DEVICE_ID_INTEL_IRONLAKE_M_HB is 0x0044 and > > PCI_DEVICE_ID_INTEL_IRONLAKE_M_IG is 0x0046. > > diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c > index 3bb678e..7c68576 100644 > --- a/drivers/char/agp/intel-gtt.c > +++ b/drivers/char/agp/intel-gtt.c > @@ -581,7 +581,7 @@ static inline int needs_ilk_vtd_wa(void) > /* Query intel_iommu to see if we need the workaround. Presumably > that > * was loaded first. > */ > - if ((gpu_devid == PCI_DEVICE_ID_INTEL_IRONLAKE_M_HB || > + if ((gpu_devid == PCI_DEVICE_ID_INTEL_IRONLAKE_D_IG || > gpu_devid == PCI_DEVICE_ID_INTEL_IRONLAKE_M_IG) && > intel_iommu_gfx_mapped) > return 1; This patch fixes the problem on my machine.
My system hangs and I have to press the reset button to reboot it now. Serial console log: [ 2711.310268] Kernel panic - not syncing: DMAR hardware is malfunctioning [ 2711.310268] [ 2711.318436] CPU: 0 PID: 2670 Comm: Xorg Not tainted 4.1.0+ #1 [ 2711.324232] Hardware name: System manufacturer System Product Name/P7H55D-M EVO, BIOS 1604 07/22/2010 [ 2711.333783] 0000000000000000 000000009fe49665 ffff8804090cbad8 ffffffff8179238b [ 2711.341321] 0000000000000000 ffffffff81aa6d28 ffff8804090cbb58 ffffffff81791152 [ 2711.348858] 0000000200000008 ffff8804090cbb68 ffff8804090cbb08 000000009fe49665 [ 2711.356401] Call Trace: [ 2711.358875] [<ffffffff8179238b>] dump_stack+0x45/0x57 [ 2711.364059] [<ffffffff81791152>] panic+0xd0/0x203 [ 2711.368898] [<ffffffff814d24aa>] __iommu_flush_iotlb+0x21a/0x230 [ 2711.375046] [<ffffffff814d29b9>] iommu_flush_iotlb_psi+0x99/0x110 [ 2711.381282] [<ffffffff814d6d15>] intel_unmap+0x1c5/0x240 [ 2711.386729] [<ffffffff814d6da2>] intel_unmap_sg+0x12/0x20 [ 2711.392306] [<ffffffffa0178253>] i915_gem_gtt_finish_object+0xb3/0xe0 [i915] [ 2711.399524] [<ffffffffa017fc67>] i915_vma_unbind+0x227/0x250 [i915] [ 2711.405953] [<ffffffffa0180ace>] i915_gem_free_object+0x8e/0x330 [i915] [ 2711.412728] [<ffffffffa00963f7>] drm_gem_object_free+0x27/0x40 [drm] [ 2711.419233] [<ffffffffa0096988>] drm_gem_object_handle_unreference_unlocked+0x118/0x130 [drm] [ 2711.427922] [<ffffffffa0096a45>] drm_gem_handle_delete+0xa5/0x100 [drm] [ 2711.434691] [<ffffffffa00970d0>] drm_gem_close_ioctl+0x20/0x30 [drm] [ 2711.441192] [<ffffffffa00979af>] drm_ioctl+0x12f/0x620 [drm] [ 2711.446985] [<ffffffff81205e8f>] ? __slab_free+0xbf/0x260 [ 2711.452526] [<ffffffffa00970b0>] ? drm_gem_handle_create+0x50/0x50 [drm] [ 2711.459369] [<ffffffff8123a4e6>] do_vfs_ioctl+0x2c6/0x4d0 [ 2711.464899] [<ffffffff8123a771>] SyS_ioctl+0x81/0xa0 [ 2711.469994] [<ffffffff81023f27>] ? syscall_trace_leave+0xc7/0x140 [ 2711.476232] [<ffffffff81798aee>] system_call_fastpath+0x12/0x71 [ 2711.482345] Kernel Offset: disabled [ 2711.485867] drm_kms_helper: panic occurred, switching back to text console [ 2711.492832] ------------[ cut here ]------------ [ 2711.497495] WARNING: CPU: 0 PID: 2670 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x61/0x70() [ 2711.507133] Modules linked in: bnep bluetooth rfkill ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_CHECKSUM nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter tun ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw hwmon_vid snd_hda_codec_hdmi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller iTCO_wdt snd_hda_codec iTCO_vendor_support fuse crct10dif_pclmul snd_hda_core crc32_pclmul crc32c_intel snd_hwdep ghash_clmulni_intel snd_seq snd_seq_device snd_pcm snd_timer snd shpchp pcspkr i2c_i801 mei_me mei soundcore asus_atk0110 lpc_ich mfd_core acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ata_generic pata_acpi i915 i2c_algo_bit video drm_kms_helper 8021q garp stp llc drm mrp firewire_ohci r8169 uas serio_raw usb_storage mii firewire_core pata_marvell crc_itu_t [ 2711.609743] CPU: 0 PID: 2670 Comm: Xorg Not tainted 4.1.0+ #1 [ 2711.615538] Hardware name: System manufacturer System Product Name/P7H55D-M EVO, BIOS 1604 07/22/2010 [ 2711.625089] 0000000000000000 000000009fe49665 ffff88041fc03d68 ffffffff8179238b [ 2711.632626] 0000000000000000 0000000000000000 ffff88041fc03da8 ffffffff8109f4da [ 2711.640163] ffff88041fc17838 0000000000000001 ffff88041fc577c0 0000000000000000 [ 2711.647699] Call Trace: [ 2711.650172] <IRQ> [<ffffffff8179238b>] dump_stack+0x45/0x57 [ 2711.655996] [<ffffffff8109f4da>] warn_slowpath_common+0x8a/0xc0 [ 2711.662048] [<ffffffff8109f60a>] warn_slowpath_null+0x1a/0x20 [ 2711.667928] [<ffffffff8104da81>] native_smp_send_reschedule+0x61/0x70 [ 2711.674507] [<ffffffff810dc62d>] trigger_load_balance+0x13d/0x240 [ 2711.680739] [<ffffffff810c9f47>] scheduler_tick+0x97/0xe0 [ 2711.686275] [<ffffffff811079c1>] update_process_times+0x51/0x60 [ 2711.692333] [<ffffffff81117fa5>] tick_sched_handle.isra.18+0x25/0x60 [ 2711.698829] [<ffffffff81118024>] tick_sched_timer+0x44/0x80 [ 2711.704538] [<ffffffff81108613>] __run_hrtimer+0x73/0x1d0 [ 2711.710069] [<ffffffff81117fe0>] ? tick_sched_handle.isra.18+0x60/0x60 [ 2711.719486] [<ffffffff81108b13>] hrtimer_interrupt+0x103/0x220 [ 2711.728189] [<ffffffff810503ec>] local_apic_timer_interrupt+0x3c/0x70 [ 2711.737492] [<ffffffff8179b8f1>] smp_apic_timer_interrupt+0x41/0x60 [ 2711.746586] [<ffffffff817999be>] apic_timer_interrupt+0x6e/0x80 [ 2711.755293] <EOI> [<ffffffff810dbbfe>] ? pick_next_task_fair+0x5ce/0x980 [ 2711.764918] [<ffffffff81794897>] ? __schedule+0x6e7/0x970 [ 2711.773139] [<ffffffff8179427b>] ? __schedule+0xcb/0x970 [ 2711.781235] [<ffffffff81794b57>] schedule+0x37/0x90 [ 2711.788883] [<ffffffff81794e9e>] schedule_preempt_disabled+0xe/0x10 [ 2711.797936] [<ffffffff810e6775>] mutex_optimistic_spin+0x1a5/0x1e0 [ 2711.806901] [<ffffffff8179699a>] __mutex_lock_slowpath+0x3a/0x120 [ 2711.815786] [<ffffffff81796aa3>] mutex_lock+0x23/0x40 [ 2711.823659] [<ffffffffa01bca7e>] intel_begin_crtc_commit+0x7e/0x1c0 [i915] [ 2711.833373] [<ffffffffa011e8d2>] drm_plane_helper_commit+0x132/0x300 [drm_kms_helper] [ 2711.844045] [<ffffffffa011ebec>] drm_plane_helper_disable+0x5c/0xb0 [drm_kms_helper] [ 2711.854626] [<ffffffffa00a359e>] drm_plane_force_disable+0x2e/0x90 [drm] [ 2711.864159] [<ffffffffa0127b41>] restore_fbdev_mode+0x51/0xf0 [drm_kms_helper] [ 2711.874225] [<ffffffffa0127d95>] drm_fb_helper_force_kernel_mode+0x85/0xc0 [drm_kms_helper] [ 2711.885444] [<ffffffffa0128c49>] drm_fb_helper_panic+0x29/0x30 [drm_kms_helper] [ 2711.895589] [<ffffffff810bf04f>] notifier_call_chain+0x4f/0x80 [ 2711.904268] [<ffffffff810bf11a>] atomic_notifier_call_chain+0x1a/0x20 [ 2711.913566] [<ffffffff8179117f>] panic+0xfd/0x203 [ 2711.921122] [<ffffffff814d24aa>] __iommu_flush_iotlb+0x21a/0x230 [ 2711.929980] [<ffffffff814d29b9>] iommu_flush_iotlb_psi+0x99/0x110 [ 2711.938925] [<ffffffff814d6d15>] intel_unmap+0x1c5/0x240 [ 2711.947084] [<ffffffff814d6da2>] intel_unmap_sg+0x12/0x20 [ 2711.955340] [<ffffffffa0178253>] i915_gem_gtt_finish_object+0xb3/0xe0 [i915] [ 2711.965259] [<ffffffffa017fc67>] i915_vma_unbind+0x227/0x250 [i915] [ 2711.974391] [<ffffffffa0180ace>] i915_gem_free_object+0x8e/0x330 [i915] [ 2711.983834] [<ffffffffa00963f7>] drm_gem_object_free+0x27/0x40 [drm] [ 2711.992999] [<ffffffffa0096988>] drm_gem_object_handle_unreference_unlocked+0x118/0x130 [drm] [ 2712.004326] [<ffffffffa0096a45>] drm_gem_handle_delete+0xa5/0x100 [drm] [ 2712.013658] [<ffffffffa00970d0>] drm_gem_close_ioctl+0x20/0x30 [drm] [ 2712.022634] [<ffffffffa00979af>] drm_ioctl+0x12f/0x620 [drm] [ 2712.030802] [<ffffffff81205e8f>] ? __slab_free+0xbf/0x260 [ 2712.038632] [<ffffffffa00970b0>] ? drm_gem_handle_create+0x50/0x50 [drm] [ 2712.047675] [<ffffffff8123a4e6>] do_vfs_ioctl+0x2c6/0x4d0 [ 2712.055324] [<ffffffff8123a771>] SyS_ioctl+0x81/0xa0 [ 2712.062446] [<ffffffff81023f27>] ? syscall_trace_leave+0xc7/0x140 [ 2712.070685] [<ffffffff81798aee>] system_call_fastpath+0x12/0x71 [ 2712.078714] ---[ end trace 429f9f85c882e9c4 ]---
(In reply to Ting-Wei Lan from comment #8) > My system hangs and I have to press the reset button to reboot it now. > > Serial console log: > > [ 2711.310268] Kernel panic - not syncing: DMAR hardware is malfunctioning > [ 2711.310268] That panic is caused by the iommu not responding to the TLB flush. You're better chasing the iommu experts as to what the likely cause is and whether it is addressable.
I sent a message to Linux IOMMU list. http://lists.linuxfoundation.org/pipermail/iommu/2015-June/013538.html
After upgrading to Linux 4.2-rc2, I see a lot of these messages in dmesg: DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr <some_addr> DMAR:[fault reason 05] PTE Write access is not set There is no crash after running it for more than one hour. I will run the system longer to make sure that it really doesn't crash. A dmesg log of Linux 4.2-rc2 is attached.
Created attachment 117161 [details] dmesg log of Linux 4.2-rc2
Created attachment 117170 [details] dmesg log of Linux 4.2-rc2 - kernel panic It crashes after using it for more than two hours. A similar kernel panic backtrace is showed.
This is sad, but it will not get fixed (if it can be fixed) by keeping this bug open.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.