Bug 111451 - igfx iommu causes hangs: DMAR: DRHD: handling fault status reg 3
Summary: igfx iommu causes hangs: DMAR: DRHD: handling fault status reg 3
Status: RESOLVED WONTFIX
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged
Keywords:
Depends on:
Blocks:
 
Reported: 2019-08-21 09:31 UTC by Paul Menzel
Modified: 2019-09-13 09:50 UTC (History)
1 user (show)

See Also:
i915 platform: BDW
i915 features: GPU hang


Attachments
Linux kernel 5.2.9 messages (159.37 KB, text/plain)
2019-08-21 09:31 UTC, Paul Menzel
no flags Details

Description Paul Menzel 2019-08-21 09:31:35 UTC
Created attachment 145112 [details]
Linux kernel 5.2.9 messages

Debian enabled CONFIG_INTEL_IOMMU_DEFAULT_ON=y in Linux 5.2.9 [1], and now there are hangs on the Dell Latitude E7250, often after using the laptops function brightness keys.

```
[   41.975448] DMAR: DRHD: handling fault status reg 3
[   41.975456] DMAR: [DMA Write] Request device [00:02.0] fault addr fffec1003000 [fault reason 23] Unknown
[   41.987006] DMAR: DRHD: handling fault status reg 3
[   41.987013] DMAR: [DMA Write] Request device [00:02.0] fault addr fffec1006000 [fault reason 23] Unknown
[   42.157270] DMAR: DRHD: handling fault status reg 3
[   42.157275] DMAR: [DMA Write] Request device [00:02.0] fault addr fffec1003000 [fault reason 23] Unknown
[   42.886618] DMAR: DRHD: handling fault status reg 3
[   47.549330] dmar_fault: 11 callbacks suppressed
[   47.549333] DMAR: DRHD: handling fault status reg 3
[   47.549343] DMAR: [DMA Write] Request device [00:02.0] fault addr fffec1006000 [fault reason 23] Unknown
[   48.699784] DMAR: DRHD: handling fault status reg 3
[   48.699799] DMAR: [DMA Write] Request device [00:02.0] fault addr fffec1003000 [fault reason 23] Unknown
[   48.880717] DMAR: DRHD: handling fault status reg 3
[   48.880733] DMAR: [DMA Write] Request device [00:02.0] fault addr fffec1006000 [fault reason 23] Unknown
[   50.033196] DMAR: DRHD: handling fault status reg 3
[   57.985477] i915 0000:00:02.0: GPU HANG: ecode 8:1:0xfffffffe, in gnome-shell [938], hang on rcs0
[   57.985479] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   57.985479] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   57.985479] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   57.985480] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   57.985480] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   57.986489] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[…]
```

Booting with `intel_iommu=igfx_off` fixes this [2].

This seems to be a common problem [3][4][5][6].

[1]: https://bugs.debian.org/934309
[2]: https://bugs.kali.org/view.php?id=5644
[3]: https://bbs.archlinux.org/viewtopic.php?id=230362
[4]: https://bugzilla.kernel.org/show_bug.cgi?id=202723
[5]: https://bugs.freedesktop.org/show_bug.cgi?id=109219
[6]: https://bugs.freedesktop.org/show_bug.cgi?id=89360
[4]: https://bugs.freedesktop.org/show_bug.cgi?id=103076
Comment 1 Paul Menzel 2019-08-21 10:16:43 UTC
I also reported this in the Debian Bug Tracking System [1].

[1]: https://bugs.debian.org/935270
Comment 2 Lakshmi 2019-08-22 11:14:39 UTC
> [   57.985480] [drm] GPU crash dump saved to /sys/class/drm/card0/error
> [   57.986489] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
> […]
Can you please attach the error log? How often hang occurs?
Comment 3 Paul Menzel 2019-08-22 14:23:51 UTC
(In reply to Lakshmi from comment #2)
> > [   57.985480] [drm] GPU crash dump saved to /sys/class/drm/card0/error
> > [   57.986489] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
> > […]
> Can you please attach the error log? How often hang occurs?

I do not have it, as the system froze before I could save it. I won’t reproduce the issue, as the crashes caused data loss on my system already.
Comment 4 Chris Wilson 2019-09-13 09:49:03 UTC
Quirked out of existence; should land in the v5.4 iommu merge and hopefully percolate back.
Comment 5 Paul Menzel 2019-09-13 09:50:52 UTC
Could you please point to some more information or explain here, why i915 IOMMU is a problem on Broadwell devices?


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.