Hi My dmesg is now filled with: DMAR: DRHD: handling fault status reg 3 DMAR: DMAR:[DMA Read] Request device [00:02.0] fault addr fbff0000 DMAR:[fault reason 06] PTE Read access is not set It seems to only happen when before X starts and during shutdown I've bisected it down to: 975f7ff42edfbad53d65ad63a4f3e7ada8c7538f is the first bad commit commit 975f7ff42edfbad53d65ad63a4f3e7ada8c7538f Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat May 14 07:26:34 2016 +0100 drm/i915: Lazily migrate the objects after hibernation Now that we mark the object domains for having been restored from the hibernation image, we not need to flush everything during resume and can instead rely on the normal domain tracking to flush only when required. The only caveat here are objects that are pinned for use by the hardware, whose contents must be coherent for when the device resumes reading from then (shortly afterwards with the driver assuming the objects are in the correct domain). References: https://bugs.freedesktop.org/show_bug.cgi?id=94722 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Imre Deak <imre.deak@intel.com> Cc: David Weinehall <david.weinehall@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Tested-by: David Weinehall <david.weinehall@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1463207195-22076-3-git-send-email-chris@chris-wilson.co.uk :040000 040000 336a603f6bd03d205632a4e131f771638c8b65b0 91bebf7c1f376c76ef92e6e1a7ddc14d674378ee M drivers
I'm starting to think I've made a mistake bisecting as reverting that commit doesn't seem to fix things for me
Created attachment 124596 [details] dmesg
Sorry I made a mistake in the last part of the bisect The first broken commit is: commit f7770bfd9fd2ef13a5b70de1ffbc16019a929b48 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat May 14 07:26:35 2016 +0100 drm/i915: Skip clearing the GGTT on full-ppgtt systems Under full-ppgtt, access to the global GTT is carefully regulated through hardware functions (i.e. userspace cannot read and write to arbitrary locations in the GGTT via the GPU). With this restriction in place, we can forgo clearing stale entries from the GGTT as they will not be accessed. For aliasing-ppgtt, we could almost do the same except that we do allow userspace access to the global-GTT via execbuf in order to workraound some quirks of certain instructions. (This execbuf path is filtered out with EINVAL on full-ppgtt.) The most dramatic effect this will have will be during resume, as with full-ppgtt the GGTT is only used sparingly. References: https://bugs.freedesktop.org/show_bug.cgi?id=94722 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: David Weinehall <david.weinehall@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Tested-by: David Weinehall <david.weinehall@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1463207195-22076-4-git-send-email-chris@chris-wilson.co.uk
That looks more like a tell-tale about forgetting to rebind an object (or we are now catching an out-of-bounds access). Hardware? Skylake?
Yes it's skylake: 00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 530 [8086:191b] (rev 06)
Also reverting that commit makes the errors go away
Hmm, there are also lots of caveats to mixing DMAR and Skylake igfx that we need to investigate to see which require driver workarounds.
So I don't forget the public link: http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/desktop-6th-gen-core-family-spec-update.pdf
In particular, I think this matches SKL036 ("Processor Graphics IOMMU Unit May Report Spurious Faults") since we are not marking the PTE as present at all. If we don't see similar failures across platforms (e.g. broadwell with execlists/full-ppgtt), then it is safe to assume this is a Skylake problem.
Is there anything else you'd like me to do?
Created attachment 124675 [details] [review] More skylake w/a Will be the patch, with more or less models.
What kernel is that patch supposed to apply against?
Created attachment 124839 [details] [review] Chris'patch rebased vs drm-intel-nightly head
Mike, you can apply this patch using last drm-intel-nightly kernel
Do you have the url to clone for that? I currently build linus's tree, Dave's drm-next and Alex's drm-next-4.8-wip branch
https://cgit.freedesktop.org/drm-intel/ you can then: git clone git://anongit.freedesktop.org/drm-intel and then, for instance, checkout drm-intel-nightly
That does indeed fix the issue Thanks
Patch send for review https://patchwork.freedesktop.org/patch/95067/
commit 48f112fed3b07858f1b3a78548d23320fb96747b Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jun 24 14:07:14 2016 +0100 drm/i915: Fill unused GGTT with scratch pages for VT-d One of the numerous VT-d workarounds we require is that the display hardware reads past the end of the buffer triggering VT-d faults. This is acknowledged in the code as being safe "since we fill the unused portions of the GGTT with the scratch page". Alas, that is no longer always true and so we trigger DMAR read faults. Skylake also requires another workaround to avoid mixing VT-d and unpopulated PTE, and so there we also need to ensure we fill unused entries with the scratch page. Reported-by: Mike Lothian <mike@fireburn.co.uk> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96584 Fixes: f7770bfd9fd2 ("drm/i915: Skip clearing the GGTT on full-ppgtt systems") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: David Weinehall <david.weinehall@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1466773634-8106-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: David Weinehall <david.weinehall@intel.com>
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.