Bug 108326 - [BXT] BUG / system hang when reading i915 debugfs entries with VT-d/IOMMU enabled
Summary: [BXT] BUG / system hang when reading i915 debugfs entries with VT-d/IOMMU en...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged, ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-10-11 10:55 UTC by Eero Tamminen
Modified: 2019-02-12 13:38 UTC (History)
2 users (show)

See Also:
i915 platform: BXT
i915 features: GPU hang


Attachments

Description Eero Tamminen 2018-10-11 10:55:40 UTC
Setup:
- BXT / APL HW (e.g. J3455, J4205, A3960)
- git version of drm-tip kernel
- VT-d enabled in BIOS (normally it's enabled by default)
- IOMMU not disabled on kernel command line (no "intel_iommu=igfx_off" option)

Use-case:
- cd /sys/kernel/debug/dri/0/
- head *

Expected outcome:
- entries shown, like happens when VT-d / IOMMU is disabled

Actual outcome:
- System hangs after following console output (and sometimes backtrace):
[   49.565898] BUG: scheduling while atomic: migration/0/11/0x00000002
[   49.572983] Preemption disabled at:

Notes:
* I haven't seen this on any other (GEN7-GEN9) HW, only on BXT/APL
* Most distros seems to mount debugfs as user readable, so this is a local DOS (security) issue
* I don't think this is a regression as it has been there at least for a year (when I filed an internal ticket about this, which was now closed with a request to file this to FDO instead)
Comment 1 Chris Wilson 2018-10-11 11:04:15 UTC
https://patchwork.freedesktop.org/series/34969/
Comment 2 Eero Tamminen 2018-10-11 11:31:22 UTC
(In reply to Chris Wilson from comment #1)
> https://patchwork.freedesktop.org/series/34969/

Against which drm-tip commit I should apply this?

Current tip gives this on our build server:
------------------------------------
patching file drivers/gpu/drm/i915/i915_drv.h
Hunk #1 FAILED at 3990.
Hunk #2 FAILED at 4002.
2 out of 2 hunks FAILED -- saving rejects to file drivers/gpu/drm/i915/i915_drv.h.rej
patching file drivers/gpu/drm/i915/i915_gem_gtt.c
Hunk #1 FAILED at 3373.
1 out of 1 hunk FAILED -- saving rejects to file drivers/gpu/drm/i915/i915_gem_gtt.c.rej
patching file drivers/gpu/drm/i915/i915_gpu_error.c
Hunk #1 succeeded at 648 (offset 15 lines).
Hunk #2 succeeded at 1870 with fuzz 1 (offset 48 lines).
Hunk #3 succeeded at 1926 (offset 48 lines).
------------------------------------
Comment 3 Chris Wilson 2018-11-19 17:21:59 UTC
commit fb6f0b64e455b207a636346588e65bf9598d30eb (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Nov 2 16:12:12 2018 +0000

    drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
    
    Since capturing the error state requires fiddling around with the GGTT
    to read arbitrary buffers and is itself run under stop_machine(), it
    deadlocks the machine (effectively a hard hang) when run in conjunction
    with Broxton's VTd workaround to serialize GGTT access.
    
    v2: Store the ERR_PTR in first_error so that the error can be reported
    to the user via sysfs.
    v3: Mention the quirk in dmesg (using info as per usual)
    
    Fixes: 0ef34ad6222a ("drm/i915: Serialize GTT/Aperture accesses on BXT")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Jon Bloomfield <jon.bloomfield@intel.com>
    Cc: John Harrison <john.C.Harrison@intel.com>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
    Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20181102161232.17742-5-chris@chris-wilson.co.uk
Comment 4 Francesco Balestrieri 2018-12-28 09:09:28 UTC
Eero, is the issue resolved for you?
Comment 5 Eero Tamminen 2018-12-31 11:59:06 UTC
(In reply to Francesco Balestrieri from comment #4)
> Eero, is the issue resolved for you?

Tested on J4205, no GPU hang -> verified:
-----------------------------------------
# dmesg | grep -i iommu
[    0.505602] DMAR-IR: IOAPIC id 1 under DRHD base  0xfed65000 IOMMU 1
[    0.846760] iommu: Adding device 0000:00:00.0 to group 0
...
# cd /sys/kernel/debug/dri/0/
# head i915_* > /dev/null
head: error reading '/sys/kernel/debug/dri/0/i915_cache_sharing': No such device
head: error reading '/sys/kernel/debug/dri/0/i915_drrs_ctl': Permission denied
head: error reading '/sys/kernel/debug/dri/0/i915_edp_psr_debug': No such device
head: error reading '/sys/kernel/debug/dri/0/i915_emon_status': No such device
head: cannot open '/sys/kernel/debug/dri/0/i915_error_state' for reading: No such device
head: error reading '/sys/kernel/debug/dri/0/i915_fbc_false_color': No such device
head: error reading '/sys/kernel/debug/dri/0/i915_fbc_status': No such device
head: error reading '/sys/kernel/debug/dri/0/i915_fifo_underrun_reset': Invalid argument
head: error reading '/sys/kernel/debug/dri/0/i915_forcewake_user': Invalid argument
head: cannot open '/sys/kernel/debug/dri/0/i915_gpu_info' for reading: No such device
head: error reading '/sys/kernel/debug/dri/0/i915_guc_info': No such device
head: error reading '/sys/kernel/debug/dri/0/i915_guc_log_level': No such device
head: cannot open '/sys/kernel/debug/dri/0/i915_guc_log_relay' for reading: No such device
head: error reading '/sys/kernel/debug/dri/0/i915_guc_stage_pool': No such device
head: error reading '/sys/kernel/debug/dri/0/i915_ips_status': No such device
head: error reading '/sys/kernel/debug/dri/0/i915_ring_freq_table': No such device
-----------------------------------------

Btw. I don't understand why i915 driver provides non-working debug info file entries (I remember e.g. "i915_ring_freq_table" working earlier).


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.