Bug 112385

Summary: [CI][BAT]igt@i915_selftest@live_gem_contexts - incomplete - Unexpected fault, Addr: 0x00000000_3ffc9000, Address space: PPGTT
Product: DRI Reporter: Lakshmi <lakshminarayana.vudum>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: RESOLVED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: not set    
Priority: not set CC: intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: BSW/CHT i915 features: GEM/Other

Description Lakshmi 2019-11-25 10:19:52 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5304/fi-bsw-n3050/igt@i915_selftest@live_gem_contexts.html
6> [393.670167] [IGT] i915_selftest: executing
<6> [393.683224] [IGT] i915_selftest: starting subtest live_gem_contexts
<5> [393.839767] Setting dangerous option live_selftests - tainting kernel
<7> [393.905682] i915 0000:00:02.0: [drm:intel_uc_init_early [i915]] enable_guc=0 (guc:no submission:no huc:no)
<7> [393.905822] [drm:intel_detect_pch [i915]] No PCH found.
<7> [393.905956] [drm:intel_power_domains_init [i915]] Allowed DC state mask 00
<7> [393.906379] [drm:intel_gt_check_and_clear_faults [i915]] Unexpected fault
	Addr: 0x00000000_7ffc9000
	Address space: PPGTT
	Engine ID: 0
	Source ID: 25
	Type: 0
<7> [393.907114] [drm:i915_ggtt_probe_hw [i915]] GGTT size = 2048M
<7> [393.907247] [drm:i915_ggtt_probe_hw [i915]] GMADR size = 128M
<7> [393.907378] [drm:i915_ggtt_probe_hw [i915]] DSM size = 32M
<6> [393.907700] i915 0000:00:02.0: vgaarb: deactivate vga console
<7> [393.908192] [drm:init_stolen [i915]] GEN6_STOLEN_RESERVED = 7ed00007
<7> [393.908325] [drm:init_stolen [i915]] Memory reserved for graphics device: 32768K, usable: 31744K
<7> [393.910120] [drm:intel_gvt_init [i915]] GVT-g is disabled by kernel params
<7> [393.910256] [drm:intel_opregion_setup [i915]] graphic opregion physical addr: 0x7b46f000
<7> [393.911130] [drm:intel_opregion_setup [i915]] ACPI OpRegion version 2.0.0
<7> [393.911265] [drm:intel_opregion_setup [i915]] Public ACPI methods supported
<7> [393.911396] [drm:intel_opregion_setup [i915]] ASLE supported
<7> [393.911819] [drm:intel_opregion_setup [i915]] Found valid VBT in ACPI OpRegion (Mailbox #4)
<7> [393.912073] [drm:intel_bios_init [i915]] Skipping VBT init due to disabled display.
<7> [393.912361] [drm:intel_dsm_detect [i915]] no _DSM method for intel device
<7> [393.912926] [drm:intel_power_domains_init_hw [i915]] rawclk rate: 200000 kHz
<7> [393.916555] [drm:intel_power_domains_init_hw [i915]] Initial PHY_CONTROL=0x050007ff
<7> [393.916729] [drm:intel_power_well_enable [i915]] enabling always-on
<7> [393.916862] [drm:intel_power_well_enable [i915]] enabling display
<7> [393.917021] [drm:intel_power_well_enable [i915]] enabling dpio-common-bc
<7> [393.917213] [drm:chv_dpio_cmn_power_well_enable [i915]] Enabled DPIO PHY0 (PHY_CONTROL=0x050007ff)
<7> [393.917367] [drm:intel_power_well_enable [i915]] enabling dpio-common-d
<7> [393.919777] [drm:chv_dpio_cmn_power_well_enable [i915]] Enabled DPIO PHY1 (PHY_CONTROL=0x050007ff)
<7> [393.924023] [drm:intel_fbc_init [i915]] Sanitized enable_fbc value: 0
<7> [393.924162] [drm:intel_modeset_init [i915]] 3 display pipes available.
<7> [393.924305] [drm:intel_modeset_init [i915]] CZ clock rate: 320000 kHz
<7> [393.924812] [drm:intel_dump_cdclk_state [i915]] Current CDCLK 200000 kHz, VCO 1600000 kHz, ref 0 kHz, bypass 0 kHz, voltage level 15
<7> [393.925097] [drm:intel_update_max_cdclk [i915]] Max CD clock rate: 320000 kHz
<7> [393.925231] [drm:intel_modeset_init [i915]] Max dotclock rate: 304000 kHz
<7> [393.934482] [drm:vlv_wm_get_hw_state [i915]] Punit not acking DDR DVFS request, assuming DDR DVFS is disabled
<7> [393.934627] [drm:intel_modeset_setup_hw_state [i915]] Initial watermarks: SR plane=127, SR cursor=0 level=1 cxsr=0
<7> [393.935080] [drm:i915_init_ggtt [i915]] clearing unused GTT space: [1000, 80000000]
<7> [393.957325] [drm:intel_rps_init [i915]] GPLL reference freq: 40000 kHz
<7> [393.962830] [drm:intel_rps_init [i915]] DDR speed: 1600 MHz
<7> [393.962963] [drm:intel_rps_init [i915]] max GPU freq: 600 MHz (60)
<7> [393.963091] [drm:intel_rps_init [i915]] RPe GPU freq: 320 MHz (32)
<7> [393.963221] [drm:intel_rps_init [i915]] RP1(Guar) GPU freq: 320 MHz (32)
<7> [393.963349] [drm:intel_rps_init [i915]] min GPU freq: 200 MHz (20)
<7> [393.964092] [drm:__intel_engine_init_ctx_wa [i915]] Initialized 9 context workarounds on rcs'0
<7> [393.965168] [drm:i915_gem_init_contexts [i915]] logical context support initialized
<7> [393.968456] [drm:intel_rps_enable [i915]] GPLL enabled? yes
<7> [393.968588] [drm:intel_rps_enable [i915]] GPU status: 0x00001410
<7> [393.973748] [drm:intel_engines_driver_register [i915]] renamed rcs'0 to rcs0
<7> [393.973878] [drm:intel_engines_driver_register [i915]] renamed bcs'0 to bcs0
<7> [393.974000] [drm:intel_engines_driver_register [i915]] renamed vcs'0 to vcs0
<7> [393.974121] [drm:intel_engines_driver_register [i915]] renamed vecs'0 to vecs0
<6> [393.977107] [drm] Initialized i915 1.6.0 20191101 for 0000:00:02.0 on minor 0
<7> [393.979447] [drm:intel_power_well_disable [i915]] disabling dpio-common-d
<7> [393.979782] [drm:chv_dpio_cmn_power_well_disable [i915]] Disabled DPIO PHY1 (PHY_CONTROL=0x050007fd)
<7> [393.979937] [drm:intel_power_well_disable [i915]] disabling dpio-common-bc
<7> [393.980094] [drm:chv_dpio_cmn_power_well_disable [i915]] Disabled DPIO PHY0 (PHY_CONTROL=0x050007fc)
<7> [393.980244] [drm:intel_power_well_disable [i915]] disabling display
<7> [393.981034] [drm:intel_power_well_disable [i915]] disabling always-on
<7> [393.981081] i915 device info: pciid=0x22b1 rev=0x21 platform=CHERRYVIEW (subplatform=0x0) gen=8
<7> [393.981087] i915 device info: is_mobile: no
<7> [393.981091] i915 device info: is_lp: yes
<7> [393.981095] i915 device info: require_force_probe: no
<7> [393.981100] i915 device info: is_dgfx: no
<7> [393.981104] i915 device info: has_64bit_reloc: yes
<7> [393.981108] i915 device info: gpu_reset_clobbers_display: no
<7> [393.981113] i915 device info: has_reset_engine: yes
<7> [393.981117] i915 device info: has_fpga_dbg: no
<7> [393.981121] i915 device info: has_global_mocs: no
<7> [393.981125] i915 device info: has_gt_uc: no
<7> [393.981130] i915 device info: has_l3_dpf: no
<7> [393.981134] i915 device info: has_llc: no
<7> [393.981138] i915 device info: has_logical_ring_contexts: yes
<7> [393.981142] i915 device info: has_logical_ring_elsq: no
<7> [393.981147] i915 device info: has_logical_ring_preemption: no
<7> [393.981151] i915 device info: has_pooled_eu: no
<7> [393.981155] i915 device info: has_rc6: yes
<7> [393.981160] i915 device info: has_rc6p: no
<7> [393.981164] i915 device info: has_rps: yes
<7> [393.981168] i915 device info: has_runtime_pm: yes
<7> [393.981172] i915 device info: has_snoop: yes
<7> [393.981177] i915 device info: has_coherent_ggtt: no
<7> [393.986217] i915 device info: unfenced_needs_alignment: no
<7> [393.986222] i915 device info: hws_needs_physical: no
<7> [393.986227] i915 device info: cursor_needs_physical: no
<7> [393.986231] i915 device info: has_csr: no
<7> [393.986236] i915 device info: has_ddi: no
<7> [393.986240] i915 device info: has_dp_mst: no
<7> [393.986244] i915 device info: has_dsb: no
<7> [393.986248] i915 device info: has_dsc: no
<7> [393.986253] i915 device info: has_fbc: no
<7> [393.986257] i915 device info: has_gmch: yes
<7> [393.986261] i915 device info: has_hdcp: no
<7> [393.986265] i915 device info: has_hotplug: yes
<7> [393.986270] i915 device info: has_ipc: no
<7> [393.986274] i915 device info: has_modular_fia: no
<7> [393.986887] i915 device info: has_overlay: no
<7> [393.986894] i915 device info: has_psr: no
<7> [393.986898] i915 device info: overlay_needs_physical: no
<7> [393.986903] i915 device info: supports_tv: no
<7> [393.986908] i915 device info: slice total: 1, mask=0001
<7> [393.986913] i915 device info: subslice total: 2
<7> [393.986918] i915 device info: slice0: 2 subslices, mask=00000003
<7> [393.986923] i915 device info: EU total: 12
<7> [393.986927] i915 device info: EU per subslice: 6
<7> [393.986931] i915 device info: has slice power gating: no
<7> [393.986936] i915 device info: has subslice power gating: yes
<7> [393.986940] i915 device info: has EU power gating: yes
<7> [393.986945] i915 device info: CS timestamp frequency: 12500 kHz
<6> [393.986949] [drm] DRM_I915_DEBUG enabled
<6> [393.986952] [drm] DRM_I915_DEBUG_GEM enabled
<6> [393.986956] [drm] DRM_I915_DEBUG_RUNTIME_PM enabled
<6> [393.986965] i915: Performing live selftests with st_random_seed=0x946c6d97 st_timeout=500
<6> [393.986969] i915: Running gem_contexts
<6> [393.987003] i915: Running i915_gem_context_live_selftests/live_nop_switch
<7> [394.144984] [drm:intel_power_well_enable [i915]] enabling always-on
<6> [394.903277] Populated 1024 contexts on rcs0 in 759294125ns
<7> [394.978090] [drm:intel_power_well_disable [i915]] disabling always-on
<7> [394.978400] [drm:intel_power_well_enable [i915]] enabling always-on
<7> [395.511674] [drm:intel_power_well_disable [i915]] disabling always-on
<6> [395.511755] Switch latencies on rcs0: 1 = 1023350ns, 228 = 108058ns
<7> [395.512855] [drm:intel_power_well_enable [i915]] enabling always-on
<6> [396.099240] Populated 1024 contexts on bcs0 in 587472437ns
<7> [396.174371] [drm:intel_power_well_disable [i915]] disabling always-on
<7> [396.174733] [drm:intel_power_well_enable [i915]] enabling always-on
<7> [396.714298] [drm:intel_power_well_disable [i915]] disabling always-on
<6> [396.714409] Switch latencies on bcs0: 1 = 495987ns, 240 = 88514ns
<7> [396.715194] [drm:intel_power_well_enable [i915]] enabling always-on
<6> [397.323438] Populated 1024 contexts on vcs0 in 609016988ns
<7> [397.399647] [drm:intel_power_well_disable [i915]] disabling always-on
<7> [397.399951] [drm:intel_power_well_enable [i915]] enabling always-on
<7> [397.918956] [drm:intel_power_well_disable [i915]] disabling always-on
<6> [397.919036] Switch latencies on vcs0: 1 = 978675ns, 238 = 88783ns
<7> [397.920226] [drm:intel_power_well_enable [i915]] enabling always-on
<6> [398.552979] Populated 1024 contexts on vecs0 in 633930325ns
<7> [398.629159] [drm:intel_power_well_disable [i915]] disabling always-on
<7> [398.629804] [drm:intel_power_well_enable [i915]] enabling always-on
<6> [399.167359] Switch latencies on vecs0: 1 = 925175ns, 330 = 48578ns
<7> [399.167701] [drm:intel_power_well_disable [i915]] disabling always-on
<6> [399.167814] i915: Running i915_gem_context_live_selftests/live_parallel_switch
<7> [399.170038] [drm:intel_power_well_enable [i915]] enabling always-on
<6> [399.682820] rcs0: 1108 switches (sync)
<6> [399.683629] bcs0: 1641 switches (sync)
<6> [399.685423] vcs0: 1584 switches (sync)
<6> [399.693380] vecs0: 876 switches (sync)
<1> [399.758464] BUG: unable to handle page fault for address: 0000000000004e50
<1> [399.758475] #PF: supervisor read access in kernel mode
<1> [399.758480] #PF: error_code(0x0000) - not-present page
<6> [399.758484] PGD 0 P4D 0 
<4> [399.758491] Oops: 0000 [#1] PREEMPT SMP PTI
<4> [399.758498] CPU: 0 PID: 4281 Comm: igt/parallel:rc Tainted: G     U            5.4.0-rc8-CI-CI_DRM_7411+ #1
<4> [399.758502] Hardware name:  /NUC5CPYB, BIOS PYBSWCEL.86A.0058.2016.1102.1842 11/02/2016
<4> [399.758680] RIP: 0010:__i915_schedule+0x94d/0xba0 [i915]
Comment 1 CI Bug Log 2019-11-25 10:22:11 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* BSW: igt@i915_selftest@live_gem_contexts - incomplete - Unexpected fault, Addr: 0x00000000_3ffc9000, Address space: PPGTT
  (No new failures associated)
Comment 2 Chris Wilson 2019-11-25 10:26:04 UTC
The ppGTT fault is worrying, and despite  my initial concern, does not lead to the incomplete.

commit 331bf90591573dfe6c8e892239713ef9702f1396
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Nov 25 09:43:18 2019 +0000

    drm/i915/gt: Mark the execlists->active as the primary volatile access
    
    Since we want to do a lockless read of the current active request, and
    that request is written to by process_csb also without serialisation, we
    need to instruct gcc to take care in reading the pointer itself.
    
    Otherwise, we have observed execlists_active() to report 0x40.
    
    [ 2400.760381] igt/para-4098    1..s. 2376479300us : process_csb: rcs0 cs-irq head=3, tail=4
    [ 2400.760826] igt/para-4098    1..s. 2376479303us : process_csb: rcs0 csb[4]: status=0x00000001:0x00000000
    [ 2400.761271] igt/para-4098    1..s. 2376479306us : trace_ports: rcs0: promote { b9c59:2622, b9c55:2624 }
    [ 2400.761726] igt/para-4097    0d... 2376479311us : __i915_schedule: rcs0: -2147483648->3, inflight:0000000000000040, rq:ffff888208c1e940
    
    which is impossible!
    
    The answer is that as we keep the existing execlists->active pointing
    into the array as we copy over that array, the unserialised read may see
    a partial pointer value.
    
    Fixes: df403069029d ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20191125094318.1630806-1-chris@chris-wilson.co.uk

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.