Bug 95634

Summary: [BAT BYT] "HW access outside of RPM atomic section" on driver unload
Product: DRI Reporter: Tvrtko Ursulin <tvrtko.ursulin>
Component: DRM/IntelAssignee: Chris Wilson <chris>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: critical    
Priority: highest CC: intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: BYT i915 features: GEM/Other

Description Tvrtko Ursulin 2016-05-24 10:36:28 UTC
/archive/results/CI_IGT_test/RO_Patchwork_981/ro-byt-n2820/html/ro-byt-n2820@RO_Patchwork_981@1/igt@drv_module_reload_basic.html

[  191.826185] ------------[ cut here ]------------
[  191.826342] WARNING: CPU: 1 PID: 5728 at drivers/gpu/drm/i915/intel_drv.h:1579 gen6_ggtt_insert_entries+0x244/0x270 [i915]
[  191.826354] HW access outside of RPM atomic section
[  191.826363] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915(-) intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul snd_hda_codec ghash_clmulni_intel i2c_algo_bit snd_hwdep drm_kms_helper snd_hda_core syscopyarea lpc_ich sysfillrect sysimgblt fb_sys_fops snd_pcm drm i2c_hid i2c_designware_platform i2c_designware_core r8169 mii sdhci_acpi sdhci mmc_core [last unloaded: snd_hda_intel]
[  191.826529] CPU: 1 PID: 5728 Comm: rmmod Tainted: G     U          4.6.0-gfxbench-RO_Patchwork_981+ #1
[  191.826542] Hardware name: \xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff \xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff/DN2820FYK, BIOS FYBYT10H.86A.0053.2016.0205.1707 02/05/2016
[  191.826553]  0000000000000000 ffff8800b57f39c0 ffffffff81409665 ffff8800b57f3a10
[  191.826579]  0000000000000000 ffff8800b57f3a00 ffffffff81079da6 0000062b00000292
[  191.826602]  ffff880036550000 0000000000000000 0000000000000001 0000000000000000
[  191.826626] Call Trace:
[  191.826647]  [<ffffffff81409665>] dump_stack+0x67/0x92
[  191.826665]  [<ffffffff81079da6>] __warn+0xc6/0xe0
[  191.826680]  [<ffffffff81079e0a>] warn_slowpath_fmt+0x4a/0x50
[  191.826809]  [<ffffffffa0296a94>] gen6_ggtt_insert_entries+0x244/0x270 [i915]
[  191.826936]  [<ffffffffa029a5b0>] aliasing_gtt_bind_vma+0xc0/0xd0 [i915]
[  191.827060]  [<ffffffffa029be1d>] i915_vma_bind+0xed/0x260 [i915]
[  191.827206]  [<ffffffffa02a35e5>] i915_gem_object_do_pin+0x835/0xad0 [i915]
[  191.827345]  [<ffffffffa02a38a8>] i915_gem_object_pin+0x28/0x30 [i915]
[  191.827489]  [<ffffffffa02a6174>] i915_gem_render_state_prepare+0x94/0x350 [i915]
[  191.827629]  [<ffffffffa02a644b>] i915_gem_render_state_init+0x1b/0xc0 [i915]
[  191.827772]  [<ffffffffa02ba13b>] intel_rcs_ctx_init+0x2b/0x160 [i915]
[  191.827902]  [<ffffffffa028e747>] i915_switch_context+0x447/0xdf0 [i915]
[  191.828043]  [<ffffffffa02a054c>] i915_gpu_idle+0x2c/0xa0 [i915]
[  191.828183]  [<ffffffffa02a3afa>] i915_gem_suspend+0x2a/0x100 [i915]
[  191.828332]  [<ffffffffa0329a7d>] i915_driver_unload+0x1d/0x1b0 [i915]
[  191.828412]  [<ffffffffa010d744>] drm_dev_unregister+0x24/0xa0 [drm]
[  191.828476]  [<ffffffffa010dd1e>] drm_put_dev+0x1e/0x50 [drm]
[  191.828588]  [<ffffffffa0263300>] i915_pci_remove+0x10/0x20 [i915]
[  191.828604]  [<ffffffff814525e4>] pci_device_remove+0x34/0xb0
[  191.828622]  [<ffffffff81513f2c>] __device_release_driver+0x9c/0x150
[  191.828639]  [<ffffffff81514af6>] driver_detach+0xb6/0xc0
[  191.828656]  [<ffffffff81513943>] bus_remove_driver+0x53/0xd0
[  191.828672]  [<ffffffff815155b7>] driver_unregister+0x27/0x50
[  191.828703]  [<ffffffff81451665>] pci_unregister_driver+0x25/0x70
[  191.828780]  [<ffffffffa010f3e2>] drm_pci_exit+0x72/0x90 [drm]
[  191.828926]  [<ffffffffa032a256>] i915_exit+0x20/0x1c8 [i915]
[  191.828953]  [<ffffffff8110b209>] SyS_delete_module+0x199/0x1f0
[  191.828980]  [<ffffffff817a1e69>] entry_SYSCALL_64_fastpath+0x1c/0xac
[  191.829097] ---[ end trace 8b7715bd709caa3b ]---
Comment 1 Chris Wilson 2016-05-24 10:52:32 UTC
Oh, fun.

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f78d4ca..981d279 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3428,6 +3428,9 @@ int i915_gpu_idle(struct drm_device *dev)
 
        /* Flush everything onto the inactive list. */
        for_each_engine(engine, dev_priv) {
+               if (engine->last_context == NULL)
+                       continue;
+
                if (!i915.enable_execlists) {
                        struct drm_i915_gem_request *req;
Comment 2 Chris Wilson 2016-05-24 12:00:23 UTC
So this would catch the straightforward load/unload, it would not if any work was done in between that left the default context uninitialised.

We can either flag it as initialised and never load the golden render state for the default context (which is valid as it should never be executed itself, and the render state should be applied to all other contexts on first use), or we go full monty and follow my planned route of making suspend lockless and not emitting a batch here at all for !execlists: https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=tasklet&id=f8fff5e32923a56ea5ce73a4ae68968bc60b70c8

First two steps are pretty simple though...
Comment 3 Mika Kuoppala 2016-09-01 08:16:28 UTC
commit c7c3c07d16dd51faddeb6ae665d360be030b31b0
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jun 24 14:55:54 2016 +0100

    drm/i915: Treat kernel context as initialised

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.