Created attachment 139316 [details]
I guess it must be something with detection of detection of monitors.
Created attachment 139317 [details]
Created attachment 139318 [details]
It appears that 5s after queuing the initial requests, we haven't even submitted them to HW. Quite distressing!
I see you are running with tip, could you please enable CONFIG_DRM_I915_TRACE_GEM=y and apply something like:
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index df234dc23274..6207bc35a53d 100644
@@ -1840,6 +1840,8 @@ void i915_capture_error_state(struct drm_i915_private *i915,
i915_error_capture_msg(i915, error, engine_mask, error_msg);
Also drm.debug=0xf (everything!) may help try to determine the delay.
Created attachment 139332 [details]
Kernel panic, so i had to do netconsole.
Created attachment 139335 [details]
sorry, i added drm.debug
It should have dumped the trace to netconsole as well. Could you check the console settings?
So it appears that we wake with a ending CS interrupt before we do anything, or that the CS interrupt is too early. Common suspect in this case is IOMMU, could you try intel_iommu=igfx_off?
Created attachment 139336 [details]
When i turned CONFIG_DRM_I915_TRACE_GEM, it on different path. So its not calling GEM_TRACE_DUMP().
I guess now its calling GEM_BUG_ON(), and not i915_capture_error_state().
GEM_BUG_ON() includes a GEM_TRACE_DUMP; I expect it to show up here :)
Another quick test is:
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 9f3cce022b2d..ff179c967e2a 100644
@@ -1738,6 +1738,8 @@ static void enable_execlists(struct intel_engine_cs *engine)
/* Following the reset, we need to reload the CSB read/write pointers */
engine->execlists.csb_head = -1;
Created attachment 139340 [details]
With line added. I guess ftrace works now.
In the intel_iommu=igfx_off test, we got
[ 0.000000] Linux version 4.17.0-rc3+ (firstname.lastname@example.org) (gcc version 7.3.0 (GCC)) #5 SMP PREEMPT Fri May 4 08:44:22 CEST 2018
[ 0.000000] Command line: \\k.efi rw drm.debug=0xf intel_iommu=igfx_off initrd=\i.img
[ 0.258015] DMAR: No ATSR found
[ 0.258097] DMAR: dmar0: Using Queued invalidation
[ 0.258107] DMAR: dmar1: Using Queued invalidation
[ 0.258195] DMAR: Setting RMRR:
[ 0.258282] DMAR: Setting identity map for device 0000:00:02.0 [0x5f800000 - 0x7fffffff]
[ 207.804629] [drm] VT-d active for gfx access
Odd. So we didn't succeed in disabling iommu. Do you mind compiling out iommu entirely to be sure we don't have a problem here with iommu+HWSP?
Created attachment 139341 [details]
netconsole take 4
Now its a bit different.
Can you please try re-enabling iommu and https://patchwork.freedesktop.org/patch/221513/ ?
(In reply to Chris Wilson from comment #14)
> Can you please try re-enabling iommu and
> https://patchwork.freedesktop.org/patch/221513/ ?
No it doesnt help,
GPU HANG: ecode 9:0:0xfffffffe, reason: hang on rcs0, bcs0, vcs0, vecs0, action: reset
Do you need debug enabled and console output ?
If you have any other ideas or extra debug flags let me know.
Created attachment 139587 [details]
tried on latest drm-tip
Yes, please send dmesg with drm.debug=0x1e log_buf_len=4M.
Created attachment 139590 [details]
used drm.debug=0x1e log_buf_len=4M
Chris, any updates on this issue?
Domen, are you still experiencing this issue?
Having purchased a glk (celeron N4100) for myself... This looks like to be an isolated incident (well this and the other glk-iommu reported issues!), as annoyingly it worksforme. I was hoping to able to reproduce, sorry.
Sorry, we dont use this board anymore. So i dont know if issue still persits.
Apologies for the disappointing end.