Created attachment 139316 [details] /sys/class/drm/card0/error I guess it must be something with detection of detection of monitors.
Created attachment 139317 [details] dmesg output
Created attachment 139318 [details] xrandr output
It appears that 5s after queuing the initial requests, we haven't even submitted them to HW. Quite distressing! I see you are running with tip, could you please enable CONFIG_DRM_I915_TRACE_GEM=y and apply something like: diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index df234dc23274..6207bc35a53d 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1840,6 +1840,8 @@ void i915_capture_error_state(struct drm_i915_private *i915, return; } + GEM_TRACE_DUMP(); + i915_error_capture_msg(i915, error, engine_mask, error_msg); DRM_INFO("%s\n", error->error_msg);
Also drm.debug=0xf (everything!) may help try to determine the delay.
Created attachment 139332 [details] netconsole Kernel panic, so i had to do netconsole.
Created attachment 139335 [details] netconsole1 sorry, i added drm.debug
It should have dumped the trace to netconsole as well. Could you check the console settings? So it appears that we wake with a ending CS interrupt before we do anything, or that the CS interrupt is too early. Common suspect in this case is IOMMU, could you try intel_iommu=igfx_off?
Created attachment 139336 [details] netconsole When i turned CONFIG_DRM_I915_TRACE_GEM, it on different path. So its not calling GEM_TRACE_DUMP(). I guess now its calling GEM_BUG_ON(), and not i915_capture_error_state().
GEM_BUG_ON() includes a GEM_TRACE_DUMP; I expect it to show up here :)
Another quick test is: diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 9f3cce022b2d..ff179c967e2a 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1738,6 +1738,8 @@ static void enable_execlists(struct intel_engine_cs *engine) engine->status_page.ggtt_offset); POSTING_READ(RING_HWS_PGA(engine->mmio_base)); + clear_gtiir(engine); + /* Following the reset, we need to reload the CSB read/write pointers */ engine->execlists.csb_head = -1; }
Created attachment 139340 [details] netconsole With line added. I guess ftrace works now.
In the intel_iommu=igfx_off test, we got [ 0.000000] Linux version 4.17.0-rc3+ (root@amd1.blue.org) (gcc version 7.3.0 (GCC)) #5 SMP PREEMPT Fri May 4 08:44:22 CEST 2018 [ 0.000000] Command line: \\k.efi rw drm.debug=0xf intel_iommu=igfx_off initrd=\i.img ... [ 0.258015] DMAR: No ATSR found [ 0.258097] DMAR: dmar0: Using Queued invalidation [ 0.258107] DMAR: dmar1: Using Queued invalidation [ 0.258195] DMAR: Setting RMRR: [ 0.258282] DMAR: Setting identity map for device 0000:00:02.0 [0x5f800000 - 0x7fffffff] ... [ 207.804629] [drm] VT-d active for gfx access Odd. So we didn't succeed in disabling iommu. Do you mind compiling out iommu entirely to be sure we don't have a problem here with iommu+HWSP?
Created attachment 139341 [details] netconsole take 4 Now its a bit different.
Can you please try re-enabling iommu and https://patchwork.freedesktop.org/patch/221513/ ?
(In reply to Chris Wilson from comment #14) > Can you please try re-enabling iommu and > https://patchwork.freedesktop.org/patch/221513/ ? No it doesnt help, GPU HANG: ecode 9:0:0xfffffffe, reason: hang on rcs0, bcs0, vcs0, vecs0, action: reset Do you need debug enabled and console output ? If you have any other ideas or extra debug flags let me know.
Created attachment 139587 [details] dmesg output tried on latest drm-tip
Yes, please send dmesg with drm.debug=0x1e log_buf_len=4M.
Created attachment 139590 [details] dmesg used drm.debug=0x1e log_buf_len=4M
Chris, any updates on this issue?
Domen, are you still experiencing this issue?
Having purchased a glk (celeron N4100) for myself... This looks like to be an isolated incident (well this and the other glk-iommu reported issues!), as annoyingly it worksforme. I was hoping to able to reproduce, sorry.
Sorry, we dont use this board anymore. So i dont know if issue still persits.
Apologies for the disappointing end.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.