Bug 94846

Summary: [BAT BSW] GPU hang in igt/gem_sync igt/gem_store_dw_loop
Product: DRI Reporter: Mika Kuoppala <mika.kuoppala>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: highest CC: chris, gary.c.wang, intel-gfx-bugs, ville.syrjala
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: BSW/CHT i915 features: GEM/Other
Attachments:
Description Flags
dmesg during piglit run
none
dmidecode
none
dmesg before run
none
lspci
none
kernel config
none
error state from that bsw but from _different_ run none

Description Mika Kuoppala 2016-04-06 15:04:20 UTC
Created attachment 122766 [details]
dmesg during piglit run

In CI/Bat: igt/gem_sync or igt/gem_store_dw_loop gpu hangs and the subsequent recovery fails also, leading to multiple gpu hangs in a row.

Relevant parts in the first ocurrence:
[  400.772446] gem_storedw_loop: starting subtest basic-blt
[  401.625303] [drm:cherryview_enable_rps] GT fifo had a previous error 1080000
[  401.634253] [drm:cherryview_enable_rps] GPLL enabled? yes
[  401.634262] [drm:cherryview_enable_rps] GPU status: 0x00002010
[  401.634266] [drm:cherryview_enable_rps] current GPU freq: 320 MHz (32)
[  401.634270] [drm:cherryview_enable_rps] setting GPU freq to 320 MHz (32)
[  403.196027] r8169 0000:03:00.0 eth0: link up
[  407.804316] [drm:intel_get_hpd_pins] hotplug event received, stat 0x00400000, dig 0x00400000, pins 0x00000080
[  407.804330] [drm:intel_hpd_irq_handler] digital hpd port D - long
[  407.804336] [drm:intel_hpd_irq_storm_detect] Received HPD interrupt on PIN 7 - cnt: 0
[  407.805358] [drm:intel_dp_hpd_pulse] got hpd irq on port D - long
[  407.805520] [drm:i915_hotplug_work_func] running encoder hotplug functions
[  407.805530] [drm:i915_hotplug_work_func] Connector HDMI-A-2 (pin 7) received hotplug event.
[  407.805537] [drm:intel_hdmi_detect] [CONNECTOR:47:HDMI-A-2]
[  407.893070] [drm:intel_hdmi_detect] Live status not up!
[  407.893114] [drm:intel_hpd_irq_event] [CONNECTOR:47:HDMI-A-2] status updated from connected to disconnected
[  407.893130] [drm:i915_hotplug_work_func] Connector DP-2 (pin 7) received hotplug event.
[  407.893137] [drm:intel_dp_detect] [CONNECTOR:49:DP-2]
[  407.904450] [drm:intel_get_hpd_pins] hotplug event received, stat 0x00400000, dig 0x00400000, pins 0x00000080
[  407.904461] [drm:intel_hpd_irq_handler] digital hpd port D - long
[  407.904466] [drm:intel_hpd_irq_storm_detect] Received HPD interrupt on PIN 7 - cnt: 1
[  407.904589] [drm:intel_dp_hpd_pulse] got hpd irq on port D - long
[  407.907929] [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x71450064
[  407.910972] [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x71450064
[  407.916397] [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x71450064
[  407.952027] [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x71450064
[  407.954059] [drm:i915_hotplug_work_func] running encoder hotplug functions
[  407.954068] [drm:i915_hotplug_work_func] Connector HDMI-A-2 (pin 7) received hotplug event.
[  407.954074] [drm:intel_hdmi_detect] [CONNECTOR:47:HDMI-A-2]
[  408.022247] [drm:drm_detect_monitor_audio] Monitor has basic audio support
[  408.022294] [drm:intel_hpd_irq_event] [CONNECTOR:47:HDMI-A-2] status updated from disconnected to connected
[  408.022314] [drm:i915_hotplug_work_func] Connector DP-2 (pin 7) received hotplug event.
[  408.022332] [drm:intel_dp_detect] [CONNECTOR:49:DP-2]
[  408.076962] [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x71450064
[  408.131936] [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x71450064
[  408.188905] [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x71450064
[  408.245874] [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x71450064
[  413.620584] [drm] stuck on blitter ring
[  413.638340] [drm] GPU HANG: ecode 8:1:0xfffffffe, in gem_storedw_loo [6150], reason: Engine(s) hung, action: reset
[  413.643244] [drm:i915_reset_and_wakeup] resetting chip
Comment 1 Mika Kuoppala 2016-04-06 15:04:43 UTC
Created attachment 122767 [details]
dmidecode
Comment 2 Mika Kuoppala 2016-04-06 15:05:00 UTC
Created attachment 122768 [details]
dmesg before run
Comment 3 Mika Kuoppala 2016-04-06 15:05:13 UTC
Created attachment 122769 [details]
lspci
Comment 4 Mika Kuoppala 2016-04-06 15:05:28 UTC
Created attachment 122770 [details]
kernel config
Comment 5 Mika Kuoppala 2016-04-06 15:12:42 UTC
Created attachment 122771 [details]
error state from that bsw but from _different_ run
Comment 6 Chris Wilson 2016-04-06 15:55:44 UTC
(In reply to Mika Kuoppala from comment #5)
> Created attachment 122771 [details]
> error state from that bsw but from _different_ run

Looks to be a simulated/stop-rings hang.
Comment 7 Chris Wilson 2016-06-28 12:23:40 UTC
Ville reworked the bsw irq handling to avoid lost interrupts, the root cause of this bug.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.