Bug 94846 - [BAT BSW] GPU hang in igt/gem_sync igt/gem_store_dw_loop
Summary: [BAT BSW] GPU hang in igt/gem_sync igt/gem_store_dw_loop
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: highest normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-04-06 15:04 UTC by Mika Kuoppala
Modified: 2017-10-06 11:17 UTC (History)
4 users (show)

See Also:
i915 platform: BSW/CHT
i915 features: GEM/Other


Attachments
dmesg during piglit run (2.71 MB, text/plain)
2016-04-06 15:04 UTC, Mika Kuoppala
no flags Details
dmidecode (13.74 KB, text/plain)
2016-04-06 15:04 UTC, Mika Kuoppala
no flags Details
dmesg before run (98.27 KB, text/plain)
2016-04-06 15:05 UTC, Mika Kuoppala
no flags Details
lspci (429 bytes, text/plain)
2016-04-06 15:05 UTC, Mika Kuoppala
no flags Details
kernel config (99.56 KB, text/plain)
2016-04-06 15:05 UTC, Mika Kuoppala
no flags Details
error state from that bsw but from _different_ run (375.19 KB, text/plain)
2016-04-06 15:12 UTC, Mika Kuoppala
no flags Details

Description Mika Kuoppala 2016-04-06 15:04:20 UTC
Created attachment 122766 [details]
dmesg during piglit run

In CI/Bat: igt/gem_sync or igt/gem_store_dw_loop gpu hangs and the subsequent recovery fails also, leading to multiple gpu hangs in a row.

Relevant parts in the first ocurrence:
[  400.772446] gem_storedw_loop: starting subtest basic-blt
[  401.625303] [drm:cherryview_enable_rps] GT fifo had a previous error 1080000
[  401.634253] [drm:cherryview_enable_rps] GPLL enabled? yes
[  401.634262] [drm:cherryview_enable_rps] GPU status: 0x00002010
[  401.634266] [drm:cherryview_enable_rps] current GPU freq: 320 MHz (32)
[  401.634270] [drm:cherryview_enable_rps] setting GPU freq to 320 MHz (32)
[  403.196027] r8169 0000:03:00.0 eth0: link up
[  407.804316] [drm:intel_get_hpd_pins] hotplug event received, stat 0x00400000, dig 0x00400000, pins 0x00000080
[  407.804330] [drm:intel_hpd_irq_handler] digital hpd port D - long
[  407.804336] [drm:intel_hpd_irq_storm_detect] Received HPD interrupt on PIN 7 - cnt: 0
[  407.805358] [drm:intel_dp_hpd_pulse] got hpd irq on port D - long
[  407.805520] [drm:i915_hotplug_work_func] running encoder hotplug functions
[  407.805530] [drm:i915_hotplug_work_func] Connector HDMI-A-2 (pin 7) received hotplug event.
[  407.805537] [drm:intel_hdmi_detect] [CONNECTOR:47:HDMI-A-2]
[  407.893070] [drm:intel_hdmi_detect] Live status not up!
[  407.893114] [drm:intel_hpd_irq_event] [CONNECTOR:47:HDMI-A-2] status updated from connected to disconnected
[  407.893130] [drm:i915_hotplug_work_func] Connector DP-2 (pin 7) received hotplug event.
[  407.893137] [drm:intel_dp_detect] [CONNECTOR:49:DP-2]
[  407.904450] [drm:intel_get_hpd_pins] hotplug event received, stat 0x00400000, dig 0x00400000, pins 0x00000080
[  407.904461] [drm:intel_hpd_irq_handler] digital hpd port D - long
[  407.904466] [drm:intel_hpd_irq_storm_detect] Received HPD interrupt on PIN 7 - cnt: 1
[  407.904589] [drm:intel_dp_hpd_pulse] got hpd irq on port D - long
[  407.907929] [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x71450064
[  407.910972] [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x71450064
[  407.916397] [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x71450064
[  407.952027] [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x71450064
[  407.954059] [drm:i915_hotplug_work_func] running encoder hotplug functions
[  407.954068] [drm:i915_hotplug_work_func] Connector HDMI-A-2 (pin 7) received hotplug event.
[  407.954074] [drm:intel_hdmi_detect] [CONNECTOR:47:HDMI-A-2]
[  408.022247] [drm:drm_detect_monitor_audio] Monitor has basic audio support
[  408.022294] [drm:intel_hpd_irq_event] [CONNECTOR:47:HDMI-A-2] status updated from disconnected to connected
[  408.022314] [drm:i915_hotplug_work_func] Connector DP-2 (pin 7) received hotplug event.
[  408.022332] [drm:intel_dp_detect] [CONNECTOR:49:DP-2]
[  408.076962] [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x71450064
[  408.131936] [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x71450064
[  408.188905] [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x71450064
[  408.245874] [drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x71450064
[  413.620584] [drm] stuck on blitter ring
[  413.638340] [drm] GPU HANG: ecode 8:1:0xfffffffe, in gem_storedw_loo [6150], reason: Engine(s) hung, action: reset
[  413.643244] [drm:i915_reset_and_wakeup] resetting chip
Comment 1 Mika Kuoppala 2016-04-06 15:04:43 UTC
Created attachment 122767 [details]
dmidecode
Comment 2 Mika Kuoppala 2016-04-06 15:05:00 UTC
Created attachment 122768 [details]
dmesg before run
Comment 3 Mika Kuoppala 2016-04-06 15:05:13 UTC
Created attachment 122769 [details]
lspci
Comment 4 Mika Kuoppala 2016-04-06 15:05:28 UTC
Created attachment 122770 [details]
kernel config
Comment 5 Mika Kuoppala 2016-04-06 15:12:42 UTC
Created attachment 122771 [details]
error state from that bsw but from _different_ run
Comment 6 Chris Wilson 2016-04-06 15:55:44 UTC
(In reply to Mika Kuoppala from comment #5)
> Created attachment 122771 [details]
> error state from that bsw but from _different_ run

Looks to be a simulated/stop-rings hang.
Comment 7 Chris Wilson 2016-06-28 12:23:40 UTC
Ville reworked the bsw irq handling to avoid lost interrupts, the root cause of this bug.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.