Bug 97054 - [APL] GPU hang after resume while executing - gem_softpin@noreloc-s4
Summary: [APL] GPU hang after resume while executing - gem_softpin@noreloc-s4
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Jairo Miramontes
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-07-22 22:23 UTC by Jairo Miramontes
Modified: 2016-10-03 16:57 UTC (History)
1 user (show)

See Also:
i915 platform: BXT
i915 features: GPU hang, power/suspend-resume


Attachments
gpu crash log from /sys/class/drm/card0/error (164.77 KB, text/plain)
2016-07-22 22:23 UTC, Jairo Miramontes
no flags Details

Description Jairo Miramontes 2016-07-22 22:23:59 UTC
Created attachment 125264 [details]
gpu crash log from /sys/class/drm/card0/error

This error seems to be related to the following https://bugs.freedesktop.org/show_bug.cgi?id=96526

Find below the configuration and steps to reproduce

=== Software information ==
Kernel version: 4.7.0-rc7 with patch-revert-guc-loading-submission+
Linux distribution: Ubuntu 16.04 LTS
Architecture: 64-bit
xf86-video-intel version,2.99.917,
Xorg-Xserver version: 1.18.99.1,
DRM version: 2.4.68
Cairo version: 1.15.2
Kernel driver in use: i915
Bios revision: 144.10
KSC revision: 1.15

 === Hardware information ===
Platform: BXT-P
Motherboard model:Broxton P
Motherboard type: NOTEBOOK Hand Held
Motherboard manufacturer: Intel Corp.
CPU family:Other,
CPU information: 06/5c
GPU Card: Intel Corporation Device 5a84 (rev 0a) (prog-if 00 [VGA controller]),
Memory ram: 16 GB
Maximum memory ram allowed: 16 GB
Hard drive capacity:80.0 GB

Steps:
------
Execute commands:
cd <...>/intel-gpu-tools/tests
# ./gem_softpin --run-subtest noreloc-s4

Actual results:
-------------
Tests Fails showing a CPU Hang and displaying the following error on dmesg.

[   74.120717] [drm] GPU HANG: ecode 9:0:0xfffffffe, in gem_softpin [1548], reason: Hang on render ring, action: reset
[   74.120719] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   74.120720] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   74.120720] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   74.120721] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   74.120722] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   74.120751] [drm:i915_reset_and_wakeup] resetting chip
[   74.120756] drm/i915: Resetting chip after gpu hang
[   74.120796] [drm:gen8_init_common_ring] Execlists enabled for render ring
[   74.120815] [drm:gen8_init_common_ring] Execlists enabled for blitter ring
[   74.120831] [drm:gen8_init_common_ring] Execlists enabled for bsd ring
[   74.120846] [drm:gen8_init_common_ring] Execlists enabled for video enhancement ring
[   74.120861] [drm:intel_guc_setup] GuC fw status: path i915/bxt_guc_ver8_7.bin, fetch NONE, load NONE


Expected results:
-----------------
Tests are marked as PASS

Kernel flags:
-----------------
drm.debug=0xe
Comment 1 yann 2016-08-03 17:09:26 UTC
From this error dump, in render ring 
  HEAD:  0x00200d38
    head = 0x00000d38, wraps = 1
  TAIL:  0x00000d38
  ACTHD: 0x00000000 00200d38
    at ring: 0x00000000
  IPEIR: 0x00000000
  IPEHR: 0x00000000
 ...
 seqno: 0x000000b0
 last_seqno: 0x000000b1

and in the trace

render ring --- 1 requests
  seqno 0x000000b1, emitted 4294963764, tail 0x00000d18
render ring --- 1 waiters
 seqno 0x000000b1 for gem_softpin [1548]
ring buffer (render ring) at 0x00000000_00912000; HEAD points to: 0x00000000_00912d38
0x00912000:      0x383b6b0a: UNKNOWN
0x00912004:      0x789b6dba: 3D UNKNOWN: 3d_965 opcode = 0x789b
0x00912008:      0x482b6b2a: 2D UNKNOWN
0x0091200c:      0x6b2e4d2a: 3D UNKNOWN: 3d_965 opcode = 0x6b2e

Could it be a synchronization issue seqno is gt last_seqno in render ring?

Jairo, can you replay with latest kernel and attached dmesg and new dump if it occurs again?
Comment 2 yann 2016-08-30 13:22:06 UTC
Jairo, ping (see comments above: please try to reproduce and attached new logs if it occurs again)
Comment 3 yann 2016-10-03 16:57:22 UTC
(In reply to yann from comment #2)
> Jairo, ping (see comments above: please try to reproduce and attached new
> logs if it occurs again)

Closing this bug as fixed (see reference also to closed bug 96526). If it occurs again with latest kernel & mesa, please reopen and attach both kernel log & gpu crash dump.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.