104747 – GPU hang occurs when encoding.the platform is haswelll

Bug 104747 - GPU hang occurs when encoding.the platform is haswelll

Summary: GPU hang occurs when encoding.the platform is haswelll

Status:	CLOSED DUPLICATE of bug 104748

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	unspecified
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium blocker
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2018-01-23 01:46 UTC by zhoubo
Modified:	2018-02-13 16:24 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:	HSW
i915 features:	GPU hang

Attachments
the package include /sys/kernel/debug/dri/0/i915_error_state and dmesg log (48.86 KB, application/vnd.rar) 2018-01-23 01:46 UTC, zhoubo	no flags	Details
View All

Description zhoubo 2018-01-23 01:46:53 UTC

Created attachment 136913 [details]
the package include  /sys/kernel/debug/dri/0/i915_error_state and dmesg log

25.271395] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   38.785285] [drm] GPU HANG: ecode 7:0:0x8edcfff1, in TSK_VEncode4 [1674], reason: Hang on render ring, action: reset
[   38.817146] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   38.848304] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   38.879736] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   38.912668] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   38.944964] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   44.655797] EXT4-fs (ram0): re-mounted. Opts: (null)
[   47.697675] drm/i915: Resetting chip after gpu hang

Comment 1 Elizabeth 2018-01-23 18:36:03 UTC

Hello Zhoubo. If reproducible, could you try a more recent kernel https://www.kernel.org? Thanks.

Comment 2 Elizabeth 2018-01-23 18:37:45 UTC


*** This bug has been marked as a duplicate of bug 104748 ***

Comment 3 zhoubo 2018-01-24 02:11:19 UTC

(In reply to Elizabeth from comment #1)
> Hello Zhoubo. If reproducible, could you try a more recent kernel
> https://www.kernel.org? Thanks.

I find the reason cause gpu hang may be encode rate control mode.
First I choose VBR mode, gpu hang occurs in most 10 mins.
Then I choose CBR mode, gpu hang doesn't occur again.
I find some difference in i965 driver, but I don't confirm which one is the bug.

  START: 0x00312000
  HEAD:  0x02006e30
  TAIL:  0x00008848
  CTL:   0x0001f001
  HWS:   0x00311000
  ACTHD: 0x00000000 6a934b44
  IPEIR: 0x00000000
  IPEHR: 0x71000007
  INSTDONE: 0xffdcffff
  BBADDR: 0x00000000 6a934b45
  BB_STATE: 0x00000120
  INSTPS: 0x80000208
  INSTPM: 0x00006080
  FADDR: 0x00000000 6a934d00

according to the gpu hang info, I found "IPEHR: 0x71000007" means the gpu was hang at this address. And I found the param might be error in function 
 "gen75_mfc_batchbuffer_emit_object_command" or "gen75_vme_fill_vme_batchbuffer" because of "*command_ptr++ = (CMD_MEDIA_OBJECT | (9 - 2))".

So I think the reason may be some param different in CRB and VBR, and it lead to different in  cmd "CMD_MEDIA_OBJECT", finally gpu hang occured.
Is this right?
If it's right ,could you help me find which is the bug ?

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.