Bug 110307 - [ICL][ffmpeg-qsv/-vaapi/msdk] GPU hang + fail with low-power (HW) HEVC encoding (in first frame)
Summary: [ICL][ffmpeg-qsv/-vaapi/msdk] GPU hang + fail with low-power (HW) HEVC encod...
Status: CLOSED NOTABUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: high normal
Assignee: Tvrtko Ursulin
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged, ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-02 13:42 UTC by Dmitry
Modified: 2019-05-06 08:33 UTC (History)
4 users (show)

See Also:
i915 platform: ICL
i915 features: GEM/Other


Attachments
log error from /sys/class/drm/card0/error (17.67 KB, text/plain)
2019-04-02 13:42 UTC, Dmitry
no flags Details
attachment-26526-0.html (2.24 KB, text/html)
2019-05-02 13:43 UTC, Dmitry
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dmitry 2019-04-02 13:42:17 UTC
Created attachment 143845 [details]
log error from /sys/class/drm/card0/error

https://github.com/intel/media-driver/issues/580
Comment 1 Dmitry 2019-04-02 13:49:06 UTC
Environment:
============
OS: Ubuntu18.04.1LTS
KERNEL: 5.1.0-rc3+
HUC/GUC disabled
ASLR IS ENABLED
HT IS ENABLED

kernel git://anongit.freedesktop.org/drm-tip at 8cac0cc264d2a6af0b33370b542b12d516e022c5 2019-03-28_13-42-05 drm-tip: 2019y-03m-28d-13h-41m-08s UTC integration manifest
libdrm git://anongit.freedesktop.org/git/mesa/drm at ae836decb41a69d00bfadab78a7cb69f88de4c94 2019-03-25_21-34-13 intel: sync i915_pciids.h with kernel
libva git://github.com/intel/libva at c98b06d2b8c00dc4df628488b672711b3f0eb118 2019-03-21_06-57-29 [common] Add A2RGB10 fourcc definition
gmmlib git://github.com/intel/gmmlib at 8294f6851ca0829deca7f97c851c8ed0439c94eb 2019-03-28_04-31-26 Adding Comet lake PCH and DeviceId's
media-driver git://github.com/intel/media-driver at 8773ef5 2019-03-25_09-49-48 [VP] Correction Colorspace to seperate RGB/YUV
media-sdk git://github.com/Intel-Media-SDK/MediaSDK at b5d24b2c2baac7dab44ddf0279f576d4c249d08a 2019-03-29_11-37-11 Small refactoring for vaapi allocator in the library
fmpeg https://github.com/FFmpeg/FFmpeg.git at 391f884675f319b95f5a72a410178516e11c557d 2019-03-28_13-52-51 lavc/qsvenc_h264: remove the privite option trellis

cat /proc/cmdline:
ffmpeg -loglevel verbose -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -hwaccel_output_format vaapi -i 1280x720p_29.97_10mb_h264_cabac.264 -c:v hevc_vaapi -low_power 1 -qscale:v 20 -y test.h265
or
sample_encode h265 -cqp -qsv-ff -i 1920x1080.yuv -w 1920 -h 1080 -o test.h265
Comment 2 Eero Tamminen 2019-04-03 10:05:27 UTC
Dmitry, you mentioned in media-driver issue tracker that you didn't get a hang with some earlier kernel version.  Which version that was / is this a regression?
Comment 3 Dmitry 2019-04-03 10:31:28 UTC
Not exactly. If i use previous version of kernel 5.0.0-internal i get the same issue, the difference is only i haven't the message with gpu hang.
Comment 4 Francesco Balestrieri 2019-04-15 04:29:06 UTC
Tvrtko, anything in the logs or in the command that points to a possible reason?
Comment 5 Tvrtko Ursulin 2019-04-15 07:14:22 UTC
According to my interpretation of the error state the execution hangs on command dw 0x73880080 (HEVC_VP9_RDOQ_STATE) submitted from a batch buffer.

  IPEIR: 0x00000000
  IPEHR: 0x34240010
  INSTDONE: 0xbbffffff
  batch: [0x00000000_018fb000, 0x00000000_018ff000]
  BBADDR: 0x00000000_018fbf45

...
0x018fbf44:      0x73880080: 3D UNKNOWN: 3d_965 opcode = 0x7388
...

Only problem is IPEIR claims execution is not in a batch buffer (bit 3 is not set) and IPEHR points to something different as well.

INSTDONE says VCS and VIN units are running. I don't know what is the latter. Is it consistent with the hanging command?
Comment 6 Dmitry Ermilov 2019-04-15 21:50:18 UTC
Thanks!

I don't know exactly how to interpret this. Copied your reply at https://github.com/intel/media-driver/issues/580
Comment 7 Arek Hiler 2019-04-25 10:51:17 UTC
Hey,

So if I understand this correctly the low-power encoding you are doing is requiring HuC and you were using drm-tip on ICL. Have you had the firmware set-up correctly and loaded? Which version?

You should probably try this out with:
https://patchwork.freedesktop.org/series/58760/
and firmware coming from here: git://anongit.freedesktop.org/drm/drm-firmware guc_huc_updates branch ?

Although it is quite surprising that HEVC_VP9_RDOQ_STATE would hang without HuC.
Comment 8 Chris Wilson 2019-04-25 12:03:26 UTC
I would like to point out that we have nothing in igt that proves the huc even exists let alone is functional. ;)
Comment 9 Dmitry 2019-05-02 13:43:48 UTC
Created attachment 144129 [details]
attachment-26526-0.html

I will be OOO from ww40.1 to ww42.1 due to vacation.

Best regards,
Dmitry Menshov
Comment 10 Jani Saarinen 2019-05-06 07:24:12 UTC
Dmitry, please try to reproduce the error using drm-tip (https://cgit.freedesktop.org/drm-tip) and kernel parameters drm.debug=0x1e log_buf_len=4M, and if the problem persists attach the full dmesg from boot.
Comment 11 Dmitry Ermilov 2019-05-06 07:51:21 UTC
It was confirmed that HEVC low power encoder needs HuC supports (including CQP mode). So the fact that HEVC-LP doesn't work on drm-tip is expected. However of course MSDK/UMD should return an error at initialization for this case.

I assume this ticket can be closed.
Comment 12 Jani Saarinen 2019-05-06 08:33:08 UTC
OK, based on feedback received. Closing. Please re-open if you consider this not ok to close.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.