Bug 110617

Summary: [KBL] Video corrupted during playback with audio in KBL NUC using Clear Linux OS (Root Caused to GuC/HuC Firmware Authentication Issue)
Product: DRI Reporter: yat.seng.lam
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: RESOLVED WORKSFORME QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: major    
Priority: medium CC: eero.t.tamminen, intel-gfx-bugs, jon.ewins
Version: unspecified   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard: Triaged
i915 platform: KBL i915 features:
Attachments:
Description Flags
kernel log
none
attachment-20701-0.html none

Description yat.seng.lam 2019-05-06 07:51:14 UTC
Created attachment 144170 [details]
kernel log

◾Bug detailed description
Video corrupted during playback with audio with NUC-KBL using kernel PK LTS2018 and Clear Linux OS. Please refer to the attach picture.  


◾Environment/Hardware
Hardware: Intel Desktop Board NUC7i7BNB
BIOS version: BNKBL357.86A.0046.2017.0503.1744
Processor: Intel® Core™ i7-7567U CPU @ 3.50GHz
PKT version: PKT LTS 2018 (4.19.28-28)


◾Reproduce Steps
1. Boot up bootable clear linux console NUC using http://koji-lts.png.intel.com/L1/releases/10/clear/
2. swupd bundle-add desktop-autostart
3. swupd bundle-add vim, wget, sudo
4. swupd bundle-add kernel-iot-lts2018
5. clr-boot-manager set-kernel <choose the kernel-iot-lts2018>
6. reboot

Terminal (on Clear Linux Desktop) NUC
4. wget http://andromeda01.png.intel.com/qe-collateral/onelinux_workload/small.ogv
5. gst-launch-1.0 playbin uri=file:///small.ogv
note: the file:/// must be full path to the file...

◾Current result
video corrupted during play back. However, audio sound can be heard as expected.

◾Expected result
video playback without corrupted together with audio sound


============================

Initial Triage Information

============================

1./  Repeating the same test setup with APL-NUC, there is no video corruption issue happens there.

2./  From initial analysis together with Intel Production Kernel Team, it points to GuC/HuC firmware Authentication Failure issue for KBL:

It should be video issue because of Guc/Huc firmware loading failure.

 

[    4.009274] [drm] HuC: Loaded firmware i915/kbl_huc_ver02_00_1810.bin (version 2.0)

[    4.014351] [drm] GuC: Loaded firmware i915/kbl_guc_ver9_39.bin (version 9.39)

[    4.065392] [drm:intel_huc_auth] ERROR HuC: Firmware not verified 0x6000

[    4.072307] [drm:intel_huc_auth] ERROR HuC: Authentication failed -110

[    4.079045] i915 0000:00:02.0: GuC initialization failed -110

[    4.084834] [drm:i915_gem_init_hw_late] ERROR Late init: enabling uc failed (-110)

[    4.092612] [drm:i915_gem_context_first_open] ERROR Late initialization failed: -110


Further comment from PKT team:
==============================
his issue should be i915 firmware loading issue. If we disable a couple of i915 related cmdline parameters, video could be played .

When this issue is reproduced, there are a lot of Guc/Huc firmware loading errors as Paul mentioned in previous mail.



If we use "swupd bundle-add kernel-iot-lts2018" && "clr-boot-manager set-kernel 4.19.32-44.iot-lts2018" to update booting kernel which version is similar with 358 in above link, there are a lot of Guc/Huc loading errors.

We noticed these two kernels’ cmdlines are different. In 4.19.32-44.iot-lts2018 kernel, it has two i915 parameters(i915.nuclear_pageflip=1 i915.enable_guc=0x02). If we removed these two parameters, there is no firmware loading error anymore in 4.19.32-44.iot-lts2018 kernel. And video playback works.


In official ClearLinux native kernel(5.0.7), there are no these two cmdline parameters either.


Further Summary of Discussion so far:
=====================================

1.\ The same kernel-iot-lts2018 is used for APL and KBL and APL needs to have GuC/HuC firmware enabled for the VDENC encoder and content protection use cases.  Hence, disabling GuC/HuC firmware loading at kernel command line is not viable solution

2.\ We need a common solution for GuC/HuC firmware loading that works for all the platform supported with kernel-iot-lts2018.

3.\ Need to find out the mechanism of GuC/HuC firmware binaries generation and signing (is it open source or closed source and whether signing has any issue) and the mechanism on how GuC/HuC firmware authentication works.
Comment 1 Lakshmi 2019-05-07 06:34:31 UTC
Have you tried to verify the issue with latest kernel 5.1 on KBL?
Comment 2 Lakshmi 2019-05-31 08:45:20 UTC
@Jon, any comments?
Comment 3 Jon Ewins 2019-06-14 19:03:02 UTC
GuC and HuC are closed source signed binaries.  They are loaded, with the GuC first authenticated based on its embedded key and the GuC then supporting authentication of the HuC.  Of the kernel params you mentioned, i915.enable_guc=0x02 configures the system for use of the HuC.  The nuclear flip param should not be relevant.
One observation is that the kernel in use here appears to have late uc loading (i915_gem_init_hw_late) that was required for Android boot operation, but never part of the upstream kernel. It is not clear if that is related.  
* Please confirm that HuC authentication is successful on APL, not just that no video corruption is seen.  Mihgt be case if case media operation is different on APL.

* Is the guc/huc failure consistent on every run?

Note that we have confirmed locally that current drm-tip kernel with v32.0.3 GuC fw is not showing an issue for KBL on general boot, IGT tests.

To aid further debugging, please enable following logging.
First ensure the drm i915 debug level is set 
drm.debug=0xe
For GUC logs, set:
i915.guc_log_level=3
This requires that debugfs is configured.  GuC logs will be located after the test run at:
/sys/kernel/debug/dri/0/i915_guc_log_dump
(no debugfs must be enabled)
If an issue happens in guc load such that the driver load fails, the driver will copy the log to
/sys/kernel/debug/dri/0/i915_guc_load_err_log_dump
before the driver is unrolled.
please update the content of these files to this bugzilla, along with kernel dmesg log.
Comment 4 Chris Wilson 2019-07-12 14:33:03 UTC
For reference,

commit f774f09649192f326fa030564afd3f8f5d82c1e4 (drm-intel/for-linux-next, drm-intel/drm-intel-next-queued, drm-intel-next-queued)
Author: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date:   Fri Jul 12 11:14:45 2019 +0000

    drm/i915/guc: Turn on GuC/HuC auto mode
    
    Using "enable_guc" modparam auto mode (-1) will let driver
    decide on which platforms and in which configuration we want
    to use GuC/HuC firmwares.
    
    Today driver will enable HuC firmware authentication by GuC
    only on Gen11+ platforms as HuC firmware is required to unlock
    advanced video codecs in media driver.
    
    Legacy platforms with GuC/HuC are not affected by this change
    as for them driver still defaults to disabled(0) in auto mode.
Comment 5 Lakshmi 2019-07-29 12:13:33 UTC
(In reply to Jon Ewins from comment #3)
> GuC and HuC are closed source signed binaries.  They are loaded, with the
> GuC first authenticated based on its embedded key and the GuC then
> supporting authentication of the HuC.  Of the kernel params you mentioned,
> i915.enable_guc=0x02 configures the system for use of the HuC.  The nuclear
> flip param should not be relevant.
> One observation is that the kernel in use here appears to have late uc
> loading (i915_gem_init_hw_late) that was required for Android boot
> operation, but never part of the upstream kernel. It is not clear if that is
> related.  
> * Please confirm that HuC authentication is successful on APL, not just that
> no video corruption is seen.  Mihgt be case if case media operation is
> different on APL.
> 
> * Is the guc/huc failure consistent on every run?
> 
> Note that we have confirmed locally that current drm-tip kernel with v32.0.3
> GuC fw is not showing an issue for KBL on general boot, IGT tests.
> 
> To aid further debugging, please enable following logging.
> First ensure the drm i915 debug level is set 
> drm.debug=0xe
> For GUC logs, set:
> i915.guc_log_level=3
> This requires that debugfs is configured.  GuC logs will be located after
> the test run at:
> /sys/kernel/debug/dri/0/i915_guc_log_dump
> (no debugfs must be enabled)
> If an issue happens in guc load such that the driver load fails, the driver
> will copy the log to
> /sys/kernel/debug/dri/0/i915_guc_load_err_log_dump
> before the driver is unrolled.
> please update the content of these files to this bugzilla, along with kernel
> dmesg log.

Reporter, can you please provide all necessary details as requested above.
Comment 6 Jon Ewins 2019-07-29 12:13:52 UTC
Created attachment 144906 [details]
attachment-20701-0.html

I am currently away and will respond when I return
Comment 7 Lakshmi 2019-10-02 18:23:01 UTC
No feedback for more than 2 months. Resolving this bug as WORKSFORME. If the issue still persists, please respond to the comment 3 and add all required information.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.