Bug 96805 - [BAT] *ERROR* Failed to fetch GuC firmware
Summary: [BAT] *ERROR* Failed to fetch GuC firmware
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: high major
Assignee: anusha
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-07-04 15:43 UTC by Tvrtko Ursulin
Modified: 2017-03-13 13:46 UTC (History)
2 users (show)

See Also:
i915 platform: BXT, KBL, SKL
i915 features: power/Other


Attachments

Description Tvrtko Ursulin 2016-07-04 15:43:30 UTC
Unless GuC loading and/or submission is set to mandatory, failure to load the firmware should not be logged as an error.

[  139.135737] i915 0000:00:02.0: Direct firmware load for i915/skl_guc_ver6_1.bin failed with error -2
[  139.135837] [drm:intel_guc_init [i915]] *ERROR* Failed to fetch GuC firmware from i915/skl_guc_ver6_1.bin (error -2)
Comment 1 Dave Gordon 2016-07-05 12:01:48 UTC
Not a bug.

Unless GuC loading is specifically *disabled*, then the failure to load a file is a significant event and must be logged even when debug is off. Otherwise there will be no indication in the log that you're not running in the expected mode, or why.

You'll probably remember that Daniel argued that there shouldn't even BE a fallback, because it will result in confusion ("it unexpectedly continues to work"). I say there *should* be, but it needs to be flagged vigorously in the log so it's *really obvious* that your system is missing an important component.

Remember that GuC submission is POR for now and the future; execlist mode is already considered legacy on Gen9.

If you don't have the firmware, or don't want to use it, you must set enable_guc_loading to 0. Any time the loader is asked to find a file and can't, that's an error, even if we have a fallback method for submission.
Comment 2 Chris Wilson 2016-07-05 12:10:51 UTC
The failure to load the firmware is already logged.

  *ERROR* Failed to fetch GuC firmware from i915/skl_guc_ver6_1.bin (error -2)

is not an appropriate user facing error message. Compare this to the information we give when the dmc is failed to be loaded.
Comment 3 Tvrtko Ursulin 2016-07-05 14:25:56 UTC
If the users has explicitly used the "try but fallback if not possible" then I still argue nothing should not be logged as an error but with informational log level (or notice).

Some customers are very sensitive to any mention of errors while the system continues to operate normally and this just wastes money by giving technical support a lot of work.
Comment 4 Dave Gordon 2016-07-05 15:07:05 UTC
Maybe we should distinguish "User selected auto-fallback" (with INFO message?) from "User parameter left at default => System selected *GuC* mode but with auto-fallback in case that fails" which warrants a WARNING or ERROR.

Note that this is an UNSAFE parameter, so setting anything except "auto" will taint the kernel. We don't encourage setting of any such parameters, and if you do set them, you're assumed to know what you're doing, including whether an error message is important or not.

Note also that we wouldn't even *try* to load the firmware if this wasn't a platform for which the *POR* method of submission is to use the GuC. Therefore, failure to find and load the firmware means that the system has been misconfigured and is operating outside the POR parameter space.

That's quite different from the hypothetical case where we might be pre-enabling a "future feature" in advance of it being POR (in which case it wouldn't be an error for it not to be available). Auto-fallback to execlist mode is just a way of keeping the system working rather than giving a blank screen, and does not mean we want users or developers /accidentally/ running in that mode without noticing.
Comment 5 Paulo Zanoni 2016-08-09 14:23:39 UTC
The important discussion here should be why we were getting this message on our BAT tests. Which machines were affected? Are we still getting this? We should make sure the firmware files are present in our BAT machines. Once this is fixed, we should probably downgrade the bug priority and then resume the discussion on what's the appropriate error message and debug level.

I just checked problems.html and I don't see this message, although the number of machines there right now is surprisingly small.
Comment 6 Dave Gordon 2016-09-05 17:37:21 UTC
The pending patchset 
[PATCH v5 0/4] Reclassify messages from GuC loader/submission
should result in messages being ERRORs only when there isn't a (permitted) fallback path. Other situations will still be logged, but at a lower level.

Meanwhile, it looks like there is one SKL machine (ro-skl3-i5-6260u) that doesn't (or didn't) have GuC firmware (v6.1) installed.

It would be good if ALL the CI machines had all past, present, and forthcoming versions installed (even before they're published on 01.org), so that a kernel patchset could select the specific version wanted for a specific test run.
Comment 7 Jari Tahvanainen 2017-03-13 11:27:06 UTC
Assumable https://patchwork.freedesktop.org/series/10918/ is the series that one refers on previous comment. 
Has that landed and does it resolve this issue?
Comment 8 Tvrtko Ursulin 2017-03-13 13:46:35 UTC
Yep, patch was merged last summer.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.