Bug 111918 - Kernel tainted even when given option is unsupported, with an incorrect taint flag
Summary: Kernel tainted even when given option is unsupported, with an incorrect taint...
Status: VERIFIED NOTABUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: All Linux (All)
: low minor
Assignee: Daniele Ceraolo Spurio
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged
Keywords:
Depends on:
Blocks:
 
Reported: 2019-10-07 14:02 UTC by Eero Tamminen
Modified: 2019-10-09 13:22 UTC (History)
5 users (show)

See Also:
i915 platform: ALL
i915 features:


Attachments

Description Eero Tamminen 2019-10-07 14:02:21 UTC
Use-case:
- BDW HW
- echo "options i915 enable_guc=2" > /etc/modprobe.d/i915.conf
- reboot
- check dmesg

Expected outcome:
- No tainting for unused options

Actual outcome:
[    2.207764] Setting dangerous option enable_guc - tainting kernel
[    2.211921] i915 0000:00:02.0: Incompatible option enable_guc=2 - GuC is not supported!
[    2.211924] i915 0000:00:02.0: Incompatible option enable_guc=2 - HuC is not supported!
Comment 1 Daniele Ceraolo Spurio 2019-10-07 21:39:50 UTC
The enable_guc parameter is defined as unsafe, so the kernel will be tainted whenever the parameter is set, no matter if the support is there or not. Just to clarify, are you requesting to avoid a taint if the parameter is set on a platform that does not support GuC? If that is the case, I don't think that is something that will match the general handling of the i915 parameters, because for other similar parameters for other features we do always taint no matter the HW support (e.g. i915.enable_psr).
Comment 2 Eero Tamminen 2019-10-08 09:43:44 UTC
(In reply to Daniele Ceraolo Spurio from comment #1)
> The enable_guc parameter is defined as unsafe, so the kernel will be tainted
> whenever the parameter is set, no matter if the support is there or not.
>
> Just to clarify, are you requesting to avoid a taint if the parameter is set
> on a platform that does not support GuC? If that is the case, I don't think
> that is something that will match the general handling of the i915
> parameters, because for other similar parameters for other features we do
> always taint no matter the HW support (e.g. i915.enable_psr).

Tainting is supposed to indicate kernel being in specific (unsupported, unsafe etc) state.  If the unsafe operation isn't actually done or unsafe state enabled, still tainting the kernel seems broken.

(I.e. this is generic i915 bug, not GuC/HuC specific one.)


Btw. Looking at the taint flag used for HuC loading, that seems odd too:
  $ cat /proc/sys/kernel/tainted
  64

According to kernel documentation:
  https://www.kernel.org/doc/html/latest/admin-guide/tainted-kernels.html

That's "taint requested by userspace application", which I think is supposed to mean user space echoing taint to "/proc/sys/kernel/tainted" file, like described here:
  https://lwn.net/Articles/184879/

None of the documented flags quite matches GuC case though, which is currently "unsafe feature / firmware", but wouldn't e.g. one of these flags be closer: "kernel issued warning" (512), "staging driver was loaded" (1024), "workaround for bug in platform firmware applied" (2048)?
Comment 3 Eero Tamminen 2019-10-08 15:08:32 UTC
> because for other similar parameters for other features we do always taint no matter the HW support

-> platform = all
Comment 4 Daniele Ceraolo Spurio 2019-10-08 21:07:57 UTC
Cc: i915 maintainers

The tainting flag for the unsafe parameter is implicitly selected by the params handler (kernel/params.c), so we have no control over it from i915:

static void param_check_unsafe(const struct kernel_param *kp)
{
	if (kp->flags & KERNEL_PARAM_FL_UNSAFE) {
		pr_notice("Setting dangerous option %s - tainting kernel\n",
			  kp->name);
		add_taint(TAINT_USER, LOCKDEP_STILL_OK);
	}
}

What we could do to avoid tainting if the feature is not supported by the HW is define all the parameters as safe and then explicitly taint if the feature is supported in HW. I'm not sure how this fits with the general usage of safe/unsafe parameters in the kernel though, so I'd like an ack/nak/suggestion from the maintainers first.
Comment 5 Joonas Lahtinen 2019-10-09 05:56:44 UTC
The solution for the user is simply not to set unsafe kernel options. When updating the kernel, the cmdline is kept intact, so when updating the kernel the currently inactive unsafe option might become active. Taint is supposed to highlight that.
Comment 6 Eero Tamminen 2019-10-09 10:06:04 UTC
(In reply to Joonas Lahtinen from comment #5)
> The solution for the user is simply not to set unsafe kernel options.
> When updating the kernel, the cmdline is kept intact, so when updating
> the kernel the currently inactive unsafe option might become active.
> Taint is supposed to highlight that.

AFAIK taint is mostly supposed to tell upstream kernel maintainers (i.e. other people than i915 devs & users) whether kernel being run in reported upstream bugs was trustworthy (is the bug worth debugging etc).  Setting taint flag when that's not the case, defeats this.


(In reply to Daniele Ceraolo Spurio from comment #4)
> 		add_taint(TAINT_USER, LOCKDEP_STILL_OK);

What about this? TAINT_USER flag is AFAIK intended for the specific use-case of user-space writing to /proc/sys/kernel/tainted.
Comment 7 Joonas Lahtinen 2019-10-09 13:08:26 UTC
If an unsafe module parameter is set (like you have enable_guc), the bugs should not be reported to upstream Bugzilla. That's the right signal to give to the user, regardless if the kernel was succesfully able to load all the firmwares or not for that specific boot. The kernel triggers an unsupported codepath when you enable the option.

That modparam checking code is not from i915, we're just using what core kernel provides at kernel/params.c. So the discussion would have to be at core kernel level. And I think it might be hard sell to skip tainting the kernel when the unsafe modparams don't have an effect. Not least because it's hard to define what is to have an effect, as in your case two NOTICE level messages get printed to the dmesg. That would already be picked up by many tools set up to monitor the dmesg.

That'd be a lot of work just to support a special case when the user happens to have unsafe module parameters at cmdline which don't happen to activate on that specific boot. Whereas the user should not have any unsafe module parameters on the cmdline to begin with, and the problem can be removed by removing the cmdline option.

And finally, I think the rationale behind adding user taint for user supplied cmdline parameter makes much sense. The documentation could be fixed to reflect that cmdline is a possible origin of the user taint? I'm sure a such patch would get accepted to core kernel.
Comment 8 Eero Tamminen 2019-10-09 13:22:01 UTC
Sounds reasonable, thanks for the explanation!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.