Media workloads performance may suffer because of the cost associated with managing of GPGPU EUs slices which are not required for the workloads (media workloads may use no more than certain number of EUs at the same time due to wave front limitation). Usually the rule is: lower stream resolution, lower number of EUs slices is needed. According to experiments, performance may be 5-20% lower from the optimal operating point. This under performance is one of the key factors why in certain use cases SKL is slower than BDW assuming all other system settings except EU aligned.
Here is a link to the feature request to provide an API on the UMD level to permit user to control number of slices selections to potentially benefit the performance: https://github.com/01org/intel-vaapi-driver/issues/152
Here are links to the RFC i915 drm-tip patches to enable static slice shutdown for BDW and SKL:
* https://patchwork.kernel.org/patch/9670509/ [RFC,2/2] drm/i915/bdw: permit make_rpcs execution on BDW to enable slice shutdown
* https://patchwork.kernel.org/patch/9670507/ [RFC,1/2] drm/i915/skl: add slice shutdown debugfs interface
These patches can be used to demonstrate influence of EUs slice numbers on the media workloads. Refer to https://github.com/01org/intel-vaapi-driver/issues/152 for the reproducing details.
Hmm, since the slice/eu masks are stored in the context, is userspace unable to adjustment via LRI? If it can, job done. If not, then we need a context param to allow unprivileged adjustment for the users own contexts.
Is it a good recommendation for all non-GPGPU workloads to set subslice_mask = 0?
Userspace could adjust it via LRI, if only i915 whitelists GEN8_R_PWR_CLK_STATE (this configuration from userspace via LRI is the preferred way for newer platforms, and I was suggesting Dmitry the same thing could be done from BDW onwards).
Just letting you know that letting userspace directly change this configuration through LRI might corrupt the data coming from the OA unit.
I think Chris' patchset to set the userspace control this through a context ioctl is the best solution. That way we can do something sensible in the kernel when the OA unit is enabled.
(In reply to Lionel Landwerlin from comment #4)
> Just letting you know that letting userspace directly change this
> configuration through LRI might corrupt the data coming from the OA unit.
> I think Chris' patchset to set the userspace control this through a context
> ioctl is the best solution. That way we can do something sensible in the
> kernel when the OA unit is enabled.
Good afternoon, is the problem still present or the patchset fixed it??
With inline GEN8_R_PWR_CLK_STATE programming we are getting reset of per-slice registers including MOCS, this can lead to functional and performance regressions. From the other hand with the i915 KMD level programming we can reprogram the state we lost. Thus, we probably will follow with the KMD level slice programming. Here are the related patches:
* https://patchwork.freedesktop.org/series/29715/ - patch series to enable slice programming for Gen8+ (most recent respin)
* https://patchwork.freedesktop.org/series/29564/ - patch series to fix conflict with OA NOA programming
And slice programming IGT test: https://patchwork.freedesktop.org/patch/174839/
First of all. Sorry about spam.
This is mass update for our bugs.
Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!
If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Dmitry, Chris, is this still valid issue?
Yes, it is still valid.
Is media stack still interested in having this for performance reasons? Are there patches available against some userspace component?