Bug 105145 - vaExportSurfaceHandle interaction with surface interlaced flag prevents switching on vaapi deinterlacing dynamically
Summary: vaExportSurfaceHandle interaction with surface interlaced flag prevents switc...
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: All All
: medium enhancement
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-17 12:47 UTC by 67b0226d
Modified: 2019-09-25 18:02 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description 67b0226d 2018-02-17 12:47:13 UTC
When decoding interlaced video with VAAPI and least when using the radeonsi driver, it is impossible to postproc-deinterlace a frame from a surface that was used with vaExportSurfaceHandle() before.

This effectively means that it is impossible to dynamically switch on deinterlacing mid-stream, i.e. after an *interlaced* frame has been shown unprocessed with vaExportSurfaceHandle.

Try for example with mpv from git:
mpv -vo gpu --hwdec=vaapi /path/to/interlaced/video.mkv
(add --gpu-context=wayland when running under Wayland, else vaapi will not work)
give it a few seconds and then switch on deinterlacing with the 'd' key.
You will see the error mesage "vaRenderPicture() failed (invalid VASurfaceID)" and not get any deinterlacing. Having deinterlacing enabled from the beginning (e.g. with "--deinterlace=yes") works, since no unprocesssed surface is exported then.

The chain of events seems to be:
- Create surfaces for decoding. They are marked interlaced by default on radeonsi. (see https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/radeonsi/si_get.c?id=9d21dbeb88bc99ca0e153c11265e19536ad36b61#n655 - PREFERS_INTERLACED is always returned as true)
- Export the decoded surfaces with vaExportSurfaceHandle() and present them to the user. This has the side-effect of making the surfaces non-interlaced (see https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/state_trackers/va/surface.c?id=9d21dbeb88bc99ca0e153c11265e19536ad36b61#n966)
- Now we want to turn on postproc.
- Create surfaces for postproc. They are, again, marked interlaced by default
- Trying to deinterlace gives VA_STATUS_ERROR_INVALID_SURFACE_ID since it thinks that the source surface is not interlaced, and that the destination surface is - both of which is wrong (see https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/state_trackers/va/postproc.c?id=9d21dbeb88bc99ca0e153c11265e19536ad36b61#n131)

Reporting this against the radeonsi driver since I couldn't test with anything else, but the issue may really be in how vlVaExportSurfaceHandle works on all drivers.
On a related note, since the radeonsi driver by default generates interlaced surfaces when using vaCreateSurfaces, vlVaExportSurfaceHandle making them progressive seems to be the only reason it correctly bob-deinterlaces at the moment (MADI seems to be fine because it uses another code path). At least after all output surfaces have been exported once, before that it just blits. I also consider this a bug that could now be solved at the same time. If not, I can open a new ticket for that.
Comment 1 Christian König 2018-02-17 13:48:02 UTC
Yeah, that is a design issue with vlVaExportSurfaceHandle.

E.g. vlVaExportSurfaceHandle can only export surfaces when they are in the progressive memory layout because VA-API doesn't supports interlaced layouts. To not run into issues with that limitation we reallocate the backing store for the surface and copy from the interlaced presentation to the progressive layout on the first vlVaExportSurfaceHandle.

Now post processing is only possible if you got the interlaced format, cause otherwise extracting the odd/even fields costs to much time.

What we could try to do is to reverse what is done in vlVaExportSurfaceHandle before post processing, e.g. copy from progressive to interlaced. But that usually means we lag for quite a number of frames (because of the extra copy) until all surfaces are in interlaced format again.

Not sure what to do here except for fixing VA-API.
Comment 2 67b0226d 2018-02-18 08:31:31 UTC
This "interlaced" data layout - what does it mean exactly? Does it just mean fields in separate surfaces or is there anything else going on additionally? I couldn't find any docs on this in mesa.

Since we can do weaving on the Kodi side anyway (for VDPAU), would it be enough to add flags, say VA_EXPORT_SURFACE_TOP_FIELD and VA_EXPORT_SURFACE_BOTTOM_FIELD, to vaExportSurfaceHandle? Would that then be roughly equivalent to what we get with VDPAURegisterVideoSurfaceNV?
Comment 3 67b0226d 2018-02-18 16:41:10 UTC
OK, scratch that. Our VAAPI weave implementation uses legacy GL features, and we're not very keen to accommodate for fields in separate textures in our general shaders.

Here's what we'll do instead (in case someone comes here looking for a workaround):
- For 2K video or less: Always do VAAPI postprocessing. If the content turns out to be progressive, we do an unnecessary copy (as the driver has to weave). But it enables us to switch deinterlacing on and off on a per-frame basis - needed not only for toggling deinterlacing on user wish, but also for mixed progressive/interlaced video. If you're very sure there are no interlaced frames anywhere in the video, postprocessing could also be turned off (we're usually not).
- For more than 2K video: Start with no VAAPI postprocessing for best performance. If interlaced frames are encountered, activate VAAPI postprocessing and reinitialize decoding with fresh surfaces. This means we'll drop frames and lag around for a bit, but should be OK-ish since >2K interlaced content is very rare.

Also, that VAAPI has to weave the first few textures before they've been exported once is not very nice. Is there some kind of hint we could give the driver that we want to have progressive surfaces?
Comment 4 Christian König 2018-02-19 07:49:34 UTC
(In reply to k.philipp from comment #2)
> Since we can do weaving on the Kodi side anyway (for VDPAU), would it be
> enough to add flags, say VA_EXPORT_SURFACE_TOP_FIELD and
> VA_EXPORT_SURFACE_BOTTOM_FIELD, to vaExportSurfaceHandle? Would that then be
> roughly equivalent to what we get with VDPAURegisterVideoSurfaceNV?

Yes, exactly that's what's needed.

(In reply to k.philipp from comment #3)
> Here's what we'll do instead (in case someone comes here looking for a
> workaround):
> - For 2K video or less: Always do VAAPI postprocessing. If the content turns
> out to be progressive, we do an unnecessary copy (as the driver has to
> weave). But it enables us to switch deinterlacing on and off on a per-frame
> basis - needed not only for toggling deinterlacing on user wish, but also
> for mixed progressive/interlaced video. If you're very sure there are no
> interlaced frames anywhere in the video, postprocessing could also be turned
> off (we're usually not).
> - For more than 2K video: Start with no VAAPI postprocessing for best
> performance. If interlaced frames are encountered, activate VAAPI
> postprocessing and reinitialize decoding with fresh surfaces. This means
> we'll drop frames and lag around for a bit, but should be OK-ish since >2K
> interlaced content is very rare.

Sounds like a good plan to me as well.

> Also, that VAAPI has to weave the first few textures before they've been
> exported once is not very nice. Is there some kind of hint we could give the
> driver that we want to have progressive surfaces?

Not that I know of, but feel free to suggest something. General problem with VA-API seems to be that suggestions made by AMD seems to be mostly ignored.

Also don't call it interlaced/progressive (that was just me trying to make sense of what the hardware is doing).

Instead call it something in the line of FIELD and FRAME, that is the more common terminology at least in the different MPEG standards.
Comment 5 67b0226d 2018-02-20 11:55:18 UTC
(In reply to Christian König from comment #4)
> (In reply to k.philipp from comment #3)
> > Also, that VAAPI has to weave the first few textures before they've been
> > exported once is not very nice. Is there some kind of hint we could give the
> > driver that we want to have progressive surfaces?
> 
> Not that I know of, but feel free to suggest something.
There are some surface hints, but I'm not sure if they fit this use case. You could argue that VA_SURFACE_ATTRIB_USAGE_HINT_DISPLAY ("Surface used for display") kind of applies here, since to display you have to export and to export you have to have it as frame (both for vaDeriveImage and vaExportSurfaceHandle). But yeah, there's also the third way of using vaPutSurface or vaGetSurfaceBufferWl (in theory, not sure how well mesa supports it) which would not inherently be limited to working with frames. "used for display" does not seem very well-defined in either case, so we could get away with it.
Cleaner solution could be to think of something new, of course. Something like VA_SURFACE_ATTRIB_USAGE_HINT_EXPORT_AS_FRAME? It really depends on what most accurately describes what info mesa needs.
If we agree on a name, I can try to bring it up with the libva people.

> General problem with
> VA-API seems to be that suggestions made by AMD seems to be mostly ignored.
That sounds very unfortunate. To be fair, I think that the TOP/BOTTOM field flags would have been added if this was raised during the vaExportSurfaceHandle discussion (https://github.com/intel/libva/pull/125) - maybe it was raised somewhere else, then please forgive my ignorance :-) It was my understanding that the addition of vaExportSurfaceHandle was done at least in part to be able to support VA-API in a usable fashion on AMD hardware at all.

> 
> Also don't call it interlaced/progressive (that was just me trying to make
> sense of what the hardware is doing).
> 
> Instead call it something in the line of FIELD and FRAME, that is the more
> common terminology at least in the different MPEG standards.
OK!
Comment 6 Christian König 2018-02-20 12:56:22 UTC
(In reply to k.philipp from comment #5)
> There are some surface hints, but I'm not sure if they fit this use case.

Setting the DISPLAY hint can be used with KMS to display the result without copying it, so that is not what we are searching for.

But checking the documentation once more VA_SURFACE_ATTRIB_USAGE_HINT_VPP_READ actually sounds exactly like what we need.

E.g. we switch over to allocate all surfaces in the progressive/frame format and only use the interlaced/field format when VA_SURFACE_ATTRIB_USAGE_HINT_VPP_READ is given.

Now you just need to find somebody volunteering to implement that :)

I can assist finding the proper places in the code, but I don't have time to do it myself.
Comment 7 67b0226d 2018-02-20 13:51:15 UTC
(In reply to Christian König from comment #6)
> But checking the documentation once more
> VA_SURFACE_ATTRIB_USAGE_HINT_VPP_READ actually sounds exactly like what we
> need.
> 
> E.g. we switch over to allocate all surfaces in the progressive/frame format
> and only use the interlaced/field format when
> VA_SURFACE_ATTRIB_USAGE_HINT_VPP_READ is given.
I did think about that, but it has the problem of breaking deinterlacing on all clients that do *not* set VA_SURFACE_ATTRIB_USAGE_HINT_VPP_READ (which I assume to be most) since there is - correct me if I'm wrong - at present no code to un-weave a decoded frame to fields and go back to interlaced/field mode, causing the initial issue in the first place.

> Now you just need to find somebody volunteering to implement that :)
Using the hint like in your proposal sounds doable to me because it's just shuffling some code around, but if we have to add de-weaving (is that even a word?) I'm probably out.
Comment 8 Christian König 2018-02-20 14:23:06 UTC
(In reply to k.philipp from comment #7)
> I did think about that, but it has the problem of breaking deinterlacing on
> all clients that do *not* set VA_SURFACE_ATTRIB_USAGE_HINT_VPP_READ (which I
> assume to be most) since there is - correct me if I'm wrong - at present no
> code to un-weave a decoded frame to fields and go back to interlaced/field
> mode, causing the initial issue in the first place.

Yeah, we can't do this without at least adding this frame to field conversion.

How about VA_SURFACE_ATTRIB_USAGE_HINT_EXPORT then? The VA-API hints seem to describe the use case instead of the effect, so that would match.

> > Now you just need to find somebody volunteering to implement that :)
> Using the hint like in your proposal sounds doable to me because it's just
> shuffling some code around, but if we have to add de-weaving (is that even a
> word?) I'm probably out.

Actually not so much of a problem, IIRC we have code for this in the OMX state tracker you could just copy&paste.

Basically you allocate a new video buffer and then copy from (0,0)-(width, heigth-1) to the top field and (0,1)-(width,height) to the bottom field of the new buffer using the blitter function.

Since the blitter uses nearest pixel interpolation that should slit the frame into two fields. I can search for the code if you're interested.
Comment 9 67b0226d 2018-03-07 07:43:50 UTC
(In reply to Christian König from comment #8)
> (In reply to k.philipp from comment #7)
> > I did think about that, but it has the problem of breaking deinterlacing on
> > all clients that do *not* set VA_SURFACE_ATTRIB_USAGE_HINT_VPP_READ (which I
> > assume to be most) since there is - correct me if I'm wrong - at present no
> > code to un-weave a decoded frame to fields and go back to interlaced/field
> > mode, causing the initial issue in the first place.
> 
> Yeah, we can't do this without at least adding this frame to field
> conversion.
> 
> How about VA_SURFACE_ATTRIB_USAGE_HINT_EXPORT then? The VA-API hints seem to
> describe the use case instead of the effect, so that would match.
I filed a PR with libva (https://github.com/intel/libva/pull/196), we'll see how it goes.
Comment 10 tempel.julian 2018-04-14 10:14:58 UTC
The PR got merged.
Comment 11 67b0226d 2018-06-11 12:58:15 UTC
Hi again,

> The PR got merged.

I do have a patch ready (it's just a few lines anyway), but vaapi has not seen a release yet so it is unclear which API version it should depend on. Also we have encountered even more corner cases that prompts me to now follow up on a comment to the PR, https://github.com/intel/libva/pull/196#issuecomment-371769757 - see below.

In this bug report, I described the approach (or workaround, rather) we took in Kodi to get useful vaapi-accelerated playback working on most videos. In the meantime, we have encountered more edge cases. Namely:
1. The described approach will not work for 1080p HEVC and VP9 videos, since the AMD decoder does not support the interlaced (field) format for HEVC and VP9 (see si_get_video_param) and will re-allocate the decode surfaces to progressive format when decoding the first picture. Then they cannot be used as post-processing input any more. This could be fixed by setting the export usage hint on the post-processing output surface, since then both input and output to post-processing are progressive.
2. We encountered a DVB H.264 PAFF mixed progressive/interlaced video with resolution 1920x1088, which failed the size check for always inserting post-processing by a few pixels. This can be fixed by making the check more lenient.

Now as you see these issues are not unfixable (or un-workaroundable), but you can probably see that in the long run these kind of edge cases will continue to crop up and cause problems. Our alternative workaround would be to always re-initialize the whole decode pipeline when encountering a change from interlaced to progressive frames or vice-versa, losing a bunch of already decoded frames in the pipeline in the process.

As this is also far from optimal and Christian König seemed positively inclined to switch to progressive by default, I want to investigate this route after all. The only missing piece for this to work seems to be de-weaving the frames when switching to the interlaced format, and I can have a look at that (can't promise a time frame though).
However, before investing time into that I want to ask if it would be also possible to go one step even further: Is the interlaced format necessary in the first place? With DVB (or other, but this seems the most common) PAFF streams, you will have to reallocate surfaces and weave/de-weave very often, possibly after few frames. Wouldn't it be more efficient then to ditch the interlaced format and have post-processing accept progressive-format frames for deinterlacing, like intel-vaapi-driver seems to do?
Comment 12 Christian König 2018-06-11 13:09:00 UTC
(In reply to k.philipp from comment #11)
> Is the interlaced format necessary in the first place?

Unfortunately yes it is.

The pixel shader used for de-interlacing needs to have access to the top/bottom field separately and that only works efficiently when they aren't mangled together. We could try to double the stride to work around that, but that isn't easily doable with the interface we have and we would stress the memory interface quite a bit more.

Additional to that most hardware de-interlacer work only with that layout, .e.g it is actually the more natural one for MPEG2. And that's also the reason why for example NVidia hardware only works with that as well.

That HEVC and VP9 only support progressive layout in the output format is also only logical because those formats don't support interlaced content.
Comment 13 67b0226d 2018-06-13 11:15:57 UTC
(In reply to Christian König from comment #12)
> Unfortunately yes it is.
That is indeed very unfortunate. It complicates stuff a lot :-(
I'd still like to see a solution that allows to support PAFF without copying around already-decoded frames/fields all the time. Do you have any ideas? Could we have the decoder put progressive content into progressive surfaces and interlaced content into interlaced surfaces (under the assumption that we will most likely want to deinterlace)?

> That HEVC and VP9 only support progressive layout in the output format is
> also only logical because those formats don't support interlaced content.
At least HEVC does support interlaced content in theory, but as far as I understood it the (hw) decoder does not need special support for that. How would that work then, would we get top/bottom field as two separate quasi-frames?
Comment 14 tempel.julian 2019-02-03 12:41:32 UTC
Unfortunately, VAAPI deinterlacing inside mpv --vo=gpu still doesn't work because this is still missing. :(
Comment 15 GitLab Migration User 2019-09-25 18:02:44 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1304.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.