Bug 98005 - VCE dual instance encoding inconsistent since st/va: enable dual instances encode by sync surface
Summary: VCE dual instance encoding inconsistent since st/va: enable dual instances en...
Status: ASSIGNED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-01 15:06 UTC by Andy Furniss
Modified: 2016-12-08 11:54 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
hacky fix for ntsc framerates (2.20 KB, patch)
2016-11-13 00:04 UTC, Andy Furniss
Details | Splinter Review
ntsc rate fix, seems solid but not pretty. (2.74 KB, patch)
2016-11-20 14:03 UTC, Andy Furniss
Details | Splinter Review
fix ntsc rates avoiding pointless division in last version (2.68 KB, patch)
2016-12-06 15:23 UTC, Andy Furniss
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Andy Furniss 2016-10-01 15:06:54 UTC
Bit late with this one, but I didn't notice initially.
It seems there is some timing/sync issue on R9285 with dual instance enabled after

c59628d11b134fc016388a170880f7646e100d6f
st/va: enable dual instances encode by sync surface

Testing with large rawvideo/higher bitrates lucked me out of noticing initially as visually these tend to be OK, though making say 20 and md5summing them will show inconsistencies. I can change the number of "bads" in some tests by flipping my cpus between cpufreq on_demand and perf.

At lower sizes/bitrates/transcoding it's possible to get corruption, either in the form of some runs giving abnormally low bitrate with vbr, or sometimes with cbr there is a chance of an out of order frame around an IDR frame.

Testing this with gstreamer.
Comment 1 Andy Furniss 2016-10-03 22:16:11 UTC
Continued on mesa-devel -

https://lists.freedesktop.org/archives/mesa-dev/2016-October/130385.html
Comment 2 Boyuan Zhang 2016-10-17 20:20:30 UTC
Hi Andy,

The corruption/out of order issue has been reproduced locally. Root cause and fix can be found at https://lists.freedesktop.org/archives/mesa-dev/2016-October/132268.html

Please give a try.

Regards,
Boyuan
Comment 3 Andy Furniss 2016-10-17 23:59:55 UTC
(In reply to Boyuan Zhang from comment #2)
> Hi Andy,
> 
> The corruption/out of order issue has been reproduced locally. Root cause
> and fix can be found at
> https://lists.freedesktop.org/archives/mesa-dev/2016-October/132268.html
> 
> Please give a try.
> 
> Regards,
> Boyuan

Thanks, that does fix the out of order and corruption.

As you explained in the thread above, dual instance bitrate control is a separate issue, but this patch does change behavior a bit.

cbr is unaffected by patch and could be seen as OK (unless I find a better test to break it). md5sums may vary but the bitrate/file size is only different by a tiny amount and visually things seem OK.

vbr is different, before patch it may be OK, but there was a chance of some files being less that half size/rate than they should.

With the patch it seems that they are all now less than half rate. So at least it's consistent (md5sums do still vary).

Will test more tomorrow as it's late here.
Comment 4 Boyuan Zhang 2016-10-18 17:58:44 UTC
(In reply to Andy Furniss from comment #3)
> (In reply to Boyuan Zhang from comment #2)
> > Hi Andy,
> > 
> > The corruption/out of order issue has been reproduced locally. Root cause
> > and fix can be found at
> > https://lists.freedesktop.org/archives/mesa-dev/2016-October/132268.html
> > 
> > Please give a try.
> > 
> > Regards,
> > Boyuan
> 
> Thanks, that does fix the out of order and corruption.
> 
> As you explained in the thread above, dual instance bitrate control is a
> separate issue, but this patch does change behavior a bit.
> 
> cbr is unaffected by patch and could be seen as OK (unless I find a better
> test to break it). md5sums may vary but the bitrate/file size is only
> different by a tiny amount and visually things seem OK.
> 
> vbr is different, before patch it may be OK, but there was a chance of some
> files being less that half size/rate than they should.
> 
> With the patch it seems that they are all now less than half rate. So at
> least it's consistent (md5sums do still vary).
> 
> Will test more tomorrow as it's late here.

Hi Andy,

Regarding to the bitrate/file size issue, I didn't see it with my local tests. 

For example, "vaapih264enc rate-control=cbr bitrate=15000" and "vaapih264enc rate-control=vbr bitrate=21428" will give me almost same output file size, which is expected results since 21428*70%=15000.

Maybe the command/clip we tested are different. Can you share the command & clip where you saw vbr got half of the expected value please?

Regards,
Boyuan
Comment 5 Andy Furniss 2016-10-18 20:25:47 UTC
I normally test with raw, but this sample also shows the issue with transcode,

I tried harder and it is possible since the patch to get a normal size vbr - just a lot harder than before.

1080p50.mkv - https://drive.google.com/file/d/0BxP5-S1t9VEEczlMd2ZzUElldms/view?usp=sharing

Here's a paste testing with your settings.

andy [vce-vbr-test]$ gst-launch-1.0 -f filesrc location=1080p50.mkv ! matroskademux ! vaapidecode ! vaapih264enc rate-control=vbr bitrate=21428 ! video/x-h264,profile=baseline ! filesink location=vbr.264
Setting pipeline to PAUSED ...
libva info: VA-API version 0.39.2
libva info: va_getDriverName() returns 0
libva info: User requested driver 'radeonsi'
libva info: Trying to open /usr/lib/dri/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_0_39
libva info: va_openDriver() returns 0
Pipeline is PREROLLING ...
Got context from element 'vaapiencodeh264-0': gst.vaapi.Display=context, gst.vaapi.Display=(GstVaapiDisplay)NULL;
Redistribute latency...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 0:00:05.440943972
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...



andy [vce-vbr-test]$ gst-launch-1.0 -f filesrc location=1080p50.mkv ! matroskademux ! vaapidecode ! vaapih264enc rate-control=cbr bitrate=15000 ! video/x-h264,profile=baseline ! filesink location=cbr.264
Setting pipeline to PAUSED ...
libva info: VA-API version 0.39.2
libva info: va_getDriverName() returns 0
libva info: User requested driver 'radeonsi'
libva info: Trying to open /usr/lib/dri/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_0_39
libva info: va_openDriver() returns 0
Pipeline is PREROLLING ...
Got context from element 'vaapiencodeh264-0': gst.vaapi.Display=context, gst.vaapi.Display=(GstVaapiDisplay)NULL;
Redistribute latency...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 0:00:05.162493127
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...


andy [vce-vbr-test]$ ls -lh
total 85M
-rw-rw-r-- 1 andy andy 35M Sep 28 16:05 1080p50.mkv
-rw-rw-r-- 1 andy andy 35M Oct 18 21:17 cbr.264
-rw-rw-r-- 1 andy andy 15M Oct 18 21:16 vbr.264
Comment 6 Boyuan Zhang 2016-11-10 22:57:28 UTC
Hi Andy,

I just submitted 2 reviews for fixing the bit-rate issue. The 2 patches can be found at https://lists.freedesktop.org/archives/mesa-dev/2016-November/134944.html

0001-st/va: force to submit two consecutive single jobs
0002-st/va: fix gop size for rate control

Please give a try and let me know how it works.

Regards,
Boyuan
Comment 7 Andy Furniss 2016-11-10 23:35:44 UTC
(In reply to Boyuan Zhang from comment #6)
> Hi Andy,
> 
> I just submitted 2 reviews for fixing the bit-rate issue. The 2 patches can
> be found at
> https://lists.freedesktop.org/archives/mesa-dev/2016-November/134944.html
> 
> 0001-st/va: force to submit two consecutive single jobs
> 0002-st/va: fix gop size for rate control
> 
> Please give a try and let me know how it works.
> 
> Regards,
> Boyuan

Hi, I can't get those to apply to current mesa head.

Checking patch src/gallium/state_trackers/va/picture.c...
Hunk #1 succeeded at 413 (offset 1 line).
Hunk #2 succeeded at 568 (offset 1 line).
Checking patch src/gallium/state_trackers/va/surface.c...
error: while searching for:

   if (context->decoder->entrypoint == PIPE_VIDEO_ENTRYPOINT_ENCODE) {
      int frame_diff;
      if (context->desc.h264enc.frame_num_cnt > surf->frame_num_cnt)
         frame_diff = context->desc.h264enc.frame_num_cnt - surf->frame_num_cnt;
      else
         frame_diff = 0xFFFFFFFF - surf->frame_num_cnt + 1 + context->desc.h264enc.frame_num_cnt;
      if (frame_diff < 2)
         context->decoder->flush(context->decoder);
      context->decoder->get_feedback(context->decoder, surf->feedback, &(surf->coded_buf->coded_size));
   }
   pipe_mutex_unlock(drv->mutex);

error: patch failed: src/gallium/state_trackers/va/surface.c:119
error: src/gallium/state_trackers/va/surface.c: patch does not apply
Checking patch src/gallium/state_trackers/va/va_private.h...
Hunk #2 succeeded at 275 (offset 1 line).

and

Checking patch src/gallium/state_trackers/va/picture.c...
Hunk #1 succeeded at 351 (offset 1 line).
error: while searching for:
   context->desc.h264enc.not_referenced = false;
   context->desc.h264enc.is_idr = (h264->pic_fields.bits.idr_pic_flag == 1);
   context->desc.h264enc.pic_order_cnt = h264->CurrPic.TopFieldOrderCnt / 2;
   if (context->desc.h264enc.is_idr)
      context->desc.h264enc.i_remain = 1;
   else
      context->desc.h264enc.i_remain = 0;

   context->desc.h264enc.p_remain = context->desc.h264enc.gop_size - context->desc.h264enc.gop_cnt - context->desc.h264enc.i_remain;


error: patch failed: src/gallium/state_trackers/va/picture.c:390
error: src/gallium/state_trackers/va/picture.c: patch does not apply
Checking patch src/gallium/state_trackers/va/va_private.h...
Comment 8 Boyuan Zhang 2016-11-11 16:31:25 UTC
(In reply to Andy Furniss from comment #7)
> (In reply to Boyuan Zhang from comment #6)
> > Hi Andy,
> > 
> > I just submitted 2 reviews for fixing the bit-rate issue. The 2 patches can
> > be found at
> > https://lists.freedesktop.org/archives/mesa-dev/2016-November/134944.html
> > 
> > 0001-st/va: force to submit two consecutive single jobs
> > 0002-st/va: fix gop size for rate control
> > 
> > Please give a try and let me know how it works.
> > 
> > Regards,
> > Boyuan
> 
> Hi, I can't get those to apply to current mesa head.
> 
> Checking patch src/gallium/state_trackers/va/picture.c...
> Hunk #1 succeeded at 413 (offset 1 line).
> Hunk #2 succeeded at 568 (offset 1 line).
> Checking patch src/gallium/state_trackers/va/surface.c...
> error: while searching for:
> 
>    if (context->decoder->entrypoint == PIPE_VIDEO_ENTRYPOINT_ENCODE) {
>       int frame_diff;
>       if (context->desc.h264enc.frame_num_cnt > surf->frame_num_cnt)
>          frame_diff = context->desc.h264enc.frame_num_cnt -
> surf->frame_num_cnt;
>       else
>          frame_diff = 0xFFFFFFFF - surf->frame_num_cnt + 1 +
> context->desc.h264enc.frame_num_cnt;
>       if (frame_diff < 2)
>          context->decoder->flush(context->decoder);
>       context->decoder->get_feedback(context->decoder, surf->feedback,
> &(surf->coded_buf->coded_size));
>    }
>    pipe_mutex_unlock(drv->mutex);
> 
> error: patch failed: src/gallium/state_trackers/va/surface.c:119
> error: src/gallium/state_trackers/va/surface.c: patch does not apply
> Checking patch src/gallium/state_trackers/va/va_private.h...
> Hunk #2 succeeded at 275 (offset 1 line).
> 
> and
> 
> Checking patch src/gallium/state_trackers/va/picture.c...
> Hunk #1 succeeded at 351 (offset 1 line).
> error: while searching for:
>    context->desc.h264enc.not_referenced = false;
>    context->desc.h264enc.is_idr = (h264->pic_fields.bits.idr_pic_flag == 1);
>    context->desc.h264enc.pic_order_cnt = h264->CurrPic.TopFieldOrderCnt / 2;
>    if (context->desc.h264enc.is_idr)
>       context->desc.h264enc.i_remain = 1;
>    else
>       context->desc.h264enc.i_remain = 0;
> 
>    context->desc.h264enc.p_remain = context->desc.h264enc.gop_size -
> context->desc.h264enc.gop_cnt - context->desc.h264enc.i_remain;
> 
> 
> error: patch failed: src/gallium/state_trackers/va/picture.c:390
> error: src/gallium/state_trackers/va/picture.c: patch does not apply
> Checking patch src/gallium/state_trackers/va/va_private.h...

Hi Andy,

Sorry about that. Just rebased my codes. Please find the new patches here:
https://lists.freedesktop.org/archives/mesa-dev/2016-November/135076.html
https://lists.freedesktop.org/archives/mesa-dev/2016-November/135077.html

Regards,
Boyuan
Comment 9 Andy Furniss 2016-11-11 21:40:16 UTC
The transcoding test above now makes similar size to cbr.

There is a new issue though. The GOP patch (2) is problematic.

It doesn't show on the transcoding test, but after reading the patch comment about gop affecting rate control I tested with gstreamer default (30) and max =

keyframe-period=300

With vbr plus high bitrates this makes a large difference in file size.

This still happens if I disable dual instance in radeon_vce.c.

CBR seems to be unaffected, though dual instance encoding seems to come out a bit lower than single instance.

The vbr test =

gst-launch-1.0 -f filesrc location=/mnt/ramdisk/trees-1440p50.nv12 blocksize=5529600 ! video/x-raw,format=NV12,width=2560,height=1440,framerate=50/1 ! queue ! vaapih264enc rate-control=vbr bitrate=50000 keyframe-period=300 ! video/x-h264, profile=baseline ! filesink location=/mnt/ramdisk/out.264

Result = 9.4M file (source is 500 frames)

above with keyframe-period=30 = 42M which is close to expected result.
Comment 10 Andy Furniss 2016-11-11 21:51:20 UTC
Testing more/with a longer stream - cbr is affected by this as well, though the difference is smaller.
Comment 11 Andy Furniss 2016-11-13 00:03:24 UTC
It turns out with more time + different tests that these patches don't fix the original issue.

I can still get too low bitrates depending on timing/luck eg. forcing powerplay/cpufreq to high with the right test stream can get different results from auto. This is testing with the same GOP and (so far) disabling dual instance does seem to always produce the correct bitrate.

I also found another bug = ntsc framerates are not handled. I live in PAL land, so didn't notice this sooner, though we still get /1001 framerates on blu-ray which is how I belatedly noticed.

I made a hacky fix - maybe working out where (if anywhere) the players pass framerate den would be better, but I'll attach it anyway just to see what you think (it only applies to vanilla mesa). Seems to work OK with avconv and gstreamer.
Comment 12 Andy Furniss 2016-11-13 00:04:26 UTC
Created attachment 127943 [details] [review]
hacky fix for ntsc framerates
Comment 13 Boyuan Zhang 2016-11-18 21:17:05 UTC
(In reply to Andy Furniss from comment #11)
> It turns out with more time + different tests that these patches don't fix
> the original issue.
> 
> I can still get too low bitrates depending on timing/luck eg. forcing
> powerplay/cpufreq to high with the right test stream can get different
> results from auto. This is testing with the same GOP and (so far) disabling
> dual instance does seem to always produce the correct bitrate.
> 
> I also found another bug = ntsc framerates are not handled. I live in PAL
> land, so didn't notice this sooner, though we still get /1001 framerates on
> blu-ray which is how I belatedly noticed.
> 
> I made a hacky fix - maybe working out where (if anywhere) the players pass
> framerate den would be better, but I'll attach it anyway just to see what
> you think (it only applies to vanilla mesa). Seems to work OK with avconv
> and gstreamer.

Hi Andy,

Please take a look at the new patch set I just submitted:
1. https://lists.freedesktop.org/archives/mesa-dev/2016-November/135815.html
2. https://lists.freedesktop.org/archives/mesa-dev/2016-November/135813.html

With these 2 patches, the vbr dual instance bit-rate issue should be solved. Please give a try. If you still see bit-rate failure, please send me the command which can reproduce the issue.

In the meantime, I will look at the ntsc framerate.

Thanks
Comment 14 Andy Furniss 2016-11-19 00:37:20 UTC
Hi,

Initial testing seems good for vbr bitrates, but I am afraid the patches regress cbr so the out of order frames near I frames issue is back.

It's still possible to get cbr and vbr bitrates to vary with different gop sizes, but it's only really noticeable at really high bitrates. At normal rates it's still there but the difference is small.
Comment 15 Andy Furniss 2016-11-20 14:01:19 UTC
(In reply to Boyuan Zhang from comment #13)

> In the meantime, I will look at the ntsc framerate.

On this I looked what gstreamer and avconv were sending and both do send the denominator so a v2 (horrible looking) patch attached which tests OK for a variety of inputs.

I guess there's a more elegant way, but it works. Maybe over paranoid trying to avoid division by 0 in this case, but I notice that the gop patch can get one if the user asks for -g 0 with avconv.
Comment 16 Andy Furniss 2016-11-20 14:03:21 UTC
Created attachment 128084 [details] [review]
ntsc rate fix, seems solid but not pretty.
Comment 17 Andy Furniss 2016-11-20 23:25:57 UTC
(In reply to Andy Furniss from comment #14)
> Hi,
> 
> Initial testing seems good for vbr bitrates

More testing with longer transcoding type tests shows it is still possible to get low bitrates with the patches.

Input is derived from blu-ray = 13 minutes = 2.6 gig so a bit hard to upload.

Diference seen between gpu on perf and auto (uvd decode is faster with full clocks)

gst-launch-1.0 -f filesrc location=in-24fps.mp4 ! qtdemux ! queue ! vaapidecode ! queue ! vaapih264enc rate-control=vbr bitrate=10000 ! video/x-h264,profile=baseline ! filesink location=out24.264
Comment 18 Boyuan Zhang 2016-11-21 19:54:28 UTC
(In reply to Andy Furniss from comment #14)
> but I am afraid the patches regress cbr so the out of order frames near I frames issue is back.

The "out of order" issue was caused by sending last p and next i frame together for dual instance encoding, which has already been solved and shouldn't affect by this patch. I suspect the i-frame has less QP (quality) than p-frame in cbr case (to meet constant bit-rate) where bit-rate is not high enough, as a result you might see low picture quality for each i-frame. Please confirm with high bitrate or using analyser to check the order.

If you still see the out-of-order issue, please share the clip and command.

Thanks
Comment 19 Boyuan Zhang 2016-11-21 22:01:35 UTC
(In reply to Boyuan Zhang from comment #18)
> (In reply to Andy Furniss from comment #14)
> > but I am afraid the patches regress cbr so the out of order frames near I frames issue is back.
> 
> The "out of order" issue was caused by sending last p and next i frame
> together for dual instance encoding, which has already been solved and
> shouldn't affect by this patch. I suspect the i-frame has less QP (quality)
> than p-frame in cbr case (to meet constant bit-rate) where bit-rate is not
> high enough, as a result you might see low picture quality for each i-frame.
> Please confirm with high bitrate or using analyser to check the order.
> 
> If you still see the out-of-order issue, please share the clip and command.
> 
> Thanks

Nevermind, the cbr issue was reproduced and fixed. Please try the latest patch I just send.
Comment 20 Boyuan Zhang 2016-11-21 22:51:27 UTC
(In reply to Boyuan Zhang from comment #19)
> (In reply to Boyuan Zhang from comment #18)
> > (In reply to Andy Furniss from comment #14)
> > > but I am afraid the patches regress cbr so the out of order frames near I frames issue is back.
> > 
> > The "out of order" issue was caused by sending last p and next i frame
> > together for dual instance encoding, which has already been solved and
> > shouldn't affect by this patch. I suspect the i-frame has less QP (quality)
> > than p-frame in cbr case (to meet constant bit-rate) where bit-rate is not
> > high enough, as a result you might see low picture quality for each i-frame.
> > Please confirm with high bitrate or using analyser to check the order.
> > 
> > If you still see the out-of-order issue, please share the clip and command.
> > 
> > Thanks
> 
> Nevermind, the cbr issue was reproduced and fixed. Please try the latest
> patch I just send.

Links can be found below:
https://lists.freedesktop.org/archives/mesa-dev/2016-November/136065.html
https://lists.freedesktop.org/archives/mesa-dev/2016-November/136066.html
Comment 21 Andy Furniss 2016-11-22 00:29:33 UTC
Still the same issue with latest patches. I can sometimes get vbr to be affected as well.

I made a smaller test file, it's re-coded with software to 24 fps as the real master is /1001 and so would be wrong without an additional patch anyway.

I am just doing repeated decode/encode from/to ram and changing cpu/gpu settings to get unlucky timing - I guess on a different box you could luck out of seeing it, just initial testing with raw video input from ram seemed to work for me.

This transcode seems to be mostly wrong with my cpu on perf and gpu on auto.

for x in $(seq 1 20); do gst-launch-1.0 -f filesrc location=/mnt/ramdisk/in-24fps.mp4 ! qtdemux ! vaapidecode ! queue ! vaapih264enc rate-control=cbr bitrate=5000 ! video/x-h264,profile=baseline ! filesink location=/mnt/ramdisk/cbr-$x.264; done

in-24fps.mp4 - https://drive.google.com/file/d/0BxP5-S1t9VEEaWNkZXdmb01GYmM/view?usp=sharing
Comment 22 Andy Furniss 2016-11-22 00:51:11 UTC
FWIW the reference decoder on a bad result looks like below when it hits the first reversal it bails. Seems to always be the same place - this with the default gop = 30.

00000(IDR)        0     0    20                             4:2:0     119
00001( P )        2     1    20                             4:2:0      40
00002( P )        4     2    21                             4:2:0      41
00003( P )        6     3    21                             4:2:0      41
00004( P )        8     4    22                             4:2:0      38
00005( P )       10     5    22                             4:2:0      46
00006( P )       12     6    22                             4:2:0      46
00007( P )       14     7    22                             4:2:0      46
00008( P )       16     8    22                             4:2:0      47
00009( P )       18     9    22                             4:2:0      47
00010( P )       20    10    21                             4:2:0      49
00011( P )       22    11    21                             4:2:0      49
00012( P )       24    12    21                             4:2:0      49
00013( P )       26    13    21                             4:2:0      49
00014( P )       28    14    21                             4:2:0      49
00015( P )       30    15    21                             4:2:0      49
00016( P )       32    16    21                             4:2:0      50
00017( P )       34    17    21                             4:2:0      49
00018( P )       36    18    21                             4:2:0      49
00019( P )       38    19    21                             4:2:0      49
00020( P )       40    20    21                             4:2:0      49
00021( P )       42    21    21                             4:2:0      36
00022( P )       44    22    21                             4:2:0      49
00023( P )       46    23    21                             4:2:0      49
00024( P )       48    24    21                             4:2:0      48
00025( P )       50    25    21                             4:2:0      49
00026( P )       52    26    21                             4:2:0      50
00027( P )       54    27    21                             4:2:0      50
00028( P )       56    28    21                             4:2:0      52
An unintentional loss of pictures occurs! Exit
Comment 23 Andy Furniss 2016-11-22 01:03:01 UTC
To clarify "Seems to always be the same place" I don't mean in the video - that's variable. I mean with gop = 30 after 00028 it will bail.
Comment 24 Boyuan Zhang 2016-11-24 19:53:05 UTC
(In reply to Andy Furniss from comment #23)
> To clarify "Seems to always be the same place" I don't mean in the video -
> that's variable. I mean with gop = 30 after 00028 it will bail.

Hi Andy,

Thanks for catching that issue. I debugged from my side and found the root cause. I made a fix and just submitted the updated 2 patches to mesa mailing list.

Both "out of order" and "vbr incorrect bitrate" issues are supposed to be fixed. Please give a try.

Thanks,
Boyuan
Comment 25 Boyuan Zhang 2016-11-24 22:04:04 UTC
(In reply to Boyuan Zhang from comment #24)
> (In reply to Andy Furniss from comment #23)
> > To clarify "Seems to always be the same place" I don't mean in the video -
> > that's variable. I mean with gop = 30 after 00028 it will bail.
> 
> Hi Andy,
> 
> Thanks for catching that issue. I debugged from my side and found the root
> cause. I made a fix and just submitted the updated 2 patches to mesa mailing
> list.
> 
> Both "out of order" and "vbr incorrect bitrate" issues are supposed to be
> fixed. Please give a try.
> 
> Thanks,
> Boyuan

Links are here:
https://lists.freedesktop.org/archives/mesa-dev/2016-November/136460.html
https://lists.freedesktop.org/archives/mesa-dev/2016-November/136458.html
Comment 26 Andy Furniss 2016-11-25 00:53:12 UTC
So far the patches seem good :-)

cbr and vbr are behaving as expected - including md5sums always being the same.
Comment 27 Boyuan Zhang 2016-11-25 21:15:08 UTC
(In reply to Andy Furniss from comment #26)
> So far the patches seem good :-)
> 
> cbr and vbr are behaving as expected - including md5sums always being the
> same.

Sounds cool! =)

Sorry for taking a long time to fully solve these issues. The two/three reported issues are caused by 2 root causes, however these 2 root causes can actually affect each other in certain way. Thanks for catching all these issues.

I will let you run more tests a bit. If everything seems fine, I will push the patches to upstream.

Regards,
Boyuan
Comment 28 Andy Furniss 2016-11-26 19:45:35 UTC
Still good, so I think these are OK to go in.
Comment 29 Andy Furniss 2016-11-26 21:59:07 UTC
Maybe not yet - Been testing cbr and vbr and these seem OK still.

There is an issue with cqp that I'll have to test further - it could be related to the source size, I am not sure yet.

With 2160p gstreamer is Ok with its default QP of 26 or 28, but any higher it bails with -

0:00:00.252952155  6716 0x7f8898002d90 ERROR            vaapiencode gstvaapiencode.c:210:gst_vaapiencode_default_alloc_buffer: invalid GstVaapiCodedBuffer size (0 bytes)
0:00:00.253007252  6716 0x7f8898002d90 ERROR            vaapiencode gstvaapiencode.c:316:gst_vaapiencode_push_frame: failed to allocate encoded buffer in system memory

It may be that there are other ways to hit this before the patches - I'll try more tomorrow.
Comment 30 Andy Furniss 2016-11-26 23:23:45 UTC
Seems that error is a bit random and not always there.

It's possible with the patches to get corrupted output with cqp, fix seems easy =
disable the statement -

if (context->desc.h264enc.rate_ctrl.rate_ctrl_method !=
PIPE_H264_ENC_RATE_CONTROL_METHOD_DISABLE

Unerlated observation while testing related to perf = I guess it depends on stream, but on a raw 2160p60 input from ram, with or without patches, I am more than twice as fast with init-qp <= 28.

Timing luck loosing me dual instance? Or something more fundamental meaning the encoder works harder al lower rates?

I notice that unlike omx when using cqp,  some rate control settings are still filled in.

omx is the same speed with higher qp - and produces bigger files for the same qp.
Comment 31 Andy Furniss 2016-11-27 00:53:00 UTC
(In reply to Andy Furniss from comment #30)

> Unerlated observation while testing related to perf = I guess it depends on
> stream, but on a raw 2160p60 input from ram, with or without patches, I am
> more than twice as fast with init-qp <= 28.
> 
> Timing luck loosing me dual instance? Or something more fundamental meaning
> the encoder works harder al lower rates?
> 
> I notice that unlike omx when using cqp,  some rate control settings are
> still filled in.

Hacking them the same made no difference.

This turned out to be a gstreamer wierd because I didn't have a ! queue ! before the encoder.

> omx is the same speed with higher qp - and produces bigger files for the
> same qp.

Still makes slightly bigger files (cabac is on for both)
Comment 32 Boyuan Zhang 2016-11-28 22:34:15 UTC
Hi Andy,

To summarise your test results, basically you saw the following 3 issues with CQP, is it correct?

- Issue #1. For large clip, sometime you can "invalid GstVaapiCodedBuffer size (0 bytes)" error.
- Issue #2. Random corruption observed when using CQP. Corruption issue is gone after disable the statement 
"if (context->desc.h264enc.rate_ctrl.rate_ctrl_method !=PIPE_H264_ENC_RATE_CONTROL_METHOD_DISABLE"
- Issue #3. Encoding speed is twice faster when setting QP<=28

So far I'm lucky enough that my test haven't trigger any of the issue yet. I will spend more time to try from my side. Again, if you could share the commands you were using to trigger the above 3 issue, that would be very appreciated.

In the meantime, I would like to push the 2 patches to upstream first. Because this two patches don't change the behaviour of CQP, but only fixing VBR/CBR. Therefore, I believe CQP fix should be apart from these 2 patches.

Regards,
Boyuan
Comment 33 Andy Furniss 2016-11-28 23:47:59 UTC
(In reply to Boyuan Zhang from comment #32)
> Hi Andy,
> 
> To summarise your test results, basically you saw the following 3 issues
> with CQP, is it correct?
> 
> - Issue #1. For large clip, sometime you can "invalid GstVaapiCodedBuffer
> size (0 bytes)" error.

Turns out I don't need a large clip.

Just running the transcode test above but with vaapih264enc rate-control=cqp init-qp=30 may give this error.

> - Issue #2. Random corruption observed when using CQP. Corruption issue is
> gone after disable the statement

Issue 1 and 2 are gone with that disabled.

> "if (context->desc.h264enc.rate_ctrl.rate_ctrl_method
> !=PIPE_H264_ENC_RATE_CONTROL_METHOD_DISABLE"
> - Issue #3. Encoding speed is twice faster when setting QP<=28

This one was my fault because I forgot to put a ! queue ! in my command and is nothing to do with the patches/not a real issue.

> So far I'm lucky enough that my test haven't trigger any of the issue yet. I
> will spend more time to try from my side. Again, if you could share the
> commands you were using to trigger the above 3 issue, that would be very
> appreciated.
> 
> In the meantime, I would like to push the 2 patches to upstream first.
> Because this two patches don't change the behaviour of CQP, but only fixing
> VBR/CBR. Therefore, I believe CQP fix should be apart from these 2 patches.

CQP works perfectly with vanilla mesa, it is broken by patch 1.
Comment 34 Boyuan Zhang 2016-11-29 20:02:55 UTC
Hi Andy,

I re-checked the logic, cqp behaviour is affected patch#1. By re-testing cqp, I confirmed that force to submit two consecutive single job logic should be applied to cqp as well, otherwise we have a chance to miss a flush or end up with sending last p and next i frame together which may protentially cause corruption. Nice catch!

Solution is simple, as you already tried, remove the "if statement" to apply new logic to cqp as well.

New patches can be found at:

https://lists.freedesktop.org/archives/mesa-dev/2016-November/136860.html
https://lists.freedesktop.org/archives/mesa-dev/2016-November/136855.html

I will push them to upstream once you confirmed the fix. :-)

Regards,
Boyuan
Comment 35 Andy Furniss 2016-11-30 19:35:13 UTC
I need to test more, but it will have to be tomorrow.

While patch one seems OK now, the issue I mentioned about patch 2 being under rate with gop 300 vs gop 30 still persists at high bitrates.

I initially thought it was only a bit low, and it is with the right test, but it seems that it's possible to be a lot low sometimes. This is with really high rates, like 100mbit. I need to see if there are combinations of gop/framerate/bitrate that fail at normal rates.
Comment 36 Andy Furniss 2016-11-30 20:53:44 UTC
It seems this is a dual instance issue, as it doesn't happen with it disabled.

It doesn't seem to happen at normal bitrates, but at high bits per picture, say 25 fps 100mbit, there is a big difference between gop 30 and 300.

I also notice that the encode time raises for the higher gop.
Comment 37 Andy Furniss 2016-12-06 00:25:17 UTC
I see the patches are in now - Which I think is OK as they do AFAICT fix normal use cases.

Maybe in future the algorithm for increasing gop could take into account bits per picture as reducing gop size when this is very high could be a workaround.

There is still the question of why single instance avoids this, the commit message reads like the rate control system in general needs this and doesn't say it's just a dual instance requirement.

Though gstreamer is buggy above 100M ffmpeg isn't and 240M is a legal rate for level 5.2 baseline, and now is hard to reach with dual instance without going for abnormally low gop.
Comment 38 Andy Furniss 2016-12-06 15:23:04 UTC
Created attachment 128354 [details] [review]
fix ntsc rates avoiding pointless division in last version
Comment 39 Andy Furniss 2016-12-08 11:54:11 UTC
While "normal" testing seems OK I've found a way to rarely get gstreamer to fail to allocate a buffer.

ERROR vaapi gstvaapiencoder.c:341:gst_vaapi_encoder_put_frame: failed to allocate coded buffer
ERROR vaapiencode gstvaapiencode.c:558:gst_vaapiencode_handle_frame: failed to encode frame 559 (status -2)

This was a mean test - transcoding a 1080p file repeatedly with a couple of scripts running that after random 1 to 20sec sleeps turn cpu/gpu from perf to auto.

As long as nothing else is happening the results are good, but if I say run an opengl demo like Unigine Valley as well there's a chance that I get the above fail on say 1 out of 80 runs where a run is transcoding a 120 sec 24 fps sample.

I can reproduce this with cbr, vbr or cqp and single instance - so maybe it should get its own bug ....

Though using extreme tests to reproduce I initially noticed this without running gl as such = only glamor scrolling xterm very fast as I had a load of printfs in the code.


bug/show.html.tmpl processed on Jan 18, 2017 at 04:12:37.
(provided by the Example extension).