Bug 75106 - File corruption with h264encode
Summary: File corruption with h264encode
Status: RESOLVED WORKSFORME
Alias: None
Product: libva
Classification: Unclassified
Component: intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) All
: medium critical
Assignee: haihao
QA Contact: Sean V Kelley
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-17 17:13 UTC by Bryan Christ
Modified: 2014-03-28 06:32 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Bryan Christ 2014-02-17 17:13:11 UTC

    
Comment 1 Bryan Christ 2014-02-17 17:14:24 UTC
*How to produce a video corruption with h264encode*
*Background*
When encoding two files, there are four significant events:
  A. Begin encoding first file
  B. Finish encoding first file
  C. Begin encoding second file
  D. Finish encoding second file

There are three possible orderings for these four events:


ABCD (no overlap - first file is finished before second begins) **No
corruption observed
    A----------B         C----------D


ACDB (complete overlap - the second file begins after and finishes before
the first)  **No corruption observed
   A----------------B
         C----D

ACBD (partial overlap - the second file begins before the first file
finishes)
   A----------B
          C----------D


The last case ACBD (partial overlap) always results in video corruption.
The video output of second file is correct up to event B. Beyond that
point, the video is corrupt.


*Procedure To Reproduce*You will need to start with a yuv formatted input
file. If you don't have one, you can create one with ffmpeg from any source
video.

1.  Determine how long it takes to encode the file
  # time /root/libva-1.2.1/test/encode/h264encode --srcyuv in.yuv
-framecount 0 -f 25 -o out1.mp4 > /dev/null 2> /dev/null

  real    0m14.984s
  user    0m6.690s
  sys    0m10.115s

So, in this example, my file takes 15 seconds to encode so I will start the
second video encoding about 7 seconds after the first to make
sure they overlap in the ACBD order described above.

2. Open two separate terminals.  Run these commands, one in each terminal,
with the 2nd command executed about 7 seconds after the first (this delay
is to ensure the right overlap)

  # date ; time /root/libva-1.2.1/test/encode/h264encode --srcyuv in.yuv
-framecount 0 -f 25 -o out1.mp4 >/dev/null 2>/dev/null ; date
  # date ; time /root/libva-1.2.1/test/encode/h264encode --srcyuv in.yuv
-framecount 0 -f 25 -o out2.mp4 >/dev/null 2>/dev/null ; date

Here is sample output (Note: h264encode output is suppressed):


*Terminal 1:*  # date ; time /root/libva-1.2.1/test/encode/h264encode
--srcyuv in.yuv -framecount 0 -f 25 -o out1.mp4 >/dev/null 2>/dev/null ;
date
  Wed Feb 12 16:44:08 CST 2014

  real    0m15.084s
  user    0m5.698s
  sys    0m9.601s
  Wed Feb 12 16:44:23 CST 2014


*Terminal 2:*  # date ; time /root/libva-1.2.1/test/encode/h264encode
--srcyuv in.yuv -framecount 0 -f 25 -o out2.mp4 >/dev/null 2> /dev/null ;
date
  Wed Feb 12 16:44:15 CST 2014

  real    0m9.097s
  user    0m3.431s
  sys    0m5.660s
  Wed Feb 12 16:44:24 CST 2014

Note that the second encoding finishes only one second after the first
despite being started 7 seconds later. Testing with different offsets
confirms the second encoding always finishes immediately after the first,
regardless of start time.



*Verify Corruption Exists*



1.      Use the file compare utility cmp to compare files:

#cmp - b out1.mp4 out2.mp4

2.      If the files are identical, there will be no output. What I see is
the files begin showing differences at a position relatively proportional
to my start offset. For example, starting 7.5 seconds after the first
video, using 15 second videos, the corruption appears in the middle of the
file.

If you were to play the file, you would observe the file plays correctly to
the point of corruption. At that point, the video freezes or flickers
between 2 frames.
Comment 2 Bryan Christ 2014-02-17 17:15:05 UTC
*System from vainfo*


*libva info: VA-API version 0.34.0*

*libva info: va_getDriverName() returns 0*

*libva info: Trying to open /usr/local/lib/dri/i965_drv_video.so*

*libva info: Found init function __vaDriverInit_0_34*

*libva info: va_openDriver() returns 0*

*vainfo: VA-API version: 0.34 (libva 1.2.2.pre1)*

*vainfo: Driver version: Intel i965 driver - 1.2.1*

*vainfo: Supported profile and entrypoints*

*      VAProfileNone                   : VAEntrypointVideoProc*

*      VAProfileMPEG2Simple            : VAEntrypointVLD*

*      VAProfileMPEG2Simple            : VAEntrypointEncSlice*

*      VAProfileMPEG2Main              : VAEntrypointVLD*

*      VAProfileMPEG2Main              : VAEntrypointEncSlice*

*      VAProfileH264Baseline           : VAEntrypointVLD*

*      VAProfileH264Baseline           : VAEntrypointEncSlice*

*      VAProfileH264Main               : VAEntrypointVLD*

*      VAProfileH264Main               : VAEntrypointEncSlice*

*      VAProfileH264High               : VAEntrypointVLD*

*      VAProfileH264High               : VAEntrypointEncSlice*

*      VAProfileVC1Simple              : VAEntrypointVLD*

*      VAProfileVC1Main                : VAEntrypointVLD*

*      VAProfileVC1Advanced            : VAEntrypointVLD*

*      VAProfileJPEGBaseline           : VAEntrypointVLD*
Comment 3 ZhaoShengyan 2014-02-25 09:12:43 UTC
Test Env
=======================================
Arch:           x86_64
Platform: 		IVB
Kernel_version:         3.9.5-301.fc19.x86_64
Libdrm:         (master)libdrm-2.4.52-4-gc5de5abbd90333fe1359283fb3a5e457b0f389f3
Mesa:           (master)73c78c514f8db0605c0deb85382003d0f66b5525
Xserver:                (master)xorg-server-1.15.0-631-g0f10cfd4b903d4db293ec47c8a9a0d8b33965803
Xf86_video_intel:               (master)2.99.910-68-gff49944928c7399527b11bb0da7699711591c21a
Libva:          (staging)8be6d274d931d8041934efa63caaee75a5984755
Libva_intel_driver:             (staging)bd630edd844b88ea543a027654db296ff7da16cd
Ffmpeg:         (master)fff526230148b3a67c04c328eecb16efac654e68
Mplayer:                (hwaccel-vaapi)1923fa10ed77bbf8408f2ce312d85a97dab1f0f3
Gstreamer10:            (1.0)4e880d4d1e151ea64f83c28b5c3e1bbc06c57903
Gst_plugins_base10:             (1.0)2dd3f028c1e6dea799d7496639f53220818b20b1
Gst_plugins_good10:             (1.0)643d425f51f81b56deec16c01162637546708ee5
Gst_plugins_bad10:              (1.0)0587ab41b4f9979e9cfc11011ed5c970569ee3d3
Gst_plugins_ugly10:             (1.0)c7c911b8320576429e4a4234a1e29ec7436e6814
Gst_plugins_vaapi10:            (master)e52d394b9e1e7124a141cc26675068e6fc2446a9

Test Steps:
=========================================
1. Writing a script named encode1.sh:

#!/bin/bash
source /root/media_tools/gst10.env
rm -rf /root/.cache/gstreamer-1.0/
rm -rf '/root/.gstreamer-0.10/*'
gst-launch-1.0 -v filesrc location=/home/shengyan/720p5994_parkrun_ter.yuvx5 '!' videoparse format=i420 width=1280 height=720 '!' vaapiencode_h264 rate-control=cqp bitrate=0 keyframe-period=30 max-bframes=2 init-qp=28 num-slices=0 cabac=false dct8x8=false '!' filesink location=/tmp/720p5994_parkrun_ter.yuvx5_1.264

Corresponding writing file encode1.sh with only difference "filesink location=/tmp/720p5994_parkrun_ter.yuvx5_2.264".

2. Test single encoding time cost with following command:
time . encode1.sh
Log output:
real    0m8.958s
user    0m2.366s
sys     0m2.506s

3. Test with "ACBD" mode, which trigger encode2.sh while encode1.sh have run 4 seconds:
time . encode1.sh in one terminal
after 4 second
time . encode2.sh in another terminal
Log output1:
real    0m10.422s
user    0m2.717s
sys     0m2.772s
Log output2:
real    0m9.784s
user    0m2.669s
sys     0m2.539s

4. Replay the output file and check whether the above mentioned issue can be reproduced.
mplayer -vo vaapi -va vaapi ./720p5994_parkrun_ter.yuvx5_1.264
mplayer -vo vaapi -va vaapi ./720p5994_parkrun_ter.yuvx5_2.264

These streams rendering normally and file size also the same:
296288795 720p5994_parkrun_ter.yuvx5_1.264
296288795 720p5994_parkrun_ter.yuvx5_2.264

Reproduced rate:
=========================================
100%
Comment 4 ZhaoShengyan 2014-02-25 09:22:22 UTC
Test Env
=======================================
Arch:           x86_64
Platform: 		IVB
Kernel_version:         3.9.5-301.fc19.x86_64
Libdrm:         (master)libdrm-2.4.52-4-gc5de5abbd90333fe1359283fb3a5e457b0f389f3
Mesa:           (master)73c78c514f8db0605c0deb85382003d0f66b5525
Xserver:                (master)xorg-server-1.15.0-631-g0f10cfd4b903d4db293ec47c8a9a0d8b33965803
Xf86_video_intel:               (master)2.99.910-68-gff49944928c7399527b11bb0da7699711591c21a
Libva:          (staging)8be6d274d931d8041934efa63caaee75a5984755
Libva_intel_driver:             (staging)bd630edd844b88ea543a027654db296ff7da16cd
Ffmpeg:         (master)fff526230148b3a67c04c328eecb16efac654e68
Mplayer:                (hwaccel-vaapi)1923fa10ed77bbf8408f2ce312d85a97dab1f0f3
Gstreamer10:            (1.0)4e880d4d1e151ea64f83c28b5c3e1bbc06c57903
Gst_plugins_base10:             (1.0)2dd3f028c1e6dea799d7496639f53220818b20b1
Gst_plugins_good10:             (1.0)643d425f51f81b56deec16c01162637546708ee5
Gst_plugins_bad10:              (1.0)0587ab41b4f9979e9cfc11011ed5c970569ee3d3
Gst_plugins_ugly10:             (1.0)c7c911b8320576429e4a4234a1e29ec7436e6814
Gst_plugins_vaapi10:            (master)e52d394b9e1e7124a141cc26675068e6fc2446a9

Test Steps:
=========================================
1. Encoding by h264encode with command:
/opt/X11R7/bin/h264encode -w 1280 -h 720 --srcyuv /home/shengyan/720p5994_parkrun_ter.yuvx5 -framecount 0 -f 25 -o /tmp/out1.mp4

2. Replay output 264 file with:
mplayer -vo vaapi -va vaapi  /tmp/out1.mp4

Only got several macroblocks rendering normally, the whole picture mess with mosaic.

Reproduced rate:
=========================================
100%
Comment 5 Bryan Christ 2014-02-25 22:12:47 UTC
@ZhaoShengyan,

If I am reading your reports correctly, it appears that you were able to reproduce the problem?
Comment 6 haihao 2014-03-24 05:12:51 UTC
you have to specify width/height by options -w/-h. otherwise it uses the default setting in the code and might get the wrong result.
Comment 7 haihao 2014-03-24 05:17:26 UTC
And use --fourcc <format> to specify the input file format.
Comment 8 Bryan Christ 2014-03-24 14:34:35 UTC
@haihao, were your comments 6 & 7 intended for me or ZhaoShengyan ?
Comment 9 haihao 2014-03-25 00:47:53 UTC
(In reply to comment #8)
> @haihao, were your comments 6 & 7 intended for me or ZhaoShengyan ?

Comment 6 was for you and comment 7 was for you and ZhaoShengyan.
Comment 10 Creighton Thomas 2014-03-25 14:30:56 UTC
Specifying -w -h and --fourcc can change the actual output and the speed of encoding, but the corruption issue persists.  When a video is encoded twice at times that don't overlap, the outputs are identical to one another.  When the times do overlap (case ACBD in comment 1), the second file is corrupted when the first file finishes encoding.

Note that to reliably reproduce the corruption, it is important that these should be the only two video encodings on the machine for the duration of the test - any other concurrent encodings can alter the results.
Comment 11 haihao 2014-03-26 06:18:16 UTC
(In reply to comment #10)
> Specifying -w -h and --fourcc can change the actual output and the speed of
> encoding, but the corruption issue persists.  When a video is encoded twice
> at times that don't overlap, the outputs are identical to one another.  When
> the times do overlap (case ACBD in comment 1), the second file is corrupted
> when the first file finishes encoding.

I can't reproduce the issue. 

terminal 0:

Wed Mar 26 14:03:03 CST 2014

real    0m10.938s
user    0m2.708s
sys     0m1.730s
Wed Mar 26 14:03:14 CST 2014

terminal 1:

Wed Mar 26 14:03:09 CST 2014

real    0m10.666s
user    0m2.659s
sys     0m1.770s
Wed Mar 26 14:03:20 CST 2014

The second encoding started 6s later and it finished 6s after the first encoding. The fisrt output is identical to the second output.

> 
> Note that to reliably reproduce the corruption, it is important that these
> should be the only two video encodings on the machine for the duration of
> the test - any other concurrent encodings can alter the results.

Yes, there were only two video encodings on the machine, and I ran the testing under X server.
Comment 12 Creighton Thomas 2014-03-26 21:53:14 UTC
Thanks for mentioning X.  We had not tried our test with an X server running.  It turns out the corruption does not happen in that case - only without X.
Comment 13 haihao 2014-03-27 03:03:22 UTC
Two clients can not concurrently access the dri device without a server. If you don't want to use X server, you must provide another server for access authentication.
Comment 14 Bryan Christ 2014-03-27 14:15:48 UTC
haihao,

Can you explain why starting an instance of h264encode and immediately suspending it causes this problem to go away.  In other words:

1.  Start h264encode on some file
2.  Send it the SIGSTOP signal
3.  Start 2 additional h264encode instances running scenario ACBD
4.  Observe that ACBD does not fail

Also, if we must use a display server, will Wayland suffice?
Comment 15 Bryan Christ 2014-03-27 16:31:35 UTC
@all, apparently running Wayland is sufficient.  i think this issue can be considered closed.  it would seem that the documentation should be updated to indicate that the h264 stack isn't truly "headless".

@haihao, thank you for all your help!
Comment 16 haihao 2014-03-28 06:22:24 UTC
(In reply to comment #13)
> Two clients can not concurrently access the dri device without a server. If
> you don't want to use X server, you must provide another server for access
> authentication.

This is for non-root user. Two root clients can concurrently access the dri device without a server if your drm driver includes the following commit:

commit 1020dc6990168a5081ffad620c440e220f05b460
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Oct 29 08:55:57 2013 +0000

    drm: Do not drop root privileges for a fancier younger process
Comment 17 haihao 2014-03-28 06:26:58 UTC
(In reply to comment #14)
> haihao,
> 
> Can you explain why starting an instance of h264encode and immediately
> suspending it causes this problem to go away.  In other words:
> 
> 1.  Start h264encode on some file
> 2.  Send it the SIGSTOP signal
> 3.  Start 2 additional h264encode instances running scenario ACBD
> 4.  Observe that ACBD does not fail
> 

I can still reproduce the issue if I used an old drm.

> Also, if we must use a display server, will Wayland suffice?

No, you don't need a display server. What you need is a server for authentication only.
Comment 18 haihao 2014-03-28 06:29:20 UTC
(In reply to comment #15)
> @all, apparently running Wayland is sufficient.  i think this issue can be
> considered closed.  it would seem that the documentation should be updated
> to indicate that the h264 stack isn't truly "headless".

It is headless for drm. You don't need a true display server.

> 
> @haihao, thank you for all your help!
Comment 19 haihao 2014-03-28 06:32:01 UTC
After upgrading the DRM module, it works for me.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.