Bug 89969

Summary: nouveau: add support for chunk decoding in order to support vaapi (st/va)
Product: Mesa Reporter: Julien Isorce <julien.isorce>
Component: Drivers/DRI/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED FIXED QA Contact: Nouveau Project <nouveau>
Severity: normal    
Priority: medium CC: ckoenig.leichtzumerken, frederic.romagne, r9ku1q, vjaquez
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: build: enable va with nouveau driver
WIP - nouveau: add support for chunk decoding
WIP - nouveau: add support for chunk decoding
WIP - nouveau: add support for chunk decoding
2 valgrind outputs to compare failure case with working reference case
WIP - nouveau: add support for chunk decoding
[PATCH 0/8] nouveau: add support for vaapi
[PATCH 1/8] nouveau: extract memcpy loop from nouveau_vp3_bsp
[PATCH 2/8] nouveau: remove nouveau_vp3_bsp to use begin/next/end
[PATCH 3/8] nouveau: split nvc0_decoder_bsp in begin/next/end
[PATCH 4/8] nouveau: preserve content buffer when calling nvc0_decoder_bsp_next
[PATCH 5/8] nouveau: remove nvc0_decoder_bsp and use begin/next/end instead
[PATCH 6/8] nvc0: implement pipe_video_codec::begin_frame/end_frame
[PATCH 7/8] nouveau: fix chunk decoding by updating number of slices
[PATCH 8/8] build: enable st/va with nouveau driver
Contains files generated with git send-email

Description Julien Isorce 2015-04-09 15:14:13 UTC
Created attachment 114987 [details]
build: enable va with nouveau driver

When running vainfo:

libva info: VA-API version 0.37.1
libva info: va_getDriverName() returns 0
libva info: User requested driver 'gallium'
libva info: Trying to open /usr/local/lib/dri/gallium_drv_video.so
libva info: Found init function __vaDriverInit_0_37
libva error: /usr/local/lib/dri/gallium_drv_video.so init failed
libva info: va_openDriver() returns 2
vaInitialize failed with error code 2 (resource allocation failed),exit

"vl_screen_create" calls "dd_create_screen" which returns NULL because the block

#if defined(GALLIUM_NOUVEAU)
   if (strcmp(driver_name, "nouveau") == 0)
      return pipe_nouveau_create_screen(fd);
   else
#endif

is not compiled for va target. Indeed GALLIUM_NOUVEAU is not defined at this point. See following patch.
Comment 1 Ilia Mirkin 2015-04-09 15:35:57 UTC
Does it actually work though? (i.e. decode videos, display them, etc)
Comment 2 Emil Velikov 2015-04-09 15:48:45 UTC
It didn't last time I've checked. st/va requires chunked decoding which nouveau does not support atm.
Comment 3 Julien Isorce 2015-04-09 16:02:09 UTC
basic libva unit tests passes but it fails to decode 2 videos mpeg12 (nouveau_vp3_fill_picparm_mpeg12_bsp) and h264 (nouveau_vp3_fill_picparm_h264_bsp).

nouveau_vp3_fill_picparm_mpeg12_bsp (map=0x7ffff7e3c000 <error: Cannot access memory at address 0x7ffff7e3c000>, desc=0x0, dec=0x7fffe004a000)
    at nouveau_vp3_video_bsp.c:121
121	   pic_bsp->picture_structure = desc->picture_structure;
(gdb) bt
#0  nouveau_vp3_fill_picparm_mpeg12_bsp (map=0x7ffff7e3c000 <error: Cannot access memory at address 0x7ffff7e3c000>, desc=0x0, dec=0x7fffe004a000)
    at nouveau_vp3_video_bsp.c:121
#1  nouveau_vp3_bsp (dec=dec@entry=0x7fffe004a000, desc=desc@entry=..., target=target@entry=0x7fffe001fe00, comm_seq=comm_seq@entry=2, 
    num_buffers=num_buffers@entry=1, data=data@entry=0x7fffe7ffdcd0, num_bytes=num_bytes@entry=0x7fffe7ffdcc0) at nouveau_vp3_video_bsp.c:257
#2  0x00007ffff072118e in nvc0_decoder_bsp (dec=dec@entry=0x7fffe004a000, desc=desc@entry=..., target=target@entry=0x7fffe001fe00, comm_seq=comm_seq@entry=2, 
    num_buffers=1, data=0x7fffe7ffdcd0, num_bytes=0x7fffe7ffdcc0, vp_caps=vp_caps@entry=0x7fffe7ffdbd8, is_ref=is_ref@entry=0x7fffe7ffdbdc, 
    refs=refs@entry=0x7fffe7ffdbe0) at nvc0/nvc0_video_bsp.c:117
#3  0x00007ffff0721650 in nvc0_decoder_decode_bitstream (decoder=0x7fffe004a000, video_target=0x7fffe001fe00, picture=0x0, num_buffers=<optimised out>, 
    data=<optimised out>, num_bytes=<optimised out>) at nvc0/nvc0_video.c:48

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe7fff700 (LWP 854)]
nouveau_vp3_bsp (dec=dec@entry=0x7fffe0068040, desc=desc@entry=..., target=target@entry=0x7fffe0063110, comm_seq=comm_seq@entry=2, 
    num_buffers=num_buffers@entry=2, data=data@entry=0x7fffe7ffe150, num_bytes=num_bytes@entry=0x7fffe7ffe140) at nouveau_vp3_video_bsp.c:270
270	      caps = nouveau_vp3_fill_picparm_h264_bsp(dec, desc.h264, bsp);
(gdb) bt
#0  nouveau_vp3_bsp (dec=dec@entry=0x7fffe0068040, desc=desc@entry=..., target=target@entry=0x7fffe0063110, comm_seq=comm_seq@entry=2, 
    num_buffers=num_buffers@entry=2, data=data@entry=0x7fffe7ffe150, num_bytes=num_bytes@entry=0x7fffe7ffe140) at nouveau_vp3_video_bsp.c:270
#1  0x00007ffff02a918e in nvc0_decoder_bsp (dec=dec@entry=0x7fffe0068040, desc=desc@entry=..., target=target@entry=0x7fffe0063110, comm_seq=comm_seq@entry=2, 
    num_buffers=2, data=0x7fffe7ffe150, num_bytes=0x7fffe7ffe140, vp_caps=vp_caps@entry=0x7fffe7ffe058, is_ref=is_ref@entry=0x7fffe7ffe05c, 
    refs=refs@entry=0x7fffe7ffe060) at nvc0/nvc0_video_bsp.c:117
#2  0x00007ffff02a9650 in nvc0_decoder_decode_bitstream (decoder=0x7fffe0068040, video_target=0x7fffe0063110, picture=0x0, num_buffers=<optimised out>, 
    data=<optimised out>, num_bytes=<optimised out>) at nvc0/nvc0_video.c:48

Worth to mention that gallium/vdpau succeeds to decode these 2 videos. Also libva succeeds to decode video using libva's vdpau-driver on top of gallium/vdpau.
Comment 4 Emil Velikov 2015-04-09 16:10:55 UTC
Seems (almost) identical to the ones I've seen. Iirc the debug build does assert out a bit earlier.

If interested you can take a look at adding the missing functionality (to drivers/nouveau) but as-is I'm not sure what nouveau can work with st/va :-\
Comment 5 Julien Isorce 2015-04-09 17:02:14 UTC
I am interested but I would need some infos to start :)

Also I do not understand why a functionality is missing in drivers/nouveau since st/vdpau works (which means nouveau_vp3_fill_picparm_mpeg12_bsp and nouveau_vp3_fill_picparm_h264_bsp work).

So I was suspecting something wrong in st/va/picture.c
Using the above bt, I noticed that for st/vpdau http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/state_trackers/vdpau/decode.c#n557 the third param is well defined.
Whereas for st/va http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/state_trackers/va/picture.c#n520 the third param is NULL which leads to the crash.

Few minutes ago I tried naively to replace this NULL param with "&context->desc.base" but then I can see a wrong image in my application when rendering decoded images.
Comment 6 Ilia Mirkin 2015-04-09 17:13:33 UTC
Unlike public API's, gallium API's are fluid and not extremely well-defined. They're primarily defined by the existing users, and secondarily by documentation when those users disagree. Video API's are especially poorly defined, as there are only 2 real driver implementations (radeon + nouveau), and until recently, only st/xvmc and st/vdpau (and everyone except me hates st/xvmc).

I believe that st/va had to make some changes to how functions were called and what parameters they were allowed to have, but the updates were only made to the radeon drivers. Unfortunately I don't know what those updates were. Emil mentioned chunked decoding, but I don't really know what that is. Perhaps you can look at the st/va enablement patches for radeon and see what all they had to change.
Comment 7 Christian König 2015-04-09 18:42:56 UTC
Chunked decoding means that between begin picture and end picture the state tracker calls the decode function multiple times for the same picture.

E.g. you don't get the bitstream all at once, but rather in small chunks.

Nouveau can't deal with that cause Maarten didn't wanted that interface, radeon on the other hand never used anything else.

You just need to fix nouveau to implement bitstream buffer resizing in the same way radeon does it (or move that stuff into the state tracker like I always suggested).
Comment 8 Julien Isorce 2015-07-17 23:08:26 UTC
Created attachment 117214 [details] [review]
WIP - nouveau: add support for chunk decoding


A: resize buffer in nvc0_decoder_bsp:
http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_video_bsp.c#n73

B: memcpy of the data in nouveau_vp3_bsp:
http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nouveau_vp3_video_bsp.c#n294

For radeon driver it both happens in ruvd_decode_bitstream.

To compare radeon to nouveau:

struct ruvd_decoder.bs_buffers is an array of struct rvid_buffer that hold a ref on struct r600_resource, aka pipe_resource.

struct nouveau_vp3_decoder.bsp_bo is an array of struct nouveau_bo (which comes from libdrm/nouveau library)

So if I am right, in the function nvc0_decoder_decode_bitstream it should resize the nouveau_bo buffers and store current data, i.e. A and B. 

So I made a patch that first splits nvc0_decoder_bsp to extract A and B into a new function nvc0_decoder_bsp_next_chunk. The remaining part of nvc0_decoder_bsp go to nvc0_decoder_bsp_end_frame.

I just realize the patch is missing the preserving part while resizing buffers (equivalent of the memcpy in rvid_resize_buffer http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/radeon/radeon_video.c#n100) that could explain why it is still not working.

But I wanted to know if I am on the right track or not ?
Comment 9 Julien Isorce 2015-07-17 23:12:02 UTC
Created attachment 117215 [details] [review]
WIP - nouveau: add support for chunk decoding

Actual wip patch.
Comment 10 Ilia Mirkin 2015-07-17 23:16:50 UTC
Wouldn't it be simpler to just accumulate user buffers and then do all the work in end_frame instead? And like Christian said, it'd be easy enough to get st/va to do this.
Comment 11 Christian König 2015-07-18 15:40:22 UTC
(In reply to Ilia Mirkin from comment #10)
> Wouldn't it be simpler to just accumulate user buffers and then do all the
> work in end_frame instead? And like Christian said, it'd be easy enough to
> get st/va to do this.

Accumulating all the user buffers in the state tracker is what I initially suggested as well and I wouldn't mind if it goes into this direction in the long term.

You would need to add a new type for the bitstream buffer (we currently already abuse vertext buffers in the OMX encoder for this).

For the short term your patch looks like it's going into the right direction.
Comment 12 Julien Isorce 2015-07-24 20:11:46 UTC
Created attachment 117358 [details] [review]
WIP - nouveau: add support for chunk decoding

I updated patch since I miss understood some parts last time.
Preserving data while resizing is commented out for testing since the initial buffer is big enough for the stream I tried.
The patch keeps st/vdpau working but st/va still does not work.
Indeed it shows a green frame. I do not see any shape but I see some pixels that let me think it does something.

How can I be sure it is not a problem from st/va ? I assume it works with radeon (only radeon for now ? according to src/gallium/targets/va/Makefile.am I was expecting intel too)
For nouveau, maybe it requires to do more work on st/va side.
I can try to go for the other solution to compare.

Which internal or external unit tests can I use ? Also if you have an idea how I can debug this ?

Maybe I can compare what contains the nouveau_bo buffer between one begin/end, between vdpau and va ? Not sure if they should contain the exact same thing at the time _end is called.

Thx
Comment 13 Ilia Mirkin 2015-07-24 20:17:14 UTC
You can compare what's going on in the two cases by using valgrind-mmt (http://nouveau.freedesktop.org/wiki/Valgrind-mmt/) in combination with the demmt tool in envytools. However it will be quite tough to tell what's going on since the interesting stuff passed around in buffers rather than sending commands in the fifo. You should still see the buffers being written though.
Comment 14 Julien Isorce 2015-07-27 17:30:59 UTC
Created attachment 117399 [details]
2 valgrind outputs to compare failure case with working reference case

Failure:
LIBVA_DRIVER_NAME=gallium valgrind --tool=mmt --mmt-trace-nouveau-ioctls --log-file=va_mesa_bin.log gst-launch-1.0 filesrc location=/home/SERILOCAL/j.isorce/Downloads/centaur_2.mpg num-buffers=6 ! mpegpsdemux ! mpegvideoparse ! vaapideco^C ! tee ! vdieoconvert ! xvimagesink

Success:
LIBVA_DRIVER_NAME=vdpau valgrind --tool=mmt --mmt-trace-nouveau-ioctls --log-file=va_over_vdpau_mesa_bin.log gst-launch-1.0 filesrc location=/home/SERILOCAL/j.isorce/Downloads/centaur_2.mpg num-buffers=6 ! mpegpsdemux ! mpegvideoparse ! vaapidecode ! tee ! videoconvert ! xvimagesink

Where should I look at in these 2 logs to compare what contain the nouveau_bo buffer between the first st/va being/end frame ? (using demmt -l)
Comment 15 Julien Isorce 2015-08-25 16:05:14 UTC
Created attachment 117910 [details] [review]
WIP - nouveau: add support for chunk decoding

It now works on mpeg12 using vlc+vaapi+mesa/st/va/nouveau. The problem from the last patch was that the "desc" was written to the nouveau_bo before being actually updated. 

vdpau backend still works, and vaapi backend using vdpau works too.

But for now the video is not showing correctly using gstreamer-vaapi so I need to investigate this.
Comment 16 Julien Isorce 2015-08-26 17:23:31 UTC
Comment on attachment 117910 [details] [review]
WIP - nouveau: add support for chunk decoding

Actually it works with GStreamer too using vaapisink. So I am going to submit patches to mailing list. I splited it in 8 patches to make it more clear.
Comment 17 Julien Isorce 2015-08-26 17:23:56 UTC
Comment on attachment 114987 [details]
build: enable va with nouveau driver

Will be submitted to mailing list
Comment 18 Julien Isorce 2015-08-26 17:46:16 UTC
Created attachment 117928 [details] [review]
[PATCH 0/8] nouveau: add support for vaapi
Comment 19 Julien Isorce 2015-08-26 17:47:46 UTC
Created attachment 117929 [details] [review]
[PATCH 1/8] nouveau: extract memcpy loop from nouveau_vp3_bsp
Comment 20 Julien Isorce 2015-08-26 17:48:52 UTC
Created attachment 117930 [details] [review]
[PATCH 2/8] nouveau: remove nouveau_vp3_bsp to use begin/next/end
Comment 21 Ilia Mirkin 2015-08-26 17:49:15 UTC
Please post patches to list so that they may be reviewed.
Comment 22 Julien Isorce 2015-08-26 17:49:27 UTC
Created attachment 117931 [details] [review]
[PATCH 3/8] nouveau: split nvc0_decoder_bsp in begin/next/end
Comment 23 Julien Isorce 2015-08-26 17:50:05 UTC
Created attachment 117932 [details] [review]
[PATCH 4/8] nouveau: preserve content buffer when calling nvc0_decoder_bsp_next
Comment 24 Julien Isorce 2015-08-26 17:50:35 UTC
Created attachment 117934 [details] [review]
[PATCH 5/8] nouveau: remove nvc0_decoder_bsp and use begin/next/end instead
Comment 25 Julien Isorce 2015-08-26 17:51:00 UTC
Created attachment 117936 [details] [review]
[PATCH 6/8] nvc0: implement pipe_video_codec::begin_frame/end_frame
Comment 26 Julien Isorce 2015-08-26 17:51:40 UTC
Created attachment 117937 [details] [review]
[PATCH 7/8] nouveau: fix chunk decoding by updating number of slices
Comment 27 Julien Isorce 2015-08-26 17:52:33 UTC
Created attachment 117938 [details] [review]
[PATCH 8/8] build: enable st/va with nouveau driver
Comment 28 Julien Isorce 2015-08-26 17:56:50 UTC
(In reply to Ilia Mirkin from comment #21)
> Please post patches to list so that they may be reviewed.

I cannot send them using git send-email. I am behind a proxy so I think that's why I cannot.
That's why I attached them here. And home computer is broken for now.
Comment 29 Julien Isorce 2015-08-26 17:58:45 UTC
Created attachment 117939 [details]
Contains files generated with git send-email
Comment 30 Julien Isorce 2015-09-16 12:32:49 UTC
Patches have been sent to mailing list for review.

vc1, mpeg12, mpeg4 are working but not h264 (visible but lot of blockiness)

Need someone to try existing st/va with a radeon card and using:
LIBVA_DRIVER_NAME=gallium mpv --hwdec=vaapi anyH264Video.mp4
Comment 31 Julien Isorce 2016-02-05 17:50:56 UTC
Fixed in current master.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.