Bug 110699 - Shader-based MJPEG decoding
Summary: Shader-based MJPEG decoding
Status: NEW
Alias: None
Product: Mesa
Classification: Unclassified
Component: Other (show other bugs)
Version: git
Hardware: Other All
: medium enhancement
Assignee: mesa-dev
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-17 04:46 UTC by Andrew Randrianasulu
Modified: 2019-05-18 05:20 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Randrianasulu 2019-05-17 04:46:45 UTC
Hello.

While this bug may not see any implmentation in literally years due to shortage of manpower - it will be around for searches, at least.

Yesterday I made informal request on #nouveau channel asking if "
https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/auxiliary/vl/vl_idct.c be reused for shader-based mjpeg decoding? "

Ilia Mirkin answered: 
01:53 imirkin: AndrewR: i suppose so? mpeg is basically a bunch of 8x8 JPEG's ... kinda 
01:54 imirkin: why do you care about mjpeg out of curiousity? 

07:31 AndrewR: imirkin, sorry, was sleeping. Recently Cinelerra-GG (NLE) gained support for vaapi/vdpau decoding and vaapi encoding ..so, having few streams played at the same time (tracks, monitors) not as uncommon as it was with just players. 
07:32 AndrewR: imirkin, https://lists.cinelerra-gg.org/pipermail/cin/2019-May/thread.html (not very big list archive) 
07:34 AndrewR: imirkin, as far as I understand mesa and ffmpeg can't be mixed freely (mit vs gpl?), but then having something simple for (regression) testing will not hurt? 
08:02 AndrewR: imirkin, https://github.com/CESNET/GPUJPEG (CUDA, but in theory it can be implemented at least on same hw with different programming interface ...). Well, even just IDCT stage.... 
08:31 AndrewR: https://github.com/negge/jpeg_gpu/commits/master - I think I tested this on my openGL 3.3 card and it worked .... 
08:51 AndrewR: imirkin, just retested this jpeg_gpu program - it decodes 2048x1536 jpeg photo at 11 fps for cpufreq 1.4 Ghz, and at 27 fps if I let cpu freq rise up to 3.4-3.8 Ghz 
---------------
src: https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=nouveau&date=2019-05-16

So, while I can't code my own feature request - i hope it will generate at least some discussion. Even if shader-based mjpeg decoding will be not very fast - it should be simpler compared to mpeg2 or h264 (!), and can serve as base for regression testing va state tracker. (there is no vaapi/vdpau component in this bugzilla)

Please note https://github.com/CESNET/GPUJPEG probably utilizes NV-specific hardware so it may be not very portable (at algo level) to amd/others hw behind OpenCL. But still they quote perf figures:

------------quote-------
OVERVIEW:
-It uses NVIDIA CUDA platform.
-Not optimized yet (it is only the first test implementation).
-Encoder and decoder use Huffman coder for entropy encoding/decoding.
-Encoder produces by default baseline JPEG codestream which consists of proper codestream
 headers and one scan for each color component without subsampling and it uses
 restart flags that allows fast parallel encoding. The quality of encoded 
 images can be specified by value 0-100.
-Optionally encoder can produce interleaved stream (all components in one scan) or/and
 subsampled stream.
-Decoder can decompress only JPEG codestreams that can be generated by encoder. If scan 
 contains restart flags, decoder can use parallelism for fast decoding.
-Encoding/Decoding of JPEG codestream is divided into following phases:
   Encoding:                       Decoding
   1) Input data loading           1) Input data loading
   2) Preprocessing                2) Parsing codestream 
   3) Forward DCT                  3) Huffman decoder
   4) Huffman encoder              4) Inverse DCT
   5) Formatting codestream        5) Postprocessing
 and they are implemented on CPU or/and GPU as follows:
   -CPU: 
      -Input data loading
      -Parsing codestream
      -Huffman encoder/decoder (when restart flags are disabled)
      -Output data formatting
   -GPU: 
      -Preprocessing/Postprocessing (color component parsing, 
       color transformation RGB <-> YCbCr)
      -Forward/Inverse DCT (discrete cosine transform)
      -Huffman encoder/decoder (when restart flags are enabled)  


PERFORMANCE:
  Following tables summarizes encoding/decoding performance using NVIDIA 
GTX 580 for non-interleaved and non-subsampled stream with different quality 
settings

[...]
Decoding:
         |           4k (4096x2160)         |         HD (1920x1080)
 --------+----------------------------------+---------------------------------
 quality | duration |     psnr |       size | duration |     psnr |       size
 --------+----------+----------+------------+---------------------------------  
      10 | 10.28 ms | 29.33 dB |  539.30 kB |  3.13 ms | 27.41 dB |  145.90 kB
      20 | 11.31 ms | 32.70 dB |  697.20 kB |  3.59 ms | 30.32 dB |  198.30 kB
      30 | 12.36 ms | 34.63 dB |  850.60 kB |  3.97 ms | 31.92 dB |  243.60 kB
      40 | 12.90 ms | 35.97 dB |  958.90 kB |  4.28 ms | 32.99 dB |  282.20 kB
      50 | 13.45 ms | 36.94 dB | 1073.30 kB |  4.56 ms | 33.82 dB |  319.10 kB
      60 | 14.71 ms | 37.96 dB | 1217.10 kB |  4.81 ms | 34.65 dB |  360.00 kB
      70 | 15.03 ms | 39.22 dB | 1399.20 kB |  5.24 ms | 35.71 dB |  422.10 kB
      80 | 16.64 ms | 40.67 dB | 1710.00 kB |  5.89 ms | 37.15 dB |  526.70 kB
      90 | 19.99 ms | 42.83 dB | 2441.40 kB |  7.48 ms | 39.84 dB |  768.40 kB
     100 | 46.45 ms | 47.09 dB | 7798.70 kB | 16.42 ms | 47.21 dB | 2499.60 kB

----end of quotation-------
Comment 1 Andrew Randrianasulu 2019-05-18 05:20:50 UTC
There also were OpenCL patches for libjpeg-turbo, but they remain unintegrated.

https://sourceforge.net/p/libjpeg-turbo/patches/40/
Patches so big because they include CL headers!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.