Hello. While this bug may not see any implmentation in literally years due to shortage of manpower - it will be around for searches, at least. Yesterday I made informal request on #nouveau channel asking if " https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/auxiliary/vl/vl_idct.c be reused for shader-based mjpeg decoding? " Ilia Mirkin answered: 01:53 imirkin: AndrewR: i suppose so? mpeg is basically a bunch of 8x8 JPEG's ... kinda 01:54 imirkin: why do you care about mjpeg out of curiousity? 07:31 AndrewR: imirkin, sorry, was sleeping. Recently Cinelerra-GG (NLE) gained support for vaapi/vdpau decoding and vaapi encoding ..so, having few streams played at the same time (tracks, monitors) not as uncommon as it was with just players. 07:32 AndrewR: imirkin, https://lists.cinelerra-gg.org/pipermail/cin/2019-May/thread.html (not very big list archive) 07:34 AndrewR: imirkin, as far as I understand mesa and ffmpeg can't be mixed freely (mit vs gpl?), but then having something simple for (regression) testing will not hurt? 08:02 AndrewR: imirkin, https://github.com/CESNET/GPUJPEG (CUDA, but in theory it can be implemented at least on same hw with different programming interface ...). Well, even just IDCT stage.... 08:31 AndrewR: https://github.com/negge/jpeg_gpu/commits/master - I think I tested this on my openGL 3.3 card and it worked .... 08:51 AndrewR: imirkin, just retested this jpeg_gpu program - it decodes 2048x1536 jpeg photo at 11 fps for cpufreq 1.4 Ghz, and at 27 fps if I let cpu freq rise up to 3.4-3.8 Ghz --------------- src: https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=nouveau&date=2019-05-16 So, while I can't code my own feature request - i hope it will generate at least some discussion. Even if shader-based mjpeg decoding will be not very fast - it should be simpler compared to mpeg2 or h264 (!), and can serve as base for regression testing va state tracker. (there is no vaapi/vdpau component in this bugzilla) Please note https://github.com/CESNET/GPUJPEG probably utilizes NV-specific hardware so it may be not very portable (at algo level) to amd/others hw behind OpenCL. But still they quote perf figures: ------------quote------- OVERVIEW: -It uses NVIDIA CUDA platform. -Not optimized yet (it is only the first test implementation). -Encoder and decoder use Huffman coder for entropy encoding/decoding. -Encoder produces by default baseline JPEG codestream which consists of proper codestream headers and one scan for each color component without subsampling and it uses restart flags that allows fast parallel encoding. The quality of encoded images can be specified by value 0-100. -Optionally encoder can produce interleaved stream (all components in one scan) or/and subsampled stream. -Decoder can decompress only JPEG codestreams that can be generated by encoder. If scan contains restart flags, decoder can use parallelism for fast decoding. -Encoding/Decoding of JPEG codestream is divided into following phases: Encoding: Decoding 1) Input data loading 1) Input data loading 2) Preprocessing 2) Parsing codestream 3) Forward DCT 3) Huffman decoder 4) Huffman encoder 4) Inverse DCT 5) Formatting codestream 5) Postprocessing and they are implemented on CPU or/and GPU as follows: -CPU: -Input data loading -Parsing codestream -Huffman encoder/decoder (when restart flags are disabled) -Output data formatting -GPU: -Preprocessing/Postprocessing (color component parsing, color transformation RGB <-> YCbCr) -Forward/Inverse DCT (discrete cosine transform) -Huffman encoder/decoder (when restart flags are enabled) PERFORMANCE: Following tables summarizes encoding/decoding performance using NVIDIA GTX 580 for non-interleaved and non-subsampled stream with different quality settings [...] Decoding: | 4k (4096x2160) | HD (1920x1080) --------+----------------------------------+--------------------------------- quality | duration | psnr | size | duration | psnr | size --------+----------+----------+------------+--------------------------------- 10 | 10.28 ms | 29.33 dB | 539.30 kB | 3.13 ms | 27.41 dB | 145.90 kB 20 | 11.31 ms | 32.70 dB | 697.20 kB | 3.59 ms | 30.32 dB | 198.30 kB 30 | 12.36 ms | 34.63 dB | 850.60 kB | 3.97 ms | 31.92 dB | 243.60 kB 40 | 12.90 ms | 35.97 dB | 958.90 kB | 4.28 ms | 32.99 dB | 282.20 kB 50 | 13.45 ms | 36.94 dB | 1073.30 kB | 4.56 ms | 33.82 dB | 319.10 kB 60 | 14.71 ms | 37.96 dB | 1217.10 kB | 4.81 ms | 34.65 dB | 360.00 kB 70 | 15.03 ms | 39.22 dB | 1399.20 kB | 5.24 ms | 35.71 dB | 422.10 kB 80 | 16.64 ms | 40.67 dB | 1710.00 kB | 5.89 ms | 37.15 dB | 526.70 kB 90 | 19.99 ms | 42.83 dB | 2441.40 kB | 7.48 ms | 39.84 dB | 768.40 kB 100 | 46.45 ms | 47.09 dB | 7798.70 kB | 16.42 ms | 47.21 dB | 2499.60 kB ----end of quotation-------
There also were OpenCL patches for libjpeg-turbo, but they remain unintegrated. https://sourceforge.net/p/libjpeg-turbo/patches/40/ Patches so big because they include CL headers!
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/937.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.