Bug 98478 - glCopyTexSubImage2D is much slower than glDrawArrays
Summary: glCopyTexSubImage2D is much slower than glDrawArrays
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: 12.0
Hardware: x86-64 (AMD64) Linux (All)
: high major
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-28 18:09 UTC by Dongseong Hwang
Modified: 2017-06-23 22:43 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Dongseong Hwang 2016-10-28 18:09:32 UTC
ChromeOS uses glCopyTexSubImage2D to copy immutable RGBA8 texture to regular RGB texture. It's very slow.
https://cs.chromium.org/chromium/src/gpu/command_buffer/service/gles2_cmd_apply_framebuffer_attachment_cmaa_intel.cc?sq=package:chromium&rcl=1477653611&l=552

If using glDrawArrays to copy, speed up significantly. ApplyFramebufferAttachmentCMAAINTELResourceManager::CopyTexture() in following patch shows how to use glDrawArrays instead of glCopyTexSubImage2D
https://codereview.chromium.org/2460973002/diff/1/gpu/command_buffer/service/gles2_cmd_apply_framebuffer_attachment_cmaa_intel.cc

I measure FPS on WebGL using Chrome browser in various platforms. Intel Mesa is always slower when using glCopyTexSubImage2D. However, Qualcomm Adreno is faster when using glCopyTexSubImage2D.
 test site: http://webglsamples.org/aquarium/aquarium.html
                                               glDrawArrays  glCopyTexSubImage2D
 4k fish on Chromebook Pixel 2015 (Broadwell):    22 FPS           32.6 FPS
 4k fish on Ubuntu and Haswell:                   23 FPS           30.9 FPS
 500 fish on Android OnePlus One (Adreno 330):    25 FPS           22.5 FPS

It looks Intel Mesa bug. There is not any reason in theory which glCopyTexSubImage2D is slower. glCopyTexSubImage2D is very frequently used, so I think this issue is quite severe.
Comment 1 Lionel Landwerlin 2016-10-28 18:24:00 UTC
The numbers you've put in seems to show that glCopyTexSubImage2D is faster.
Did you invert the columns' labels?
Comment 2 Dongseong Hwang 2016-10-28 18:57:35 UTC
Oh, yes, I made mistake. I invert the colums' label. following is true

                                               glCopyTexSubImage2D glDrawArrays
 4k fish on Chromebook Pixel 2015 (Broadwell):      22 FPS           32.6 FPS
 4k fish on Ubuntu and Haswell:                     23 FPS           30.9 FPS
 500 fish on Android OnePlus One (Adreno 330):      25 FPS           22.5 FPS
Comment 3 Kenneth Graunke 2016-10-28 19:19:26 UTC
The next step would be to determine what method i965 is using for CopyTexSubImage:

1. BLORP (should be fast)
2. BLT (should be slow)
3. CPU maps (should be slow)

Putting a breakpoint in intelCopyTexSubImage and stepping through it should make it pretty clear which is happening, and if we're falling off the fast path, why.
Comment 4 Dongseong Hwang 2016-10-29 10:15:52 UTC
Here's intel_gpu_top data

When using glDrawArrays

render busy:  48%: █████████▋                             render space: 67/16384


       task  percent busy
         CS:  49%: █████████▉              vert fetch: 650289482 (650202512/sec)
        GAM:  43%: ████████▋               prim fetch: 216764414 (216735424/sec)
         VS:  11%: ██▎                  VS invocations: 636105016 (636047896/sec)
        SVG:  10%: ██                   GS invocations: 0 (0/sec)
         VF:  10%: ██                        GS prims: 0 (0/sec)
       GAFS:   9%: █▉                   CL invocations: 216759538 (216730978/sec)
         CL:   9%: █▉                        CL prims: 97832823 (97804263/sec)
         SF:   5%: █                    PS invocations: 18278204160 (-430295856/sec)
         DS:   4%: ▉                    PS depth pass: 4741623475 (77865535/sec)
        SOL:   3%: ▋                    
         GS:   3%: ▋                    
        SDE:   2%: ▌                    
        TDG:   2%: ▌                    
         TE:   1%: ▎                    
         HS:   1%: ▎                    
       GAFM:   1%: ▎                    
         RS:   1%: ▎                    
        VFE:   0%:                      
        TSG:   0%:                      
       URBM:   0%:                      


When using glCopyTexSubImage2D

render busy:  53%: ██████████▋                            render space: 38/16384


       task  percent busy
         CS:  55%: ███████████             vert fetch: 421770029 (15081065/sec)
        GAM:  49%: █████████▉              prim fetch: 140591035 (5026989/sec)
         VS:  14%: ██▉                  VS invocations: 411982523 (14758688/sec)
        SVG:  14%: ██▉                  GS invocations: 0 (0/sec)
       GAFS:  12%: ██▌                       GS prims: 0 (0/sec)
         VF:  12%: ██▌                  CL invocations: 140587183 (5027260/sec)
         CL:  11%: ██▎                       CL prims: 63411022 (2244392/sec)
         SF:   7%: █▌                   PS invocations: 9766779712 (286387776/sec)
         DS:   6%: █▎                   PS depth pass: 2582256091 (72883919/sec)
        SOL:   5%: █                    
         GS:   5%: █                    
        SDE:   3%: ▋                    
        TDG:   3%: ▋                    
         HS:   2%: ▌                    
         TE:   2%: ▌                    
       GAFM:   1%: ▎                    
        TSG:   0%:                      
         RS:   0%:                      
        VFE:   0%:                      
       URBM:   0%:
Comment 5 Ben Widawsky 2016-11-03 04:24:08 UTC
This is cool. Can you also please get the data that Ken requested?
Comment 6 Ian Romanick 2016-11-04 14:56:53 UTC
(In reply to Kenneth Graunke from comment #3)
> The next step would be to determine what method i965 is using for
> CopyTexSubImage:
> 
> 1. BLORP (should be fast)
> 2. BLT (should be slow)
> 3. CPU maps (should be slow)
> 
> Putting a breakpoint in intelCopyTexSubImage and stepping through it should
> make it pretty clear which is happening, and if we're falling off the fast
> path, why.

Dongseong, have you had a chance to follow up on this?  This informations is necessary for us to make progress on this issue.
Comment 7 Dongseong Hwang 2016-11-04 15:30:31 UTC
Hi, I'm in vacation and 3 weeks more. When I come back to office, I'll do it as the first task. Sorry for delaying.

If someone wants to try to reproduce it, here's instruction.
1. build chromium in linux (ubuntu or debian is easiest)
https://chromium.googlesource.com/chromium/src/+/master/docs/linux_build_instructions.md

2. patch this
+++ b/gpu/command_buffer/service/feature_info.cc
@@ -954,8 +954,7 @@ void FeatureInfo::InitializeFeatures() {
   if (extensions.Contains("GL_INTEL_framebuffer_CMAA")) {
     feature_flags_.chromium_screen_space_antialiasing = true;
     AddExtensionString("GL_CHROMIUM_screen_space_antialiasing");
-  } else if (!workarounds_.disable_framebuffer_cmaa &&
-             (gl_version_info_->IsAtLeastGLES(3, 1) ||
+  } else if ( (gl_version_info_->IsAtLeastGLES(3, 1) ||
               (gl_version_info_->IsAtLeastGL(3, 0) &&
                extensions.Contains("GL_ARB_shading_language_420pack") &&
                extensions.Contains("GL_ARB_texture_gather") &&

3. run any webgl site 
> ./out/Release/chrome http://webglsamples.org/aquarium/aquarium.html

4. break point from following two points
https://cs.chromium.org/chromium/src/gpu/command_buffer/service/gles2_cmd_apply_framebuffer_attachment_cmaa_intel.cc?q=gles2_cmd_ap&sq=package:chromium&l=250
https://cs.chromium.org/chromium/src/gpu/command_buffer/service/gles2_cmd_copy_texture_chromium.cc?sq=package:chromium&rcl=1478244283&l=311
Comment 8 Annie 2017-01-11 05:38:54 UTC
Ian-Are we still waiting on the info needed here or can we close this bug?
Comment 9 Kenneth Graunke 2017-01-11 08:25:39 UTC
Compiling Chromium from source with patches is rather painful...if possible, we would like to avoid that.

Dongseong, if you still want us to look at this, can you provide an apitrace which uses glCopyTexSubImage2D?  Install apitrace, then run "apitrace trace chromium ...".  It should create a "chromium.trace" file.

Then I can answer the question I had in comment 3...
Comment 10 Annie 2017-02-13 06:02:40 UTC
Dongseong-Can we close this bug?
Comment 11 Dongseong Hwang 2017-02-21 17:39:56 UTC
no, sorry for delaying. I'll provide stack trace soon.
Comment 12 Annie 2017-06-23 22:43:31 UTC
INVALID is not a great representation but the best category I can see from the list. We haven't had a reporter update in over four months so per the mesa bug guidelines, we are closing. If this is still a bug, feel free to reopen with the proper documentation.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.