Summary: | glCopyTexSubImage2D is much slower than glDrawArrays | ||
---|---|---|---|
Product: | Mesa | Reporter: | Dongseong Hwang <dongseong.hwang> |
Component: | Drivers/DRI/i965 | Assignee: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Status: | RESOLVED FIXED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | major | ||
Priority: | high | CC: | ben, eero.t.tamminen, lionel.g.landwerlin |
Version: | 12.0 | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
Dongseong Hwang
2016-10-28 18:09:32 UTC
The numbers you've put in seems to show that glCopyTexSubImage2D is faster. Did you invert the columns' labels? Oh, yes, I made mistake. I invert the colums' label. following is true glCopyTexSubImage2D glDrawArrays 4k fish on Chromebook Pixel 2015 (Broadwell): 22 FPS 32.6 FPS 4k fish on Ubuntu and Haswell: 23 FPS 30.9 FPS 500 fish on Android OnePlus One (Adreno 330): 25 FPS 22.5 FPS The next step would be to determine what method i965 is using for CopyTexSubImage: 1. BLORP (should be fast) 2. BLT (should be slow) 3. CPU maps (should be slow) Putting a breakpoint in intelCopyTexSubImage and stepping through it should make it pretty clear which is happening, and if we're falling off the fast path, why. Here's intel_gpu_top data When using glDrawArrays render busy: 48%: █████████▋ render space: 67/16384 task percent busy CS: 49%: █████████▉ vert fetch: 650289482 (650202512/sec) GAM: 43%: ████████▋ prim fetch: 216764414 (216735424/sec) VS: 11%: ██▎ VS invocations: 636105016 (636047896/sec) SVG: 10%: ██ GS invocations: 0 (0/sec) VF: 10%: ██ GS prims: 0 (0/sec) GAFS: 9%: █▉ CL invocations: 216759538 (216730978/sec) CL: 9%: █▉ CL prims: 97832823 (97804263/sec) SF: 5%: █ PS invocations: 18278204160 (-430295856/sec) DS: 4%: ▉ PS depth pass: 4741623475 (77865535/sec) SOL: 3%: ▋ GS: 3%: ▋ SDE: 2%: ▌ TDG: 2%: ▌ TE: 1%: ▎ HS: 1%: ▎ GAFM: 1%: ▎ RS: 1%: ▎ VFE: 0%: TSG: 0%: URBM: 0%: When using glCopyTexSubImage2D render busy: 53%: ██████████▋ render space: 38/16384 task percent busy CS: 55%: ███████████ vert fetch: 421770029 (15081065/sec) GAM: 49%: █████████▉ prim fetch: 140591035 (5026989/sec) VS: 14%: ██▉ VS invocations: 411982523 (14758688/sec) SVG: 14%: ██▉ GS invocations: 0 (0/sec) GAFS: 12%: ██▌ GS prims: 0 (0/sec) VF: 12%: ██▌ CL invocations: 140587183 (5027260/sec) CL: 11%: ██▎ CL prims: 63411022 (2244392/sec) SF: 7%: █▌ PS invocations: 9766779712 (286387776/sec) DS: 6%: █▎ PS depth pass: 2582256091 (72883919/sec) SOL: 5%: █ GS: 5%: █ SDE: 3%: ▋ TDG: 3%: ▋ HS: 2%: ▌ TE: 2%: ▌ GAFM: 1%: ▎ TSG: 0%: RS: 0%: VFE: 0%: URBM: 0%: This is cool. Can you also please get the data that Ken requested? (In reply to Kenneth Graunke from comment #3) > The next step would be to determine what method i965 is using for > CopyTexSubImage: > > 1. BLORP (should be fast) > 2. BLT (should be slow) > 3. CPU maps (should be slow) > > Putting a breakpoint in intelCopyTexSubImage and stepping through it should > make it pretty clear which is happening, and if we're falling off the fast > path, why. Dongseong, have you had a chance to follow up on this? This informations is necessary for us to make progress on this issue. Hi, I'm in vacation and 3 weeks more. When I come back to office, I'll do it as the first task. Sorry for delaying. If someone wants to try to reproduce it, here's instruction. 1. build chromium in linux (ubuntu or debian is easiest) https://chromium.googlesource.com/chromium/src/+/master/docs/linux_build_instructions.md 2. patch this +++ b/gpu/command_buffer/service/feature_info.cc @@ -954,8 +954,7 @@ void FeatureInfo::InitializeFeatures() { if (extensions.Contains("GL_INTEL_framebuffer_CMAA")) { feature_flags_.chromium_screen_space_antialiasing = true; AddExtensionString("GL_CHROMIUM_screen_space_antialiasing"); - } else if (!workarounds_.disable_framebuffer_cmaa && - (gl_version_info_->IsAtLeastGLES(3, 1) || + } else if ( (gl_version_info_->IsAtLeastGLES(3, 1) || (gl_version_info_->IsAtLeastGL(3, 0) && extensions.Contains("GL_ARB_shading_language_420pack") && extensions.Contains("GL_ARB_texture_gather") && 3. run any webgl site > ./out/Release/chrome http://webglsamples.org/aquarium/aquarium.html 4. break point from following two points https://cs.chromium.org/chromium/src/gpu/command_buffer/service/gles2_cmd_apply_framebuffer_attachment_cmaa_intel.cc?q=gles2_cmd_ap&sq=package:chromium&l=250 https://cs.chromium.org/chromium/src/gpu/command_buffer/service/gles2_cmd_copy_texture_chromium.cc?sq=package:chromium&rcl=1478244283&l=311 Ian-Are we still waiting on the info needed here or can we close this bug? Compiling Chromium from source with patches is rather painful...if possible, we would like to avoid that. Dongseong, if you still want us to look at this, can you provide an apitrace which uses glCopyTexSubImage2D? Install apitrace, then run "apitrace trace chromium ...". It should create a "chromium.trace" file. Then I can answer the question I had in comment 3... Dongseong-Can we close this bug? no, sorry for delaying. I'll provide stack trace soon. INVALID is not a great representation but the best category I can see from the list. We haven't had a reporter update in over four months so per the mesa bug guidelines, we are closing. If this is still a bug, feel free to reopen with the proper documentation. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.