Summary: | compilation of WebGL demo iamnop.com/particles shader slowed 5x, run-time perf dropped to 1/10th | ||
---|---|---|---|
Product: | Mesa | Reporter: | nataraj.deshpande |
Component: | Drivers/DRI/i965 | Assignee: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Status: | RESOLVED MOVED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | eero.t.tamminen |
Version: | unspecified | Keywords: | bisected, regression |
Hardware: | All | ||
OS: | Linux (All) | ||
See Also: |
https://bugs.freedesktop.org/show_bug.cgi?id=91857 https://bugs.freedesktop.org/show_bug.cgi?id=70613 |
||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
ui logs collected from chromebook with --disable-gpu-watchdog
Other spilling shader Spilling slowdown shader |
Description
nataraj.deshpande
2016-07-21 23:27:45 UTC
Created attachment 126101 [details]
Other spilling shader
Created attachment 126102 [details]
Spilling slowdown shader
Tested this on SKL GT2 with today's Mesa. All the other shapes selectable in the WebGL demo load instantly and run well over 60FPS. However, selecting the default "galaxy" shape takes ~20s and runs at ~2FPS. Problem is couple of shaders that get compiled, Mesa spills with them and as result they're huge. I've attached cleaned up versions of the shaders (indented then & replaced hex hash names with something shorter). OGL_PS_0016.glsl is the main culprit, Mesa produces 800KB shader assembly file with ~8000 instructions for it. According to 'perf', compile time goes to: ------------------------------- # Cost Object Symbol 27.14% i965_dri.so ra_allocate 19.94% i965_dri.so fs_visitor::virtual_grf_interferes 13.90% i965_dri.so ra_add_node_adjacency 10.21% i965_dri.so fs_visitor::assign_regs 4.11% i965_dri.so decrement_q.isra.2 3.83% i965_dri.so instruction_scheduler::add_dep 2.97% i965_dri.so ra_add_node_interference 1.71% libc-2.23.so _int_malloc ------------------------------- While OGL_PS_0031.glsl also spills and is in instruction count nearly half of the previous shader, it compiles and runs >10x faster. Its 'perf' profile looks completely different: ------------------------------- # Cost Object Symbol 17.66% i965_dri.so (anon)::ir_copy_propagation_visitor::visit 17.12% i965_dri.so (anon)::ir_copy_propagation_elements_visitor::kill 12.46% i965_dri.so (anon)::ir_copy_propagation_visitor::kill 12.14% i965_dri.so (anon)::ir_copy_propagation_elements_visitor::handle_rvalue 4.47% i965_dri.so ra_allocate 3.12% i965_dri.so ir_expression::accept 2.66% i965_dri.so fs_visitor::virtual_grf_interferes 1.61% i965_dri.so ra_add_node_adjacency 1.39% i965_dri.so fs_visitor::assign_regs ------------------------------- The shaders are very similar despite the huge compilation time difference. In the much faster compiled shader, the large amount of expanded functions is called conditionally based on sampler result (in the slow one that call is unconditional) and it has few lines less before that. (In reply to Eero Tamminen from comment #3) > Tested this on SKL GT2 with today's Mesa. > > All the other shapes selectable in the WebGL demo load instantly and run > well over 60FPS. However, selecting the default "galaxy" shape takes ~20s > and runs at ~2FPS. With Mesa 11.2, compilation takes only few seconds and "galaxy" runs at full speed. When comparing at the generated assembly, Mesa 11.2 spills a bit: SIMD8 shader: 4432 instructions. 0 loops. 35394 cycles. 81:179 spills:fills. Promoted 18 constants. Compacted 70912 to 48880 bytes (31%) But the latest Mesa spills a lot more: SIMD8 shader: 7985 instructions. 0 loops. 57210 cycles. 359:993 spills:fills. Promoted 19 constants. Compacted 127760 to 86032 bytes (33%) I'll try whether I can locate the regression in this with eZBench: https://cgit.freedesktop.org/ezbench/tree/README Note: When looking at the SIMD8 shader code generated by the Windows driver, it doesn't spill at all. Eero, could you capture the shaders from this demo and put them into shader-db? (In reply to Matt Turner from comment #6) > Eero, could you capture the shaders from this demo and put them into > shader-db? Ken pointed out to me that we may not want a shader that takes 20 seconds to compile in shader-db. :) Also, I guess I can reproduce the problem with what you have already attached. Thanks for providing those! Had some troubles at first with ezBench, but finally it bisected this. As expected, the same commit caused 5x increase in compilation speed and even larger drop in performance. This was the commit: ----------------------------- commit 5886cd79a0ed10b887840f6afb980a5a6a47e26e Author: Kenneth Graunke <kenneth@whitecape.org> AuthorDate: Wed Apr 6 22:34:39 2016 -0700 Commit: Kenneth Graunke <kenneth@whitecape.org> CommitDate: Mon Apr 11 18:43:55 2016 -0700 nir: Do basic constant reassociation. Many shaders contain expression trees of the form: const_1 * (value * const_2) Reorganizing these to (const_1 * const_2) * value will allow constant folding to combine the constants. Sometimes, these constants are 2 and 0.5, so we can remove a multiply altogether. Other times, it can create more immediate constants, which can actually hurt. Finding a good balance here is tricky. While much more could be done, this simple patch seems to have a lot of positive benefit while having a low downside. ----------------------------- According to Pekka, he's seen performance drops with above commit also with other test(s). For the almost identical, but slightly smaller shader attached here, there was no perf regression. (Matt's later "Drop -fno-strict-aliasing" commit improved compilation speed slightly.) I checked our current benchmarks set with commit before and after this change, but that didn't show anything significant, so the problem seems to be fairly isolated. I'll ask Pekka for more info where else he saw it have a negative effect. (In reply to Eero Tamminen from comment #9) > I checked our current benchmarks set with commit before and after this > change, but that didn't show anything significant, so the problem seems to > be fairly isolated. I'll ask Pekka for more info where else he saw it have > a negative effect. GpuTest Volplosion performance decreased ~3.5% and FurMark decreased ~2% (tested on HSW). I didn't notice any other changes in performance. (In reply to Pekka Jylhä-Ollila from comment #10) > GpuTest Volplosion performance decreased ~3.5% and FurMark decreased ~2% > (tested on HSW). I didn't notice any other changes in performance. I did testing on SKL and those two tests didn't decrease there (everything was within normal variance), so I guess for those two tests the regression is GEN specific. Tested latest Mesa. On my SKL GT2 the slow shader still takes ~20s to compile and produces following Mesa statistics: SIMD8 shader: 7985 instructions. 0 loops. 57210 cycles. 359:993 spills:fills. Promoted 19 constants. Compacted 127760 to 86032 bytes (33%) (In reply to Eero Tamminen from comment #12) > Tested latest Mesa. On my SKL GT2 the slow shader still takes ~20s to > compile and produces following Mesa statistics: > SIMD8 shader: 7985 instructions. 0 loops. 57210 cycles. 359:993 > spills:fills. Promoted 19 constants. Compacted 127760 to 86032 bytes (33%) Latest Mesa (i.e. 5 months later) uses 17-18ms to compile on that machine, and gets following statistics: SIMD8 shader: 8127 instructions. 0 loops. 57436 cycles. 369:1019 spills:fills. Promoted 18 constants. Compacted 130032 to 87472 bytes (33%) I.e. there's been a minor improvement in compile time, but regression from v11.2 is still valid and spilling & instruction counts have gotten slightly worse. "perf" output looks same as earlier. Testing with current Mesa (06bfb2d28f) on HSW desktop system, page loads in ~10 secs and runs 60fps meeting the expected result. I'm proposing to resolve this as FIXED unless still a problem on chromebooks? (In reply to Tapani Pälli from comment #14) > Testing with current Mesa (06bfb2d28f) on HSW desktop system, This bug was originally filed against BSW/SKL. > page loads in > ~10 secs and runs 60fps meeting the expected result. I'm proposing to > resolve this as FIXED unless still a problem on chromebooks? Tested Mesa tip (git-379b24a40d) on SKL GT4e. Issue is still there, page freezes for ~10 seconds and "galaxy" shape renders at 1/60th speed of the other shapes. (I verified that Chromium is using right Mesa version from "chrome://gpu".) (In reply to Eero Tamminen from comment #15) > (In reply to Tapani Pälli from comment #14) > > Testing with current Mesa (06bfb2d28f) on HSW desktop system, > > This bug was originally filed against BSW/SKL. > > > > page loads in > > ~10 secs and runs 60fps meeting the expected result. I'm proposing to > > resolve this as FIXED unless still a problem on chromebooks? > > Tested Mesa tip (git-379b24a40d) on SKL GT4e. Issue is still there, page > freezes for ~10 seconds and "galaxy" shape renders at 1/60th speed of the > other shapes. > > (I verified that Chromium is using right Mesa version from "chrome://gpu".) Yep, I tried now on KBL and it also reproduces this. Did not notice this is gen specific issue. (In reply to Tapani Pälli from comment #16) > (In reply to Eero Tamminen from comment #15) > > (In reply to Tapani Pälli from comment #14) > > > Testing with current Mesa (06bfb2d28f) on HSW desktop system, > > > > This bug was originally filed against BSW/SKL. > > > > > > > page loads in > > > ~10 secs and runs 60fps meeting the expected result. I'm proposing to > > > resolve this as FIXED unless still a problem on chromebooks? > > > > Tested Mesa tip (git-379b24a40d) on SKL GT4e. Issue is still there, page > > freezes for ~10 seconds and "galaxy" shape renders at 1/60th speed of the > > other shapes. > > > > (I verified that Chromium is using right Mesa version from "chrome://gpu".) > > Yep, I tried now on KBL and it also reproduces this. Did not notice this is > gen specific issue. Which actually means that this is not just 'generic' spilling issue, this is something specific to how new gens operate. In the shader the return clause of Fun_E() is what causes trouble. It does probably cause many variables to be 'live' but even just removing couple of those dot() calls is enough for this to run 60fps on KBL machine. Seems like this shader should be in shader-db, if the compilation regression can be fixed so it only takes 20ms. I don't have any plans to look into this. Issue is still there, Curro's register liveness improvements last month didn't help this one significantly. Browsing through old bugs ... this demo is still super-slow on KBL (1fps), on HSW I get ~40-50fps. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1529. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.