Created attachment 143948 [details] output from Mesa 18.2 A large branching shader suffers an over 15% performance loss comparing Mesa GIT (revision a182adfd83ad00e326153b00a725a014e0359bf0) against Mesa 18.2.8 (on Ubuntu 18.04). To replicate, 1. Build painter-glyph-test-GL-debug from the project https://github.com/intel/fastuidraw 2. Run (adjusting width and height options to match ones monitor) with vblank_mode=0 LD_LIBRARY_PATH=. ./painter-glyph-test-GL-release fullscreen true width 1920 height 1200 use_file true text demo_data/txt/wall_of_text_caps_no_numbers.txt On my Iris Pro Graphics 580 (Skylake GT4e), I see (with fluctuations): Mesa 18.2.8: 5.6 ms/frame [178 FPS] a182adfd83ad00e326153b00a725a014e0359bf0: 6.5 ms/frame [153 FPS] The shader being executed is a large uber-shader. In both tested Mesa's above, the uber-shader is realized as only SIMD8 with no spilling. Attached are the outputs when running with MESA_GLSL_CACHE_DISABLE=true INTEL_DEBUG=fs for the offending fragment shader.
Created attachment 143949 [details] output from Mesa Git
Hi Kevin I've compiled the test and run it, but I'm not sure how to compare FPS. How did you check them? Did you use special tools for it or some flag in test or something else?
Press the "L" key (atleast on US Keyboards) to bring up a jazz with FPS and other things. At startup, a list of all what all key presses are printed to stdout. If you are sufficiently masochistic, yo can run the program with the single command line argument "--help" to see all command line options. Just to make sure all is good, did the demos as-is draw a wall of text to the screen? -Kevin
Looking at the attached shader assembly... Mesa 18.2: SIMD8 shader: 2413 instructions. 11 loops. 131452 cycles. 0:0 spills:fills. Promoted 15 constants. Compacted 38608 to 27856 bytes (28%) Mesa git: SIMD8 shader: 2388 instructions. 11 loops. 120307 cycles. 0:0 spills:fills. Promoted 14 constants. Compacted 38208 to 27392 bytes (28%) => Both versions reach only SIMD8 and new version uses less instructions. Loops in git version are shorter, except for last two which are marginally longer: Mesa 18.2: while(8) JIP: -216 { align1 1Q }; while(8) JIP: -216 { align1 1Q }; while(8) JIP: -216 { align1 1Q }; while(8) JIP: -216 { align1 1Q }; while(8) JIP: -216 { align1 1Q }; while(8) JIP: -216 { align1 1Q }; while(8) JIP: -216 { align1 1Q }; while(8) JIP: -296 { align1 1Q }; while(8) JIP: -4496 { align1 1Q }; while(8) JIP: -1136 { align1 1Q }; while(8) JIP: -1136 { align1 1Q }; Mesa git: while(8) JIP: -200 { align1 1Q }; while(8) JIP: -200 { align1 1Q }; while(8) JIP: -200 { align1 1Q }; while(8) JIP: -200 { align1 1Q }; while(8) JIP: -200 { align1 1Q }; while(8) JIP: -200 { align1 1Q }; while(8) JIP: -200 { align1 1Q }; while(8) JIP: -288 { align1 1Q }; while(8) JIP: -4424 { align1 1Q }; while(8) JIP: -1144 { align1 1Q }; while(8) JIP: -1144 { align1 1Q }; At maximum, old code seems to have 61 live regs, new one 62. Both Mesa and my own (crappy) ISA analyzer think that the new version (which has more lrp & mad reg bank conflicts) should use less cycles, but in branching code that can't really be predicted as it depends so much on which branches get selected.
For this test, what branches that gets hit are all the same. Did you get the demo to run to verify the performance drop?
if you add "painter_use_uber_item_shader false" to the command line, that should make the shader much less uber-ish for analysis (though I confess I have not compared the benchmark numbers for this case yet).
Hi, Apparently I added a show_framerate option which prints to stdout the average frametime across all frames. To use it, add "show_framerate true" to the command line. If one pulls (i.e. git commit 203b84c336c0c013cae670766182c5ea81cd0711 or newer) there is a "warm-up counter" to avoid including in the average the first few N-frames. -Kevin
Hi guys Kevin, thanks for the tip - it works. I've bisected the mesa between mesa-18.2.8(785e09e3b3) and latest master version of Mesa (04e672257c) on Skylake with Intel® HD Graphics 520. Bisect brought me to the commit a920979d4f30a48a23f8ff375ce05fa8a947dd96 Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Fri Nov 16 10:46:27 2018 -0600 intel/fs: Use split sends for surface writes on gen9+ Surface reads don't need them because they just have the one address payload. With surface writes, on the other hand, we can put the address and the data in the different halves and avoid building the payload all together. The decrease in register pressure and added freedom in register allocation resulting from this change reduces spilling enough to improve the performance of one customer benchmark by about 2x. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> commit a920979d4f30a48a23f8ff375ce05fa8a947dd96 Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Fri Nov 16 10:46:27 2018 -0600 intel/fs: Use split sends for surface writes on gen9+ Surface reads don't need them because they just have the one address payload. With surface writes, on the other hand, we can put the address and the data in the different halves and avoid building the payload all together. The decrease in register pressure and added freedom in register allocation resulting from this change reduces spilling enough to improve the performance of one customer benchmark by about 2x. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Bad commits had 60 FPS, good commits had 70 FPS on my machine.
Thankyou for the work of finding the offending commit! I confess though, this leaves even more mysteries since the commit message stats the change is only for surface write messages and the shaders in the benchmark should only have surface writes only at the very end: writing to the render target (dual-src). Hopefully, someone from the Intel Mesa team will pick this up and investigate.
Hi guys Looks like, that it's a duplicate of https://bugs.freedesktop.org/show_bug.cgi?id=110344 Jason has described all scope of the work in it. I'm adding a ticket to 'see also' section.
*** This bug has been marked as a duplicate of bug 110344 ***
*** This bug has been marked as a duplicate of bug 109507 ***
*** This bug has been marked as a duplicate of bug 109517 ***
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.