Summary: | Performance: extra&costly SSBO validation even when SSBO aren't used | ||
---|---|---|---|
Product: | Mesa | Reporter: | gregory.hainaut |
Component: | Drivers/DRI/nouveau | Assignee: | Nouveau Project <nouveau> |
Status: | RESOLVED FIXED | QA Contact: | Nouveau Project <nouveau> |
Severity: | normal | ||
Priority: | medium | ||
Version: | git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
gregory.hainaut
2016-06-03 07:53:01 UTC
As a side note, I potentially have a similar behavior with shader image (st_bind_*_images). I need to double check my engine as I used them sometimes. Hi Gregory, Thanks for profiling Nouveau with perf, that's very nice. :-) Well, if your application doesn't use SSBO's, nvc0_validate_buffers() should not be called yeah. But this might happen when we switch between different contexts. Anyway, improving the validation path is on our todolist. :) Well, according to your backtrace, nvc0_set_shader_buffers() is called and will dirty NVC0_NEW_3D_BUFFERS, which will then call nvc0_validate_buffers() at draw time. I wonder why it's called if you are sure that your application doesn't use any SSBO's... Can you extract some shaders from your application to make sure no SSBO's are used? You can use NV50_PROG_DEBUG=1 for example (this will dump the TGSI code). Hi Samuel, > Thanks for profiling Nouveau with perf, that's very nice. :-) Well it is nice that I can do profiling :) > Well, if your application doesn't use SSBO's, nvc0_validate_buffers() > should not be called yeah. But this might happen when we switch between > different contexts. Anyway, improving the validation path is on our todolist. :) Yes, I'm sure. I don't know how to use SSBO. > I wonder why it's called if you are sure that your application doesn't use > any SSBO's... src/mesa/state_tracker/st_atom_storagebuf.c st_bind_*_ssbos struct contains the ST_NEW_*_PROGRAM flags. So every time, you call glUseProgram (or the 4.1 pipeline equivalent), flags will be asserted and a validation will be triggered. It is the same for the image in st_bind_*_images struct in st_atom_image.c. It is nice for the performance. > Can you extract some shaders from your application to make sure no SSBO's > are used? You can use NV50_PROG_DEBUG=1 for example (this will dump the TGSI code). All my shader could be found in glsl format (bit a mess of ifdef but no SSBO ;)) https://github.com/PCSX2/pcsx2/tree/master/plugins/GSdx/res/glsl Here an example (I'm not sure if it is the TGSI format). FRAG DCL IN[0], GENERIC[0], PERSPECTIVE DCL IN[1], GENERIC[3], PERSPECTIVE DCL OUT[0], COLOR DCL OUT[1], COLOR[1] DCL SAMP[0] DCL SAMP[1] DCL SVIEW[0], 2D, FLOAT DCL SVIEW[1], 2D, FLOAT DCL CONST[1][0] DCL CONST[2][0..1] DCL CONST[3][0..1] DCL CONST[4][0] DCL CONST[5][0..1] DCL CONST[6][0..7] DCL CONST[7][0] DCL TEMP[0..1], LOCAL IMM[0] FLT32 { 0.0000, 255.0000, 0.0500, 0.0078} IMM[1] FLT32 { 0.0039, 0.0000, 0.0000, 0.0000} 0: MOV TEMP[0].xy, IN[1].xyyy 1: TEX TEMP[0].w, TEMP[0], SAMP[0], 2D 2: MOV TEMP[1].y, IMM[0].xxxx 3: MOV TEMP[1].x, TEMP[0].wwww 4: TRUNC TEMP[0], IN[0] 5: MOV TEMP[1].xy, TEMP[1].xyyy 6: TEX TEMP[1], TEMP[1], SAMP[1], 2D 7: MAD TEMP[1], TEMP[1], IMM[0].yyyy, IMM[0].zzzz 8: TRUNC TEMP[1], TEMP[1] 9: MUL TEMP[0], TEMP[0], TEMP[1] 10: MUL TEMP[0], TEMP[0], IMM[0].wwww 11: TRUNC TEMP[0], TEMP[0] 12: MIN TEMP[0], TEMP[0], IMM[0].yyyy 13: MUL TEMP[1], TEMP[0], IMM[1].xxxx 14: MUL TEMP[0].x, TEMP[0].wwww, IMM[0].wwww 15: MOV OUT[0], TEMP[1] 16: MOV OUT[1], TEMP[0].xxxx 17: END Right ... other things deal with this by using the cso_cache (or the backend driver handles it). We probably should for this as well. Add a per-buffer dirty bit and only set it if it's actually changed. Or add it to the cso_context logic. Thanks for the report. We will fix it. Thanks you. I did a quick benchmark of my testcase: raw GIT => Mean by frame: 32.083336ms (31.168831fps) GIT + hack to remove the new program flags from SSBO and images => Mean by frame: 21.586538ms (46.325169fps) Note: testcase uses lots of shader bind, so I guess it is kinds of a worst case for the perf. I've pushed out some changes to nvc0 to reduce overhead of updating ssbo/images. There are additional patches I've sent out to validate ssbo/images more often in the st (right now we miss some cases). Let me know if the profile looks any better now. I don't know about the reporter's case, but I have ran some benchmarks and tests with f018456901ee291181ecce74c30b19c9f6731f06 (latest revision before those four patches) and fd6bbc2ee205ed02f66a8d8ef5b2adf4005d588c (the latest revision, with the four patches) on my GTX 770 + FX-8320 @ 4.1GHz, focusing on CPU-bound cases. The results are all to the better - on most games I tested I see 4-10% performance boost. Am only going to list a pair of highlights: · Age of Wonders III, my own severely CPU limited testcase: 21 fps -> 26 fps, a jump by a whooping 23.8% (still CPU-bound, though). · Payday 2, well, this game has no [reproducable] way to benchmark it, but the gameplay used to be nightmare filled with severe rubber-banding, running just some 18-22 fps in many situations, all while painfully CPU-bound. Now, most of rubber-banding is either gone or is a lot less noticeable. The framerate in these aforementioned situations went up to 25-60; dipping below 30 very rarely, while mostly maintaining over 2x performance boost. Basically, these four patches made the game *playable* on nouveau. (The game is still very painfully CPU-bound, though.) So, at least here, I can see clear performance benefits. Will leave to be marked as RESOLVED by the reporter; don't want to hijack his issue. Hello, It is much better. I disabled my cpu turbo to reduce perf variation hence the smaller value. I'm now around 33-34 fps with latest git. For reference, if I disable validation completely validation with an hack. I'm around 35-36fps. It isn't completely free but it feels good enough. Maybe one can create a benchmark test ping-pong between 2 differents programs (could be the same compiled twice). Issue can be closed. Hi Ilia, You told me by IRC that you validate all SSBOs when one is updated. I suspecting a similar patter for UBO. I.e. all UBOs are validated when one is updated. Potentially validation is even done for all shader stages. Anyway, I move a bit my UBO declaration to reduce the number of active UBO for a draw call. And I managed to win a couples of fps (67 fps => 70 fps). So it might worth to investigate further the single SSBO/UBO bucket validation. (In reply to gregory.hainaut from comment #10) > Hi Ilia, > > You told me by IRC that you validate all SSBOs when one is updated. I > suspecting a similar patter for UBO. I.e. all UBOs are validated when one is > updated. Nope. UBOs (and textures) have their individual validation "buckets". > > Potentially validation is even done for all shader stages. Anyway, I move a > bit my UBO declaration to reduce the number of active UBO for a draw call. > And I managed to win a couples of fps (67 fps => 70 fps). > > So it might worth to investigate further the single SSBO/UBO bucket > validation. There are different stages of validation. It's all extremely confusing. st/mesa validates everything, because it has to - which UBO is bound to where is based on program uniform settings: binding = &st->ctx->UniformBufferBindings[shader->UniformBlocks[i]->Binding]; So if either of those are updated, we have to revalidate. However there's a CSO cache backing UBOs, which will avoid propagating the set to the backend if nothing has changed. I don't think we can do much better than this without some much larger rejiggers. Perhaps there are still some things we can do to speed up common scenarios like "there are no ubos" or "there are no ssbos" or "there are no images". But it doesn't seem immediately apparent to me. Actually what I saw is that all UBOs are validated when programs are switched. But I guess it is normal. I need to dig further. Thanks for the fixes. (In reply to Gediminas Jakutis from comment #8) > I don't know about the reporter's case, but I have ran some benchmarks and > tests with f018456901ee291181ecce74c30b19c9f6731f06 (latest revision before > those four patches) and fd6bbc2ee205ed02f66a8d8ef5b2adf4005d588c (the latest > revision, with the four patches) on my GTX 770 + FX-8320 @ 4.1GHz, focusing > on CPU-bound cases. > > The results are all to the better - on most games I tested I see 4-10% > performance boost. Am only going to list a pair of highlights: > > · Age of Wonders III, my own severely CPU limited testcase: 21 fps -> 26 > fps, a jump by a whooping 23.8% (still CPU-bound, though). > · Payday 2, well, this game has no [reproducable] way to benchmark it, but > the gameplay used to be nightmare filled with severe rubber-banding, running > just some 18-22 fps in many situations, all while painfully CPU-bound. Now, > most of rubber-banding is either gone or is a lot less noticeable. The > framerate in these aforementioned situations went up to 25-60; dipping below > 30 very rarely, while mostly maintaining over 2x performance boost. > Basically, these four patches made the game *playable* on nouveau. (The game > is still very painfully CPU-bound, though.) > > So, at least here, I can see clear performance benefits. > Will leave to be marked as RESOLVED by the reporter; don't want to hijack > his issue. I saw the same thing with PAYDAY 2, but I couldn't restore the low perf so I guess they just reworked their engine while they added the SMAA and SSAO thing, so I doubt those patches had anything to do with that :/ |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.