Summary: | [GEN8+] up to 10% perf drop on several 3D benchmarks | ||
---|---|---|---|
Product: | Mesa | Reporter: | Eero Tamminen <eero.t.tamminen> |
Component: | Drivers/DRI/i965 | Assignee: | Jason Ekstrand <jason> |
Status: | VERIFIED FIXED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | agomez, mark.a.janes |
Version: | git | Keywords: | bisected, regression |
Hardware: | Other | ||
OS: | All | ||
See Also: |
https://bugs.freedesktop.org/show_bug.cgi?id=107706 https://bugs.freedesktop.org/show_bug.cgi?id=107743 |
||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Bug Depends on: | |||
Bug Blocks: | 109535 |
Description
Eero Tamminen
2018-08-07 11:42:34 UTC
Bisected to series ending in: de9e5cf35a1fd0c66288e49c30dc100ad3e1d3aa Author: Jason Ekstrand <jason@jlekstrand.net> anv/pipeline: Add populate_tcs/tes_key helpers They don't really do anything interesting, but it's more consistent this way. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> ------------------------------------- More than one commit in the series drops the tess benchmark score. more precise bisection: 4434591bf56a6b0c193ef209ea9d7b9e3c95a522 Author: Jason Ekstrand <jason@jlekstrand.net> AuthorDate: Tue Jul 31 06:16:34 2018 -0700 Commit: Jason Ekstrand <jason@jlekstrand.net> CommitDate: Wed Aug 1 18:02:28 2018 -0700 Parent: b0bb547f782 intel/nir: Split IO arrays into elements Containing: master perf Follows: 18.1-branchpoint (2138) Precedes: mesa-18.2.0-rc1 (15) intel/nir: Call nir_lower_io_to_scalar_early Shader-db results on Kaby Lake: total instructions in shared programs: 15166953 -> 15073611 (-0.62%) instructions in affected programs: 2390284 -> 2296942 (-3.91%) helped: 16469 HURT: 505 total loops in shared programs: 4954 -> 4951 (-0.06%) loops in affected programs: 3 -> 0 helped: 3 HURT: 0 Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (In reply to Mark Janes from comment #2) > more precise bisection: ... > intel/nir: Call nir_lower_io_to_scalar_early Thanks! Does this explain also the performance drops in other tests? GfxBench Tessellation test is tessellation shader bound, SynMark GSCloth is transform feedback / streamout / geometry shader test, GpuTest Julia FP64 is fragment shader ALU test, GfxBench ALU2 is ALU & texture access scheduling test and Heaven is bound by overall fragment shader perf. PS. any idea about whether something is going to be done about regression bug 107223, which seems to cause E2E RBC getting disabled in several tests. Jason has posted a revert for the regressing commit: https://patchwork.freedesktop.org/patch/243409/ "Commit 4434591bf56a6b0 caused substantially more URB messages in geometry and tessellation shaders. Before we can really enable this sort of optimization, We either need some way of combining them back together into vectors or we need to do cross-stage vector element elimination without splitting everything into scalars." This explains GfxBench tessellation, SynMark GSCloth and potentially also Unigine Heaven+tessellation regressions. However, GfxBench ALU2, GpuTest Julia FP64 and SynMark PSPom don't use those features, so I wonder whether their (small) regressions come from some other commit... Mark? The i965 Mesa performance CI was reduced in scope a while back, and has not included many of these workloads. I've been working to add them to the standard execution. It takes time, but I'll be in a good position to evaluate all the workloads in the next day or two. I reverted the patch: commit 10f44da775a69561c77438507298363ff4eeb65d Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Wed Aug 8 12:00:55 2018 -0700 Revert "intel/nir: Call nir_lower_io_to_scalar_early" Commit 4434591bf56a6b0 caused substantially more URB messages in geometry and tessellation shaders. Before we can really enable this sort of optimization, We either need some way of combining them back together into vectors or we need to do cross-stage vector element elimination without splitting everything into scalars. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510 Fixes: 4434591bf56a6 "intel/nir: Call nir_lower_io_to_scalar_early" Acked-by: Kenneth Graunke <kenneth@whitecape.org> Tested-by: Mark Janes <mark.a.janes@intel.com> Verified. this regression appears to be back, somewhere between 6027d354d1d and 45b5f5fa25b (In reply to Mark Janes from comment #8) > this regression appears to be back, somewhere between 6027d354d1d and > 45b5f5fa25b It was re-enabled [1]. I'm currently working on a pass to recombine things back into vectors that should solve this as it seem we might have similar issues with the AMD NIR backends. [1] https://cgit.freedesktop.org/mesa/mesa/commit/?id=8d8222461f9d7f497d657c2c0eff70820986429b Ugh... I didn't intend to turn io_to_scalar_early back on in that patch.... Yeah, we have some serious issues with making everything scalar for geometry and other stages. We should probably disable it again unless doing so hurts something badly. (In reply to Timothy Arceri from comment #9) > (In reply to Mark Janes from comment #8) > > this regression appears to be back, somewhere between 6027d354d1d and > > 45b5f5fa25b > > It was re-enabled [1]. > > I'm currently working on a pass to recombine things back into vectors that > should solve this as it seem we might have similar issues with the AMD NIR > backends. > > [1] > https://cgit.freedesktop.org/mesa/mesa/commit/ > ?id=8d8222461f9d7f497d657c2c0eff70820986429b I filed new bug 107706 for that 2 months ago, there's no need to re-open this. *** Bug 107706 has been marked as a duplicate of this bug. *** (In reply to Eero Tamminen from comment #11) > I filed new bug 107706 for that 2 months ago, there's no need to re-open > this. It's the exact same bug and since this one has the relevant information I've marked that one as a duplicate. I've written a new pass which fixes the regression in GfxBench Tessellation, just finishing some testing. Hopefully will get some patches on the list tomorrow. [1] https://gitlab.freedesktop.org/tarceri/mesa/commits/vectorize_io_v2 Because exactly same tests were affected both times this perf regression was introduced to Mesa, I think it's also the cause for the small regressions in the listed fragment shader bound tests: - GpuTest v0.7 Julia FP64 - GfxBench ALU2 - Unigine Heaven - SynMark PSPom Not just for the larger regressions in the 2 Tessellation & Geometry shader tests. Of these, PSPom regression was fixed between these commits: 18cc65edf8: 2018-10-15 17:56:12 i965: Drop assert about number of uniforms in ARB handling 322a919a41: 2018-10-16 11:47:55 anv: Implement VK_EXT_pci_bus_info (My guess is that "nir: Copy propagation between blocks" commit fixed it.) Timothy's latest patch series: https://patchwork.freedesktop.org/series/51642/ Fixes just the larger tessellation & geometry shader regressions, not the smaller fragment shader ones (expected as latest version disables the optimization pass for fragment stage). (In reply to Eero Tamminen from comment #14) > Because exactly same tests were affected both times this perf regression was > introduced to Mesa, I think it's also the cause for the small regressions in > the listed fragment shader bound tests: > - GpuTest v0.7 Julia FP64 > - GfxBench ALU2 > - Unigine Heaven > - SynMark PSPom > > Not just for the larger regressions in the 2 Tessellation & Geometry shader > tests. > > > Of these, PSPom regression was fixed between these commits: > 18cc65edf8: 2018-10-15 17:56:12 i965: Drop assert about number of uniforms > in ARB handling > 322a919a41: 2018-10-16 11:47:55 anv: Implement VK_EXT_pci_bus_info > > (My guess is that "nir: Copy propagation between blocks" commit fixed it.) > > > Timothy's latest patch series: > https://patchwork.freedesktop.org/series/51642/ > > Fixes just the larger tessellation & geometry shader regressions, not the > smaller fragment shader ones (expected as latest version disables the > optimization pass for fragment stage). Splitting and/or vectorising FS inputs seems to case the scheduler to vary wildly. I'm seeing up to +/-50% swings in cycle counts in both directions. I'm not sure what to do about this. (In reply to Timothy Arceri from comment #15) > Splitting and/or vectorising FS inputs seems to case the scheduler to vary > wildly. I'm seeing up to +/-50% swings in cycle counts in both directions. I don't trust scheduler cycle counts much for shaders doing sampling. Texture fetches have most impact on perf/cycles, but when I last looked at this: * Cycle counts are obviously wrong for SIMD32 (when comparing reported cycle changes to the actual perf change), so I don't think it takes latency compensation from threading properly into account * I'm not sure how well it takes cache impact for texture fetches into account either (on average first fetch for given sampler takes more cycles than successive ones, and grouping few fetches to same sampler together and distancing such groups from each other, uses less cycles as that minimizes cache trashing. Windows driver produced shader assembly looks on average noticeably better in this respect than i965 produced assembly) > I'm not sure what to do about this. IMHO i965 instruction scheduler cycle accounting and texture fetch scheduling would need to be improved. Not sure who can do that though. I think that for this bug, your patch is enough though. (In reply to Eero Tamminen from comment #14) > Timothy's latest patch series: > https://patchwork.freedesktop.org/series/51642/ > > Fixes just the larger tessellation & geometry shader regressions, not the > smaller fragment shader ones (expected as latest version disables the > optimization pass for fragment stage). Timothy, is there something preventing merging of this? I didn't see any regressions from you patch, just improvements in the regressed cases. *** Bug 107743 has been marked as a duplicate of this bug. *** Up to ~20% perf drops in Sascha Willems' Vulkan Terrain / tessellation tests, listed here: https://share.lunarg.com/vulkan/report/2957#performance In loop & replay variants (not loadtime) since Mesa 18.1.1, seem likely to be caused by this too. There are too many things being tracked in this bug for me to know whether it's fixed or not. Vulkan regressions are likely fixed by 39aee57523a02552e7eae7df5da488e535aeb1eb (anv: Put MOCS in the correct location). But, that wouldn't affect any GL programs, so any GL regressions may still exist. I think Jason meant this to block the 19.0 release, instead depending on it. (In reply to Kenneth Graunke from comment #20) > There are too many things being tracked in this bug for me to know whether > it's fixed or not. This bug tracks just regression coming from this commit: ------------------------------------------------- commit 8d8222461f9d7f497d657c2c0eff70820986429b Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Mon Jul 23 22:20:41 2018 -0700 intel/nir: Enable nir_opt_find_array_copies ------------------------------------------------- Which re-introduces back the same issue that got fixed in Jason's revert of "intel/nir: Call nir_lower_io_to_scalar_early" commit. Timothy's patch series fixing the issue (but being explicitly limited to tess/geom shaders) with following explanation: Commit 8d8222461f9d7f49 caused substantially more URB messages in geometry and tessellation shaders (due to enabling nir_lower_io_to_scalar_early). This combines io again to avoid this regression while still allowing link time optimisation of components. > Vulkan regressions are likely fixed by > 39aee57523a02552e7eae7df5da488e535aeb1eb (anv: Put MOCS in the correct > location). But, that wouldn't affect any GL programs, so any GL regressions > may still exist. If your commit didn't fix them, IMHO this bug is most likely explanation for tessellation regression. I've got an MR to fix this: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/409 This is fixed by the following commit now in master: commit 6d5d89d25a0a4299dbfcbfeca71b6c7e65ef3d45 (HEAD -> master, origin/master, origin/HEAD) Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Thu Mar 7 15:01:37 2019 -0600 intel/nir: Vectorize all IO The IO scalarization pass that we run to help with linking end up turning some shader I/O such as that for tessellation and geometry shaders into many scalar URB operations rather than one vector one. To alleviate this, we now vectorize the I/O once again. This fixes a 10% performance regression in the GfxBench tessellation test that was caused by scalarizing. Shader-db results on Kaby Lake: total instructions in shared programs: 15224023 -> 15220871 (-0.02%) instructions in affected programs: 342009 -> 338857 (-0.92%) helped: 1236 HURT: 443 total spills in shared programs: 23471 -> 23465 (-0.03%) spills in affected programs: 6 -> 0 helped: 1 HURT: 0 total fills in shared programs: 31770 -> 31766 (-0.01%) fills in affected programs: 4 -> 0 helped: 1 HURT: 0 Cycles was just a lot of churn do to moves being different places. Most of the pure churn in instructions was +/- one or two instructions in fragment shaders. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510 Fixes: 4434591bf56a "intel/nir: Call nir_lower_io_to_scalar_early" Fixes: 8d8222461f9d "intel/nir: Enable nir_opt_find_array_copies" Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (In reply to Eero Tamminen from comment #0) > On SKL GT2, the regressions are following: > - 10% GfxBench Tessellation (onscreen & offscreen) > - 5% SynMark2 GSCloth > - 2-3% GpuTest Julia FP64 > - 1-2% Unigine Heaven (high quality with tessellation, no AA) > - 1% GfxBench ALU2 (onscreen & offscreen), SynMark PSPom (In reply to Jason Ekstrand from comment #24) > This is fixed by the following commit now in master: ... > intel/nir: Vectorize all IO All the other tests improved by the amount regressed, except for these two which were not affected by the fix: - GpuTest Julia FP64 - SynMark PSPom I assume PSPom drop was fixed already in October, with the "nir: Copy propagation between blocks" commit (but it regressed last month, I'll file separate bug for that). FP64 case isn't really relevant. VERIFIED. Fix to this (or commit near it) improved also FurMark by ~1%. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.