All mesa-17.3.0-rcX release candidates balk at compiling compute shaders which allocate more than 32kB shared memory: error: Too much shared memory used (65536/32768) This occurs on Radeon RX 480 dGPU and on FX-8800P APU, both with 64kB LDS (radeonsi driver). On that hardware all previous versions including mesa-17.2.5 allow to compile and run shaders with up to 64kB shared memory.
This is working as intended, the limit has been 32k for a while now. What changed was that we added error checking in https://cgit.freedesktop.org/mesa/mesa/commit/?id=a2c8812f919c59933605c5942d6613e14ec8b3d1
I checked back with AMD and they say that there is hardware support for 64kB shared memory. This actually gets exposed in OpenCL and ROCm/HC. The 2016 AMD GCN3 manual wrongly states it is only 32kB, but the AMD Vega ISA reference guide has the correct 64kB LDS size. So it seems the restriction is due to the mesa software stack, not due to hardware. Could you please raise the bar back to 64kB? I wouldn't ask if it wouldn't hit me with with a 2x performance penalty...
Bas explained what's going on correctly. We could consider raising the limit for compute shaders though. May ask what your use case is?
Many thanks for your consideration - That would be great! Our application is real time image processing and display. The main computing effort is delay-and-sum beamforming. Source and destination data sets are much larger than LDS so we need to do block processing. This causes additional LDS <-> GPURAM traffic essentially inversely proportional to block (LDS) size. A 32k vs 64k limit (and thus twice the RAM traffic) therefore causes quite a performance hit. Anyway: Mesa's OpenGL performance on AMD Radeons is quite amazing. Great job!
Created attachment 136272 [details] [review] Enable 64kB LDS on AMD Radeon GPUs applies to mesa-17.3.0 and its release candidates
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.