Bug 103697

Summary: [regression] shared memory size 64k -> 32k?
Product: Mesa Reporter: O Heid <oliver.heid>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED NOTABUG QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium    
Version: 17.3   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: Enable 64kB LDS on AMD Radeon GPUs

Description O Heid 2017-11-12 11:30:24 UTC
All mesa-17.3.0-rcX release candidates balk at compiling compute shaders which allocate more than 32kB shared memory:
    error: Too much shared memory used (65536/32768)
This occurs on Radeon RX 480 dGPU and on FX-8800P APU, both with 64kB LDS (radeonsi driver).
On that hardware all previous versions including mesa-17.2.5 allow to compile and run shaders with up to 64kB shared memory.
Comment 1 Bas Nieuwenhuizen 2017-11-12 22:39:39 UTC
This is working as intended, the limit has been 32k for a while now. 

What changed was that we added error checking in

https://cgit.freedesktop.org/mesa/mesa/commit/?id=a2c8812f919c59933605c5942d6613e14ec8b3d1
Comment 2 O Heid 2017-11-13 15:33:59 UTC
I checked back with AMD and they say that there is hardware support for 64kB shared memory. This actually gets exposed in OpenCL and ROCm/HC.
The 2016 AMD GCN3 manual wrongly states it is only 32kB, but the AMD Vega ISA reference guide has the correct 64kB LDS size.
So it seems the restriction is due to the mesa software stack, not due to hardware. Could you please raise the bar back to 64kB? I wouldn't ask if it wouldn't hit me with with a 2x performance penalty...
Comment 3 Nicolai Hähnle 2017-11-28 14:54:04 UTC
Bas explained what's going on correctly. We could consider raising the limit for compute shaders though. May ask what your use case is?
Comment 4 O Heid 2017-11-28 16:41:38 UTC
Many thanks for your consideration - That would be great!
Our application is real time image processing and display. The main computing effort is delay-and-sum beamforming.
Source and destination data sets are much larger than LDS so we need to do block processing. This causes additional LDS <-> GPURAM traffic essentially inversely proportional to block (LDS) size. A 32k vs 64k limit (and thus twice the RAM traffic) therefore causes quite a performance hit.
Anyway: Mesa's OpenGL performance on AMD Radeons is quite amazing. Great job!
Comment 5 O Heid 2017-12-19 08:45:58 UTC
Created attachment 136272 [details] [review]
Enable 64kB LDS on AMD Radeon GPUs

applies to mesa-17.3.0 and its release candidates

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.