Created attachment 123804 [details] compute shader and GLSL/TGSI/llvm IR. After fixing the two GLSL bugs, I hit a case where one of the compute shaders in ElementalDemo takes a long long time to compile and consumes a lot of memory doing so. #0 llvm::SUnit::addPred (this=this@entry=0x7fa8da02fae0, D=..., Required=Required@entry=true) at /home/airlied/devel/llvm/lib/CodeGen/ScheduleDAG.cpp:67 #1 0x00007fa92c946d2d in llvm::ScheduleDAGInstrs::addPhysRegDataDeps (this=this@entry=0x7fa9248d3840, SU=SU@entry=0x7fa8d9833c40, OperIdx=OperIdx@entry=8) at /home/airlied/devel/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp:318 #2 0x00007fa92c947198 in llvm::ScheduleDAGInstrs::addPhysRegDeps (this=this@entry=0x7fa9248d3840, SU=SU@entry=0x7fa8d9833c40, OperIdx=OperIdx@entry=8) at /home/airlied/devel/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp:370 #3 0x00007fa92c94fd1e in llvm::ScheduleDAGInstrs::buildSchedGraph (this=this@entry=0x7fa9248d3840, AA=0x7fa8e2e55b10, RPTracker=RPTracker@entry=0x0, PDiffs=PDiffs@entry=0x0, LIS=LIS@entry=0x0, TrackLaneMasks=TrackLaneMasks@entry=false) at /home/airlied/devel/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp:959 #4 0x00007fa92c8ff02b in (anonymous namespace)::SchedulePostRATDList::schedule (this=this@entry=0x7fa9248d3840) at /home/airlied/devel/llvm/lib/CodeGen/PostRASchedulerList.cpp:391 #5 0x00007fa92c8ffbbe in (anonymous namespace)::PostRAScheduler::runOnMachineFunction (this=0x7fa8e284a500, Fn=...) at /home/airlied/devel/llvm/lib/CodeGen/PostRASchedulerList.cpp:360 #6 0x00007fa92c880721 in llvm::MachineFunctionPass::runOnFunction (this=0x7fa8e284a500, F=...) at /home/airlied/devel/llvm/lib/CodeGen/MachineFunctionPass.cpp:60
Last I looked at the elemental demo, we have multiple issues: - The shared array (1024 elements) get wrongly promoted to a private array. There is a fix for that at https://lists.freedesktop.org/archives/mesa-dev/2016-April/113832.html - in radeonsi we compile arrays to vectors with insert/extract element. This pretty much results in the array being SSA version, which results in a very large program. - a 1024 element vector does not fit in 256 VGPR's so LLVM tries to load and spill around every operation and therefore every versioned array element takes scratch space. - As a result I needed 7 MiB of scratch space per wave, or 6,7 GiB in total. This overflows the 32-bit buffer size and we only allocate a smaller buffer. - This resulted in hangs (or maybe long long shader execution times, not really sure...). So not sure how long a long long time is, last I tried (which admittedly is some weeks ago) I certainly could get past the compilation stage. If you did get past that and did not get hangs, I'm not sure why. Fixing the first problem also circumvents problems 2 & 3, although it would be nice to get those fixed as well.
https://patchwork.freedesktop.org/patch/82260/ Looks like Bas's patch herre actually fixes this. sorry for the noise.
with your patch any my two on llvm master and mesa master I'm running elementaldemo in 4.3 mode now.
(In reply to Dave Airlie from comment #3) > with your patch any my two on llvm master and mesa master I'm running > elementaldemo in 4.3 mode now. So can we close the bug?
Closing as the scratch buffer issue was fixed.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.