AMD JUNIPER (DRM 2.43.0 / 4.6.0)
Mesa 12.1.0-devel (git-3581812)
clpeak - https://github.com/krrishnarraj/clpeak.git
As soon as it starts it's float8 test (earlier ones run fine), the machine locks up and does not recover. Perhaps it attempts to execute some fp64 instructions that are missing on Juniper?
(In reply to Grazvydas Ignotas from comment #0)
> AMD JUNIPER (DRM 2.43.0 / 4.6.0)
> Mesa 12.1.0-devel (git-3581812)
> llvm-3.8 1:3.8-2ubuntu3
> clpeak - https://github.com/krrishnarraj/clpeak.git
> As soon as it starts it's float8 test (earlier ones run fine), the machine
> locks up and does not recover. Perhaps it attempts to execute some fp64
> instructions that are missing on Juniper?
any attempt to use doubles should fail to build the kernel (even with llvm 3.8).
Running with CLOVER_DEBUG=llvm,asm CLOVER_OUTPUT=out_file should give you an idea about what the compiled program looks like, though I'd recommend using llvm 3.9.
Created attachment 124221 [details]
OK so it's the memory bandwidth test that causes the GPU hang, --compute-dp fails with "No double precision support! Skipped", as expected.
llvm 3.9 doesn't seemed to be released so I've build the trunk, but the hang is still there. I've been able to capture the logs before the system dies, attached.
BTW CLOVER_OUTPUT doesn't seem to be handled, did you mean CLOVER_DEBUG_FILE?
Created attachment 124375 [details]
global_bandwidth_v16_local_offset asm dump
One problem is that starting from R700 ADD_INT is VecALU only instruction (should not be in Trans slot), but it was not enough to fix the hang on my Turks.
Using llvm 4.0.1 and the latest git commit from libclc ( 17648cd846390e294feafef21c32c7106eac1e24 ):
I am getting a cpu endless loop with clpeak, fixable with ctrl+c.
Other samples, such as Matrix Multiply work fine.
CLOVER_DEBUG=llvm,asm,clc CLOVER_OUTPUT=clover.out clpeak >dump 2>dump.err
Created attachment 130914 [details]
AMD PALM (DRM 2.49.0 / 4.10.0-qtec-standard, LLVM 4.0.1 + MESA 17.0.3
got this today. No hang.
Device: AMD TURKS (DRM 2.49.0 / 4.11.11-300.fc26.x86_64, LLVM 6.0.0)
Driver version : 17.3.0-devel (Linux x64)
Compute units : 6
Clock frequency : 650 MHz
Global memory bandwidth (GBPS)
float : 40.47
float2 : 41.01
float4 : 38.05
float8 : 25.09
float16 : 13.33
Single-precision compute (GFLOPS)
float : 124.18
float2 : 243.14
float4 : 249.80
float8 : 285.99
float16 : 350.36
No double precision support! Skipped
Integer compute (GIOPS)
int : 62.25
int2 : 122.03
int4 : 123.01
int8 : 122.29
int16 : 122.11
Transfer bandwidth (GBPS)
enqueueWriteBuffer : 18.15
enqueueReadBuffer : 3.06
enqueueMapBuffer(for read) : 6.53
memcpy from mapped ptr : 5.65
enqueueUnmap(after write) : 2108.68
memcpy to mapped ptr : 7.49
Kernel launch latency : 67.10 us
I've changed hardware and can no longer test, so I'll just trust Jan and close this.
turns out I spoke too fast. The GPU still hangs, but Linux is better at recovering.
There are still GPU hang(ring 0 stalled for more than) messages in dmesg.