Created attachment 133229 [details]
the program is attached, I debugged using the following:
-gdb break 380 & print answer
-investigation of answer and answer
while in both moments of the program they point to exactly the same array, the content is different. The only thing that happened in between is the SHA1Init(&ctx), which should not interfere with output and k_pad/digest.
Created attachment 133230 [details]
clover dump files
fyi: I ran the program through oclgrind, this reports no problems ...
I ran the program with pocl, there's no problem there,
I ran the program with amdgpu-pro, no problem either
confirmed on -RC3 (initial report on RC2)
I got it running on both pocl and clover. I will attach a working version with a built_hints.txt file included, which also includes debug information to debug memory contents.
what I concluded:
- atom_add does not work correctly
- this workaround works, where it should not:
i = ctx->l1;
i+= len << 3;
Created attachment 133403 [details]
working program source
the source contains the following modifications:
-use of local memory
-insertion of debug function
- removal of atom_add
seems like I am a very bad programmer...
after investigating the OpenCL reference manual, I found out my code was unstable, and I need to fix it before I can post something useful