Bug 101952 - OpenCL enqueueReadBuffer returns trash
Summary: OpenCL enqueueReadBuffer returns trash
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Mesa core (show other bugs)
Version: 17.1
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: mesa-dev
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 99553
  Show dependency treegraph
 
Reported: 2017-07-27 13:50 UTC by Janpieter Sollie
Modified: 2017-08-03 18:04 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
command log with GDB showing the results of a library switch from amdgpu-pro to mesa (18.39 KB, text/plain)
2017-07-27 13:50 UTC, Janpieter Sollie
Details
the dump file as requested by mr Vesely. (225.62 KB, application/octet-stream)
2017-07-29 04:34 UTC, Janpieter Sollie
Details
original program sources (8.98 KB, application/octet-stream)
2017-07-29 04:37 UTC, Janpieter Sollie
Details
dump optimized for the os_memcmp file (not functional) (450.05 KB, application/x-compressed-tar)
2017-07-29 05:36 UTC, Janpieter Sollie
Details
dump optimized for os_memcmp analysis (functional) (227.15 KB, application/x-compressed-tar)
2017-07-29 05:37 UTC, Janpieter Sollie
Details

Description Janpieter Sollie 2017-07-27 13:50:04 UTC
Created attachment 133073 [details]
command log with GDB showing the results of a library switch from amdgpu-pro to mesa

Hardware: 
cpu: 2x Opteron 6276
RAM: 128GB 
Video card: on-board matrox G200 and AMD Nano (Fiji)

Software:
Linux: 4.10.17/DRM 3.9.0 (gentoo)
LLVM: 4.0.0
gcc: 7.1.0
mesa: 17.1.5

problem:
when I render a program using the AMD pro libraries, the clEnqueueReadBuffer command returns the correct output.  When I use the mesa libs, the output is trash.  A write to the buffer is performed in both cases. before, the whole buffer is 0, and after:
AMD: { 0, 0, 0, 255 }
MESA: {4177618488, 4177945519, 16468310, 12159302}

for both cases, the standard amdgpu kernel module was used.

incorrect clReadBuffer results make the system unusable.
Comment 1 Jan Vesely 2017-07-27 15:32:08 UTC
I doubt that clEnqueueReadBuffer is at fault here, more likely the results produced by the kernel are wrong. can you post the GPU kernel that computes the result?
Comment 2 Janpieter Sollie 2017-07-27 16:08:21 UTC
Hi Mr. Vesely,
hereby a minimal extract:

__kernel void startTest(__constant char* arg, __constant uchar* arg2, size_t arg3, __constant char* phrase, size_t _len, __global int* output) {
    char result;
    if(output[3] == 255) return;
    result = external_function_call(phrase, arg2, arg3, _len, arg);
    if(result == 2) {
        output[3] = 255;
        output[0] = get_global_id(0);
        output[1] = get_global_id(1);
        output[2] = get_global_id(2);
    }
    else if(result == -1) output[3] = 127;
    return;
}

the reason why I doubt the correctness of readbuffer is there is no 255 int at place 3, while something was read from somewhere, so I guess it's looking at the wrong place ...
and yes, the global id (0.0.0) will give a result of 2, at least it does with amdpro and pocl
Comment 3 Jan Vesely 2017-07-27 16:52:56 UTC
it might be, but I'd have expected that to show up more frequently, though it might be related to your hw configuration.

does a simpler kernel work?

kernel void outTest(global int* out)
{
    out[get_global_id(0)] = get_global_id(0);
}
Comment 4 Janpieter Sollie 2017-07-27 17:56:53 UTC
yes, a simpler kernel works perfectly.  Though I think it only makes the search more difficult:
the combination of these functions does not work (no 255 is visible):
in external_function_call:
     while (left > 0) {
                count++;
                extern2(2048, count, digest);
                plen = left > SHA1_MAC_LEN ? SHA1_MAC_LEN : left;
//              result = get_global_id(0) | get_global_id(1) | get_global_id(2);
                result += os_memcmp(digest, pos, plen);
                pos += plen;
                left -= plen;
        }
        return result;
in startTest:
    if(result == 2) {
        output[3] = 255;
        output[0] = get_global_id(0);
        output[1] = get_global_id(1);
        output[2] = get_global_id(2);
    }
----------------------------------------
while 
              result = get_global_id(0) | get_global_id(1) | get_global_id(2);
//            result += os_memcmp(digest, pos, plen);

and
    if(result == 0) {

does work! (255 int included) :s

I suspect an overflow happens somewhere, but is it possible oclgrind does not detect it?
Comment 5 Jan Vesely 2017-07-27 18:53:18 UTC
(In reply to Janpieter Sollie from comment #4)
> yes, a simpler kernel works perfectly.  Though I think it only makes the
> search more difficult:

yes, it might be miscompilation on clover/llvm part, or the program relies on undefined behaviour that is just implemented differently in mesa/clover.

> the combination of these functions does not work (no 255 is visible):
> in external_function_call:
>      while (left > 0) {
>                 count++;
>                 extern2(2048, count, digest);
>                 plen = left > SHA1_MAC_LEN ? SHA1_MAC_LEN : left;
> //              result = get_global_id(0) | get_global_id(1) |
> get_global_id(2);
>                 result += os_memcmp(digest, pos, plen);

Is the 'result' variable initialized at the beginning of this function?

>                 pos += plen;
>                 left -= plen;
>         }
>         return result;
> in startTest:
>     if(result == 2) {
>         output[3] = 255;
>         output[0] = get_global_id(0);
>         output[1] = get_global_id(1);
>         output[2] = get_global_id(2);
>     }
> ----------------------------------------
> while 
>               result = get_global_id(0) | get_global_id(1) |
> get_global_id(2);
> //            result += os_memcmp(digest, pos, plen);
> 
> and
>     if(result == 0) {
> 
> does work! (255 int included) :s
> 
> I suspect an overflow happens somewhere, but is it possible oclgrind does
> not detect it?
Comment 6 Janpieter Sollie 2017-07-28 04:05:18 UTC
in external_funtion (where the while loop is executed):
	unsigned char count = 0, result = 0;
	uchar left = 32, plen;
	private unsigned char digest[80];
I will try to upgrade to llvm 4.0.1 and see if this changes the behaviour
Comment 7 Janpieter Sollie 2017-07-28 07:59:54 UTC
upgraded llvm/clang 4.0.0 -> 4.0.1,
re-installed mesa, clc and libdrm, but no change.
Even GET_DEVICE_NAME reports itself as 4.0.0 :(
Comment 8 Janpieter Sollie 2017-07-28 16:14:43 UTC
no result, after the move to 4.0.1, the output is still useless
Comment 9 Jan Vesely 2017-07-28 16:36:20 UTC
(In reply to Janpieter Sollie from comment #8)
> no result, after the move to 4.0.1, the output is still useless

you can try running with CLOVER_DEBUG=clc,llvm,native CLOVER_DEBUG_FILE=dump
and upload the results (files dump.*.ll dump.*.asm) Note that this is more useful if you have a minimal reproducer rather than a full program. It will also include your sources (or their compiled version) if you're worried about posting them publicly.

you might also consider trying llvm-5 or llvm-6 to see if it fixes the issue.
Comment 10 Janpieter Sollie 2017-07-29 04:34:42 UTC
Created attachment 133112 [details]
the dump file as requested by mr Vesely.

I will explain what this program does:
I tried to create a program which proves that wpa/wpa2 can easily be brute-forced if you have a WiFi password of only 8 chars (and you have a GPU miner), and that the minimal requirement should be 12 ... I know my targets can easily be put to the wrong side, but please take my word for it: if I really had bad intentions, I would not have posted it ;)
Comment 11 Janpieter Sollie 2017-07-29 04:37:31 UTC
Created attachment 133114 [details]
original program sources
Comment 12 Janpieter Sollie 2017-07-29 05:06:43 UTC
the bug seems to be located within this function:
char os_memcmp(private const void* dest, constant const void* src, const uchar amount) {
    char j;
    const int* destination = (const int*) dest;
    constant int* source = (constant int*) src;
   for(j = (amount >> 2) - 1; j >= 0; j--) if(destination[j] != source[j]) return 0;
    return amount;
}

if you comment out the for loop, the program behaves as expected, if you don't, it returns trash
Comment 13 Janpieter Sollie 2017-07-29 05:36:50 UTC
Created attachment 133117 [details]
dump optimized for the os_memcmp file (not functional)
Comment 14 Janpieter Sollie 2017-07-29 05:37:33 UTC
Created attachment 133119 [details]
dump optimized for os_memcmp analysis (functional)
Comment 15 Janpieter Sollie 2017-07-31 13:49:07 UTC
Today I installed LLVM/CLANG 5.0.0 (git) and Mesa 17.2-rc2
this seems to work: the bug is resolved.
do you need any additional info from me to figure out whether it is LLVM or Mesa's fault?
Comment 16 Jan Vesely 2017-07-31 13:57:29 UTC
(In reply to Janpieter Sollie from comment #15)
> Today I installed LLVM/CLANG 5.0.0 (git) and Mesa 17.2-rc2
> this seems to work: the bug is resolved.
> do you need any additional info from me to figure out whether it is LLVM or
> Mesa's fault?

no. There were only ~4 patches to clover between 17.1 and 17.2, so I'd say it's LLVM. AFAIK, there also wasn't any plan for LLVM 4.0.2 so upgrading to LLVM 5 is the proper fix.

thanks for reporting and testing.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.