Summary: | OpenCL enqueueReadBuffer returns trash | ||
---|---|---|---|
Product: | Mesa | Reporter: | Janpieter Sollie <janpieter.sollie> |
Component: | Mesa core | Assignee: | mesa-dev |
Status: | RESOLVED FIXED | QA Contact: | mesa-dev |
Severity: | major | ||
Priority: | medium | CC: | janpieter.sollie |
Version: | 17.1 | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
See Also: | https://bugs.freedesktop.org/show_bug.cgi?id=102030 | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Bug Depends on: | |||
Bug Blocks: | 99553 | ||
Attachments: |
command log with GDB showing the results of a library switch from amdgpu-pro to mesa
the dump file as requested by mr Vesely. original program sources dump optimized for the os_memcmp file (not functional) dump optimized for os_memcmp analysis (functional) |
I doubt that clEnqueueReadBuffer is at fault here, more likely the results produced by the kernel are wrong. can you post the GPU kernel that computes the result? Hi Mr. Vesely, hereby a minimal extract: __kernel void startTest(__constant char* arg, __constant uchar* arg2, size_t arg3, __constant char* phrase, size_t _len, __global int* output) { char result; if(output[3] == 255) return; result = external_function_call(phrase, arg2, arg3, _len, arg); if(result == 2) { output[3] = 255; output[0] = get_global_id(0); output[1] = get_global_id(1); output[2] = get_global_id(2); } else if(result == -1) output[3] = 127; return; } the reason why I doubt the correctness of readbuffer is there is no 255 int at place 3, while something was read from somewhere, so I guess it's looking at the wrong place ... and yes, the global id (0.0.0) will give a result of 2, at least it does with amdpro and pocl it might be, but I'd have expected that to show up more frequently, though it might be related to your hw configuration. does a simpler kernel work? kernel void outTest(global int* out) { out[get_global_id(0)] = get_global_id(0); } yes, a simpler kernel works perfectly. Though I think it only makes the search more difficult: the combination of these functions does not work (no 255 is visible): in external_function_call: while (left > 0) { count++; extern2(2048, count, digest); plen = left > SHA1_MAC_LEN ? SHA1_MAC_LEN : left; // result = get_global_id(0) | get_global_id(1) | get_global_id(2); result += os_memcmp(digest, pos, plen); pos += plen; left -= plen; } return result; in startTest: if(result == 2) { output[3] = 255; output[0] = get_global_id(0); output[1] = get_global_id(1); output[2] = get_global_id(2); } ---------------------------------------- while result = get_global_id(0) | get_global_id(1) | get_global_id(2); // result += os_memcmp(digest, pos, plen); and if(result == 0) { does work! (255 int included) :s I suspect an overflow happens somewhere, but is it possible oclgrind does not detect it? (In reply to Janpieter Sollie from comment #4) > yes, a simpler kernel works perfectly. Though I think it only makes the > search more difficult: yes, it might be miscompilation on clover/llvm part, or the program relies on undefined behaviour that is just implemented differently in mesa/clover. > the combination of these functions does not work (no 255 is visible): > in external_function_call: > while (left > 0) { > count++; > extern2(2048, count, digest); > plen = left > SHA1_MAC_LEN ? SHA1_MAC_LEN : left; > // result = get_global_id(0) | get_global_id(1) | > get_global_id(2); > result += os_memcmp(digest, pos, plen); Is the 'result' variable initialized at the beginning of this function? > pos += plen; > left -= plen; > } > return result; > in startTest: > if(result == 2) { > output[3] = 255; > output[0] = get_global_id(0); > output[1] = get_global_id(1); > output[2] = get_global_id(2); > } > ---------------------------------------- > while > result = get_global_id(0) | get_global_id(1) | > get_global_id(2); > // result += os_memcmp(digest, pos, plen); > > and > if(result == 0) { > > does work! (255 int included) :s > > I suspect an overflow happens somewhere, but is it possible oclgrind does > not detect it? in external_funtion (where the while loop is executed): unsigned char count = 0, result = 0; uchar left = 32, plen; private unsigned char digest[80]; I will try to upgrade to llvm 4.0.1 and see if this changes the behaviour upgraded llvm/clang 4.0.0 -> 4.0.1, re-installed mesa, clc and libdrm, but no change. Even GET_DEVICE_NAME reports itself as 4.0.0 :( no result, after the move to 4.0.1, the output is still useless (In reply to Janpieter Sollie from comment #8) > no result, after the move to 4.0.1, the output is still useless you can try running with CLOVER_DEBUG=clc,llvm,native CLOVER_DEBUG_FILE=dump and upload the results (files dump.*.ll dump.*.asm) Note that this is more useful if you have a minimal reproducer rather than a full program. It will also include your sources (or their compiled version) if you're worried about posting them publicly. you might also consider trying llvm-5 or llvm-6 to see if it fixes the issue. Created attachment 133112 [details]
the dump file as requested by mr Vesely.
I will explain what this program does:
I tried to create a program which proves that wpa/wpa2 can easily be brute-forced if you have a WiFi password of only 8 chars (and you have a GPU miner), and that the minimal requirement should be 12 ... I know my targets can easily be put to the wrong side, but please take my word for it: if I really had bad intentions, I would not have posted it ;)
Created attachment 133114 [details]
original program sources
the bug seems to be located within this function: char os_memcmp(private const void* dest, constant const void* src, const uchar amount) { char j; const int* destination = (const int*) dest; constant int* source = (constant int*) src; for(j = (amount >> 2) - 1; j >= 0; j--) if(destination[j] != source[j]) return 0; return amount; } if you comment out the for loop, the program behaves as expected, if you don't, it returns trash Created attachment 133117 [details]
dump optimized for the os_memcmp file (not functional)
Created attachment 133119 [details]
dump optimized for os_memcmp analysis (functional)
Today I installed LLVM/CLANG 5.0.0 (git) and Mesa 17.2-rc2 this seems to work: the bug is resolved. do you need any additional info from me to figure out whether it is LLVM or Mesa's fault? (In reply to Janpieter Sollie from comment #15) > Today I installed LLVM/CLANG 5.0.0 (git) and Mesa 17.2-rc2 > this seems to work: the bug is resolved. > do you need any additional info from me to figure out whether it is LLVM or > Mesa's fault? no. There were only ~4 patches to clover between 17.1 and 17.2, so I'd say it's LLVM. AFAIK, there also wasn't any plan for LLVM 4.0.2 so upgrading to LLVM 5 is the proper fix. thanks for reporting and testing. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 133073 [details] command log with GDB showing the results of a library switch from amdgpu-pro to mesa Hardware: cpu: 2x Opteron 6276 RAM: 128GB Video card: on-board matrox G200 and AMD Nano (Fiji) Software: Linux: 4.10.17/DRM 3.9.0 (gentoo) LLVM: 4.0.0 gcc: 7.1.0 mesa: 17.1.5 problem: when I render a program using the AMD pro libraries, the clEnqueueReadBuffer command returns the correct output. When I use the mesa libs, the output is trash. A write to the buffer is performed in both cases. before, the whole buffer is 0, and after: AMD: { 0, 0, 0, 255 } MESA: {4177618488, 4177945519, 16468310, 12159302} for both cases, the standard amdgpu kernel module was used. incorrect clReadBuffer results make the system unusable.