Bug 64201 - bfgminer OpenCL usage result segmentation fault on r600g with HD6850
Summary: bfgminer OpenCL usage result segmentation fault on r600g with HD6850
Status: RESOLVED WORKSFORME
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/r600 (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium blocker
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 99553
  Show dependency treegraph
 
Reported: 2013-05-03 21:55 UTC by Erdem U. Altınyurt
Modified: 2017-03-22 15:41 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Possible fix (953 bytes, patch)
2013-05-04 06:00 UTC, Tom Stellard
Details | Splinter Review
Pyrit Error log with Debug (50.59 KB, text/plain)
2013-05-04 12:27 UTC, Erdem U. Altınyurt
Details
Pyrit Debug log with Patched LLVM-trunk (50.38 KB, text/plain)
2013-05-04 12:36 UTC, Erdem U. Altınyurt
Details
GPU Lockup with bfgminer -v1 --benchmark kernel messages (2.75 KB, text/plain)
2013-05-04 23:53 UTC, Erdem U. Altınyurt
Details
Output of R600_DEBUG=trace_cs,nodma for bfgminer after lockup (487.59 KB, text/x-csrc)
2013-05-10 16:07 UTC, Aaron Watry
Details
dmesg lines that correspond to cs trace in previous attachment (2.74 KB, text/plain)
2013-05-10 16:08 UTC, Aaron Watry
Details
flush testing (4.09 KB, patch)
2013-05-10 17:02 UTC, Alex Deucher
Details | Splinter Review
Possible Fix #2 (1.05 KB, patch)
2013-09-17 17:52 UTC, Tom Stellard
Details | Splinter Review
bfgminer debug (9.15 KB, text/plain)
2013-09-17 19:36 UTC, darkbasic
Details
debug radeonsi nopatch (338.10 KB, text/plain)
2013-09-19 20:46 UTC, darkbasic
Details
Possible Fix #3 (6.60 KB, patch)
2013-09-28 00:23 UTC, Tom Stellard
Details | Splinter Review

Description Erdem U. Altınyurt 2013-05-03 21:55:16 UTC
I am using OpenSUSE 12.3 x86_64 with 3.9 Kernel and ATI 6850HD GPU.

I just experiment OpenCL but I cannot make it with open source tools.

compiled llvm, clang with : ./configure --libdir=/usr/lib64 --prefix=/usr --enable-{optimized,pic,shared} --disable-{assertions,docs,timestamps} --enable-targets="x86_64" --enable-experimental-targets="R600"
(for llvm compilation, used http://llvm.org/docs/GettingStarted.html)

And after, compiled mesa-trunk with ./configure --with-gallium-drivers=r600 --prefix=/usr --libdir=/usr/lib64 --enable-{vdpau,texture-float} --with-dri-drivers="" --enable-{gallium-llvm,r600-llvm-compiler,opencl} --enable-glx-tls --enable-shared-{glapi,dricore}

but every utility that I tried, gives a segmentation error to me :-/


[ 3453.462803] python[7401]: segfault at 60 ip 00007f55c16292c0 sp 00007fff191b9138 error 4 in pipe_r600.so[7f55c14c0000+299000]
[ 3465.707476] pyrit[7529]: segfault at 0 ip 00007f53614fb7cb sp 00007f535f2bc5c0 error 6 in libLLVM-3.3svn.so[7f5360eb1000+1004000]
[ 3674.192257] cgminer[8773]: segfault at 20 ip 00007f2b65088710 sp 00007fffc1f24908 error 4 in libLLVM-3.3svn.so[7f2b648af000+1004000]


Most detailed report from pyrit, by using it with benchmark argument;

> pyrit benchmark
Pyrit 0.4.1-dev (svn r308) (C) 2008-2011 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+

Calibrating... 0x7faf109a72e0: i32 = GlobalAddress<i32 (i32, i32, i32)* @llvm.AMDGPU.bit.extract.u32.> 0
Undefined function
UNREACHABLE executed at /run/media/death/OldRoot/temp/llvm/lib/Target/R600/AMDGPUISelLowering.h:56!
Stack dump:
0.	Running pass 'Function Pass Manager' on module 'radeon'.
1.	Running pass 'AMDGPU DAG->DAG Pattern Instruction Selection' on function '@sha1_process'
Aborted

Regards,
Erdem
Comment 1 vincent 2013-05-03 22:30:18 UTC
Can you post the output with R600_DEBUG=cs env var set ?
Comment 2 Tom Stellard 2013-05-03 23:02:53 UTC
For pyrit it looks like we aren't handling one of the intrinsics produced by the AMDILPeephole optimizer, but can you still post the output with the RADEON_DEBUG=cs env variable set.

Do you have a link to where can I download this program?

Also, what other programs didn't work for you?
Comment 3 Tom Stellard 2013-05-04 06:00:51 UTC
Created attachment 78831 [details] [review]
Possible fix

This patch should fix the error you were seeing with pyrit.  However, it's possible you will now see a different error.
Comment 4 Erdem U. Altınyurt 2013-05-04 12:26:39 UTC
Hi friends,

@Vincent, I will add output as an attachment.

@Tom Stellard

For pyrit : svn checkout http://pyrit.googlecode.com/svn/trunk/ pyrit-read-only

I don't patched it. Will report with your patch asap.

Other programs are cgminer : https://github.com/ckolivas/cgminer

And python-opencl package from suse repo that give  pipe_r600.so fault.

Any other program that I trued doesn't work on my card, yet.

Thanks.
Comment 5 Erdem U. Altınyurt 2013-05-04 12:27:13 UTC
Created attachment 78838 [details]
Pyrit Error log with Debug
Comment 6 Erdem U. Altınyurt 2013-05-04 12:36:36 UTC
Created attachment 78839 [details]
Pyrit Debug log with Patched LLVM-trunk

Hi again,
I attached debug log with patch for examination.
Regards,
Erdem
Comment 7 Tom Stellard 2013-05-04 21:49:20 UTC
(In reply to comment #4)
> Hi friends,
> 
> @Vincent, I will add output as an attachment.
> 
> @Tom Stellard
> 
> For pyrit : svn checkout http://pyrit.googlecode.com/svn/trunk/
> pyrit-read-only
> 
> I don't patched it. Will report with your patch asap.
> 
> Other programs are cgminer : https://github.com/ckolivas/cgminer
> 

I would recommend using bfgminer for bitcoin mining.  It auto-detects the mesa platform, and disabled unsupported features.  All you need to do to get it to work is pass the -v1 flag.

> And python-opencl package from suse repo that give  pipe_r600.so fault.
> 

Can you open a separate bug for this or any other programs you've tried.

> Any other program that I trued doesn't work on my card, yet.



> 
> Thanks.
Comment 8 Erdem U. Altınyurt 2013-05-04 23:52:16 UTC
(In reply to comment #7)
> I would recommend using bfgminer for bitcoin mining.  It auto-detects the
> mesa platform, and disabled unsupported features.  All you need to do to get
> it to work is pass the -v1 flag.

I just want try openCL programs. Not intended to waste energy with those yet, until I produce my energy by solar panels...

With "bfgminer -v1 --benchmark", I think it start working first. But hey! I got GPU lockups which is serious bug I think! :-/
I attached the kernel messages.

Anyway I got 1.6M hash/s.


Also "bfgminer -v1 --benchmark --scrypt" gives segmentation fault.
Attaching the debug output with R600_cs=DEBUG.



> > And python-opencl package from suse repo that give  pipe_r600.so fault.
> Can you open a separate bug for this or any other programs you've tried.
ACK
Comment 9 Erdem U. Altınyurt 2013-05-04 23:53:59 UTC
Created attachment 78868 [details]
GPU Lockup with bfgminer -v1 --benchmark kernel messages
Comment 10 Erdem U. Altınyurt 2013-05-05 00:18:35 UTC
Also faced same lockups at GIMP OpenCL, otherwise it looks start working, slowwwwly. :)
If bfgminer fix doesn't fix GIMP also, I will open bug report for it also.
Thanks.
Comment 11 Aaron Watry 2013-05-06 20:15:00 UTC
If you're experiencing GPU lockups w/ OpenCL, I'd recommend making sure that you mesa includes:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=4539f8e20af286d1f521eb016c89c6d9af0b801c

This fixed a lot of common CL-based lockups on my Radeon 6850.
Comment 12 Erdem U. Altınyurt 2013-05-06 20:59:29 UTC
(In reply to comment #11)
> If you're experiencing GPU lockups w/ OpenCL, I'd recommend making sure that
> you mesa includes:
> http://cgit.freedesktop.org/mesa/mesa/commit/
> ?id=4539f8e20af286d1f521eb016c89c6d9af0b801c
> 
> This fixed a lot of common CL-based lockups on my Radeon 6850.

Nope, I am using mesa-trunk that  includes this patch also, but it looks doesn't help much...
Comment 13 Erdem U. Altınyurt 2013-05-08 00:42:52 UTC
Oops! LLVM 3.3 is branched today.
I wish this bug vanished from LLVM before release of v3.3 in June

Updated today's mesa and llvm-trunks...
Without "-v1" flag, "bfgminer --benchmark" reports:

[2013-05-08 03:37:54] Error -11: Building Program (clBuildProgram)
[2013-05-08 03:37:54] input.cl:197:7: error: initializing 'uint' (aka 'unsigned int') with an expression of incompatible type 'unsigned int __attribute__((ext_ve
ctor_type(2)))'
        uint r = rot(W[3].x,25u)^rot(W[3].x,14u)^((W[3].x)>>3U);



@Aaron, could you use pyrit or any other OpenCL tools with yours HD6850 without lockup?
Regards.
Comment 14 Aaron Watry 2013-05-10 03:16:43 UTC
> @Aaron, could you use pyrit or any other OpenCL tools with yours HD6850
> without lockup?

I get the same error from pyrit (cpyrit-opencl) as you do:

> awatry@veer:~/src/opencl_applications/cpyrit-opencl-0.4.0$ pyrit benchmark
> Pyrit 0.4.0 (C) 2008-2011 Lukas Lueg http://pyrit.googlecode.com
> This code is distributed under the GNU General Public License v3+
> 
> Calibrating... LLVM ERROR: Not supported instr: <MCInst 206 <MCOperand Reg:1046> <MCOperand Reg:1031> <MCOperand Imm:0> <MCOperand Imm:0>>

I'm not sure right now what instruction is causing that error.

I do get lockups when I run:
bfgminer -v1 --benchmark

Kernel: http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-next/2013-05-03-raring/
Mesa: git master as of yesterday
LLVM: git master as of a70d02ff284, with a few additional patches on top
drm: git master 040f6b015ef7d9c

A simpler test case might be the piglit min() builtin CL test.  Does that test lock up for you as well?  You might have to run it a few times.
Comment 15 Tom Stellard 2013-05-10 04:24:33 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > I would recommend using bfgminer for bitcoin mining.  It auto-detects the
> > mesa platform, and disabled unsupported features.  All you need to do to get
> > it to work is pass the -v1 flag.
> 
> I just want try openCL programs. Not intended to waste energy with those
> yet, until I produce my energy by solar panels...
> 
> With "bfgminer -v1 --benchmark", I think it start working first. But hey! I
> got GPU lockups which is serious bug I think! :-/
> I attached the kernel messages.
> 

I'm not sure that benchmark mode works for bfgminer.  Can you try without --benchmark
Comment 16 Aaron Watry 2013-05-10 13:26:43 UTC
> I'm not sure that benchmark mode works for bfgminer.  Can you try without
> --benchmark

I just signed up with a pool and tried:
bfgminer -v1 -o [super_secret] -u [even_more_secret] -p [hi!!!]

About 20-30 seconds after the bfgminer UI showed up I got a lock (and the GPU didn't reset after 10+ seconds).

I'm trying to find an older kernel which will still work on my machine to test against...
Comment 17 Aaron Watry 2013-05-10 16:05:49 UTC
I can at least confirm that the lock-ups also happen on a CEDAR (HD 5400) with the 3.9.0 kernel and latest drm/llvm/mesa master.  This machine at least doesn't hard lock (GPU resets after 10 seconds...), so I'll attach the results of:

R600_DEBUG=trace_cs,nodma bfgminer -v1
Comment 18 Aaron Watry 2013-05-10 16:07:10 UTC
Created attachment 79106 [details]
Output of R600_DEBUG=trace_cs,nodma for bfgminer after lockup
Comment 19 Aaron Watry 2013-05-10 16:08:56 UTC
Created attachment 79107 [details]
dmesg lines that correspond to cs trace in previous attachment
Comment 20 Tom Stellard 2013-05-10 16:20:24 UTC
(In reply to comment #17)
> I can at least confirm that the lock-ups also happen on a CEDAR (HD 5400)
> with the 3.9.0 kernel and latest drm/llvm/mesa master.  This machine at
> least doesn't hard lock (GPU resets after 10 seconds...), so I'll attach the
> results of:
> 
> R600_DEBUG=trace_cs,nodma bfgminer -v1

One thing you may want to check is that Mesa is picking up the correct LLVM shared object.  The LLVM version was just bumped and the current shared object is now called libLLVM-3.4svn.so.  I just ran into this problem myself.
Comment 21 Alex Deucher 2013-05-10 17:02:10 UTC
Created attachment 79110 [details] [review]
flush testing

I suspect there may still be issues with flushing.  You might try playing with that.  This patch has some possibilities to try.  Ideally we'd use the SURFACE_SYNC stuff rather than the FLUSH_AND_INV events on evergreen+.
Comment 22 Aaron Watry 2013-05-10 19:39:06 UTC
I've gone back to kernel 3.6.11 and still get GPU locks when running this program (and the min() CL builtin in piglit).

I'm leaning towards the issue being in mesa, which I'll have a much easier time bisecting than kernel changes.

Tom, I've also verified that the LLVM .so version is correct (after deleting the old one and rebuilding mesa).

I've tried Alex's patch as is, and also when swapping in the commented line in evergreen_compute.c...  It still locks, but we'll see if I can bisect to find a culprit.
Comment 23 Tom Stellard 2013-05-14 18:23:58 UTC
The pyrit failure should be a separate bug.  The failure is caused by the lack of proper private address space support.  This same issues also affects a few of the GEGL filters.  New bugs should be opened for pyrit and GEGL.  Let's keep the focus of this bug on hangs in bfgminer.
Comment 24 Aaron Watry 2013-05-17 17:51:12 UTC
I haven't managed to finish bisecting this yet, but I believe that this bug has been present since at least January 9th, 2013:

mesa:  4f2d9a8f520cda
llvm:  1db9b6957c
clang: 50767d8c8f2f667255bd

libclc was latest from my git repo on fd.o, with the following commits reverted to make it build against older llvm:
f617f2dfa68
1114e99b296

I'll keep going in trying to nail down the commit that this broke in, although I'll soon be back into having to build Tom's llvm tree to get the AMDGPU back-end again.
Comment 25 Olivier Langlois 2013-05-24 16:00:51 UTC
Hi,

I have a segfault when running bfgminer. I have compiled llvm,libclc and mesa yesterday from git/svn trunk.

The core seems to happen when llvm compiles the cl program. So the crash happens before anything gets executed on the GPU. Maybe llvm is compiling differently depending on the target.

I am currently rebuilding debug build of llvm to have more info but in the meantime, just in case that you would recognize something you've seen often:

Program terminated with signal 11, Segmentation fault.
#0  0x00007f090591a5e5 in llvm::Linker::linkInModule(llvm::Module*, unsigned int, std::string*) () from /usr/lib/llvm/libLLVM-3.4svn.so
(gdb) where
#0  0x00007f090591a5e5 in llvm::Linker::linkInModule(llvm::Module*, unsigned int, std::string*) () from /usr/lib/llvm/libLLVM-3.4svn.so
#1  0x00007f090591ccaf in llvm::Linker::LinkModules(llvm::Module*, llvm::Module*, unsigned int, std::string*) ()
   from /usr/lib/llvm/libLLVM-3.4svn.so
#2  0x00007f09070fbecc in clover::compile_program_llvm(clover::compat::string const&, pipe_shader_ir, clover::compat::string const&, clover::compat::string const&) () from /usr/lib/libOpenCL.so
#3  0x00007f09070deb19 in _cl_program::build(std::vector<_cl_device_id*, std::allocator<_cl_device_id*> > const&, char const*) ()
   from /usr/lib/libOpenCL.so
#4  0x00007f09070efa70 in clBuildProgram () from /usr/lib/libOpenCL.so
#5  0x000000000043ac60 in ?? ()
#6  0x0000000000436d1d in ?? ()
#7  0x0000000000406af3 in ?? ()
#8  0x00007f0908b92a15 in __libc_start_main () from /usr/lib/libc.so.6
#9  0x0000000000408175 in ?? ()

marie-eve@Kimper /usr/lib $ bfgminer -n
 [2013-05-24 11:59:21] CL Platform 0 vendor: Mesa                    
 [2013-05-24 11:59:21] CL Platform 0 name: Default                    
 [2013-05-24 11:59:21] CL Platform 0 version: OpenCL 1.1 MESA 9.2.0                    
 [2013-05-24 11:59:21] Platform 0 devices: 1                    
 [2013-05-24 11:59:21] 	0	AMD RS780                    
 [2013-05-24 11:59:21] Unable to load ati adl library                    
 [2013-05-24 11:59:21] 1 GPU devices max detected
Comment 26 Tom Stellard 2013-05-24 16:08:26 UTC
> 
> marie-eve@Kimper /usr/lib $ bfgminer -n
>  [2013-05-24 11:59:21] CL Platform 0 vendor: Mesa                    
>  [2013-05-24 11:59:21] CL Platform 0 name: Default                    
>  [2013-05-24 11:59:21] CL Platform 0 version: OpenCL 1.1 MESA 9.2.0         
> 
>  [2013-05-24 11:59:21] Platform 0 devices: 1                    
>  [2013-05-24 11:59:21] 	0	AMD RS780                    

Sorry, OpenCL is not supported for RS780.

>  [2013-05-24 11:59:21] Unable to load ati adl library                    
>  [2013-05-24 11:59:21] 1 GPU devices max detected
Comment 27 Olivier Langlois 2013-05-24 17:50:35 UTC
(In reply to comment #26)
> > 
> > marie-eve@Kimper /usr/lib $ bfgminer -n
> >  [2013-05-24 11:59:21] CL Platform 0 vendor: Mesa                    
> >  [2013-05-24 11:59:21] CL Platform 0 name: Default                    
> >  [2013-05-24 11:59:21] CL Platform 0 version: OpenCL 1.1 MESA 9.2.0         
> > 
> >  [2013-05-24 11:59:21] Platform 0 devices: 1                    
> >  [2013-05-24 11:59:21] 	0	AMD RS780                    
> 
> Sorry, OpenCL is not supported for RS780.
> 
Ok. I'm a bit surprise given that this GPU is in the R600 row at:

http://www.x.org/wiki/RadeonFeature#Decoder_ring_for_engineering_vs_marketing_names

I'll stop my effort to investigate this problem then.

Thank you for your prompt reply.
Comment 28 Olivier Langlois 2013-05-24 18:50:09 UTC
if this can help:

llvm::Linker::SrcM is NULL

(gdb) where
#0  _M_data (this=0x90) at /usr/include/c++/4.8.0/bits/basic_string.h:293
#1  _M_rep (this=0x90) at /usr/include/c++/4.8.0/bits/basic_string.h:301
#2  size (this=0x90) at /usr/include/c++/4.8.0/bits/basic_string.h:716
#3  empty (this=0x90) at /usr/include/c++/4.8.0/bits/basic_string.h:812
#4  run (this=0x7fffe1bd7e30) at LinkModules.cpp:1152
#5  llvm::Linker::linkInModule (this=this@entry=0x7fffe1bd8190, Src=Src@entry=0x0, Mode=Mode@entry=0, 
    ErrorMsg=ErrorMsg@entry=0x7fffe1bd83c0) at LinkModules.cpp:1302
#6  0x00007f74fe1870ef in llvm::Linker::LinkModules (Dest=<optimized out>, Src=0x0, Mode=0, 
    ErrorMsg=0x7fffe1bd83c0) at LinkModules.cpp:1322
#7  0x00007f74ff96eecc in clover::compile_program_llvm(clover::compat::string const&, pipe_shader_ir, clover::compat::string const&, clover::compat::string const&) () from /usr/lib/libOpenCL.so
#8  0x00007f74ff951b19 in _cl_program::build(std::vector<_cl_device_id*, std::allocator<_cl_device_id*> > const&, char const*) () from /usr/lib/libOpenCL.so
#9  0x00007f74ff962a70 in clBuildProgram () from /usr/lib/libOpenCL.so
#10 0x000000000043ac60 in ?? ()
#11 0x0000000000436d1d in ?? ()
#12 0x0000000000406af3 in ?? ()
#13 0x00007f7501405a15 in __libc_start_main () from /usr/lib/libc.so.6
#14 0x0000000000408175 in ?? ()
(gdb) up
#4  run (this=0x7fffe1bd7e30) at LinkModules.cpp:1152
1152	  if (!SrcM->getDataLayout().empty() && !DstM->getDataLayout().empty() &&
(gdb) p *this
$3 = {DstM = 0x23c7030, SrcM = 0x0, TypeMap = {<llvm::ValueMapTypeRemapper> = {
      _vptr.ValueMapTypeRemapper = 0x7f74fec64810 <vtable for (anonymous namespace)::TypeMapTy+16>}, 
    MappedTypes = {<llvm::DenseMapBase<llvm::DenseMap<llvm::Type*, llvm::Type*, llvm::DenseMapInfo<llvm::Type*> >, llvm::Type*, llvm::Type*, llvm::DenseMapInfo<llvm::Type*> >> = {<No data fields>}, Buckets = 0x0, 
      NumEntries = 0, NumTombstones = 0, NumBuckets = 0}, 
    SpeculativeTypes = {<llvm::SmallVectorImpl<llvm::Type*>> = {<llvm::SmallVectorTemplateBase<llvm::Type*, true>> = {<llvm::SmallVectorTemplateCommon<llvm::Type*, void>> = {<llvm::SmallVectorBase> = {
              BeginX = 0x7fffe1bd7e78, EndX = 0x7fffe1bd7e78, CapacityX = 0x7fffe1bd7ef8}, 
            FirstEl = {<llvm::AlignedCharArray<8ul, 8ul>> = {
                buffer = "P\346\061\002\000\000\000"}, <No data fields>}}, <No data fields>}, <No data fields>}, Storage = {InlineElts = {{<llvm::AlignedCharArray<8ul, 8ul>> = {
              buffer = "\001\000\000\000\000\000\000"}, <No data fields>},
Comment 29 Erdem U. Altınyurt 2013-06-09 22:19:21 UTC
Error is still true.
Could you diagnose what the problem is?
Latest error message of bfgminer is:

EmitRawText called on an MCStreamer that doesn't support it,  something must not be fully mc'ized
                                                                                 Stack dump:
0.	Running pass 'Function Pass Manager' on module 'radeon'.
1.	Running pass 'AMDGPU Assembly Printer' on function '@search'
Aborted
Comment 30 Tom Stellard 2013-06-09 22:41:27 UTC
(In reply to comment #29)
> Error is still true.
> Could you diagnose what the problem is?
> Latest error message of bfgminer is:
> 
> EmitRawText called on an MCStreamer that doesn't support it,  something must
> not be fully mc'ized
>                                                                             
> Stack dump:
> 0.	Running pass 'Function Pass Manager' on module 'radeon'.
> 1.	Running pass 'AMDGPU Assembly Printer' on function '@search'
> Aborted

Rebuilding libclc should fix this.
Comment 31 Erdem U. Altınyurt 2013-06-10 23:28:25 UTC
I rebuilt libclc from your repo now.
Also update;rebuild;install the llvm,gallium,bfgminer.

Same error still exists.
Regards
Erdem
Comment 32 Tom Stellard 2013-06-10 23:30:59 UTC
(In reply to comment #31)
> I rebuilt libclc from your repo now.
> Also update;rebuild;install the llvm,gallium,bfgminer.
> 
> Same error still exists.
> Regards
> Erdem


Did you build it with the same version of clang that you are using with Mesa?
Comment 33 Erdem U. Altınyurt 2013-06-14 00:30:04 UTC
Yes. I got same version...
Now, the error is gone. (I clean llvm and mesa repo, updated from trunk and rebuild.)

bfgminer --benchmark generate error of:

[2013-06-14 03:24:24] Error -11: Building Program (clBuildProgram)
 int') with an expression of incompatible type 'unsigned int __attribute__((ext_vector_type(2)))'
        uint r = rot(W[3].x,25u)^rot(W[3].x,14u)^((W[3].x)>>3U);
             ^   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 [2013-06-14 03:24:24] Failed to init GPU thread 0, disabling device 0
 [2013-06-14 03:24:24] Restarting the GPU from the 


"bfgminer --benchmark -v1" locks the GPU and gpu doesn't restart...
Thanks.
Comment 34 Erdem U. Altınyurt 2013-06-28 21:30:29 UTC
From latest llvm & mesa trunk:
"bfgminer --benchmark" generate error of:

EmitRawText called on an MCStreamer that doesn't support it,  something must not be fully mc'ized
                                                                                 Stack dump:
0.Running pass 'Function Pass Manager' on module 'radeon'.
1.Running pass 'AMDGPU Assembly Printer' on function '@search'
Aborted
Comment 35 Tom Stellard 2013-06-28 21:33:23 UTC
(In reply to comment #34)
> From latest llvm & mesa trunk:
> "bfgminer --benchmark" generate error of:
> 
> EmitRawText called on an MCStreamer that doesn't support it,  something must
> not be fully mc'ized
>                                                                             
> Stack dump:
> 0.Running pass 'Function Pass Manager' on module 'radeon'.
> 1.Running pass 'AMDGPU Assembly Printer' on function '@search'
> Aborted

This error means you need to rebuild libclc using whatever version of clang you linked Mesa against.
Comment 36 Erdem U. Altınyurt 2013-06-28 21:38:17 UTC
Ops. I forgot to update libclc.

After compiling it from trunk

 [2013-06-29 00:32:23] Started bfgminer 3.1.1
 [2013-06-29 00:32:23] CL Platform 0 vendor: Mesa
 [2013-06-29 00:32:23] CL Platform 0 name: Default
 [2013-06-29 00:32:23] CL Platform 0 version: OpenCL 1.1 MESA 9.2.0-devel
 [2013-06-29 00:32:23] Platform 0 devices: 1
 [2013-06-29 00:32:23] 	0	AMD BARTS
 [2013-06-29 00:32:23] Unable to load ati adl library
 [2013-06-29 00:32:23] Init GPU thread 0 GPU 0 virtual GPU 0
 [2013-06-29 00:32:23] CL Platform vendor: Mesa
 [2013-06-29 00:32:23] CL Platform name: Default
 [2013-06-29 00:32:23] CL Platform version: OpenCL 1.1 MESA 9.2.0-devel
 [2013-06-29 00:32:23] List of devices:
 [2013-06-29 00:32:23] 	0	AMD BARTS
 [2013-06-29 00:32:23] Selected 0: AMD BARTS
 [2013-06-29 00:32:23] Selecting phatk kernel for Mesa
 [2013-06-29 00:32:24] Initialising kernel phatk121016.cl without bitalign, 2 vectors and worksize 128
 [2013-06-29 00:32:24] initCl() finished. Found AMD BARTS
 [2013-06-29 00:32:24] 1 gpu miner threads started
 [2013-06-29 00:32:24] Pool 0 not providing work fast enough
LLVM ERROR: Cannot select: 0x7fc57026bd40: v4i32,ch = load 0x7fc5701aadf0, 0x7fc57026b960, 0x7fc570268c40<LD8[undef], zext from i64> [ID=157]
  0x7fc57026b960: i32 = Constant<92> [ID=16]
  0x7fc570268c40: i32 = undef [ID=2]
In function: search
Comment 37 Erdem U. Altınyurt 2013-07-10 01:47:11 UTC
Any news?
Comment 38 Aaron Watry 2013-07-12 13:21:51 UTC
(In reply to comment #37)
> Any news?

Sorry to leave you hanging here...  I was still able to reproduce the lockups as of a week ago on a Cedar (5400), but I replaced my desktop 6850 with a 7850 about a month back and I've been focusing most of my efforts there.
Comment 39 Erdem U. Altınyurt 2013-08-12 02:37:45 UTC
Still got luck ups with my HD6850.
Using latest trunk with 3.11 RC4 kernel.. :(
Comment 40 Aaron Watry 2013-08-14 17:37:50 UTC
Can you apply the following LLVM 2 code series, build the latest upstream libclc version, and try again?

LLVM Patch series:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130812/184088.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130812/184089.html

Libclc:
git clone http://llvm.org/git/libclc.git

With that, and the latest mesa git code, I've managed to run bfgminer for ~5 minutes on a Radeon 5400 without a single lockup.  This same GPU used to lock up within 5 seconds of starting: bfgminer -v1 --benchmark
Comment 41 Erdem U. Altınyurt 2013-08-16 03:20:35 UTC
I think those patches are merged with llvm-trunk. Manually installation give me hunks and fails.

I update/build/install llvm-trunk,ibclc-trunk mesa-trunk and now I still got lock ups!???
( Using 3.11.rc5 with disabled dpm )

---


If you have clean patches, I can test them also.
Because this patches gives me 

/temp/llvm/lib/Target/R600/SIInstrInfo.td:39:1: error: def 'SIload_constant' already defined

error at compile time...
Thanks
Comment 42 Aaron Watry 2013-08-16 03:45:42 UTC
(In reply to comment #41)
> I think those patches are merged with llvm-trunk. Manually installation give
> me hunks and fails.

The second LLVM series is not yet merged upstream, but the first one is.  I was able to use a clean LLVM/Clang checkout from today with the second series on my Cedar-based machine.
Comment 43 Erdem U. Altınyurt 2013-08-16 05:46:14 UTC
I patched llvm with

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130812/184089.html

using

patch -N -p1 -i p2.patch

But still got lock-ups.
Any hints? :/
Comment 44 Aaron Watry 2013-09-04 17:02:43 UTC
I just retested on my CEDAR with fresh git mesa/llvm. bfgminer is still working correctly for 5+ minutes without a single lock-up on my machine.  Kernel is 3.11 final with DPM enabled.

The following are the only patches that I have applied that are not yet upstream:

LLVM:
http://llvm-reviews.chandlerc.com/D1449

Mesa:
http://lists.freedesktop.org/archives/mesa-dev/2013-August/043932.html
http://lists.freedesktop.org/archives/mesa-dev/2013-August/043951.html
http://lists.freedesktop.org/archives/mesa-dev/2013-August/043686.html

You probably don't need that last patch, since it's targeted at Southern Islands.

I honestly don't know if any of those patches would improve anything for you, but it's what I just successfully tested my evergreen with. You may need to do some slight re-basing of the first patch to take into account some of the recent changes in r600g.

I do have my 6850 installed in an old Athlon 64 machine at home.  I was already thinking of hooking that machine back up (it's in storage at the moment), so I may be able to re-test this on my 6850 in the next few days... assuming I don't get sidetracked by house maintenance issues.
Comment 45 Aaron Watry 2013-09-09 03:37:22 UTC
(In reply to comment #43)
> I patched llvm with
> 
> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130812/184089.
> html
> 
> using
> 
> patch -N -p1 -i p2.patch
> 
> But still got lock-ups.
> Any hints? :/

I just did a fresh install of Ubuntu 3.04 64bit on an Athlon 64 with a radeonsi 6850.

I did a checkout of LLVM/clang master and applied only the following patch:
http://llvm-reviews.chandlerc.com/file/data/lzktdebskepaizauyiqg/PHID-FILE-ke45wuyidarlnthfbtcg/D1449.diff

Without that patch, libclc won't currently build for the radeonsi target.

Along with that I checked out git copies of Mesa, drm, and libclc.  I upgraded the kernel to 3.11 from the Ubuntu mainline ppa, but I didn't enable dpm (machine wouldn't boot with it enabled, and I was too busy to debug that).

With this done and all of the aforementioned packages compiled and installed to /usr/local, I have been running 'bfgminer -v1 --benchmark' without a single lockup for 20 minutes now on a radeonsi 6850 that used to exhibit the same lock ups that we have been discussing, and which also seem to have been cured on the 5400 card I was esting on before.
Comment 46 Aaron Watry 2013-09-09 03:39:34 UTC
Autocorrect got the best of me...  s/radeonsi/radeon/g
Comment 47 Erdem U. Altınyurt 2013-09-10 20:26:05 UTC
Hi Aaron,
Well, without the patch, I can verify that bfgminer started to work on my HD6850, using mesa/llvm/libclc trunk.

Thanks.
Comment 48 darkbasic 2013-09-16 20:10:50 UTC
bfgminer -v1 --scrypt keeps segfaulting (without scrypt it works fine).
Scrypt is needed for litecoin mining.
Comment 49 Tom Stellard 2013-09-17 17:52:07 UTC
Created attachment 86004 [details] [review]
Possible Fix #2

Can you try this patch, and if it doesn't work post the output with R600_DEBUG=cs
Comment 50 darkbasic 2013-09-17 19:06:23 UTC
It doesn't work, but at least now it hangs at:

R600_DEBUG=cs bfgminer --scrypt -o stratum+tcp://stratum2.wemineltc.com:3334 -u user.1 -p password -I 13 -v1

 [2013-09-17 21:03:59] Started bfgminer 3.2.0
 [2013-09-17 21:03:59] Started bfgminer 3.2.0
 [2013-09-17 21:03:59] Probing for an alive pool
 [2013-09-17 21:04:00] Network difficulty changed to 1.03k ( 7.39Gh/s)
 [2013-09-17 21:04:00] Stratum from pool 0 detected new block
 [2013-09-17 21:04:00] Pool 0 is hiding block contents from us
 [2013-09-17 21:04:24] Killing OCL 0Errore di segmentazione

and as soon as I press CTRL+C I get a segfault in dmesg:
[ 1275.112334] bfgminer[6407]: segfault at 2d0 ip 00007f6a3d18ef01 sp 00007fffc765cc10 error 4 in libpthread-2.15.so[7f6a3d185000+17000]

I don't know where to find the output of R600_DEBUG=cs, sorry.
Comment 51 darkbasic 2013-09-17 19:36:20 UTC
Created attachment 86012 [details]
bfgminer debug

Here is the stderr output of RADEON_DUMP_SHADERS=1 bfgminer --scrypt -o stratum+tcp://stratum2.wemineltc.com:3334 -u user.1 -p password -v1

Thanks
Comment 52 darkbasic 2013-09-19 20:46:17 UTC
Created attachment 86167 [details]
debug radeonsi nopatch

With the patch there are no shaders in the output, so here is the output without the patch.

RADEON_DUMP_SHADERS=1 bfgminer --scrypt -o stratum+tcp://stratum2.wemineltc.com:3334 -u user.1 -p password -v1 2> debug.txt
Comment 53 Tom Stellard 2013-09-28 00:23:14 UTC
Created attachment 86762 [details] [review]
Possible Fix #3

Can you try this patch?
Comment 54 darkbasic 2013-09-28 01:53:23 UTC
No sorry but it doesn't help.
Comment 55 Tom Stellard 2013-09-28 03:06:02 UTC
(In reply to comment #54)
> No sorry but it doesn't help.

Sorry, I should have mentioned this will only help with pre-SI GPUs.
Comment 56 Jan Vesely 2017-03-02 20:43:08 UTC
this is an old bug. Is this still an issue?

pyrit works OK on my turks as well as carrizo+iceland:
#2: 'OpenCL-Device 'AMD TURKS (DRM 2.48.0 / 4.9.12-200.fc25.x86_64, LLVM 5.0.0)'': 10480.1 PMKs/s (RTT 2.3)

#1: 'OpenCL-Device 'AMD CARRIZO (DRM 3.1.0 / 4.4.0-ROC, LLVM 4.0.0)'': 16205.3 PMKs/s (RTT 1.6)
#2: 'OpenCL-Device 'AMD ICELAND (DRM 3.1.0 / 4.4.0-ROC, LLVM 4.0.0)'': 2072.7 PMKs/s (RTT 1.5)

bfgminer works as well:
OCL0        | 20s:106.8 avg:79.71 u:23.09 Mh/s | A:1 R:0+0(none) HW:0/none

OCL0        | 20s:184.9 avg:182.3 u:75.27 Mh/s | A:2 R:2+0( 50%) HW:0/none
OCL1        | 20s:48.50 avg:47.25 u:37.64 Mh/s | A:1 R:1+0( 50%) HW:0/none

cgminer removed GPU code long time ago
Comment 57 darkbasic 2017-03-02 20:48:28 UTC
Can't test right now, I'm reinstalling my desktop with the HD7950.
Comment 58 Vedran Miletić 2017-03-22 15:41:46 UTC
Resolving per comment 56. Darkbasic, if you find that this is still an issue, please reopen.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.