Bug 99488

Summary: [r600g]ImageMagick issues in Gaussian Blur kernel
Product: Mesa Reporter: nixscripter
Component: Drivers/Gallium/r600Assignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium    
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 99553    
Attachments: assert on infinite loop
Fix-ALU-clause-markers-use-detection
numRegs Assert Extra Debugger Info

Description nixscripter 2017-01-22 07:23:32 UTC
(Bug number three. I can almost taste victory...)

If I compile ImageMagick with OpenCL support, the current version will hang when I invoke the Gaussian blur operation. More specifically, the CPU will run at 100%, and memory consumed by the process will increase indefinitely at 19 MB per second, but nothing will happen.

The blur operation is done internally for many other operations, so this issue is more crippling than it first appears.

To reproduce:

1. Get the current tree of ImageMagick from GitHub: https://github.com/ImageMagick/ImageMagick

2. Compile it with OpenCL and HDRI support (--enable-opencl --enable-hdri flags).

3. Get an image, and try to perform a simple Gaussian blur:

convert input.png -blur 0x20 output.png

4. The hang will occur... at least, on my system.

I mentioned this in another bug, and was told that on different hardware than mine (I have a Radeon HD 5700, which is an Evergreen chipset) all of the ImageMagick self-tests pass.

The source code for all OpenCL kernels is in MagickCore/accelerate-kernels-private.h. Several functions are involved with blurring, and unfortunately it's difficult to identify the source of the hang. I would be happy to provide debugging information if I could be advised on how to collect it.

LLVM version: r292714
Mesa version: commit bb5db5564

ImageMagick version: commit 43a09ba75 (Note: the next git revision was broken by upstream)
Comment 1 nixscripter 2017-02-03 02:02:41 UTC
I'm still trying some versions in order to help you guys pin this down (it's not always easy to tell what reinstall is having what effect, since Arch Linux has three packages involved). In the mean time, I did the basics on the process in its hung state.

It's currently running three threads, two blocked, one continuing to run:

(gdb) info threads 
  Id   Target Id         Frame 
* 1    Thread 0x39ac9cdf7c0 (LWP 3806) "display" 0x0000039abefef921 in llvm::MachineInstr::findRegisterDefOperandIdx(unsigned int, bool, bool, llvm::TargetRegisterInfo const*) const () from /usr/lib/libLLVM-5.0svn.so
  2    Thread 0x39abd04f700 (LWP 3809) "radeon_cs:0" 0x0000039ac6b0310f in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
  3    Thread 0x39abadd4700 (LWP 3814) "display" futex_wait (val=8, 
    addr=0x25349d4)
    at /build/gcc-multilib/src/gcc/libgomp/config/linux/x86/futex.h:44
(gdb)


What is that call to findRegisterDefOperandIdx doing? It's not entirely clear, but it's sucking up a lot of memory. Running strace confirms that: 

strace: Process 3806 attached with 3 threads
strace: [ Process PID=3806 runs in x32 mode. ]
[pid  3809] futex(0x2599e64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  3814] futex(0x25349d4, FUTEX_WAIT_PRIVATE, 8, NULL <unfinished ...>
[pid  3806] mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x640f4000
strace: [ Process PID=3806 runs in 64 bit mode. ]
[pid  3806] mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x39a638f3000
[pid  3806] mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x39a630f2000
[pid  3806] mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x39a628f1000
[...]

And down the address space it goes, 0x1000 bytes (4k) a time or two per second.

Looking at the function name, I'm thinking about what Jan said on another bug:

> the hang is probably a separate bug. ImageMagick test suite results on my Turks GPU are:
> # TOTAL: 86
> # PASS:  78
> # SKIP:  0
> # XFAIL: 0
> # FAIL:  3
> # XPASS: 0
> # ERROR: 5
>
> the errors and failures are accompanied by:
> Assertion `i < getNumRegs() && "Register number out of range!"' failed.

Could this be perhaps the same registers that were out of range on a different card?

Either way, I will continue to investigate, and hope to narrow down the issue soon.
Comment 2 nixscripter 2017-02-03 02:48:39 UTC
Correction on the steps to reproduce:

1. Get the current tree of ImageMagick from GitHub: https://github.com/ImageMagick/ImageMagick

2. Compile it with OpenCL and HDRI support (--enable-opencl --enable-hdri flags).

3. Create a JPEG image, and try to perform a simple Gaussian blur on it:

convert rose: input.jpg && convert input.jpg -blur 0x20 output.jpg

The blur does work for other image formats, strangely enough. Perhaps there is a difference in data types?
Comment 3 Jan Vesely 2017-02-03 03:39:04 UTC
(In reply to nixscripter from comment #1)
> I'm still trying some versions in order to help you guys pin this down (it's
> not always easy to tell what reinstall is having what effect, since Arch
> Linux has three packages involved). In the mean time, I did the basics on
> the process in its hung state.
> 
> It's currently running three threads, two blocked, one continuing to run:
> 
> (gdb) info threads 
>   Id   Target Id         Frame 
> * 1    Thread 0x39ac9cdf7c0 (LWP 3806) "display" 0x0000039abefef921 in
> llvm::MachineInstr::findRegisterDefOperandIdx(unsigned int, bool, bool,
> llvm::TargetRegisterInfo const*) const () from /usr/lib/libLLVM-5.0svn.so

can you get backtrace of this thread?
does it ever leave this function? you can check by adding breakpoint on that function and checking if it gets hit.
this can be repeated going up the stack to find the function that won't exit.

>   2    Thread 0x39abd04f700 (LWP 3809) "radeon_cs:0" 0x0000039ac6b0310f in
> pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
>   3    Thread 0x39abadd4700 (LWP 3814) "display" futex_wait (val=8, 
>     addr=0x25349d4)
>     at /build/gcc-multilib/src/gcc/libgomp/config/linux/x86/futex.h:44
> (gdb)
> 
> 
> What is that call to findRegisterDefOperandIdx doing?

there's a loop, it can't be infinite, but if the num of operands is corrupted, it can take a very long time to finish. can you check "p e" in gdb?

> It's not entirely
> clear, but it's sucking up a lot of memory. Running strace confirms that: 
> 
> strace: Process 3806 attached with 3 threads
> strace: [ Process PID=3806 runs in x32 mode. ]
> [pid  3809] futex(0x2599e64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
> [pid  3814] futex(0x25349d4, FUTEX_WAIT_PRIVATE, 8, NULL <unfinished ...>
> [pid  3806] mmap(NULL, 8392704, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x640f4000
> strace: [ Process PID=3806 runs in 64 bit mode. ]
> [pid  3806] mmap(NULL, 8392704, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x39a638f3000
> [pid  3806] mmap(NULL, 8392704, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x39a630f2000
> [pid  3806] mmap(NULL, 8392704, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x39a628f1000
> [...]
> 
> And down the address space it goes, 0x1000 bytes (4k) a time or two per
> second.

the above mmaps show 8M (+4K, probably for bookkeeping) allocations. is there any other, not shown? I haven't found anything in the mentioned function that would need such big amount of memory, the hand if probably higher in the call stack.

> 
> Looking at the function name, I'm thinking about what Jan said on another
> bug:
> 
> > the hang is probably a separate bug. ImageMagick test suite results on my Turks GPU are:
> > # TOTAL: 86
> > # PASS:  78
> > # SKIP:  0
> > # XFAIL: 0
> > # FAIL:  3
> > # XPASS: 0
> > # ERROR: 5
> >
> > the errors and failures are accompanied by:
> > Assertion `i < getNumRegs() && "Register number out of range!"' failed.
> 
> Could this be perhaps the same registers that were out of range on a
> different card?

all cards of one class have the same number of architecturally available registers.
I see you have debug symbols, is that a debug build? if not, it can be that the assert is not hit, and the hang is just fallout.

> 
> Either way, I will continue to investigate, and hope to narrow down the
> issue soon.

thanks.
Comment 4 Michel Dänzer 2017-02-03 09:01:33 UTC
(In reply to Jan Vesely from comment #3)
> > (gdb) info threads 
> >   Id   Target Id         Frame 
> > * 1    Thread 0x39ac9cdf7c0 (LWP 3806) "display" 0x0000039abefef921 in
> > llvm::MachineInstr::findRegisterDefOperandIdx(unsigned int, bool, bool,
> > llvm::TargetRegisterInfo const*) const () from /usr/lib/libLLVM-5.0svn.so
> 
> can you get backtrace of this thread?
> does it ever leave this function? you can check by adding breakpoint on that
> function and checking if it gets hit.
> this can be repeated going up the stack to find the function that won't exit.

FWIW, for this purpose I usually just use "finish" repeatedly until it doesn't terminate.
Comment 5 Jan Vesely 2017-02-04 22:47:34 UTC
Created attachment 129340 [details] [review]
assert on infinite loop

this patch adds an assert for possible infinite loop in emit clause markers.
Comment 6 nixscripter 2017-02-05 16:37:04 UTC
Replicating the issue again, here is the backtrace you requested:

(gdb) bt
#0  0x000003b11dea95d4 in llvm::MachineInstr::findRegisterUseOperandIdx(unsigned int, bool, llvm::TargetRegisterInfo const*) const ()
   from /usr/lib/libLLVM-5.0svn.so
#1  0x000003b11ed8c96e in (anonymous namespace)::R600EmitClauseMarkers::MakeALUClause(llvm::MachineBasicBlock&, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>) () from /usr/lib/libLLVM-5.0svn.so
#2  0x000003b11ed8d7fe in (anonymous namespace)::R600EmitClauseMarkers::runOnMachineFunction(llvm::MachineFunction&) () from /usr/lib/libLLVM-5.0svn.so
#3  0x000003b11dea3bf1 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) () from /usr/lib/libLLVM-5.0svn.so
#4  0x000003b11dd19342 in llvm::FPPassManager::runOnFunction(llvm::Function&)
    () from /usr/lib/libLLVM-5.0svn.so
#5  0x000003b11dd193e3 in llvm::FPPassManager::runOnModule(llvm::Module&) ()
   from /usr/lib/libLLVM-5.0svn.so
#6  0x000003b11dd19d94 in llvm::legacy::PassManagerImpl::run(llvm::Module&) ()
   from /usr/lib/libLLVM-5.0svn.so
#7  0x000003b120ede12c in ?? () from /usr/lib/libMesaOpenCL.so.1
#8  0x000003b120ede790 in ?? () from /usr/lib/libMesaOpenCL.so.1
#9  0x000003b120eda9ad in ?? () from /usr/lib/libMesaOpenCL.so.1
#10 0x000003b120ecb9f9 in ?? () from /usr/lib/libMesaOpenCL.so.1
#11 0x000003b120ea98dc in ?? () from /usr/lib/libMesaOpenCL.so.1
#12 0x000003b1220f450b in clBuildProgram () from /usr/lib/libOpenCL.so
#13 0x000003b127156afe in CompileOpenCLKernels ()
   from /usr/lib/libMagickCore-6.Q16HDRI.so.4
#14 0x000003b12715728d in InitOpenCLEnvInternal ()
   from /usr/lib/libMagickCore-6.Q16HDRI.so.4
#15 0x000003b1271573f1 in AcceleratePerfEvaluator ()
   from /usr/lib/libMagickCore-6.Q16HDRI.so.4
#16 0x000003b127157e5a in autoSelectDevice ()
   from /usr/lib/libMagickCore-6.Q16HDRI.so.4
#17 0x000003b127158944 in InitOpenCLEnv ()
   from /usr/lib/libMagickCore-6.Q16HDRI.so.4
#18 0x000003b127051400 in checkOpenCLEnvironment ()
   from /usr/lib/libMagickCore-6.Q16HDRI.so.4
#19 0x000003b127054a2d in AccelerateBlurImage ()
   from /usr/lib/libMagickCore-6.Q16HDRI.so.4
#20 0x000003b1270f0c13 in BlurImageChannel ()
   from /usr/lib/libMagickCore-6.Q16HDRI.so.4
#21 0x000003b126d9efd0 in MogrifyImage ()
   from /usr/lib/libMagickWand-6.Q16HDRI.so.4
#22 0x000003b126da6200 in MogrifyImages ()
   from /usr/lib/libMagickWand-6.Q16HDRI.so.4
#23 0x000003b126d2bb86 in ConvertImageCommand ()
   from /usr/lib/libMagickWand-6.Q16HDRI.so.4
#24 0x000003b126d9b3ee in MagickCommandGenesis ()
   from /usr/lib/libMagickWand-6.Q16HDRI.so.4
#25 0x00000000004007c7 in main ()

As you can see, it's a different function at the top of the stack this time. That's beacuse, as you suspected, it is not the culprit. I was just lucky finding findRegisterDefOperandIdx at the top of the stack when I tested this before several times.

Per your suggestion, I did "finish" repeatedly until I could find the danging function:

gdb) finish
Run till exit from #0  0x000003b11dea95d4 in llvm::MachineInstr::findRegisterUseOperandIdx(unsigned int, bool, llvm::TargetRegisterInfo const*) const ()
   from /usr/lib/libLLVM-5.0svn.so
0x000003b11ed8c96e in (anonymous namespace)::R600EmitClauseMarkers::MakeALUClause(llvm::MachineBasicBlock&, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>) () from /usr/lib/libLLVM-5.0svn.so
(gdb) finish
Run till exit from #0  0x000003b11ed8c96e in (anonymous namespace)::R600EmitClauseMarkers::MakeALUClause(llvm::MachineBasicBlock&, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>) () from /usr/lib/libLLVM-5.0svn.so
0x000003b11ed8d7fe in (anonymous namespace)::R600EmitClauseMarkers::runOnMachineFunction(llvm::MachineFunction&) () from /usr/lib/libLLVM-5.0svn.so
(gdb) finish
Run till exit from #0  0x000003b11ed8d7fe in (anonymous namespace)::R600EmitClauseMarkers::runOnMachineFunction(llvm::MachineFunction&) ()
   from /usr/lib/libLLVM-5.0svn.so
[ ... hangs... ]

So it seems your patch is on the right track.

In addition, I was making a release build. because that was the default in the recipe file. I am building a debug build as I write this, including your attached patch.
Comment 7 Jan Vesely 2017-02-05 21:57:04 UTC
Thanks for the info. I think it's the same bug that hangs GROMACS kernel (my patch was originally written to debug GROMACS).
you can change the assert to if and use "I->dump()" to print the triggering instruction.
If it's "MOVA_INT_eg" then it's the same bug that hangs GROMACS.
Comment 8 Jan Vesely 2017-02-21 23:20:15 UTC
Created attachment 129815 [details] [review]
Fix-ALU-clause-markers-use-detection

This patch fixes gromacs build for me. I tested blur and it now results in "Register number out of range!" failure.
Comment 9 nixscripter 2017-02-24 03:09:14 UTC
Thanks for your continued work on this.

I've been fighting with Arch Linux packaging for a week of quiet frustration. It's also really slow to try and fix it, because the debug build seems to be 10x the size of the release version (7 GB total instead of 600 MB).

I will spend some more time on it in the coming days, and let you know how your current patch goes when I am able.
Comment 10 nixscripter 2017-02-26 04:46:06 UTC
I have finally gotten a build done, and the patch does indeed fix the hang. I get the same assertion.

Once that patch lands, I will verify that SVN build number, and close this bug.

(And then, I'll be opening another bug for the assertion, once I've gathering more information about it. There is an even simpler case in the ImageMagick self-test suite that I'm trying to figure out how to run in a debuggable manner.)
Comment 11 nixscripter 2017-02-26 04:47:44 UTC
Also, a link to the patch in the build system would be nice, so I know when that happens.
Comment 12 Jan Vesely 2017-02-26 06:38:00 UTC
The first patch is already in the mainline (https://reviews.llvm.org/D29792)
The second one is under review (https://reviews.llvm.org/D30230)

Feel free to leave this bug open until jpeg conversion works.
The kernel uses calls to sinpi/cospi functions which are rather register hungry atm (they fail even on their own).
Comment 13 nixscripter 2017-02-26 17:24:18 UTC
I have updated the bug's title to expand the scope in order to continue work on the assert.

With my debug build and that patch, I can now run the ImageMagick self-tests and easily replicate the assert. Here is the easiest way:

1. Before you configure ImageMagick set the CFLAGS environment variable to "-O -ggdb".
2. Build ImageMagick with "make".
3. Run "make check" to build all the self-tests (it's not done by default).
4. From the top level of the source tree, run "bash test/tests/validate-formats-memory.tap".

This is the output you will see:

[... snip ...]
  test 856: XWD/Undefined/TrueColor/12-bits... pass
  test 857: XWD/Undefined/TrueColor/16-bits... pass
lt-validate: /home/admin/Software/r600/llvm-svn/src/llvm/include/llvm/MC/MCRegisterInfo.h:64: unsigned int llvm::MCRegisterClass::getRegister(unsigned int) const: Assertion `i < getNumRegs() && "Register number out of range!"' failed.
  test 858: YUV/Undefined/TrueColor/8-bits

That last test is where the issue is.

Here is the backtrace (now that I have debug symbols for everything):

(gdb) bt
#0  0x00000381c351c04f in raise () from /usr/lib/libc.so.6
#1  0x00000381c351d47a in abort () from /usr/lib/libc.so.6
#2  0x00000381c3514ea7 in __assert_fail_base () from /usr/lib/libc.so.6
#3  0x00000381c3514f52 in __assert_fail () from /usr/lib/libc.so.6
#4  0x00000381ac85cd47 in llvm::MCRegisterClass::getRegister (
    i=<optimized out>, this=<optimized out>)
    at /home/admin/Software/r600/llvm-svn/src/llvm/include/llvm/MC/MCRegisterInfo.h:64
#5  llvm::TargetRegisterClass::getRegister (this=<optimized out>, 
    i=<optimized out>)
    at /home/admin/Software/r600/llvm-svn/src/llvm/include/llvm/Target/TargetRegisterInfo.h:81
#6  0x00000381adc07b0c in llvm::TargetRegisterClass::getRegister (
    this=<optimized out>, i=<optimized out>)
    at /home/admin/Software/r600/llvm-svn/src/llvm/include/llvm/MC/MCRegisterInfo.h:64
#7  llvm::R600InstrInfo::buildIndirectRead (this=this@entry=0x1a29020, 
    MBB=0x1aae9a0, I=I@entry=..., ValueReg=1611, Address=<optimized out>, 
    OffsetReg=OffsetReg@entry=1621, AddrChan=2)
    at /home/admin/Software/r600/llvm-svn/src/llvm/lib/Target/AMDGPU/R600InstrInfo.cpp:1156
#8  0x00000381adc07f6a in llvm::R600InstrInfo::expandPostRAPseudo (
    this=0x1a29020, MI=...)
    at /home/admin/Software/r600/llvm-svn/src/llvm/lib/Target/AMDGPU/R600InstrInfo.cpp:1070                                                                     
#9  0x00000381acb1bc7b in (anonymous namespace)::ExpandPostRA::runOnMachineFunction (this=0xcc6b10, MF=...)                                                     
    at /home/admin/Software/r600/llvm-svn/src/llvm/lib/CodeGen/ExpandPostRAPseudos.cpp:200                                                                      
#10 0x00000381acc03ea4 in llvm::MachineFunctionPass::runOnFunction (            
    this=0xcc6b10, F=...)                                                       
    at /home/admin/Software/r600/llvm-svn/src/llvm/lib/CodeGen/MachineFunctionPass.cpp:62                                                                       
#11 0x00000381aca54caf in llvm::FPPassManager::runOnFunction (this=0xbfd9d0,    
    F=...)                                                                      
    at /home/admin/Software/r600/llvm-svn/src/llvm/lib/IR/LegacyPassManager.cpp:1513                                                                            
#12 0x00000381aca54d5c in llvm::FPPassManager::runOnModule (this=0xbfd9d0,      
    M=...)                                                                      
    at /home/admin/Software/r600/llvm-svn/src/llvm/lib/IR/LegacyPassManager.cpp:1534                                                                            
#13 0x00000381aca55930 in (anonymous namespace)::MPPassManager::runOnModule (   
    M=..., this=<optimized out>)                                                
    at /home/admin/Software/r600/llvm-svn/src/llvm/lib/IR/LegacyPassManager.cpp:1590                                                                            
#14 llvm::legacy::PassManagerImpl::run (this=0x1b04d20, M=...)                  
    at /home/admin/Software/r600/llvm-svn/src/llvm/lib/IR/LegacyPassManager.cpp:1693
#15 0x00000381afb7e4ec in (anonymous namespace)::emit_code(llvm::Module&, clover::llvm::target const&, llvm::TargetMachine::CodeGenFileType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) ()
   from /usr/lib/libMesaOpenCL.so.1
#16 0x00000381afb7eb50 in clover::llvm::build_module_native(llvm::Module&, clover::llvm::target const&, clang::CompilerInstance const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) ()
   from /usr/lib/libMesaOpenCL.so.1
#17 0x00000381afb7ad5d in clover::llvm::link_program(std::vector<clover::module, std::allocator<clover::module> > const&, pipe_shader_ir, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) ()
   from /usr/lib/libMesaOpenCL.so.1
#18 0x00000381afb6b9a1 in clover::program::link(clover::ref_vector<clover::device> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, clover::ref_vector<clover::program> const&) ()
   from /usr/lib/libMesaOpenCL.so.1
#19 0x00000381afb494ec in clBuildProgram () from /usr/lib/libMesaOpenCL.so.1
#20 0x00000381b15bb50b in clBuildProgram () from /usr/lib/libOpenCL.so
#21 0x00000381c4213f12 in CompileOpenCLKernel (exception=0xac7490, 
    signature=<optimized out>, 
    options=0x3d8f0b54500 "-cl-single-precision-constant -cl-mad-enable -DMAGICKCORE_HDRI_SUPPORT=1 -DCLQuantum=float -DCLSignedQuantum=float -DCLPixelType=float4 -DQuantumRange=65535.000000f -DQuantumScale=0.000015 -DCharQuant"..., 
    kernel=<optimized out>, device=0xac5760) at MagickCore/opencl.c:1361
#22 HasOpenCLDevices (clEnv=clEnv@entry=0xac4d50, 
    exception=exception@entry=0xac7490) at MagickCore/opencl.c:2104
#23 0x00000381c4215327 in InitializeOpenCL (clEnv=clEnv@entry=0xac4d50, 
    exception=exception@entry=0xac7490) at MagickCore/opencl.c:2321
#24 0x00000381c411c19b in getOpenCLEnvironment (
    exception=exception@entry=0xac7490) at MagickCore/accelerate.c:223
#25 0x00000381c411d729 in AccelerateBlurImage (image=image@entry=0xb250d0, 
    radius=radius@entry=10, sigma=sigma@entry=3.5, 
    exception=exception@entry=0xac7490) at MagickCore/accelerate.c:773
#26 0x00000381c41afa48 in BlurImage (image=image@entry=0xb250d0, 
    radius=radius@entry=10, sigma=sigma@entry=3.5, 
    exception=exception@entry=0xac7490) at MagickCore/effect.c:789
#27 0x00000381c421392c in RunOpenCLBenchmark (is_cpu=is_cpu@entry=MagickFalse)
    at MagickCore/opencl.c:1048
#28 0x00000381c4215aa3 in RunDeviceBenckmark (device=0xac5760, 
    testEnv=0xac4d50, clEnv=0xa729d0) at MagickCore/opencl.c:1088
#29 BenchmarkOpenCLDevices (clEnv=0xa729d0) at MagickCore/opencl.c:1177
#30 AutoSelectOpenCLDevices (clEnv=0xa729d0) at MagickCore/opencl.c:975
#31 InitializeOpenCL (clEnv=clEnv@entry=0xa729d0, 
    exception=exception@entry=0x98b990) at MagickCore/opencl.c:2328
#32 0x00000381c411c19b in getOpenCLEnvironment (
    exception=exception@entry=0x98b990) at MagickCore/accelerate.c:223
#33 0x00000381c4124a45 in AccelerateResizeImage (image=image@entry=0xa72c50, 
    resizedColumns=resizedColumns@entry=70, resizedRows=resizedRows@entry=46, 
    resizeFilter=resizeFilter@entry=0xa71bf0, 
    exception=exception@entry=0x98b990) at MagickCore/accelerate.c:4430
#34 0x00000381c4274c9f in ResizeImage (image=image@entry=0xa72c50, 
    columns=columns@entry=70, rows=rows@entry=46, 
    filter=filter@entry=TriangleFilter, exception=exception@entry=0x98b990)
    at MagickCore/resize.c:2877
#35 0x00000381b17d11f4 in WriteYUVImage (image_info=0xa67840, image=0xa72c50, 
    exception=0x98b990) at coders/yuv.c:649
#36 0x00000381c416bcb2 in WriteImage (image_info=image_info@entry=0x98ca10, 
    image=image@entry=0xa72c50, exception=exception@entry=0x98b990)
    at MagickCore/constitute.c:1114
#37 0x00000000004021bb in ValidateImageFormatsInMemory (
    image_info=image_info@entry=0x98ca10, 
    reference_filename=reference_filename@entry=0x3d8f0b5abb0 "/tmp/magick-308288PpulFoRmPFl", 
    output_filename=output_filename@entry=0x3d8f0b5bbb0 "/tmp/magick-30828ayJzZh43jT2f", fail=fail@entry=0x3d8f0b5aba8, exception=exception@entry=0x98b990)
    at tests/validate.c:1623
#38 0x000000000040548a in main (argc=<optimized out>, argv=<optimized out>)
    at tests/validate.c:2629
Comment 14 nixscripter 2017-02-26 17:26:20 UTC
Created attachment 129933 [details]
numRegs Assert Extra Debugger Info

I also played around in the debugger a bit, which may or may not be helpful.
Comment 15 nixscripter 2017-02-26 17:36:51 UTC
Oh, and I almost forgot: if you attach GDB, hit the assert, and leave that bash script running, it will assume the test is hung after a couple minutes and kill it.

To prevent this, you'll have to suspend the bash process, and let GDB react to the signal its child got, and then you can debug in peace.
Comment 16 nixscripter 2017-05-07 00:39:58 UTC
Good news! I tried a new version:

LLVM r302002
Mesa commit 3bf3f9866c

And the unit tests on the ImageMagick master branch don't hang anymore!

A little more testing today, and I may be able to close this one.
Comment 17 nixscripter 2017-05-24 05:12:03 UTC
It took longer than I expected, but I'm calling it good. Thanks for all your work on this!

Marking RESOLVED FIXED.
Comment 18 Jan Vesely 2017-05-30 06:38:56 UTC
(In reply to nixscripter from comment #17)
> It took longer than I expected, but I'm calling it good. Thanks for all your
> work on this!
> 
> Marking RESOLVED FIXED.

not sure 'fixed' is the right word. I see 8 ERRORS/FAILS in ImageMagick test suite, each corresponding to an assertion failure.
Comment 19 nixscripter 2017-06-05 01:38:24 UTC
The tests, alas, are stupid. They hard-code a particular font that is a Microsoft font not available to my knowledge on Linux (Ariel vs Helvetica).

By downloading the MS Core Fonts bundle, installing those, and then changing the font in the test to use those, those errors magically disappear.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.