Bug 96488 - [r600g]OpenCL driver causes segfault in ImageMagick's Histogram kernel
Summary: [r600g]OpenCL driver causes segfault in ImageMagick's Histogram kernel
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/r600 (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-06-11 00:55 UTC by nixscripter
Modified: 2017-01-22 06:59 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Histogram kernel backtrace (2.55 KB, text/plain)
2016-06-11 00:55 UTC, nixscripter
Details

Note You need to log in before you can comment on or make changes to this bug.
Description nixscripter 2016-06-11 00:55:26 UTC
Created attachment 124469 [details]
Histogram kernel backtrace

If I compile ImageMagick with OpenCL support, the current version segfaults when I attempt to do any operation which computes a histogram. This is actually done frequently internally, so it's more crippling than it first appears.

To reproduce:

1. Get the current latest ImageMagick from GitHub: https://github.com/ImageMagick/ImageMagick

2. Compile it with OpenCL support (--enable-opencl flag)

3. Get an image, and try to equalize its colors (based on the histogram):

convert input.png -equalize output.png

4. The segfault will occur

LLVM version: r272184
Mesa version: commit d5491a8
ImageMagick version: ade0d8e

The source code for the Histogram kernel is in MagickCore/accelerate-kernels-private.h, starting on line 1283.
Comment 1 Jan Vesely 2016-06-11 03:26:20 UTC
That's because global atomics are not implemented in LLVM backend (despite driver advertising the extensions).
Shouldn't be too hard to them.
Comment 2 Vedran Miletić 2016-06-11 11:45:23 UTC
(In reply to jano.vesely from comment #1)
> That's because global atomics are not implemented in LLVM backend (despite
> driver advertising the extensions).
> Shouldn't be too hard to them.

Are you sure the hardware has the necessary instructions? Tom Stellard commented on #radeon back in November that implementing global atomics for r600g will be very very difficult[1].

[1] https://people.freedesktop.org/~cbrill/dri-log/index.php?date=2015-11-27&channel=radeon
Comment 3 nixscripter 2016-06-11 13:47:02 UTC
I suspected it was that, based on the word "Atomic" being at fault.

Unfortunately, I cannot assist, as I'm not an expert on parallel architecture and spending the last half hour reading AMD's documentation won't make me one. But I would ask one question: is anyone even working on this? I'd hate to write off ImageMagick on this platform.
Comment 4 Jan Vesely 2016-06-11 19:15:15 UTC
The instructions are there for EG+. They're tricky to implement because of the register requirements. However, there might be some weird interaction when using CB with the current stores, and loads from the same WI.

I had a version that passed noret variant of global atomic piglit tests, but I'm not sure piglit tests cover all situations.
Comment 5 Vedran Miletić 2016-06-12 12:41:53 UTC
Can you dig the patch somewhere? I have a Caicos at hand and I would really like to test atomic compare-and-swap.
Comment 6 Jan Vesely 2016-06-12 23:13:42 UTC
(In reply to Vedran Miletić from comment #5)
> Can you dig the patch somewhere? I have a Caicos at hand and I would really
> like to test atomic compare-and-swap.

you can try my github branch [0], you'll need the last 5 patches (2-3 can be skipped since it's just revert).
but I rebase a lot so things might brake.

It passes noret piglit tests, and _noret should work OK, _return variants will give LLVM ERROR: Cannot select.
uncommenting the FIXME parts enables return variants, but it will brake for more than one WI as every WI uses address 0 as scratchpad.

Jan

[0] https://github.com/jvesely/llvm
Comment 7 nixscripter 2016-12-23 05:45:34 UTC
Normally, I would be asking in this comment if there is any progress on this, but I can tell there has been... change, at least.

Now instead of crashing, it deadlocks the calling process waiting on a mutex. Forever, as far as I can tell.

LLVM version: r289770
Mesa version: commit 47351b843a

Perhaps this is more easily fixed? Or is this getting to the heart of the matter implied in comment 2?
Comment 8 Jan Vesely 2016-12-23 15:28:19 UTC
(In reply to nixscripter from comment #7)
> Normally, I would be asking in this comment if there is any progress on
> this, but I can tell there has been... change, at least.
> 
> Now instead of crashing, it deadlocks the calling process waiting on a
> mutex. Forever, as far as I can tell.
> 
> LLVM version: r289770
> Mesa version: commit 47351b843a
> 
> Perhaps this is more easily fixed? Or is this getting to the heart of the
> matter implied in comment 2?

the hang is probably a separate bug. ImageMagick test suite results on my Turks GPU are:
# TOTAL: 86
# PASS:  78
# SKIP:  0
# XFAIL: 0
# FAIL:  3
# XPASS: 0
# ERROR: 5

the errors and failures are accompanied by:
Assertion `i < getNumRegs() && "Register number out of range!"' failed.

The patches to add support for noret atomics (should be good enough for histogram) are at:
https://reviews.llvm.org/D27989
https://reviews.llvm.org/D28067
Comment 9 nixscripter 2016-12-29 03:25:52 UTC
I have downloaded LLVM r290690 which contains your first diff, manually applied your 2nd diff, and compiled it. The resulting library seems to resolve the issue.

Once that patch is reviewed and applied to the mainline, I will do one final verification and (hopefully) close this bug.

Thanks for all your efforts on this!
Comment 10 Jan Vesely 2017-01-16 21:36:09 UTC
(In reply to nixscripter from comment #9)
> I have downloaded LLVM r290690 which contains your first diff, manually
> applied your 2nd diff, and compiled it. The resulting library seems to
> resolve the issue.
> 
> Once that patch is reviewed and applied to the mainline, I will do one final
> verification and (hopefully) close this bug.
> 
> Thanks for all your efforts on this!

I have just landed the patch. Note that if the kernel uses sub-int(short,char) vectors. it will generate incorrect code until D27964 (or alternative) is merged.

Jan
Comment 11 nixscripter 2017-01-22 06:59:10 UTC
I have just tried LLVM r292714, and it does indeed work as advertised. Marking RESOLVED FIXED.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.