Bug 108572 - Could not start gimp (probably due to opencl)
Summary: Could not start gimp (probably due to opencl)
Status: RESOLVED DUPLICATE of bug 108879
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 99553
  Show dependency treegraph
 
Reported: 2018-10-26 22:14 UTC by Marco
Modified: 2019-02-18 01:48 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
clinfo output (10.89 KB, text/x-log)
2018-11-28 22:36 UTC, Marco
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marco 2018-10-26 22:14:46 UTC
Probably a duplicate of https://bugs.freedesktop.org/show_bug.cgi?id=74973 (but that was an old bug)

Latest git broke opencl on radeonsi.
I could not start gimp (with opencl enabled) and even clinfo gives a segmetation fault.

Bisected to:
first bad commit: [4fd8d2df9c65396319619fa0784378600fc834d0] radeonsi: move emission of PA_SU_VTX_CNTL into emit_guardband


This is my hw (glxinfo)
OpenGL vendor string: X.Org
OpenGL renderer string: AMD KABINI (DRM 3.26.0, 4.18.16-bfq-zstd+, LLVM 7.0.0)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 18.3.0-devel (git-0e0dc596a2)
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 4.5 (Compatibility Profile) Mesa 18.3.0-devel (git-0e0dc596a2)
OpenGL shading language version string: 4.50
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 18.3.0-devel (git-0e0dc596a2)
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20


Marco
Comment 1 Marco 2018-11-28 22:34:31 UTC
Taking inspiration from:
https://bugs.freedesktop.org/show_bug.cgi?id=108879
I tried to comment the call to si_clear_buffer and magically clinfo start to work again.

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c b/src/gallium/drivers/radeonsi/si_pipe.c
index c487ef4..04678fb 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -620,10 +620,10 @@ static struct pipe_context *si_create_context(struct pipe_screen *screen,
 
        if (sctx->chip_class == CIK) {
                /* Clear the NULL constant buffer, because loads should return zeros. */
-               uint32_t clear_value = 0;
-               si_clear_buffer(sctx, sctx->null_const_buf.buffer, 0,
-                               sctx->null_const_buf.buffer->width0,
-                               &clear_value, 4, SI_COHERENCY_SHADER);
+               uint32_t clear_value = 0xCCCCCCCC;
+               //si_clear_buffer(sctx, sctx->null_const_buf.buffer, 0,
+               //              sctx->null_const_buf.buffer->width0,
+               //              &clear_value, 4, SI_COHERENCY_SHADER);
        }
        return &sctx->b;
 fail:


This is my hardware:
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Root Complex
00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Kabini [Radeon HD 8330]
00:01.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Kabini HDMI/DP Audio
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Function 0
00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Functions 5:1
00:02.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Functions 5:1
00:02.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Functions 5:1
00:10.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller (rev 01)
00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller (rev 39)
00:12.2 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller (rev 39)
00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller (rev 39)
00:13.2 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller (rev 39)
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 3a)
00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD] FCH Azalia Controller (rev 02)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 11)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Function 5
01:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Sun PRO [Radeon HD 8570A/8570M]
02:00.0 Network controller: Qualcomm Atheros QCA9565 / AR9565 Wireless Network Adapter (rev 01)
03:00.0 Ethernet controller: Qualcomm Atheros QCA8172 Fast Ethernet (rev 10)
Comment 2 Marco 2018-11-28 22:36:15 UTC
Created attachment 142654 [details]
clinfo output

Thi is the output of clinfo after applying the patch (commenting si_clear_buffer)
Comment 3 Timothy Arceri 2018-11-29 00:56:43 UTC
If you build radeonsi with debug sysmbols i.e. use --enable-debug you should be able to run gimp in gdb and see exactly were the segfault is. Could be as simple as a NULL check missing somewhere.
Comment 4 Marco 2018-11-29 11:04:48 UTC
Gimp does not segfault, it just freeze:

Stacktrace is this:
#0  0x00007fb1f9872a79 in syscall () at /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fb1ccbfcb5c in  () at /usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_radeonsi.so
#2  0x00007fb1ccc37271 in  () at /usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_radeonsi.so
#3  0x00007fb1ccc39176 in  () at /usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_radeonsi.so
#4  0x00007fb1ccc392ff in  () at /usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_radeonsi.so
#5  0x00007fb1ccc490eb in  () at /usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_radeonsi.so
#6  0x00007fb1ccc49961 in  () at /usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_radeonsi.so
#7  0x00007fb1ccc0fc81 in amdgpu_winsys_create () at /usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_radeonsi.so
#8  0x00007fb1ccadcc07 in  () at /usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_radeonsi.so
#9  0x00007fb1d1afb3f1 in  () at /usr/lib/x86_64-linux-gnu/libMesaOpenCL.so.1
#10 0x00007fb1d1b2245c in  () at /usr/lib/x86_64-linux-gnu/libMesaOpenCL.so.1
#11 0x00007fb1d1b2e058 in  () at /usr/lib/x86_64-linux-gnu/libMesaOpenCL.so.1
#12 0x00007fb1d1afaa76 in  () at /usr/lib/x86_64-linux-gnu/libMesaOpenCL.so.1
#13 0x00007fb1fbe1c0ca in  () at /lib64/ld-linux-x86-64.so.2
#14 0x00007fb1fbe1c1d6 in  () at /lib64/ld-linux-x86-64.so.2
#15 0x00007fb1fbe20253 in  () at /lib64/ld-linux-x86-64.so.2
#16 0x00007fb1f98b3adf in _dl_catch_exception () at /lib/x86_64-linux-gnu/libc.so.6
#17 0x00007fb1fbe1fb1a in  () at /lib64/ld-linux-x86-64.so.2
#18 0x00007fb1f9961276 in  () at /lib/x86_64-linux-gnu/libdl.so.2
#19 0x00007fb1f98b3adf in _dl_catch_exception () at /lib/x86_64-linux-gnu/libc.so.6
#20 0x00007fb1f98b3b6f in _dl_catch_error () at /lib/x86_64-linux-gnu/libc.so.6
#21 0x00007fb1f9961975 in  () at /lib/x86_64-linux-gnu/libdl.so.2
#22 0x00007fb1f9961331 in dlopen () at /lib/x86_64-linux-gnu/libdl.so.2
#23 0x00007fb1d2ff69af in  () at /usr/lib/x86_64-linux-gnu/libOpenCL.so.1
#24 0x00007fb1d2ff76ab in clGetPlatformIDs () at /usr/lib/x86_64-linux-gnu/libOpenCL.so.1
#25 0x00007fb1fa821784 in  () at /usr/lib/x86_64-linux-gnu/libgegl-0.4.so.0
#26 0x00007fb1fa822563 in  () at /usr/lib/x86_64-linux-gnu/libgegl-0.4.so.0
#27 0x00007fb1fa7c3b8d in  () at /usr/lib/x86_64-linux-gnu/libgegl-0.4.so.0
#28 0x00007fb1f9c26b6d in g_closure_invoke () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#29 0x00007fb1f9c398f3 in  () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#30 0x00007fb1f9c42882 in g_signal_emit_valist () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#31 0x00007fb1f9c42ecf in g_signal_emit () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#32 0x00007fb1f9c2b1d4 in  () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#33 0x00007fb1f9c2ab0e in  () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0

Also clinfo get stuck with an analogous trace.
Comment 5 Marco 2019-02-11 22:34:10 UTC
The same freeze occurs even on mesa master (19.1.0) from today.


This is the stack trace (with a Ctrl+C on clinfo frozen)

#0  0x00007ffff7eb4289 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff17ae0ac in sys_futex (val3=-1, addr2=0x0, timeout=0x0, val1=2, op=9, addr1=0x5555558ddf88) at ../src/util/futex.h:38
#2  futex_wait (timeout=0x0, value=2, addr=0x5555558ddf88) at ../src/util/futex.h:50
#3  do_futex_fence_wait (fence=fence@entry=0x5555558ddf88, timeout=timeout@entry=false, abs_timeout=abs_timeout@entry=0) at ../src/util/u_queue.c:115
#4  0x00007ffff17ae6b9 in _util_queue_fence_wait (fence=fence@entry=0x5555558ddf88) at ../src/util/u_queue.c:130
#5  0x00007ffff17e9981 in util_queue_fence_wait (fence=0x5555558ddf88) at ../src/util/u_queue.h:161
#6  si_bind_compute_state (ctx=0x55555581cbf0, state=0x5555558ddf70) at ../src/gallium/drivers/radeonsi/si_compute.c:277
#7  0x00007ffff17ebd47 in si_compute_do_clear_or_copy (sctx=sctx@entry=0x55555581cbf0, dst=dst@entry=0x55555586a920, dst_offset=dst_offset@entry=0, 
    src=src@entry=0x0, src_offset=src_offset@entry=0, size=size@entry=16, clear_value=0x7fffffffd780, clear_value_size=4, coher=SI_COHERENCY_SHADER)
    at ../src/gallium/drivers/radeonsi/si_compute_blit.c:161
#8  0x00007ffff17ec355 in si_clear_buffer (sctx=sctx@entry=0x55555581cbf0, dst=0x55555586a920, offset=offset@entry=0, size=16, 
    clear_value=clear_value@entry=0x7fffffffd780, clear_value_size=clear_value_size@entry=4, coher=SI_COHERENCY_SHADER)
    at ../src/gallium/drivers/radeonsi/si_compute_blit.c:248
#9  0x00007ffff17b19e3 in si_create_context (screen=screen@entry=0x5555555d03a0, flags=flags@entry=0)
    at ../src/gallium/drivers/radeonsi/si_pipe.c:619
#10 0x00007ffff17b21f9 in radeonsi_screen_create (ws=<optimized out>, config=<optimized out>) at ../src/gallium/drivers/radeonsi/si_pipe.c:1124
#11 0x00007ffff1832f1c in amdgpu_winsys_create (fd=fd@entry=4, config=config@entry=0x7fffffffd930, 
    screen_create=0x7ffff17b1bb0 <radeonsi_screen_create>) at ../src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c:413
#12 0x00007ffff17178d7 in create_screen (fd=4, config=0x7fffffffd930) at ../src/gallium/targets/pipe-loader/pipe_radeonsi.c:15
#13 0x00007ffff6c19611 in pipe_loader_create_screen (dev=0x5555555bf260) at ../src/gallium/auxiliary/pipe-loader/pipe_loader.c:137
#14 0x00007ffff6c073bc in clover::device::device (this=0x5555555c5a50, platform=..., ldev=<optimized out>)
    at ../src/gallium/state_trackers/clover/core/device.cpp:47
#15 0x00007ffff6c13140 in create<clover::device, clover::platform&, pipe_loader_device*&> ()
    at ../src/gallium/state_trackers/clover/util/pointer.hpp:230
#16 clover::platform::platform (this=0x7ffff7db94c0 <(anonymous namespace)::_clover_platform>)
    at ../src/gallium/state_trackers/clover/core/platform.cpp:36
#17 0x00007ffff6be3666 in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535)
    at ../src/gallium/state_trackers/clover/api/platform.cpp:31
#18 _GLOBAL__sub_I_platform.cpp(void) () at ../src/gallium/state_trackers/clover/api/platform.cpp:141
#19 0x00007ffff7fe430a in ?? () from /lib64/ld-linux-x86-64.so.2
#20 0x00007ffff7fe4406 in ?? () from /lib64/ld-linux-x86-64.so.2
#21 0x00007ffff7fe8263 in ?? () from /lib64/ld-linux-x86-64.so.2
#22 0x00007ffff7ef505f in _dl_catch_exception () from /lib/x86_64-linux-gnu/libc.so.6
#23 0x00007ffff7fe7b4a in ?? () from /lib64/ld-linux-x86-64.so.2
---Type <return> to continue, or q <return> to quit---
#24 0x00007ffff7f82256 in ?? () from /lib/x86_64-linux-gnu/libdl.so.2
#25 0x00007ffff7ef505f in _dl_catch_exception () from /lib/x86_64-linux-gnu/libc.so.6
#26 0x00007ffff7ef50ef in _dl_catch_error () from /lib/x86_64-linux-gnu/libc.so.6
#27 0x00007ffff7f82975 in ?? () from /lib/x86_64-linux-gnu/libdl.so.2
#28 0x00007ffff7f822e6 in dlopen () from /lib/x86_64-linux-gnu/libdl.so.2
#29 0x00007ffff7f8b7e0 in ?? () from /usr/lib/x86_64-linux-gnu/libOpenCL.so.1
#30 0x00007ffff7f8c4e3 in clGetPlatformIDs () from /usr/lib/x86_64-linux-gnu/libOpenCL.so.1
#31 0x000055555555a478 in ?? ()
#32 0x00007ffff7de409b in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#33 0x000055555555a7ba in ?? ()
Comment 6 Marco 2019-02-11 23:21:13 UTC
As previously said, commenting si_clear_buffer makes clinfo and opencl work again (with mesa master from git)


ndex c6f93e7..1523919 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -616,9 +616,9 @@ static struct pipe_context *si_create_context(struct pipe_screen *screen,
        if (sctx->chip_class == CIK) {
                /* Clear the NULL constant buffer, because loads should return zeros. */
                uint32_t clear_value = 0;
-               si_clear_buffer(sctx, sctx->null_const_buf.buffer, 0,
+               /*si_clear_buffer(sctx, sctx->null_const_buf.buffer, 0,
                                sctx->null_const_buf.buffer->width0,
-                               &clear_value, 4, SI_COHERENCY_SHADER);
+                               &clear_value, 4, SI_COHERENCY_SHADER);*/
        }
        return &sctx->b;
 fail:
Comment 7 Jan Vesely 2019-02-18 01:48:03 UTC
This looks like a duplicate of 108879. I'm going to close this one as 108879 is more generic.
KABINI si afaik CIK class GPU. Unfortunately I can't investigate as I don't have any CIK hw.

*** This bug has been marked as a duplicate of bug 108879 ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.