Summary: | [CIK] [regression] All opencl apps hangs indefinitely in si_create_context | ||
---|---|---|---|
Product: | Mesa | Reporter: | Vedran Miletić <vedran> |
Component: | Drivers/Gallium/radeonsi | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED FIXED | QA Contact: | Default DRI bug account <dri-devel> |
Severity: | critical | ||
Priority: | medium | CC: | galkin-vv, mpiazza, steffen.klee |
Version: | 18.3 | ||
Hardware: | All | ||
OS: | All | ||
See Also: | https://bugs.freedesktop.org/show_bug.cgi?id=110045 | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Bug Depends on: | |||
Bug Blocks: | 99553 | ||
Attachments: |
clinfo output with mesa-opencl-icd downgraded to 18.2.8
Stacktrace from debian's 18.3.0 build |
Description
Vedran Miletić
2018-11-27 12:48:26 UTC
Confirmed both on: 00:01.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Kaveri [Radeon R5 Graphics] [1002:1315] 01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT GL [FirePro W9100] [1002:67a0] Faced same problem on CIK gpu: clinfo hanging at start since 18.3.0. Stack trace is the same - the sys_futex never returns. The issue reproduces every time. Most important - it affects ALL applications using opencl I tried (clinfo, fresh manual build of https://github.com/ihaque/memtestCL and closed-source Geeks3D GpuTest). They all hang at initialization with similar stack trace. I'm renaming the bug to indicate that all apps are affected. GPU is 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tobago PRO [Radeon R7 360 / R9 360 OEM] (rev 81) (prog-if 00 [VGA controller]) ("Tobago" is less common variant of "Bonaire" gpus). Reproduced on two different motherboards (however, both are PCIe 2.0/1.1 - so no PCIe3.0 atomics if it is related). Changing kernels in 4.17-4.20 range doesn't matter. For example vanilla 4.20.0 with ubuntu config - 4.20.0-042000-generic can be used for issue reproduaction. Distros also doesn't matter I tried debian and vanilla mesa build on archlinux Kernel parameters are: BOOT_IMAGE=/boot/vmlinuz-4.20.0-042000-generic root=UUID=7917286f-3223-4003-8d58-a2bff30a7730 ro quiet intel_iommu=on amdgpu.si_support=1 amdgpu.cik_support=1 amdgpu.dc=1 acpi_enforce_resources=off radeon.si_support=0 radeon.cik_support=0 radeon.modeset=1 nouveau.modeset=1 zswap.enabled=1 zswap.zpool=zsmalloc zswap.compressor=lz4hc zswap.max_pool_percent=42 (actually intel_iommu is DISABLED in bios, so I don't think it is related) Unlike opencl, both vulkan and opengl works completely fine. Donwgrading mesa-opencl-icd to 18.2.8 fixes the problem. This deb package downgrades only /usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_nouveau.so /usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_r300.so /usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_r600.so /usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_radeonsi.so /usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_swrast.so /usr/lib/x86_64-linux-gnu/gallium-pipe/pipe_vmwgfx.so /usr/lib/x86_64-linux-gnu/libMesaOpenCL.so.1.0.0 other mesa libs are kept from 18.3.0 Created attachment 142960 [details]
clinfo output with mesa-opencl-icd downgraded to 18.2.8
Created attachment 142961 [details]
Stacktrace from debian's 18.3.0 build
The dmesg is completely clean - no any errors there, and system working fine. Even hanged clinfo can be interrupted by Ctrl+C.
(In reply to Vasily Galkin from comment #2) > GPU is > > 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] > Tobago PRO [Radeon R7 360 / R9 360 OEM] (rev 81) (prog-if 00 [VGA > controller]) > > ("Tobago" is less common variant of "Bonaire" gpus). > > Reproduced on two different motherboards (however, both are PCIe 2.0/1.1 - > so no PCIe3.0 atomics if it is related). Changing kernels in 4.17-4.20 > range doesn't matter. For example vanilla 4.20.0 with ubuntu config - > 4.20.0-042000-generic can be used for issue reproduaction. Distros also > doesn't matter I tried debian and vanilla mesa build on archlinux > > Kernel parameters are: BOOT_IMAGE=/boot/vmlinuz-4.20.0-042000-generic > root=UUID=7917286f-3223-4003-8d58-a2bff30a7730 ro quiet intel_iommu=on > amdgpu.si_support=1 amdgpu.cik_support=1 amdgpu.dc=1 > acpi_enforce_resources=off radeon.si_support=0 radeon.cik_support=0 > radeon.modeset=1 nouveau.modeset=1 zswap.enabled=1 zswap.zpool=zsmalloc > zswap.compressor=lz4hc zswap.max_pool_percent=42 > Have you tried using the radeon module? *** Bug 108572 has been marked as a duplicate of this bug. *** Has this patch affected the status? : https://lists.freedesktop.org/archives/mesa-dev/2019-February/215057.html I applied the two patches: https://lists.freedesktop.org/archives/mesa-dev/2019-February/215057.html https://lists.freedesktop.org/archives/mesa-dev/2019-February/215058.html but problem persist on my card: AMD KABINI (DRM 3.27.0, 4.20.8-bfq-zstd+, LLVM 8.0.0) AMD Radeon HD 8500M Series (HAINAN, DRM 3.27.0, 4.20.8-bfq-zstd+, LLVM 8.0.0) The result is the same, clinfo freezes with the same stack trace AMD R9 390 (Linux 4.14, LLVM 8.0.0, AMDGPU kernel driver, Mesa 19.0.1) Also experiencing hangs when running clinfo and other OpenCL software. Applying mentioned patches results in segfaults when starting graphical applications as well as OpenCL software. However, when just applying the workaround in duplicate bug 108572, comment 6, clinfo and other OpenCL software start working again. (In reply to Steffen Klee from comment #9) > AMD R9 390 (Linux 4.14, LLVM 8.0.0, AMDGPU kernel driver, Mesa 19.0.1) > > Also experiencing hangs when running clinfo and other OpenCL software. > Applying mentioned patches results in segfaults when starting graphical > applications as well as OpenCL software. > > However, when just applying the workaround in duplicate bug 108572, comment > 6, clinfo and other OpenCL software start working again. Thanks for the update. Can you try running piglit cl-api-enqueue-copy-buffer after applying the workaround? It might be just an early initialization issue rather than a problem with compute shader clears in general. cl-api-enqueue-copy-buffer passes when using the workaround. Should be fixed by b58e5fb6f317be771326f98d498483e45942beaf Closing. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.