Summary: | Any OpenCL application causes "*ERROR* ring gfx timeout" on Vega 64 | ||
---|---|---|---|
Product: | Mesa | Reporter: | Alexander Mezin <mezin.alexander> |
Component: | Drivers/Gallium/radeonsi | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED MOVED | QA Contact: | Default DRI bug account <dri-devel> |
Severity: | normal | ||
Priority: | medium | ||
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: | kernel log |
And the same thing with 4.19.40. And with manually built drm-next. Also, not only libreoffice, but literally any application that uses OpenCL triggers the same problem. It seems that I just can't use OpenCL at all. Other examples: Luxmark, Geekbench (`./geekbench4 --compute`) More likely a bug in the mesa OpenCL code. If you want functional OpenCL, you should use the ROCm OpenCL packages. (In reply to Alex Deucher from comment #3) > More likely a bug in the mesa OpenCL code. If you want functional OpenCL, > you should use the ROCm OpenCL packages. I thought that buggy userspace shouldn't cause a complete GPU lockup like this... The kernel log says "GPU reset succeeded", but it leaves GNOME session completely unusable (hung, artifacts on screen), restarting GDM also fails, the only way to recover is rebooting (sometimes only through Sysrq) And also I get a very similar lockup (the same messages in kernel log, again with all kernel/firmware versions I tried) in a few Steam games running on Proton/DXVK. Will try different mesa versions And BTW with kernel 4.19.40 and latest git firmware (https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=2579167548be33afb1fe2a9a5c141561ee5a8bbe) monitors switch off on boot as soon as amdgpu driver loads and never turn on again Can you post the output of 'clinfo'? GPU hangs in clover are usually signs of old LLVM, or old mesa (that does not catch function calls). Do you use ocl-icd? if yes can you confirm if the games hang when running with OCL_ICD_VENDORS=/var/empty/ ? (alternatively, you can just move libMesaOpenCL.* out of library path) (In reply to Alex Deucher from comment #3) > More likely a bug in the mesa OpenCL code. If you want functional OpenCL, > you should use the ROCm OpenCL packages. I doubt that. clover uses the same LLVM code generation paths. also note: "the same problem with multiple games", I doubt those use OpenCL. the above steps should confirm that. My guess is that compute shaders are busted (irrespective of the API). GPU reset has never worked correctly on any AMD GPU that I've ever used. (In reply to Jan Vesely from comment #6) > Can you post the output of 'clinfo'? Sure Number of platforms 1 Platform Name Clover Platform Vendor Mesa Platform Version OpenCL 1.1 Mesa 19.0.4 Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd Platform Extensions function suffix MESA Platform Name Clover Number of devices 1 Device Name Radeon RX Vega (VEGA10, DRM 3.30.0, 5.1.0-arch1-1-ARCH, LLVM 8.0.0) Device Vendor AMD Device Vendor ID 0x1002 Device Version OpenCL 1.1 Mesa 19.0.4 Driver Version 19.0.4 Device OpenCL C Version OpenCL C 1.1 Device Type GPU Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Max compute units 64 Max clock frequency 1630MHz Max work item dimensions 3 Max work item sizes 256x256x256 Max work group size 256 Preferred work group size multiple 64 Preferred / native vector sizes char 16 / 16 short 8 / 8 int 4 / 4 long 2 / 2 half 8 / 8 (cl_khr_fp16) float 4 / 4 double 2 / 2 (cl_khr_fp64) Half-precision Floating-point support (cl_khr_fp16) Denormals No Infinity and NANs Yes Round to nearest Yes Round to zero No Round to infinity No IEEE754-2008 fused multiply-add No Support is emulated in software No Single-precision Floating-point support (core) Denormals No Infinity and NANs Yes Round to nearest Yes Round to zero No Round to infinity No IEEE754-2008 fused multiply-add No Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Address bits 64, Little-Endian Global memory size 8573157376 (7.984GiB) Error Correction support No Max memory allocation 6858525900 (6.387GiB) Unified memory for Host and Device No Minimum alignment for any data type 128 bytes Alignment of base address 32768 bits (4096 bytes) Global Memory cache type None Image support No Local memory type Local Local memory size 32768 (32KiB) Max number of constant args 16 Max constant buffer size 2147483647 (2GiB) Max size of kernel argument 1024 Queue properties Out-of-order execution No Profiling Yes Profiling timer resolution 0ns Execution capabilities Run OpenCL kernels Yes Run native kernels No Device Extensions cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_fp16 NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Clover clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [MESA] clCreateContext(NULL, ...) [default] Success [MESA] clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1) Platform Name Clover Device Name Radeon RX Vega (VEGA10, DRM 3.30.0, 5.1.0-arch1-1-ARCH, LLVM 8.0.0) clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name Clover Device Name Radeon RX Vega (VEGA10, DRM 3.30.0, 5.1.0-arch1-1-ARCH, LLVM 8.0.0) clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name Clover Device Name Radeon RX Vega (VEGA10, DRM 3.30.0, 5.1.0-arch1-1-ARCH, LLVM 8.0.0) ICD loader properties ICD loader Name OpenCL ICD Loader ICD loader Vendor OCL Icd free software ICD loader Version 2.2.12 ICD loader Profile OpenCL 2.2 (In reply to Jan Vesely from comment #6) > if yes can you confirm if the games hang when running with > OCL_ICD_VENDORS=/var/empty/ ? > (alternatively, you can just move libMesaOpenCL.* out of library path) No, setting OCL_ICD_VENDORS didn't change anything (though I'm not completely sure that Steam and then Proton don't discard environment variables somewhere). However, upgrading Mesa to 19.0.4 fixed game hangs. OpenCL issues are still here. Tried Mesa 19.1.0-rc3 Geekbench hangs, but there are no immediate errors in dmesg. It looks like gpu is doing something based on 'sensors' output (~130 W power consumption, at idle it is <20W). And power consumption doesn't go down even when I kill geekbench. When I try to reboot, the system hangs. (In reply to Alex Deucher from comment #3) > More likely a bug in the mesa OpenCL code. If you want functional OpenCL, > you should use the ROCm OpenCL packages. Do you mean "Mesa OpenCL is not supported/unmaintained"? I still can't get any OpenCL application to work (even "Hello World" examples). Mesa 18.3.4, 19.0.x - GPU hangs then resets. But judging by power consumption (hwmon, 70W - higher than usual idle power consumption) GPU continues to do something even after reset Mesa 19.1.x, git master - GPU doesn't hang but applications themselves hang on the first clFinish. Power consumption stays higher than typical idle power again. Building ROCm from source is a huge pain. (In reply to Alexander Mezin from comment #10) > (In reply to Alex Deucher from comment #3) > > More likely a bug in the mesa OpenCL code. If you want functional OpenCL, > > you should use the ROCm OpenCL packages. > > Do you mean "Mesa OpenCL is not supported/unmaintained"? Correct. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1402. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 144191 [details] kernel log Open LibreOffice, enable OpenCL in settings, restart it. Result: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=698, emitted seq=700 [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process soffice.bin pid 2517 thread soffice.bi:cs0 pid 2545 amdgpu 0000:67:00.0: GPU reset begin! amdgpu 0000:67:00.0: GPU BACO reset amdgpu: [powerplay] Failed message: 0x5, input parameter: 0x2000000, error code: 0xffffffff amdgpu 0000:67:00.0: GPU reset succeeded, trying to resume [drm] PCIE GART of 512M enabled (table at 0x000000F400900000). [drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* VRAM is lost! [drm] PSP is resuming... [drm] reserve 0x400000 from 0xf400d00000 for PSP TMR SIZE [drm] UVD and UVD ENC initialized successfully. [drm] VCE initialized successfully. [drm] recover vram bo from shadow start [drm] recover vram bo from shadow done [drm] Skip scheduling IBs! [drm] Skip scheduling IBs! amdgpu 0000:67:00.0: GPU reset(2) succeeded! [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! ... Also the same problem with multiple games, so probably not OpenCL-related, just the easiest way to trigger it. linux 5.1.arch1-1 (same results with 5.0.13, will also retest with 4.9) linux-firmware 20190502.92e17d0-1 (same results with 20190424.4b6cf2b-1) opencl-mesa 19.0.3-1 libdrm 2.4.98-1 libreoffice-fresh 6.2.3-2 GNOME on X.org with modesetting driver Sapphire Vega 64 Nitro+, no overclocking