Bug 94040

Summary: clGetPlatformIDs causes futex race condition
Product: Mesa Reporter: bob
Component: OtherAssignee: mesa-dev
Status: RESOLVED WORKSFORME QA Contact: mesa-dev
Severity: normal    
Priority: medium CC: currojerez
Version: 11.0   
Hardware: Other   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 99553    
Attachments: Blender GDB backtrace
Complete Blender backtraces
Complete Blender backtraces with ocl-icd debuginfos

Description bob 2016-02-08 07:02:27 UTC
I originally opened this bug on the Redhat bugzilla[1], but I was instructed to take this upstream. Just so everyone's on the same page, I'm going to copy-paste the report from RH.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1273131
Comment 1 bob 2016-02-08 07:02:51 UTC
User-Agent:       Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:41.0) Gecko/20100101 Firefox/41.0
Build Identifier: 

Apologies if I've mischaracterised anything above, I'm going off of my uninformed analysis of the attached backtrace from GDB.

Attempting to use Blender in any meaningful way results in the UI hanging. strace clued me into the blocking futex call and gdb seems to point the finger at libOpenCL.so. 

Oddly enough, clinfo works as it should

Reproducible: Always

Steps to Reproduce:
1. Open Blender
2. Open system tab of preferences panel, switch to the Cycles renderer etc
Actual Results:  
UI freeze, application hang

Expected Results:  
Application doesn't hang

rpm -q fedora-release
fedora-release-23-0.17.noarch

rpm -q blender       
blender-2.75-4.fc23.x86_64

rpm -q ocl-icd
ocl-icd-2.2.7-2.git20150606.ebbc4c1.fc23.x86_64
Comment 2 bob 2016-02-08 07:03:35 UTC
Created attachment 121583 [details]
Blender GDB backtrace
Comment 3 bob 2016-02-08 07:04:52 UTC
Fabian Deutsch <fdeutsch@redhat.com>

The backtrace shows that clover is used.
Component: ocl-icd → mesa
Comment 4 bob 2016-02-08 07:05:07 UTC
I've tried with a variety of different configurations to check whether this bug might be setup-specific. I normally use two GPUs from mixed vendors, but I tried using a single GTX570, HD6870, HD6950, 9800GT and just the plain Intel HD4000 but still this bug persists
Comment 5 bob 2016-02-08 07:05:31 UTC
Fabian Deutsch <fdeutsch@redhat.com>

I strongly suspect that it's an issue if clover/mesa's opencl tracker, please file a bug in upstream or retry with the latest release from rawhide.
Status: NEW → CLOSED
Comment 6 Francisco Jerez 2016-02-08 20:42:46 UTC
Can you also provide backtraces for any concurrently running threads?  I suspect that reverting commit d5b1731178378b3d828c74368f6bfe85edc10618 may fix the deadlock, any chance you could try?

Thanks.
Comment 7 bob 2016-02-09 05:15:05 UTC
In the specific case of blender, there are 31 hung up threads with around 9 each waiting on

__lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
do_futex_wait () at ../sysdeps/unix/sysv/linux/futex-internal.h:205

The remaining threads are waiting on poll () at ../sysdeps/unix/syscall-template.S:84 with the last being the one I provided a backtrace for. Is there any one you're interested in particular? After sampling some of the backtraces they seem to be unrelated, either blender-specific or just mainloops for udev and pulseaudio waiting for events, but if please let me know if there's any backtraces you'd like
Comment 8 Francisco Jerez 2016-02-09 05:30:53 UTC
(In reply to bob from comment #7)
> In the specific case of blender, there are 31 hung up threads with around 9
> each waiting on
> 
> __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> pthread_cond_wait@@GLIBC_2.3.2 () at
> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> do_futex_wait () at ../sysdeps/unix/sysv/linux/futex-internal.h:205
> 
> The remaining threads are waiting on poll () at
> ../sysdeps/unix/syscall-template.S:84 with the last being the one I provided
> a backtrace for. Is there any one you're interested in particular? After
> sampling some of the backtraces they seem to be unrelated, either
> blender-specific or just mainloops for udev and pulseaudio waiting for
> events, but if please let me know if there's any backtraces you'd like

Is there any other hung thread showing mesa function calls in the stack trace?  That would be particularly interesting.  In any case the more backtraces you can provide the better :)
Comment 9 bob 2016-02-20 01:27:07 UTC
Created attachment 121851 [details]
Complete Blender backtraces
Comment 10 bob 2016-02-20 01:27:36 UTC
Sorry, I completely forgot you asked for those. Enjoy ;)
Comment 11 bob 2016-02-20 01:45:46 UTC
Created attachment 121852 [details]
Complete Blender backtraces with ocl-icd debuginfos
Comment 12 Francisco Jerez 2016-02-20 23:16:28 UTC
Thanks.  Could you check if this workaround from an earlier bug report helps?

https://bugs.freedesktop.org/attachment.cgi?id=117708
Comment 13 bob 2016-02-24 06:52:57 UTC
I wasn't successful in getting mesa to compile with `--enable-opencl`, but something odd has happened today: I had to reinstall Fedora because gdm exploded, and now no such lockup occurs.

➜  ~ rpm -q mesa-libOpenCL
mesa-libOpenCL-11.1.0-2.20151218.fc23.x86_64
Comment 14 Vedran Miletić 2017-03-22 16:28:24 UTC
Closing per comment #13.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.