Bug 100262

Summary: libswrAVX2.so Causes hang with QOpenGLWidget
Product: Mesa Reporter: chris
Component: Drivers/Gallium/swrAssignee: mesa-dev
Status: RESOLVED FIXED QA Contact: mesa-dev
Severity: normal    
Priority: medium CC: chris
Version: 13.0   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: Minimal Qt Project sample widget showing issue

Description chris 2017-03-17 19:29:13 UTC
Created attachment 130297 [details]
Minimal Qt Project sample widget showing issue

Following is some Qt example code straight from their repository that runs properly using Gallium llvmpipe driver, and Gallium swr driver on AVX enabled machines (libswrAVX.so).

Running the same demo on AVX2 enabled machines (libswrAVX2.so) causes the machine to hang at 100% CPU usage and never recovers. The library detects AVX2, however GL calls in the paintEvent appear to be hanging the machine (IE no GL calls, no issue)


Please see attached for the QOpenGlWidget example (Qt v5)
Comment 1 Tim Rowley 2017-03-30 18:50:54 UTC
Not seeing a hang here, but a transparent area where the OpenGL should be rendered to.

what version of Mesa and Qt were you using?

If you don't mind running an experiment, could you try setting KNOB_MAX_WORKER_THREADS=4 in the environment before running hellogl2?  This will stop swr from binding threads which might possibly be confusing Qt's internal threading.
Comment 2 chris 2017-03-30 20:11:08 UTC
(In reply to Tim Rowley from comment #1)
> Not seeing a hang here, but a transparent area where the OpenGL should be
> rendered to.
> 
> what version of Mesa and Qt were you using?
> 
> If you don't mind running an experiment, could you try setting
> KNOB_MAX_WORKER_THREADS=4 in the environment before running hellogl2?  This
> will stop swr from binding threads which might possibly be confusing Qt's
> internal threading.

Hi Tim!

I tested on Qt 4.8.6 and 5.6.1, both cause a hang for me using the SWR driver on Mesa 13.0.1 and 13.0.5

Setting the MAX_KNOB_WORKER_THREADS=4 seems to correct the issue!

Can you tell me if this is a permanent fix? Also does this need to change with the number of CPUs available? I have field deployments experiencing these issues.

Thanks!
Comment 3 Tim Rowley 2017-03-30 21:43:55 UTC
Ok, looks like I need to check and make sure there hasn't been a regression for this since Mesa 13.  Are you using the dri version of the driver, or libgl-x11?

Interesting that setting the variable fixed the problem for you.  We've seen similar problems before with TBB (thread building blocks), where if we bound threads inside swr, their threading code would think no cpus were available for its use.  Previously the workaround we've suggested is to initialize the threading library before creating an OpenGL context.  If that's possible in Qt that would the cleanest way forward, though I could see that potentially being hard to do since it renders the UI with OpenGL as well.

If used, for maximum performance MAX_KNOB_WORKER_THREADS should be the number of cores minus one (we have an API thread that feeds the workers).
Comment 4 chris 2017-03-30 22:26:33 UTC
(In reply to Tim Rowley from comment #3)
> Ok, looks like I need to check and make sure there hasn't been a regression
> for this since Mesa 13.  Are you using the dri version of the driver, or
> libgl-x11?
> 
> Interesting that setting the variable fixed the problem for you.  We've seen
> similar problems before with TBB (thread building blocks), where if we bound
> threads inside swr, their threading code would think no cpus were available
> for its use.  Previously the workaround we've suggested is to initialize the
> threading library before creating an OpenGL context.  If that's possible in
> Qt that would the cleanest way forward, though I could see that potentially
> being hard to do since it renders the UI with OpenGL as well.
> 
> If used, for maximum performance MAX_KNOB_WORKER_THREADS should be the
> number of cores minus one (we have an API thread that feeds the workers).

I am using the libgl-x11 version (--disable-dri option).

Thanks for the info, I will try and putz around with the threading library and see if it helps.
Comment 5 Bruce Cherniak 2017-06-30 16:07:42 UTC
Hi Chris,

Just checking back.  Have you had a chance to 'putz around'?  Is this still an issue?  Depending on your application, and the interaction of the 2 threading libraries, it may be necessary to set the KNOB, as Tim described, in the production environment.
Comment 6 Bruce Cherniak 2017-11-29 01:30:01 UTC
In a separate email with Chris, he indicates that the KNOB_MAX_WORKER_THREADS has resolved this issue.  Closing as resolved/fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.