Bug 97102 - [dri][swr] stack overflow / infinite loop with GALLIUM_DRIVER=swr
Summary: [dri][swr] stack overflow / infinite loop with GALLIUM_DRIVER=swr
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/swr (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: mesa-dev
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-07-28 08:01 UTC by Jan Ziak
Modified: 2017-01-24 02:04 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
gl.c (1.50 KB, text/x-csrc)
2016-07-28 08:01 UTC, Jan Ziak
Details
gdb.log (9.18 KB, text/x-log)
2016-07-28 08:01 UTC, Jan Ziak
Details
/proc/cpuinfo (4.84 KB, text/plain)
2017-01-06 16:50 UTC, Jan Ziak
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Ziak 2016-07-28 08:01:13 UTC
Created attachment 125360 [details]
gl.c

$ gcc -o gl gl.c -lGL -lglfw
$ LIBGL_ALWAYS_SOFTWARE=1 GALLIUM_DRIVER=swr ./gl
Comment 1 Jan Ziak 2016-07-28 08:01:54 UTC
Created attachment 125361 [details]
gdb.log
Comment 2 Bruce Cherniak 2017-01-05 20:18:15 UTC
Going back and addressing some older bugs; can't reproduce this on mesa-master.  Looking at the attached gdb.log, I'm not certain this was an issue within OpenSWR.
Comment 3 Jan Ziak 2017-01-06 09:38:15 UTC
I am unable to confirm whether this bug has been resolved.

$ LIBGL_ALWAYS_SOFTWARE=1 GALLIUM_DRIVER=swr
SWR detected AVX
Segmentation fault

$ gdb
(gdb) bt
#0  CreateThreadPool (pContext=pContext@entry=0x63a340, pPool=pPool@entry=0x63a510) at /var/tmp/portage/media-libs/mesa-9999/work/mesa-9999/src/gallium/drivers/swr/rasterizer/core/threads.cpp:840                                                   
#1  0x00007ffff584c1ce in SwrCreateContext (pCreateInfo=pCreateInfo@entry=0x7fffffffcf60) at /var/tmp/portage/media-libs/mesa-9999/work/mesa-9999/src/gallium/drivers/swr/rasterizer/core/api.cpp:109                                                 
#2  0x00007ffff5835d5b in swr_create_context (p_screen=0x7715f0, priv=0x0, flags=<optimized out>) at /var/tmp/portage/media-libs/mesa-9999/work/mesa-9999/src/gallium/drivers/swr/swr_context.cpp:466                                                 
#3  0x00007ffff73454ce in st_api_create_context (stapi=<optimized out>, smapi=0x757f60, attribs=0x7fffffffd110, error=0x7fffffffd10c, shared_stctxi=0x0)                                                                                              
    at /var/tmp/portage/media-libs/mesa-9999/work/mesa-9999/src/mesa/state_tracker/st_manager.c:662                                                                                                                                                   
#4  0x00007ffff74b34ba in dri_create_context (api=<optimized out>, visual=0x77c480, cPriv=0x63a2d0, major_version=<optimized out>, minor_version=<optimized out>, flags=<optimized out>, notify_reset=false, error=0x7fffffffd2dc,                    
    sharedContextPrivate=0x0) at /var/tmp/portage/media-libs/mesa-9999/work/mesa-9999/src/gallium/state_trackers/dri/dri_context.c:123                                                                                                                
#5  0x00007ffff74b298f in driCreateContextAttribs (screen=0x617ae0, api=<optimized out>, config=0x77c480, shared=<optimized out>, num_attribs=<optimized out>, attribs=<optimized out>, error=0x7fffffffd2dc, data=0x63a130)                          
    at /var/tmp/portage/media-libs/mesa-9999/work/mesa-9999/src/mesa/drivers/dri/common/dri_util.c:448                                                                                                                                                
#6  0x000000366d041baf in drisw_create_context_attribs (base=0x630b10, config_base=0x784240, shareList=<optimized out>, num_attribs=<optimized out>, attribs=<optimized out>, error=0x7fffffffd2dc)                                                   
    at /var/tmp/portage/media-libs/mesa-9999/work/mesa-9999/src/glx/drisw_glx.c:476                                                                                                                                                                   
#7  0x000000366d01abc0 in glXCreateContextAttribsARB (dpy=0x603070, config=0x784240, share_context=0x0, direct=1, attrib_list=0x7fffffffd330) at /var/tmp/portage/media-libs/mesa-9999/work/mesa-9999/src/glx/create_context.c:78                     
#8  0x00007ffff7d96e55 in _glfwCreateContextGLX () from /usr/lib64/libglfw.so.3                                                                                                                                                                       
#9  0x00007ffff7d9394d in _glfwPlatformCreateWindow () from /usr/lib64/libglfw.so.3                                                                                                                                                                   
#10 0x00007ffff7d8dd3d in glfwCreateWindow () from /usr/lib64/libglfw.so.3                                                                                                                                                                            
#11 0x0000000000400bf0 in main ()
Comment 4 Jan Ziak 2017-01-06 09:39:00 UTC
(In reply to Jan Ziak from comment #3)
> $ LIBGL_ALWAYS_SOFTWARE=1 GALLIUM_DRIVER=swr

$ LIBGL_ALWAYS_SOFTWARE=1 GALLIUM_DRIVER=swr ./gl
Comment 5 Bruce Cherniak 2017-01-06 14:18:58 UTC
This is quite a different bt than you had attached originally.  From this, something is definitely going on in SWR.  I'll continue to take a look.

Lately, I've run quite a bit of stuff through OpenSWR with DRI drivers.  Although this isn't our primary mode (most of our customers simply use the standalone gallium libGL.so.1.5.0), we definitely support it.
Comment 6 Bruce Cherniak 2017-01-06 14:50:09 UTC
What are your configure and run options?  I have not installing these as the default drivers, rather using a test sandbox <test-drivers>.

I am simply configuring using:
--prefix=<test-drivers>
--with-dri-drivers=swrast
--with-gallium-drivers=swrast,swr

And then to run, I have set:
LD_LIBRARY_PATH prepends <test-drivers>/lib
LIBGL_DRIVERS_PATH=<test-drivers>/lib/dri
LIBGL_ALWAYS_SOFTWARE=1
GALLIUM_DRIVER=swr
Comment 7 Bruce Cherniak 2017-01-06 14:59:18 UTC
I also see you are on an AVX capable processor.  It would be helpful to know which model.

The segfault you've referenced below is in a section of code that figures out processor topology.
Comment 8 Jan Ziak 2017-01-06 16:49:05 UTC
(In reply to Bruce Cherniak from comment #7)
> I also see you are on an AVX capable processor.  It would be helpful to know
> which model.

AMD A10-7850K

> The segfault you've referenced below is in a section of code that figures
> out processor topology.

The minimal "physical id" in /proc/cpuinfo on my machine is 1. It isn't 0.

In function CreateThreadPool():
(gdb) p nodes[0].cores
$2 = std::vector of length 0, capacity 0

Adding "numaId--" to function CalculateProcessorTopology() fixes the segmentation fault.

$ zgrep NUMA /proc/config.gz 
# CONFIG_NUMA is not set
Comment 9 Jan Ziak 2017-01-06 16:50:25 UTC
Created attachment 128795 [details]
/proc/cpuinfo
Comment 10 Tim Rowley 2017-01-06 17:26:36 UTC
I think the right fix for CalculateProcessorTopology is to prune empty nodes at the end:

    for (auto it = out_nodes.begin(); it != out_nodes.end(); ) {
        if ((*it).cores.size() == 0)
            it = out_nodes.erase(it);
        else
            ++it;
    }

However, the rest of the topology logic with that cpuinfo comes to the conclusion that's there's only two cores, and so will only generate two threads with one being dedicated to the API.  We'll need to adjust that logic as well.
Comment 11 Bruce Cherniak 2017-01-12 21:13:03 UTC
As Tim suggests, pruning empty nodes is probably the best solution for the crash.

For performance, however, I'm not sure how many cores to expose in your case.  cpuinfo shows that there are 4 threads across 2 cores, which we detect as 2 cores, with 2 hyperthreads.  Due to the way OpenSWR loads the processor, we have found that not using the hyperthreads as OpenSWR workers yields the best performance.  This may or may not be the case with your processor.

Something you can try is to set the environment variable KNOB_MAX_THREADS_PER_CORE=0.  This will allow OpenSWR to use all 4 threads.

Please report back on how this affects performance.
Comment 12 Jan Ziak 2017-01-16 10:07:35 UTC
(In reply to Bruce Cherniak from comment #11)
> As Tim suggests, pruning empty nodes is probably the best solution for the
> crash.
> 
> For performance, however, I'm not sure how many cores to expose in your
> case.  cpuinfo shows that there are 4 threads across 2 cores, which we
> detect as 2 cores, with 2 hyperthreads.  Due to the way OpenSWR loads the
> processor, we have found that not using the hyperthreads as OpenSWR workers
> yields the best performance.  This may or may not be the case with your
> processor.
> 
> Something you can try is to set the environment variable
> KNOB_MAX_THREADS_PER_CORE=0.  This will allow OpenSWR to use all 4 threads.
> 
> Please report back on how this affects performance.

An AMD dual core x86 module is in terms of performance close to two separate x86 cores:

- Kaveri/Steamroller module: 1 instruction fetch unit, 2 instruction decoders, 2 integer cores, 1 AVX core, 1 L1i cache, 2 L1d caches

- Two separate cores: 2 instruction {fetch,decode} units, {integer,AVX} cores, 2 L1{i,d} caches

In my experience, the statement that x86 module is close to 2 separate cores is generally true. Many programs (gcc (make -j4), ...) scale close to what they scale on two separate x86 cores.

----

# export LIBGL_ALWAYS_SOFTWARE=1
# export GALLIUM_DRIVER=swr
# glxgears
350.080 FPS

# KNOB_MAX_THREADS_PER_CORE=0 glxgears
615.980 FPS

----

Unigine Sanctuary 1.6.3 1024x768_windowed:

Default: 0.166578 FPS
KNOB_MAX_THREADS_PER_CORE=0: 0.440662 FPS
Comment 13 Bruce Cherniak 2017-01-24 02:04:17 UTC
I pushed the change that corrects the crash you were seeing.  This is the same as Tim's suggestion to prune empty nodes.

I don't know all the other topologies well enough to suggest a change that heuristically determines the optimal number of threads.  So, I would recommend that, for now, you continue to use KNOB_MAX_THREADS_PER_CORE=0 if that gives you better performance.

In any result, this bug is resolved and applications can run with OpenSWR under DRI.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.