Bug 54773 - Crashing caused by shader compiling in multiple threads.
Summary: Crashing caused by shader compiling in multiple threads.
Status: RESOLVED INVALID
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: 8.0
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-09-11 13:43 UTC by w_tresspassers
Modified: 2017-02-10 22:38 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description w_tresspassers 2012-09-11 13:43:37 UTC

    
Comment 1 w_tresspassers 2012-09-11 13:52:56 UTC
Sorry.
I submitted the report with empty description by mistake.
Though I found a few similar bugs had been reported,
I hope this report may add some pieces of information for debugging.

-------------------

Mesa version: 8.0.3: (reproduiable also with 8.0.4)
OS: Cent OS 6.3
System Hardware: Intel i7 Ivy Bridge with HD4000 GPU (/usr/lib64/i965_dri.so is used)

Symptom:
--------
My proprietary application crashes with segmentation fault 2 times out of 10.
A typical stack trace is attached below, but sometimes it crashes at different locations.

My application runs the following 2 threads and the problem seems to be caused
by that multi-threaded rendering, though I believe OpenGL and Mesa are designed
to be "thread-safe".

  Thread 1 (main):
    A Qt program for GUI, driven by the GUI event loop.
    Thread 1 receives rendered FBO from Thread-2 and renders them on the screen.

  Thread 2 (sub) :
    Video image processing codes with multiple shader programs, driven by video frame rate.

Other observed facts:
--------------------
Another single-threaded version of the application never crashes.

The same 2 threaded application never crashes when compiled with NVIDIA's OpenGL library
and executed on a NVIDIA card.

The same 2 threaded application never crashes when compiled with Mesa 7.11 and
executed on Intel i7 Sandy Bridge.

Reproduction code:
------------------
Sorry, I can't provide a simple reproduction code.

My investigation on the issue:
-------------------------------
The segmentation fault occurs because the value of "instructions->tail_pred" given to
"_mesa_ast_to_hir()" (and to its lower functions ) is invalid (0 or other value).
Oddly, the value of "exec_list *instruction" given to "_mesa_ast_to_hir()"
is different fromthe value of the original "shader->ir"
in its caller "_mesa_glsl_compile_shader()", program/ir_to_mesa.cpp:line 3342.

While the data structure pointed by "instructions" looks corrupted,
the original data structure pointed by "shader->ir" looks healthy.
(See the example below)

>>>>>

(gdb) down
#6  0x00007fffe7c16610 in _mesa_ast_to_hir (instructions=0x7fffcda33450, 
    state=0x7fffcda16870) at ast_to_hir.cpp:63
63         _mesa_glsl_initialize_variables(instructions, state);
(gdb) print instructions
$2 = (exec_list *) 0x7fffcda33450
(gdb) print *instructions
$3 = {head = 0x43676172465f6c67, tail = 0x64726f6f, tail_pred = 0x0}
(gdb) up
#7  0x00007fffe7bf39bd in _mesa_glsl_compile_shader (ctx=0x7fffd255b040, 
    shader=0x3cb8810) at program/ir_to_mesa.cpp:3342
3342          _mesa_ast_to_hir(shader->ir, state);
(gdb) print shader->ir
$4 = (exec_list *) 0x3c89a50
(gdb) print *(shader->ir)
$5 = {head = 0x3c89ae8, tail = 0x0, tail_pred = 0x3cc06f8}

<<<<<<
 

The observed facts above make me suspect that compilation of shader programs is not "thread-safe".

A shader program is compiled "on-the-fly" when it is executed for the first time
in a "run" of an application (right ?).

So, I forced "thread-2" to sleep for a while before it starts rendering
so that "thread-1" finishes its first cycle of rendering and
all of its shader codes are compiled.

This workaround seems to be effective in my experiment (no crashing in 1500 trials),
but it is not a desirable solution.

I investigated the source codes of Mesa 3.0.8 and found that there are some
static and global variables without protecting mutex (as far as I can see).
Don't they make Masa vulnerable to threading problems?

In glsl/glsl_type.cpp, at line 611:
hash_table *glsl_type::array_types = NULL;
hash_table *glsl_type::record_types = NULL;
void *glsl_type::mem_ctx = NULL;

In glsl/ralloc.c, at line 284:
static void *autofree_context = NULL;


------------ stack backtrace begin

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd1030700 (LWP 15994)]
0x00007fffe79f65f2 in exec_list::push_tail (this=0x7fffcda33450, 
    n=0x7fffcda19888) at ../../../../../src/glsl/list.h:368
368           n->prev->next = n;
(gdb) where
#0  0x00007fffe79f65f2 in exec_list::push_tail (this=0x7fffcda33450, 
    n=0x7fffcda19888) at ../../../../../src/glsl/list.h:368
#1  0x00007fffe7c20bc0 in add_variable (instructions=0x7fffcda33450, 
    symtab=0x7fffcda19c40, name=0x7fffe7d47bd3 "gl_FragCoord", 
    type=0x7fffe8065418, mode=ir_var_in, slot=0) at builtin_variables.cpp:415
#2  0x00007fffe7c20e7e in add_builtin_variable (instructions=0x7fffcda33450, 
    symtab=0x7fffcda19c40, proto=0x7fffe804f200) at builtin_variables.cpp:484
#3  0x00007fffe7c21c72 in generate_110_fs_variables (
    instructions=0x7fffcda33450, state=0x7fffcda16870)
    at builtin_variables.cpp:797
#4  0x00007fffe7c22116 in initialize_fs_variables (
    instructions=0x7fffcda33450, state=0x7fffcda16870)
    at builtin_variables.cpp:968
#5  0x00007fffe7c221ef in _mesa_glsl_initialize_variables (
    instructions=0x7fffcda33450, state=0x7fffcda16870)
    at builtin_variables.cpp:998
#6  0x00007fffe7c16610 in _mesa_ast_to_hir (instructions=0x7fffcda33450, 
    state=0x7fffcda16870) at ast_to_hir.cpp:63
#7  0x00007fffe7bf39bd in _mesa_glsl_compile_shader (ctx=0x7fffd255b040, 
    shader=0x3cb8810) at program/ir_to_mesa.cpp:3342
#8  0x00007fffe7abfbba in compile_shader (ctx=0x7fffd255b040, shaderObj=3)
    at main/shaderapi.c:734
#9  0x00007fffe7ac032e in _mesa_CompileShaderARB (shaderObj=3)
    at main/shaderapi.c:1019
#10 0x00007fffe7bde284 in meta_glsl_clear_init (ctx=0x7fffd255b040, 
    clear=0x3ad7538) at drivers/common/meta.c:1773
#11 0x00007fffe7bde5e5 in _mesa_meta_glsl_Clear (ctx=0x7fffd255b040, 
    buffers=256) at drivers/common/meta.c:1844
#12 0x00007fffe7980bc4 in intelClear (ctx=0x7fffd255b040, mask=256)
    at intel_clear.c:192
#13 0x00007fffe7c76070 in _mesa_Clear (mask=16384) at main/clear.c:242
#14 0x0000000000426ffd in RenderTarget::setFBOAsRenderTarget (
    this=0x7fffcc116210, frameIndex=1) at RenderTarget.cpp:82
#15 0x00000000004246b2 in PGMOutput::renderContents (this=0x7fffcc123870, 
    srcTexIndex=0, tgtFBOIndex=1) at PGMOutput.cpp:107
#16 0x0000000000424674 in PGMOutput::render (this=0x7fffcc123870, 
    srcTexIndex=0, tgtFBOIndex=1) at PGMOutput.cpp:98
#17 0x000000000042a467 in GPURenderer::GPURenderer::render (this=0x3c71970)
    at GPURenderer.cpp:345
#18 0x0000000000417d88 in GPURenderer::GPURendererCtrl::render (this=0x3c713c0)
    at GPURendererCtrl.cpp:212
#19 0x000000000041785d in GPURenderer::GPURendererCtrl::operator() (
    this=0x3c713c0) at GPURendererCtrl.cpp:66
#20 0x0000000000415850 in GPURenderer::GPURendererThreadHolder::operator() (
    this=0x3c71c50) at GPURendererThreadHolder.h:24
#21 0x00000000004171c8 in boost::detail::thread_data<GPURenderer::GPURendererThreadHolder>::run (this=0x3c71b20)
    at /usr/include/boost/thread/detail/thread.hpp:56
#22 0x00007ffff62c8d97 in thread_proxy ()
   from /usr/lib64/libboost_thread-mt.so.5
#23 0x00007ffff64d9851 in start_thread () from /lib64/libpthread.so.0
#24 0x00007ffff55e011d in clone () from /lib64/libc.so.6

------------- end
Comment 2 w_tresspassers 2012-09-11 14:47:56 UTC
I need to add some additional lines to my previous comment.

> Though I found a few similar bugs had been reported,
> I hope this report may add some pieces of information for debugging.

The following reports were listed as "possible duplicates",
but I think NONE of them duplicates this report though 47236 might be related.

NEW 47236, 48058
RESOLVED 21873, 25172, 29737, 35603

> The same 2 threaded application never crashes
> when compiled with NVIDIA's OpenGL library and executed on a NVIDIA card.

I meant "NVIDIA's proprietary OpenGL library and driver" here, not Nouveau.
Comment 3 Ian Romanick 2012-10-19 22:59:29 UTC
Tell me a bit more about your application.

What are you calling concurrently from multiple threads?  glCompileShader?  glLinkProgram?  Drawing?

Does your application have multiple contexts?  Are these contexts in the same share group?
Comment 4 w_tresspassers 2012-10-22 14:55:36 UTC
(In reply to comment #3)
Hello.
Thank you for your attention to this issue.

> What are you calling concurrently from multiple threads?
> glCompileShader?  glLinkProgram?  Drawing?

Drawing.

I don't call glCompileShader or glLinkProgram.
I understand that a shader code is compiled at the first drawing
with the shader code. Correct?

It was the reason why this problem was hard to reproduce.
I hope I can work around the problem by compiling all shader codes
with glCompileShader() in a mutex-ed safe section before actual drawing starts.
Thank you for telling me the option.
(Currently, I avoid the problem by having one thread sleep for a while
until the other thread finishes its first drawing loop.)

> Does your application have multiple contexts?  Are these contexts in the
> same share group?

Yes.
My application has multiple contexts in the same share group.

The followings are what I do.

Thread A: main thread, a Qt4.6 application with GUI.
Thread B: a video rendering thread that feeds texture image to the GUI.

(0) Thread A has its own drawing context created by the Qt framework.
(1) Thread A creates another context in the same share group
    with the following codes.

const QGLContext parentContext;
parentContext = QGLContext::currentContext();
QGLContext  renderContext;
renderContext =
   new QGLContext(parentcontext->format(),                                     
            parentContext->device());
renderContext->create(parentContext);

(2) Thread A creates Thread B and passes "renderContext" to Thread B.
(3) Thread B calls renderContext->makeCurrent().
(4) Thread B starts rendering with the new context.
(3') Thread A also goes into its GUI event loop with the original context.

If I'm doing anything wrong, please let me know.

I found this problem with Mesa 8.0.3, but I haven't tested Mesa 9.0 yet.

Thank you.
Comment 5 Matt Turner 2016-11-02 06:51:31 UTC
Our shader-db program compiles shaders from multiple threads at once and I have not seen such and issue. Please test a new version of Mesa and mark as REOPENED
if you can reproduce and RESOLVED/* if you cannot reproduce.
Comment 6 Annie 2017-02-10 22:38:44 UTC
Dear Reporter,

This Mesa bug has been in the "NEEDINFO" status for over 60 days. I am closing this bug based on lack of response but feel free to reopen if resolution is still needed. Please ensure you're supplying the correct information as requested.

Thank you.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.