Bug 92633

Summary: FBO creation/destruction issue in brw_meta_resolve_color causing assertion in gen8_emit_null_surface_state
Product: Mesa Reporter: Samuel Maroy <samuel.maroy>
Component: Mesa coreAssignee: mesa-dev
Status: RESOLVED MOVED QA Contact: mesa-dev
Severity: critical    
Priority: medium CC: ben, chadversary, idr
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: Trace of FBO creation and destruction in brw_meta_resolve_color (brw_meta_fast_clear.c)
patch to fix this issue

Description Samuel Maroy 2015-10-23 10:17:37 UTC
Created attachment 119137 [details]
Trace of FBO creation and destruction in brw_meta_resolve_color (brw_meta_fast_clear.c)

Mesa triggers an assertion in gen8_emit_null_surface_state.

I've traced the issue back to the brw_meta_resolve_color function. There seems to be a threading issue with the creation/destruction of the FBO when using multiple contexts.
I've patched brw_meta_resolve_color (I reused _mesa_error() to dump the info) to print out the generation and deletion of the FBO id's and I have seen the following after a certain amount of time (sometimes after 10 seconds, sometimes after a minute,...):

Mesa: User error: GL_FALSE in Generated FBO 2371, ctx: 0x7fff48003fe0, thread: 1933
Mesa: User error: GL_FALSE in Generated FBO 2371, ctx: 0x3765160, thread: 1927
Mesa: User error: GL_FALSE in Deleted FBO 2371, ctx: 0x3765160, thread: 1927

Right after this, the assertion triggers. The problem is probably that both contexts point to the same FBO and that one context has already deleted the FBO and that the other context still wants to do stuff with it.

I've included the full trace as an attachment.
I can reproduce this on both the git master and on the 11.0.3 release.

Should there be ways to workaround this issue from within our application, please let me know. 

Regards,
Samuel


Backtrace
---------
gen8_surface_state.c:370: gen8_emit_null_surface_state: Assertion `(fieldval & ~ (((1<<((13)-(0)+1))-1)<<(0))) == 0' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fff5e7fc700 (LWP 4256)]
0x00007fffeee02107 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56	../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007fffeee02107 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007fffeee034e8 in __GI_abort () at abort.c:89
#2  0x00007fffeedfb226 in __assert_fail_base (fmt=0x7fffeef31ce8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7fffd6edddc0 "(fieldval & ~ (((1<<((13)-(0)+1))-1)<<(0))) == 0", file=file@entry=0x7fffd6edf9af "gen8_surface_state.c", 
    line=line@entry=370, function=function@entry=0x7fffd6edfb70 <__PRETTY_FUNCTION__.45075> "gen8_emit_null_surface_state") at assert.c:92
#3  0x00007fffeedfb2d2 in __GI___assert_fail (assertion=assertion@entry=0x7fffd6edddc0 "(fieldval & ~ (((1<<((13)-(0)+1))-1)<<(0))) == 0", file=file@entry=0x7fffd6edf9af "gen8_surface_state.c", line=line@entry=370, 
    function=function@entry=0x7fffd6edfb70 <__PRETTY_FUNCTION__.45075> "gen8_emit_null_surface_state") at assert.c:101
#4  0x00007fffd6e0c5c7 in gen8_emit_null_surface_state (brw=<optimized out>, width=0, height=0, samples=<optimized out>, out_offset=<optimized out>) at gen8_surface_state.c:370
#5  0x00007fffd6df00ff in brw_update_renderbuffer_surfaces (brw=0x7fff48003fe0, fb=0x7fff48646010, render_target_start=0, surf_offset=0x7fff48028cbc) at brw_wm_surface_state.c:757
#6  0x00007fffd6df01f1 in update_renderbuffer_surfaces (brw=0x7fff48003fe0) at brw_wm_surface_state.c:775
#7  0x00007fffd6dc2ff8 in check_and_emit_atom (atom=0x7fff48029400, state=<synthetic pointer>, brw=0x7fff48003fe0) at brw_state_upload.c:667
#8  brw_upload_pipeline_state (pipeline=BRW_RENDER_PIPELINE, brw=0x7fff48003fe0) at brw_state_upload.c:768
#9  brw_upload_render_state (brw=0x7fff48003fe0) at brw_state_upload.c:790
#10 0x00007fffd6d48c10 in brw_try_draw_prims (indirect=<optimized out>, max_index=<optimized out>, min_index=<optimized out>, ib=<optimized out>, nr_prims=<optimized out>, prims=<optimized out>, arrays=<optimized out>, ctx=<optimized out>) at brw_draw.c:516
#11 brw_draw_prims (ctx=0x7fff48003fe0, prims=0x7fff5e7fb570, nr_prims=6, nr_prims@entry=1, ib=0x0, index_bounds_valid=255 '\377', index_bounds_valid@entry=1 '\001', min_index=0, max_index=2, unused_tfb_object=0x0, stream=0, indirect=0x0) at brw_draw.c:606
#12 0x00007fffd6dabaf3 in brw_draw_rectlist (ctx=ctx@entry=0x7fff48003fe0, rect=rect@entry=0x7fff5e7fb5c0, num_instances=num_instances@entry=1) at brw_meta_fast_clear.c:201
#13 0x00007fffd6dac67f in brw_meta_resolve_color (brw=0x7fff48003fe0, mt=0x7fff4857a0d0) at brw_meta_fast_clear.c:704
#14 0x00007fffd6e16ff5 in intel_miptree_resolve_color (brw=brw@entry=0x7fff48003fe0, mt=<optimized out>) at intel_mipmap_tree.c:1933
#15 0x00007fffd6d3f953 in intel_resolve_for_dri2_flush (brw=brw@entry=0x7fff48003fe0, drawable=drawable@entry=0x7fff48066ed0) at brw_context.c:1142
#16 0x00007fffd6d41aae in intel_resolve_for_dri2_flush (brw=brw@entry=0x7fff48003fe0, drawable=drawable@entry=0x7fff48066ed0) at brw_context.c:1119
#17 0x00007fffd6e1e63b in intel_dri2_flush_with_flags (cPriv=<optimized out>, dPriv=0x7fff48066ed0, flags=<optimized out>, reason=__DRI2_THROTTLE_SWAPBUFFER) at intel_screen.c:177
#18 0x00007fffefbf96e3 in dri3_flush (psc=0x1250750, draw=0x7fff48066de0, throttle_reason=__DRI2_THROTTLE_SWAPBUFFER, flags=3) at dri3_glx.c:572
#19 dri3_swap_buffers (pdraw=0x7fff48066de0, target_msc=0, divisor=0, remainder=0, flush=1) at dri3_glx.c:1534
#20 0x00007fffdf6d7b52 in QGLXContext::swapBuffers (this=0x7fff48003050, surface=0x3bbd2e0) at qglxintegration.cpp:403
#21 0x00007ffff06a58d6 in QOpenGLContext::swapBuffers (this=0x7fff48003300, surface=<optimized out>) at kernel/qopenglcontext.cpp:902


System details
--------------
GPU: Intel(R) HD Graphics 5500 (Broadwell GT2)

OS: Debian Jessie
    mesa: Mesa 11.1.0-devel (git-7182498)
    xf86-video-intel: 2.99.917
    xorg: 1.17.1
    libdrm: 2.4.64


Patched brw_meta_resolve_color function used to generate trace
--------------------------------------------------------------
void
brw_meta_resolve_color(struct brw_context *brw,
                       struct intel_mipmap_tree *mt)
{
   struct gl_context *ctx = &brw->ctx;
   GLuint fbo, rbo;
   struct rect rect;
   pid_t tid;

   brw_emit_mi_flush(brw);

   _mesa_meta_begin(ctx, MESA_META_ALL);

   _mesa_GenFramebuffers(1, &fbo);
   tid = syscall(SYS_gettid);
   _mesa_error(ctx, GL_NO_ERROR, "Generated FBO %d, ctx: %p, thread: %d", fbo, ctx, tid);    <------------

   rbo = brw_get_rb_for_slice(brw, mt, 0, 0, false);

   _mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, fbo);
   _mesa_FramebufferRenderbuffer(GL_DRAW_FRAMEBUFFER,
                                 GL_COLOR_ATTACHMENT0,
                                 GL_RENDERBUFFER, rbo);
   _mesa_DrawBuffer(GL_COLOR_ATTACHMENT0);

   brw_fast_clear_init(brw);

   use_rectlist(brw, true);

   brw_bind_rep_write_shader(brw, (float *) fast_clear_color);

   set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_RESOLVE_ENABLE);

   mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_RESOLVED;
   get_resolve_rect(brw, mt, &rect);

   brw_draw_rectlist(ctx, &rect, 1);

   set_fast_clear_op(brw, 0);
   use_rectlist(brw, false);

   _mesa_DeleteRenderbuffers(1, &rbo);
   _mesa_DeleteFramebuffers(1, &fbo);
   _mesa_error(ctx, GL_NO_ERROR, "Deleted FBO %d, ctx: %p, thread: %d", fbo, ctx, tid);   <--------------

   _mesa_meta_end(ctx);

   /* We're typically called from intel_update_state() and we're supposed to
    * return with the state all updated to what it was before
    * brw_meta_resolve_color() was called.  The meta rendering will have
    * messed up the state and we need to call _mesa_update_state() again to
    * get back to where we were supposed to be when resolve was called.
    */
   if (ctx->NewState)
      _mesa_update_state(ctx);
}
Comment 1 Ian Romanick 2015-10-28 22:33:54 UTC
Adding Chad and Ben to CC as they've worked on this code in the past.
Comment 2 Ian Romanick 2015-10-28 22:34:52 UTC
Do you know if this used to work on older versions of Mesa?  If it did, is it possible for you to bisect?  That will speed the process of finding the problem.
Comment 3 Ben Widawsky 2015-10-28 23:08:42 UTC
Any chance we can reproduce this locally?
Comment 4 Samuel Maroy 2015-10-29 08:40:02 UTC
I did my tests with the git master branch at commit 718249843b915decf8fccec92e466ac1a6219934

Now, I have an update regarding this issue. I've made a patch (see attachment) that solves it. As the fix is related to src/mesa/main/fbobject.c, I think that we probably should reassign this bug to the 'mesa-core' component.

Can you please confirm?
Comment 5 Samuel Maroy 2015-10-29 08:40:59 UTC
Created attachment 119282 [details] [review]
patch to fix this issue
Comment 6 Ian Romanick 2015-11-11 05:20:24 UTC
Samuel: Can you send your patch (with proper Author data) to the mesa-dev mailing list?  I think it's right, but I'd like it to have wider review.

Note for future self:  This could use a piglit test, and I'm sure other object types suffer from the same problem.  The test would be to have several threads (more than there are CPU cores) call Gen for, say, 32 or 64 objects in a tight loop for several iterations.  Then check that all the threads got unique IDs.  We'd need this for every Gen and every Create (e.g., glCreateTextures).
Comment 7 GitLab Migration User 2019-09-18 20:24:21 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/994.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.