Bug 102038 - assertion failure in update_framebuffer_size
Summary: assertion failure in update_framebuffer_size
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/swr (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: mesa-dev
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-08-04 15:33 UTC by Brad King
Modified: 2017-09-05 16:55 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
apitrace of test that crashes (66.93 KB, application/octet-stream)
2017-08-04 15:33 UTC, Brad King
Details
proposed patch (1.97 KB, patch)
2017-08-21 20:12 UTC, Brian Paul
Details | Splinter Review
simple-msaa.c (2.04 KB, text/x-csrc)
2017-08-21 23:17 UTC, Bruce Cherniak
Details
new patch to try. (4.42 KB, patch)
2017-08-22 21:46 UTC, Brian Paul
Details | Splinter Review
Patch to fix swr after "new patch to try" (2.16 KB, patch)
2017-08-23 20:22 UTC, Bruce Cherniak
Details | Splinter Review
Updated swr patch to fix msaa once new_patch_to_try has been applied. (2.16 KB, patch)
2017-08-24 00:19 UTC, Bruce Cherniak
Details | Splinter Review

Description Brad King 2017-08-04 15:33:43 UTC
Created attachment 133247 [details]
apitrace of test that crashes

Since commit 6839d33699 (st/mesa: fix handling of NumSamples=1 (v2), 2017-08-01) many of VTK's tests crash with Mesa with an assertion failure.  I've attached an apitrace of one of the tests.

The test aborts with an assertion failure:

```
state_tracker/st_atom_framebuffer.c:62: update_framebuffer_size: Assertion `surface' failed.
```

The backtrace in gdb is

```
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff7a6e3fa in __GI_abort () at abort.c:89
#2  0x00007ffff7a65e37 in __assert_fail_base (fmt=<optimized out>, 
    assertion=assertion@entry=0x7fffec7516a4 "surface", 
    file=file@entry=0x7fffec751680 "state_tracker/st_atom_framebuffer.c", 
    line=line@entry=62, 
    function=function@entry=0x7fffec7517a0 <__PRETTY_FUNCTION__.39021> "update_framebuffer_size") at assert.c:92
#3  0x00007ffff7a65ee2 in __GI___assert_fail (
    assertion=0x7fffec7516a4 "surface", 
    file=0x7fffec751680 "state_tracker/st_atom_framebuffer.c", line=62, 
    function=0x7fffec7517a0 <__PRETTY_FUNCTION__.39021> "update_framebuffer_size") at assert.c:101
#4  0x00007fffec336bad in update_framebuffer_size (framebuffer=0x7fffffffce70, 
    surface=0x0) at state_tracker/st_atom_framebuffer.c:62
#5  0x00007fffec336f3e in st_update_framebuffer_state (st=0x555555ae3b90)
    at state_tracker/st_atom_framebuffer.c:181
#6  0x00007fffec33538c in st_validate_state (st=0x555555ae3b90, 
    pipeline=ST_PIPELINE_CLEAR) at state_tracker/st_atom.c:251
#7  0x00007fffec3428ea in st_Clear (ctx=0x555555abeb30, mask=18)
    at state_tracker/st_cb_clear.c:411
#8  0x00007fffec0a2b4a in clear (no_error=false, mask=16640, ctx=0x555555abeb30)
    at main/clear.c:221
#9  _mesa_Clear (mask=16640) at main/clear.c:242
...(vtk code)...
```

For reference, I'm building Mesa as follows:

    ./autogen.sh \
      --prefix="$prefix" \
      --enable-debug \
      --disable-dri \
      --disable-egl \
      --disable-gbm \
      --disable-gles1 \
      --disable-gles2 \
      --disable-shared-glapi \
      --with-platforms=x11 \
      --enable-glx=gallium-xlib \
      --enable-gallium-osmesa \
      --with-gallium-drivers=swrast \
      --enable-gallium-llvm=yes \
        LLVM_CONFIG=/usr/bin/llvm-config-3.8 \
      --enable-llvm-shared-libs \
      --with-gl-lib-name=MesaGL \
      --with-osmesa-lib-name=MesaOSMesa
Comment 1 Brian Paul 2017-08-04 16:59:38 UTC
Sorry for the breakage, Brad.

I'll try to investigate ASAP, but I'm about to leave town for a week.

I can't repro with llmvpipe, fwiw.  I suspect it may be an issue in the swr driver.  I'm building it now, but may run out of time.
Comment 2 Brian Paul 2017-08-04 17:11:50 UTC
Hmm, swr isn't working at all for me.  It's hanging in a swr_fence_finish() call with everything I've tried.  Even glxinfo hangs (but elsewhere).

Maybe one of the swr developers can take a look.
Comment 3 Brian Paul 2017-08-17 18:43:38 UTC
OK, with a patch from Bruce, swr is working for me, but I can't repro this assertion.  Are you still seeing it, Brad?
Comment 4 Bruce Cherniak 2017-08-17 19:40:59 UTC
I was able to reproduce, but not through the trace.  It runs fine.  And, only on llvmpipe, the swr driver doesn't fail.

Running VTK test "ctest -R TestGL2PSBillboardTextActor3D" I get a backtrace like Brad's.

I'll dig a little deeper.
Comment 5 Bruce Cherniak 2017-08-18 16:49:57 UTC
Looks like a problem in llvmpipe_is_format_supported.  If should be returning false for all "sample_count > 0", not 1.

That works for me, at least.  I'm running all VTK tests against it right now.
Comment 6 Roland Scheidegger 2017-08-18 17:54:01 UTC
(In reply to Bruce Cherniak from comment #5)
> Looks like a problem in llvmpipe_is_format_supported.  If should be
> returning false for all "sample_count > 0", not 1.
No, not really. In gallium there's no distinction between sample count 0 and 1 - there are no surfaces with no samples, after all. (Ordinary surfaces should actually always have 1 sample, that 0 is allowed to is more of a historical accident.)

Maybe VTK is trying to use msaa surfaces with just one sample, despite that the max sample count supported we announce is 0.
Albeit I'm thinking we should have refused that configuration earlier somewhere, if the sample_count gets upgraded later...
Comment 7 Bruce Cherniak 2017-08-18 18:29:02 UTC
I understand that this isn't a solution, but it points to where things go wrong.  I'll keep digging for the correct fix.
Comment 8 Bruce Cherniak 2017-08-21 19:43:04 UTC
Roland, you are correct, VTK is trying to use msaa surfaces with just one sample.

Here's what's happening:

The rendertarget color buffer is being created by:
st_api_make_current
  st_framebuffer_validate
    xmesa_st_frambuffer_validate_textures
      llvmpipe_resource_create
with nr_samples = 1

Then later in st_framebuffer_validate "changed" is set when the surface is created.  This causes the following:

_mesa_resize_framebuffer  (loops until it gets to the depth attachment)
  st_renderbuffer_alloc_storage

until we get to:
160├──> if (rb->NumSamples > 0) {
161│       unsigned i;
162│
163│       for (i = MAX2(2, rb->NumSamples); i <= ctx->Const.MaxSamples; i++) {
164│          format = st_choose_renderbuffer_format(st, internalFormat, i);
165│
166│          if (format != PIPE_FORMAT_NONE) {
167│             rb->NumSamples = i;
168│             break;
169│          }
170│       }
171│    } else {
172│       format = st_choose_renderbuffer_format(st, internalFormat, 0);
173│    }
174│
175│    /* Not setting gl_renderbuffer::Format here will cause
176│     * FRAMEBUFFER_UNSUPPORTED and ValidateFramebuffer will not be called.
177│     */
178|    if (format == PIPE_FORMAT_NONE) {
179│       return GL_TRUE;
180│    }
  
rb->NumSamples = 1, but ctx->Const.MaxSamples = 0 so the loop @ 163 is skipped and format remains  PIPE_FORMAT_NONE.  Because to this, we don't allocate a depth buffer and boom.

But, this works:
-   if (rb->NumSamples > 0) {
+   if (rb->NumSamples > 1) {
       unsigned i;
 
       for (i = MAX2(2, rb->NumSamples); i <= ctx->Const.MaxSamples; i++) {
@@ -169,7 +170,7 @@ st_renderbuffer_alloc_storage(struct gl_context * ctx,
          }
       }
    } else {
-      format = st_choose_renderbuffer_format(st, internalFormat, 0);
+      format = st_choose_renderbuffer_format(st, internalFormat, rb->NumSamples);
    }

I'm not sure if it honors the original intent.  But, in addition to fixing the VTK tests ext_framebuffer_multisample-blit-mismatched-formats now passes.
Comment 9 Brian Paul 2017-08-21 20:12:53 UTC
Created attachment 133654 [details] [review]
proposed patch

Bruce, can you try the attached patch?
Comment 10 Bruce Cherniak 2017-08-21 22:38:10 UTC
For llvmpipe, this fails in the same way as before.  Unless llvmpipe_is_format_supported is changed in the same way that you modified svga_is_format_supported:

-   if (sample_count > 1)
+   if (sample_count > 0)

However, this is the change that Roland has issue with.
Comment 11 Bruce Cherniak 2017-08-21 23:17:39 UTC
Created attachment 133661 [details]
simple-msaa.c

Hi Brian,

This simple msaa sample fails in the same way when run with "sample-msaa 1".
Comment 12 Brian Paul 2017-08-22 21:46:56 UTC
Created attachment 133687 [details] [review]
new patch to try.

Hi Bruce, here's another patch to try.
Comment 13 Bruce Cherniak 2017-08-23 16:21:35 UTC
Hi Brian,

This appears to fix VTK tests on llvmpipe.  However, it breaks SWR.  Please let me see what's going on there before submitting this patch for review.
Comment 14 Bruce Cherniak 2017-08-23 20:22:59 UTC
Created attachment 133726 [details] [review]
Patch to fix swr after "new patch to try"

Hi Brian,

I found the reason that swr starts failing with your patch.  I was treating any num_samples > 0 as msaa and turning off the fake_msaa cap.  The attached (in conjunction with you patch) works correctly.

If this makes sense to you, will you please include it with your patch so that we don't end up with a regression?
Comment 15 Bruce Cherniak 2017-08-24 00:15:14 UTC
Comment on attachment 133726 [details] [review]
Patch to fix swr after "new patch to try"

Nevermind that patch, I found a bug in validating MSAA override value.  Updating with a new version.
Comment 16 Bruce Cherniak 2017-08-24 00:19:34 UTC
Created attachment 133732 [details] [review]
Updated swr patch to fix msaa once new_patch_to_try has been applied.

Here's a patch that fixes the MSAA override validation.  "new-msaa-swr.patch" should be included along with the "new_patch_to_try".
Comment 17 Brad King 2017-08-25 18:49:00 UTC
After applying the two patches I can confirm that the VTK test I used to produce the apitrace now passes again.  Thanks!
Comment 18 Brian Paul 2017-09-05 16:11:24 UTC
The state tracker patch has been committed.
I'll leave the swr driver patch to Bruce.
Comment 19 Bruce Cherniak 2017-09-05 16:54:36 UTC
The swr driver patch has been committed, as well.
Comment 20 Brad King 2017-09-05 16:55:52 UTC
The VTK test suite passes again since these two patches were merged.  Thanks!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.