Bug 41495 - i830: intel_get_vb_max / intel_batchbuffer_space mismatch.
i830: intel_get_vb_max / intel_batchbuffer_space mismatch.
Status: RESOLVED FIXED
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i915
unspecified
Other All
: medium normal
Assigned To: Ian Romanick
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-10-05 15:58 UTC by Kurt Roeckx
Modified: 2012-03-02 17:37 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Compute number of vertices to fit inside remaining batch (2.18 KB, patch)
2011-11-09 14:25 UTC, Chris Wilson
Details | Splinter Review
Use minimum size of batch.map and batch.bo (734 bytes, patch)
2011-11-13 12:10 UTC, Kurt Roeckx
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Kurt Roeckx 2011-10-05 15:58:43 UTC
Hi,

Trying to debug a crash, because it's writing to places it shouldn't, I finally came to this function in intel_batchbuffer.h:
static INLINE unsigned
intel_batchbuffer_space(struct intel_context *intel)
{
   return (intel->batch.state_batch_offset - intel->batch.reserved_space)
      - intel->batch.used*4;
}

(gdb) p intel->batch.state_batch_offset
$83 = 4096
(gdb) p intel->batch.reserved_space
$84 = 16
(gdb) p intel->batch.used
$85 = 7894
(gdb) p (intel->batch.state_batch_offset - intel->batch.reserved_space) - intel->batch.used*4
$86 = 4294939800

Which of course doesn't make any sense.

This comes from this function:
#0  intel_extend_inline (dwords=7812, intel=0x81fbf50) at intel_tris.c:137

static GLuint *intel_extend_inline(struct intel_context *intel, GLuint dwords)
{
   GLuint *ptr;

   assert(intel->prim.flush == intel_flush_inline_primitive);

   if (intel_batchbuffer_space(intel) < dwords * sizeof(GLuint))
      intel_wrap_inline(intel);

/*    printf("."); */

   intel->vtbl.assert_not_dirty(intel);

   ptr = intel->batch.map + intel->batch.used;
   intel->batch.used += dwords;

   return ptr;
}

It doesn't check that it actually has room for those 7812 dwords.
It gets called from:
#1  intel_get_prim_space (intel=0x81fbf50, count=1116) at intel_tris.c:164
164	      return intel_extend_inline(intel, count * intel->vertex_size);


But the real error probably is here:
#2  0xb7611352 in intel_render_triangles_verts (ctx=0x81fbf50, start=0, 
    count=2730, flags=20) at ../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:294

static void TAG(render_triangles_verts)( struct gl_context *ctx,
                                         GLuint start,
                                         GLuint count,
                                         GLuint flags )
{
   LOCAL_VARS;
   int dmasz = (GET_SUBSEQUENT_VB_MAX_VERTS()/3) * 3;
   int currentsz;
   GLuint j, nr;

   INIT(GL_TRIANGLES);

   currentsz = (GET_CURRENT_VB_MAX_VERTS()/3) * 3;

   /* Emit whole number of tris in total.  dmasz is already a multiple
    * of 3.
    */
   count -= (count-start)%3;

   if (currentsz < 8)
      currentsz = dmasz;

   for (j = start; j < count; j += nr) {
      nr = MIN2( currentsz, count - j );
      TAG(emit_verts)( ctx, j, nr, ALLOC_VERTS(nr) );
      currentsz = dmasz;
   }
}

Where GET_CURRENT_VB_MAX_VERTS() is this in intel_render.c:
static INLINE GLuint intel_get_current_max(struct intel_context *intel)
{

   if (intel->intelScreen->no_vbo)
      return intel_get_vb_max(intel);
   else
      return (INTEL_VB_SIZE - intel->prim.current_offset) / (intel->vertex_size * 4);
}

static INLINE GLuint intel_get_vb_max(struct intel_context *intel)
{
   GLuint ret;

   if (intel->intelScreen->no_vbo)
      ret = sizeof(intel->batch.map) - 1500;
   else
      ret = INTEL_VB_SIZE;
   ret /= (intel->vertex_size * 4);
   return ret;
}

With no_vbo == 1 and sizeof(intel->batch.map) == 32768, you get the 1116

Which doesn't agree with the intel_batchbuffer_space above.

The 4096 seems to come from:
void
intel_batchbuffer_reset(struct intel_context *intel)
{
   if (intel->batch.last_bo != NULL) {
      drm_intel_bo_unreference(intel->batch.last_bo);
      intel->batch.last_bo = NULL;
   }
   intel->batch.last_bo = intel->batch.bo;

   clear_cache(intel);

   intel->batch.bo = drm_intel_bo_alloc(intel->bufmgr, "batchbuffer",
                                        intel->maxBatchSize, 4096);

   intel->batch.reserved_space = BATCH_RESERVED;
   intel->batch.state_batch_offset = intel->batch.bo->size;
   intel->batch.used = 0;
}

With
   if (intel->gen < 4)
      intel->maxBatchSize = 4096;
   else
      intel->maxBatchSize = sizeof(intel->batch.map);

I hope this gives a good overview of how things go wrong for me.  I'm not really sure about the solution.

I also wonder if that last sizeof() makes sense and shouldn't be divided by 4.


I'm seeing this problem with the texgen piglit test on an i830.


Kurt
Comment 1 Kurt Roeckx 2011-10-16 15:21:35 UTC
Hi,

So after looking at this some more there is an additional problem that the intel_start_inline() call in intel_wrap_inline() already adds things to the buffer.

In my test case it ends up with 82 in intel->batch.used after the intel_start_inline() call.

If I make sure that it's limited to (4096-16-4*82) / (intel->vertex_size * 4), I can get this test to pass.

But I'm not sure what the proper way to fix all this is.


Kurt
Comment 2 Chris Wilson 2011-11-09 14:25:30 UTC
Created attachment 53351 [details] [review]
Compute number of vertices to fit inside remaining batch

Kurt, can you please try this patch.
Comment 3 Kurt Roeckx 2011-11-09 18:20:36 UTC
(In reply to comment #2)
> Created attachment 53351 [details] [review] [review]
> Compute number of vertices to fit inside remaining batch
> 
> Kurt, can you please try this patch.

It doesn't fix the problem.

The stack trace looks like:
Program terminated with signal 11, Segmentation fault.
#0  0xb6d845f2 in intel_get_prim_space (intel=0x9c67ef0, count=426)
    at intel_tris.c:163
163	   if (intel->intelScreen->no_vbo) {
(gdb) bt
#0  0xb6d845f2 in intel_get_prim_space (intel=0x9c67ef0, count=426)
    at intel_tris.c:163
#1  0xb6d51abb in intel_render_triangles_verts (ctx=0x9c67ef0, start=0, 
    count=2730, flags=20) at ../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:294
#2  0xb6d524b4 in intel_run_render (ctx=0x9c67ef0, stage=0x9ce0d98)
    at intel_render.c:253
#3  0xb6e91391 in _tnl_run_pipeline (ctx=0x9c67ef0) at tnl/t_pipeline.c:163
#4  0xb6da2ffb in intelRunPipeline (ctx=0x9c67ef0) at intel_tris.c:1095
#5  0xb6e9269e in _tnl_draw_prims (ctx=0x9c67ef0, arrays=0x9cceca4, 
    prim=0x9ccd600, nr_prims=1, ib=0x0, min_index=0, max_index=2729)
    at tnl/t_draw.c:538
#6  0xb6e92436 in _tnl_vbo_draw_prims (ctx=0x9c67ef0, arrays=0x9cceca4, 
    prim=0x9ccd600, nr_prims=1, ib=0x0, index_bounds_valid=1 '\001', 
    min_index=0, max_index=2729) at tnl/t_draw.c:438
#7  0xb6e7b8b5 in vbo_exec_vtx_flush (exec=0x9ccd3f0, keepUnmapped=0 '\000')
    at vbo/vbo_exec_draw.c:379
#8  0xb6e623bb in vbo_exec_wrap_buffers (exec=0x9ccd3f0)
    at vbo/vbo_exec_api.c:89
#9  0xb6e6248c in vbo_exec_vtx_wrap (exec=0x9ccd3f0) at vbo/vbo_exec_api.c:124
#10 0xb6e63562 in vbo_Vertex3fv (v=0x9e41844) at vbo/vbo_attrib_tmp.h:202
#11 0x080d12f3 in GLEAN::GeomRenderer::sendVertex (this=0xbff783dc, 
    vertexIndex=455) at /home/kurt/piglit/tests/glean/geomrend.cpp:381
#12 0x080d0c55 in GLEAN::GeomRenderer::renderPrimitives (this=0xbff783dc, 
    mode=4) at /home/kurt/piglit/tests/glean/geomrend.cpp:214
#13 0x0814353e in GLEAN::TexgenTest::renderSphere (this=0x81cfce0, 
    retainedMode=0, sphereRenderer=...)
    at /home/kurt/piglit/tests/glean/ttexgen.cpp:353
#14 0x08142df0 in GLEAN::TexgenTest::runOne (this=0x81cfce0, r=...)
    at /home/kurt/piglit/tests/glean/ttexgen.cpp:262
#15 0x080de4e6 in GLEAN::BaseTest<GLEAN::BasicResult>::run (this=0x81cfce0, 
    environment=...) at /home/kurt/piglit/tests/glean/tbase.h:325
#16 0x080d4d79 in main (argc=9, argv=0xbff787c4)
    at /home/kurt/piglit/tests/glean/main.cpp:141

The problem is that the code now assumes that the first call is limited to the size of GET_CURRENT_VB_MAX_VERTS() / intel_get_current_max() and the following calls are limited to GET_SUBSEQUENT_VB_MAX_VERTS() / intel_get_vb_max().  Which isn't the case because intel_get_vb_max() is not changed and still is way too big.

The change I tested before limited intel_get_vb_max() to 134 ((4096-16-4*82)/28) because that was the size I saw I needed, but it now still returns 1116.


Kurt
Comment 4 Kurt Roeckx 2011-11-10 18:48:11 UTC
To summarize my understanding of the problem, you have 3 buffers:
- batch.map, 8K entries, 32K bytes
- batch.bo, 4K bytes
- prim.vb and prim.vb_bo, 32K bytes, but not used for no_vbo

batch.bo seems to have 16 bytes reserved.
There is also a reserved space of 1500, but I'm not sure what it's for exactly, but it used to limit the batch.map.  I assume it's the real upper limit of that "4*82" I used, or something that's close to that upper limit.

The calling functions expect to get a the available space for the first and the following calls.

The functions intel_get_vb_max() / intel_get_current_max() now only take the size of batch.map into account for the no_vbo case, while the intel_batchbuffer_space takes the batch.bo space into account.

My *guess* would be that either intel_get_vb_max() / intel_get_current_max() need to take the size of both batch.map and batch.bo into account, or that batch.bo should be the same size as batch.map.

In case the first one is the correct way, the patch only fixed 1 of the 2 functions.  intel_get_vb_max() would then need to take the minimum of the 2 sizes, which is probably the right thing to do in any case.

But this contains a lot of guesses on my end.


Kurt
Comment 5 Kurt Roeckx 2011-11-13 12:10:17 UTC
Created attachment 53483 [details] [review]
Use minimum size of batch.map and batch.bo

With the combination of the other patch and this patch things work for me.
Comment 6 Eric Anholt 2012-03-02 17:37:35 UTC
Thanks for your work on fixing this, and sorry for being slow in pushing your changes.  I've done some piglit runs, and this pair of patches looks good, though I dropped the map size check (it's always at least batch.bo->size).

commit 024ece7523f1735d2fca0067c0a3bdcf53fde8f9
Author: Kurt Roeckx <kurt@roeckx.be>
Date:   Fri Mar 2 15:34:45 2012 -0800

    i915: Compute maximum number of verts using the actual batchbuffer size.
    
    We were looking at the size of batch.map for how big the batchbuffer
    was, but on 865 we just use a single-page batchbuffer due to hardware
    limits.
    
    v2: Removed check for sizeof map < bo->size, since that's always false.
        [change by anholt]
    NOTE: This is a candidate for release branches.
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41495

commit 33b07893e92dcee495908c549be872887096c894
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Nov 9 22:21:16 2011 +0000

    i830: Compute initial number of vertices from remaining batch space
    
    In order to prevent an overflow of the batch buffer when emitting
    triangles, we need to limit the initial primitive to fit within the
    current batch. To do we need to measure the remaining space and thence
    compute the maximum number of vertices that fit into that space.
    
    Reported-by: Kurt Roeckx <kurt@roeckx.be>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41495
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Eric Anholt <eric@anholt.net>
    NOTE: This is a candidate for release branches.