Bug 107295

Summary: Access violation on glDrawArrays with count >= 2048
Product: Mesa Reporter: Paul <paul>
Component: Drivers/Gallium/llvmpipeAssignee: mesa-dev
Status: RESOLVED FIXED QA Contact: mesa-dev
Severity: major    
Priority: medium    
Version: 18.0   
Hardware: x86 (IA32)   
OS: Windows (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: Sample code

Description Paul 2018-07-19 15:49:42 UTC
I found this issue first on a 10.x version. The issue still remains in version 18.0.

I use an OpenGL 3.3 Core context and a set of shaders that converts points to simple flat shaded cones using a geometry shader given the following:

* Position, normal and color as vertex attributes
* Base radius and length as uniforms

The cones are made out of 8 sides (16 triangles total including the base).

Calling glDrawArrays(GL_POINTS, 0, 3000) will cause an access violation in the DLL (opengl32). All points are stored into a single buffer object described by a VAO. Further testing shows that the maximum count that will not crash is 2047. Splitting the call works fine, i.e.:

glDrawArrays(GL_POINTS, 0, 1500)
glDrawArrays(GL_POINTS, 1500, 1500)

According to llvmpipe, the maximum recommended vertex count for draw range elements is 3000 (there's no specific value for just glDrawArrays). This is just a recommended value, where larger values may only decrease performance, but it's already crashing way before 3000 is reached.
Comment 1 Roland Scheidegger 2018-07-19 16:10:23 UTC
To debug this we'd need some more information.
Backtrace might be a good start but probably not sufficient.
Sample code would be great, as would be an apitrace.
Comment 2 Paul 2018-07-20 09:23:06 UTC
I managed to build a debug version of Mesa3D 10.4.7 (version 18 is a pre-built binary without debugging information), and this is what I got for a callstack:

>	opengl32.dll!draw_pt_emit_linear() Line 235	C++
 	opengl32.dll!emit() Line 324	C++
 	opengl32.dll!llvm_pipeline_generic() Line 461	C++
 	opengl32.dll!llvm_middle_end_linear_run() Line 523	C++
 	opengl32.dll!vsplit_segment_simple_linear() Line 241	C++
 	opengl32.dll!vsplit_run_linear() Line 60	C++
 	opengl32.dll!draw_pt_arrays() Line 151	C++
 	opengl32.dll!draw_vbo() Line 547	C++
 	opengl32.dll!llvmpipe_draw_vbo() Line 137	C++
 	opengl32.dll!cso_draw_vbo() Line 1420	C++
 	opengl32.dll!st_draw_vbo() Line 257	C++
 	opengl32.dll!vbo_draw_arrays() Line 649	C++
 	opengl32.dll!vbo_exec_DrawArrays() Line 800	C++
 	opengl32.dll!glDrawArrays() Line 2005	C++

The code fails when calling:

   translate->run(translate,
                  0,
                  count,
                  draw->start_instance,
                  draw->instance_id,
                  hw_verts);

I tried to duplicate this behavior using a test program with the same shaders and attribute definition, and it works just fine. Further checking shows that the test program follows another path inside llvmpipe: it calls pipeline instead of emit on llvm_pipeline_generic. The piece of code that determines which method to call is this (Mesa-10.4.7\src\gallium\auxiliary\draw\draw_pt_fetch_shade_pipeline_llvm.c:445):

      if ((opt & PT_SHADE) && gshader) {
         clipped = draw_pt_post_vs_run( fpme->post_vs, vert_info, prim_info );
      }
      if (clipped) {
         opt |= PT_PIPELINE;
      }

      /* Do we need to run the pipeline? Now will come here if clipped
       */
      if (opt & PT_PIPELINE) {
         pipeline( fpme, vert_info, prim_info ); // works fine
      }
      else {
         emit( fpme->emit, vert_info, prim_info ); // crashes further on
      }

The test program is, therefore, setting PT_PIPELINE while the actual program is not (what exactly causes clipped to be true or false in terms of the OpenGL context?). Forcing the code to call emit instead of pipeline by using the debugger does reproduce the crash. Forcing the actual program to call pipeline instead of emit does not crash and does show the proper output.

The vertex struct is defined like this:

        struct VertexAttribute
        {
            float position[3]; // offset = 0
            float normal[3]; // offset = 12
            unsigned char selectionColor[4]; // offset = 24
            float values[3]; // offset = 28
            int flags; // offset = 40
        };

sizeof(VertexAttribute) = 44
Total vertices = 3159

For this particular VAO I'm not using selectionColor, but every other member of the struct is associated with a shader attribute.

Some local values inside draw_pt_emit_linear are:

translate->key.output_stride = 32
translate->key.nr_elements = 2
vert_info->stride = 68
vert_info->count = 101088

I assume the values are associated with the output of the geometry shader. There I specify a max_vertices of 32, and it produces 2 outputs: position and color. The 32 would also refer to the number of bytes for the output: position and color are both vec4. The 68 doesn't make sense, though. I do emit 32 vertices in batches of 4 vertices for a triangle strip primitive.

Is there anything else I may test?
Comment 3 Paul 2018-07-20 09:29:21 UTC
Created attachment 140729 [details]
Sample code

This is the sample program I'm using (I just adjusted an example I found on the Internet). All I removed from the shaders was the part that makes use of flags.

NOTE: The sample code works fine because it calls pipeline instead of emit.
Comment 4 Roland Scheidegger 2018-07-20 15:22:50 UTC
(In reply to Paul from comment #2)
> I managed to build a debug version of Mesa3D 10.4.7 (version 18 is a
> pre-built binary without debugging information), and this is what I got for
> a callstack:
Thanks, but a callstack with a more recent version would have been more useful.

> The test program is, therefore, setting PT_PIPELINE while the actual program
> is not (what exactly causes clipped to be true or false in terms of the
> OpenGL context?).
Simple, if at least some of the vertices are outside the viewport clipped will be set (if there's no gs, this will be determined directly in the generated shader, otherwise draw_pt_post_vs_run() will determine it).
(PT_PIPELINE will also be already set if you hit the draw pipeline fallbacks, such as drawing tris in wireframe or point mode, but it appears this isn't the case here.)

> Forcing the code to call emit indraw_pt_post_vs_runstead of pipeline by using
> the debugger does reproduce the crash. Forcing the actual program to call
> pipeline instead of emit does not crash and does show the proper output.

> translate->key.output_stride = 32
> translate->key.nr_elements = 2
> vert_info->stride = 68
> vert_info->count = 101088
> 
> I assume the values are associated with the output of the geometry shader.
> There I specify a max_vertices of 32, and it produces 2 outputs: position
> and color. The 32 would also refer to the number of bytes for the output:
> position and color are both vec4. The 68 doesn't make sense, though. I do
> emit 32 vertices in batches of 4 vertices for a triangle strip primitive.
The 68 indeed is the stride of the output vertices of the gs here.
So 32 bytes plus the vertex header, though (should be 20 bytes, 4 floats for clip pos, plus another 4 bytes for flag bits - I think it's 68 here and not 52 because you're using a prehistoric mesa version). 

I can reproduce this here. I think it might be a bug in the translate code but not sure (valgrind shows it's writing past the allocated buffer size for the output (driver) vertices). Or the parameters to it aren't quite right (it does not appear to be a bug in the generated translate code directly, since if that's commented out (see translate_create()) it'll crash in the memcpy used in the generic path instead).

I'll try to look into this.
Comment 5 Roland Scheidegger 2018-07-20 16:02:39 UTC
Ah actually I found the issue.
We cannot handle having more than a ushort number of vertices in alloc_vertices() (possibly elsewhere too).
There's logic to split things into smaller chunks (the vsplit stuff in the backtrace) based on various things, but among others there's an absolute limit of 4096 vertices at once, so that the chunks don't get too big (bad for caches etc.)
But that's coming from a time without GS, which can amplify the amount of vertices (by quite a lot), which is exactly what's happening here. The 101088 vertices don't fit into a ushort and hence the allocated buffer is too small as a result.
(With the pipeline path, things will be split into individual prims and only a couple at once emitted.)
I suspect the easiest fix is to just use uints instead of ushorts there. While we didn't really want such large numbers, we cannot split things in the front-end reasonably so we don't exceed that later after the gs (the amplification limit is 1024 after all). I suppose another possibility would be to split things after gs again but not sure it warrants the complexity...
Comment 6 Roland Scheidegger 2018-07-20 17:26:04 UTC
Actually there might be some trouble with this. The vertex_id we use internally (in the vertex header) are 16 bit too.
Although I believe they are not actually relevant when using the pt_emit paths so maybe it'll still work (and the pipeline path should use smaller numbers again). But I forgot how this actually works...
Comment 7 Paul 2018-07-20 17:46:42 UTC
It's good to know you managed to pinpoint the issue, even if it's tricky to solve. Is there a particular reason to choose emit over pipeline? Does it perform better, for example?

I'm also wondering if this issue is fully llvmpipe specific, or does it also affect softpipe, swr or any other of the hardware based drivers? What about VMware Fusion (I get the idea it's also based on mesa)?
Comment 8 Roland Scheidegger 2018-07-20 20:37:33 UTC
(In reply to Paul from comment #7)
> It's good to know you managed to pinpoint the issue, even if it's tricky to
> solve. Is there a particular reason to choose emit over pipeline? Does it
> perform better, for example?
Yes, pipeline is more costly. emit is a passthrough path which is quite a bit easier (pipeline will decompose tri strips etc. emitting tris, whereas emit will leave things alone).

> I'm also wondering if this issue is fully llvmpipe specific, or does it also
> affect softpipe, swr or any other of the hardware based drivers?
swr doesn't use draw, so is unaffected. Most hw drivers don't use draw neither, some do (i915), some might (some cut-down r300 based chips) but they don't expose (I believe) geometry shaders.
softpipe is - it doesn't use llvm-based draw by default (but you can enable that too), but the non-llvm paths are basically the same there (just set GALLIUM_DRIVER=softpipe and it'll crash all the same).

> What about VMware Fusion (I get the idea it's also based on mesa)?
The gl guest driver is, and it might use draw but only for fallbacks (I am not sure you'd be able to hit this case, probably not).
The code running on the host side, no, at least not hw based rendering.
Comment 9 Roland Scheidegger 2018-07-23 21:30:57 UTC
Fixed by 09828feab0bdcb1feb0865fc66e5304f6164c3b8.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.