Created attachment 120555 [details] apitrace reproducing this bug (starts at frame 470) Some vertices seem to be wrongly put on the (0,0) (1,1) diagonal when using a geometry shader on Sandy Bridge. Screenshots of this bug: http://i.imgur.com/c8WhpF5.png http://linkmauve.fr/files/dolphin.png (from the previous trace) Related Dolphin bug report: https://bugs.dolphin-emu.org/issues/9166
Created attachment 121075 [details] Screenshot from trace What's not obvious in the previous screenshot is that all these lines stop at the diagonal. It looks like there's a fan vs strip type of situation going on here on top of whatever's causing the lines to appear. The first draw with a GS shows the bad behaviour: 813290 @0 glDrawRangeElementsBaseVertex(mode = GL_POINTS, start = 0, end = 24630, count = 24630, type = GL_UNSIGNED_SHORT, indices = 0x15eb10, basevertex = 413659) So it takes points in and outputs a triangle_strip. Perhaps that's where things are going wrong?
And with master I'm getting: brw_vec4.cpp:1797: void brw::vec4_visitor::convert_to_hw_regs(): Assertion `brw_is_single_value_swizzle(inst->src[i].swizzle)' failed. Program received signal SIGABRT, Aborted. 0x00007ffff607f167 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff607f167 in raise () from /lib64/libc.so.6 #1 0x00007ffff60804ca in abort () from /lib64/libc.so.6 #2 0x00007ffff6078296 in __assert_fail_base () from /lib64/libc.so.6 #3 0x00007ffff6078342 in __assert_fail () from /lib64/libc.so.6 #4 0x00007ffff286d630 in brw::vec4_visitor::convert_to_hw_regs (this=this@entry=0x4b59490) at brw_vec4.cpp:1797 #5 0x00007ffff286d991 in brw::vec4_visitor::run (this=this@entry=0x4b59490) at brw_vec4.cpp:1960 #6 0x00007ffff287b2d4 in brw::brw_compile_gs (compiler=0x9185d0, log_data=log_data@entry=0x7ffff7fd0040, mem_ctx=mem_ctx@entry=0x4bc6510, key=key@entry=0x7fffffffd910, prog_data=prog_data@entry=0x7fffffffd6e0, src_shader=<optimized out>, shader_prog=shader_prog@entry=0x4b4cac0, shader_time_index=shader_time_index@entry=-1, final_assembly_size=final_assembly_size@entry=0x7fffffffd6d4, error_str=error_str@entry=0x7fffffffd6d8) at brw_vec4_gs_visitor.cpp:914 #7 0x00007ffff27a308c in brw_codegen_gs_prog (brw=brw@entry=0x7ffff7fd0040, prog=prog@entry=0x4b4cac0, gp=gp@entry=0x4b1d940, key=key@entry=0x7fffffffd910) at brw_gs.c:160 #8 0x00007ffff27a3604 in brw_gs_precompile (ctx=0x7ffff7fd0040, shader_prog=0x4b4cac0, prog=0x4b1d940) at brw_gs.c:288 #9 0x00007ffff27a3dc9 in brw_shader_precompile (sh_prog=0x4b4cac0, ctx=0x7ffff7fd0040) at brw_link.cpp:53 #10 brw_link_shader (ctx=0x7ffff7fd0040, shProg=<optimized out>) at brw_link.cpp:282 #11 0x00007ffff266833e in _mesa_glsl_link_shader (ctx=0x7ffff7fd0040, prog=0x4b4cac0) at program/ir_to_mesa.cpp:2962 #12 0x00007ffff256f35a in link_program (ctx=0x7ffff7fd0040, program=<optimized out>) at main/shaderapi.c:1048 The shader in question is, I believe, the same one as the one used in draw call 813290. Here are some debug prints from INTEL_DEBUG=gs (I skipped the GLSL IR bits): NIR (SSA form) for geometry shader: shader: MESA_SHADER_GEOMETRY name: GLSL214 inputs: 0 outputs: 0 uniforms: 0 decl_var uniform INTERP_QUALIFIER_NONE ivec4 ctexoffset (2, 0) decl_var uniform INTERP_QUALIFIER_NONE vec4 clinept (1, 0) decl_var uniform INTERP_QUALIFIER_NONE vec4 cstereo (0, 0) decl_var shader_in INTERP_QUALIFIER_FLAT vec4[1] colors_0 (VARYING_SLOT_VAR0, 26) decl_var shader_in INTERP_QUALIFIER_FLAT vec4[1] pos (VARYING_SLOT_VAR1, 27) decl_var shader_out INTERP_QUALIFIER_NONE vec4 gl_Position (VARYING_SLOT_POS, 0) decl_var shader_out INTERP_QUALIFIER_NONE vec4 colors_0@0 (VARYING_SLOT_VAR0, 26) decl_function main returning void impl main { block block_0: /* preds: */ vec1 ssa_0 = load_const (0x00000000 /* 0.000000 */) vec2 ssa_1 = load_const (0xbf800000 /* -1.000000 */, 0x3f800000 /* 1.000000 */) vec2 ssa_2 = load_const (0x3f800000 /* 1.000000 */, 0xbf800000 /* -1.000000 */) vec1 ssa_3 = load_const (0x00000010 /* 0.000000 */) vec4 ssa_4 = intrinsic load_ubo (ssa_0, ssa_3) () () vec1 ssa_5 = frcp ssa_4 vec1 ssa_6 = fmul ssa_4.w, ssa_5 vec1 ssa_7 = frcp ssa_4.y vec1 ssa_8 = fmul -ssa_4.w, ssa_7 vec2 ssa_9 = vec2 ssa_6, ssa_8 vec4 ssa_10 = intrinsic load_per_vertex_input (ssa_0, ssa_0) () (27) /* pos */ vec2 ssa_11 = fmul ssa_9, ssa_10.ww vec4 ssa_12 = intrinsic load_per_vertex_input (ssa_0, ssa_0) () (26) /* colors_0 */ vec2 ssa_13 = fadd ssa_10, -ssa_11 vec4 ssa_14 = vec4 ssa_13, ssa_13.y, ssa_10.z, ssa_10.w vec2 ssa_15 = ffma ssa_2, ssa_11, ssa_10 vec4 ssa_16 = vec4 ssa_15, ssa_15.y, ssa_10.z, ssa_10.w vec2 ssa_17 = ffma ssa_1, ssa_11, ssa_10 vec4 ssa_18 = vec4 ssa_17, ssa_17.y, ssa_10.z, ssa_10.w vec2 ssa_19 = fadd ssa_10, ssa_11 vec4 ssa_20 = vec4 ssa_19, ssa_19.y, ssa_10.z, ssa_10.w intrinsic store_output (ssa_14, ssa_0) () (0, 15) /* gl_Position */ intrinsic store_output (ssa_12, ssa_0) () (26, 15) /* colors_0 */ intrinsic emit_vertex_with_counter (ssa_0) () (0) vec1 ssa_21 = load_const (0x00000001 /* 0.000000 */) intrinsic store_output (ssa_16, ssa_0) () (0, 15) /* gl_Position */ intrinsic store_output (ssa_12, ssa_0) () (26, 15) /* colors_0 */ intrinsic emit_vertex_with_counter (ssa_21) () (0) vec1 ssa_22 = load_const (0x00000002 /* 0.000000 */) intrinsic store_output (ssa_18, ssa_0) () (0, 15) /* gl_Position */ intrinsic store_output (ssa_12, ssa_0) () (26, 15) /* colors_0 */ intrinsic emit_vertex_with_counter (ssa_22) () (0) vec1 ssa_23 = load_const (0x00000003 /* 0.000000 */) intrinsic store_output (ssa_20, ssa_0) () (0, 15) /* gl_Position */ intrinsic store_output (ssa_12, ssa_0) () (26, 15) /* colors_0 */ intrinsic emit_vertex_with_counter (ssa_23) () (0) vec1 ssa_24 = load_const (0x00000004 /* 0.000000 */) intrinsic end_primitive_with_counter (ssa_24) () (0) intrinsic set_vertex_count (ssa_24) () () /* succs: block_1 */ block block_1: } NIR (final form) for geometry shader: shader: MESA_SHADER_GEOMETRY name: GLSL214 inputs: 0 outputs: 0 uniforms: 0 decl_var uniform INTERP_QUALIFIER_NONE ivec4 ctexoffset (2, 0) decl_var uniform INTERP_QUALIFIER_NONE vec4 clinept (1, 0) decl_var uniform INTERP_QUALIFIER_NONE vec4 cstereo (0, 0) decl_var shader_in INTERP_QUALIFIER_FLAT vec4[1] colors_0 (VARYING_SLOT_VAR0, 26) decl_var shader_in INTERP_QUALIFIER_FLAT vec4[1] pos (VARYING_SLOT_VAR1, 27) decl_var shader_out INTERP_QUALIFIER_NONE vec4 gl_Position (VARYING_SLOT_POS, 0) decl_var shader_out INTERP_QUALIFIER_NONE vec4 colors_0@0 (VARYING_SLOT_VAR0, 26) decl_function main returning void impl main { decl_reg vec2 r0 decl_reg vec4 r1 decl_reg vec4 r2 decl_reg vec4 r3 decl_reg vec4 r4 block block_0: /* preds: */ vec1 ssa_0 = load_const (0x00000000 /* 0.000000 */) vec2 ssa_1 = load_const (0xbf800000 /* -1.000000 */, 0x3f800000 /* 1.000000 */) vec2 ssa_2 = load_const (0x3f800000 /* 1.000000 */, 0xbf800000 /* -1.000000 */) vec1 ssa_3 = load_const (0x00000010 /* 0.000000 */) vec4 ssa_4 = intrinsic load_ubo (ssa_0, ssa_3) () () vec1 ssa_5 = frcp ssa_4 r0.x = fmul ssa_4.w, ssa_5 vec1 ssa_7 = frcp ssa_4.y r0.y = fmul -ssa_4.w, ssa_7.x vec4 ssa_10 = intrinsic load_per_vertex_input (ssa_0, ssa_0) () (27) /* pos */ vec2 ssa_11 = fmul r0, ssa_10.ww vec4 ssa_12 = intrinsic load_per_vertex_input (ssa_0, ssa_0) () (26) /* colors_0 */ r1.xy = fadd ssa_10, -ssa_11 r1.zw = imov ssa_10 r2.xy = ffma ssa_2, ssa_11, ssa_10 r2.zw = imov r1 r3.xy = ffma ssa_1, ssa_11, ssa_10 r3.zw = imov r2 r4.xy = fadd ssa_10, ssa_11 r4.zw = imov r3 intrinsic store_output (r1, ssa_0) () (0, 15) /* gl_Position */ intrinsic store_output (ssa_12, ssa_0) () (26, 15) /* colors_0 */ intrinsic emit_vertex_with_counter (ssa_0) () (0) vec1 ssa_21 = load_const (0x00000001 /* 0.000000 */) intrinsic store_output (r2, ssa_0) () (0, 15) /* gl_Position */ intrinsic store_output (ssa_12, ssa_0) () (26, 15) /* colors_0 */ intrinsic emit_vertex_with_counter (ssa_21) () (0) vec1 ssa_22 = load_const (0x00000002 /* 0.000000 */) intrinsic store_output (r3, ssa_0) () (0, 15) /* gl_Position */ intrinsic store_output (ssa_12, ssa_0) () (26, 15) /* colors_0 */ intrinsic emit_vertex_with_counter (ssa_22) () (0) vec1 ssa_23 = load_const (0x00000003 /* 0.000000 */) intrinsic store_output (r4, ssa_0) () (0, 15) /* gl_Position */ intrinsic store_output (ssa_12, ssa_0) () (26, 15) /* colors_0 */ intrinsic emit_vertex_with_counter (ssa_23) () (0) vec1 ssa_24 = load_const (0x00000004 /* 0.000000 */) intrinsic end_primitive_with_counter (ssa_24) () (0) intrinsic set_vertex_count (ssa_24) () () /* succs: block_1 */ block block_1: } GS Input VUE map (4 slots, non-SSO) [0] VARYING_SLOT_PSIZ [1] VARYING_SLOT_POS [2] VARYING_SLOT_VAR0 [3] VARYING_SLOT_VAR1 GS Output VUE map (3 slots, non-SSO) [0] VARYING_SLOT_PSIZ [1] VARYING_SLOT_POS [2] VARYING_SLOT_VAR0 GS estimated execution time: 138 cycles
Created attachment 121097 [details] [review] change copyprop condition Well, this patch "fixes" it for me, but I don't know if vstride is available at this point in the compiler flow. This makes the condition match what's in convert_to_hw_values though. The trace runs seemingly correctly with this.
Actually I think this patch is a little better: http://patchwork.freedesktop.org/patch/70688/
Hi, what is the status on this bug? Has the patch been reviewed? The intel driver on SNB has been blacklisted in dolphin for this feature. Lift waiting for resolution. Thanks
I'll handle it. It's on my todo list.
I've sent a patch to the mailing list, in reply to Ilia's initial patch.
Thanks, I can confirm the original bug in dolphin has been solved with mesa-master
Thanks for testing! commit 9f2e22bf343b21d6b44e6a502f00a86d169f5ade Author: Matt Turner <mattst88@gmail.com> Date: Sun Jan 17 20:30:14 2016 -0500 i965/vec4: don't copy ATTR into 3src instructions with complex swizzles
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.