Bug 58718

Summary: Crash in src_register() during glClear() call
Product: Mesa Reporter: Sergey Anikin <s_anikin>
Component: OtherAssignee: mesa-dev
Status: RESOLVED NOTOURBUG QA Contact:
Severity: critical    
Priority: medium CC: keith.kriewall
Version: 9.0   
Hardware: x86 (IA32)   
OS: Windows (All)   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=58716
Whiteboard:
i915 platform: i915 features:
Attachments: framebuffer2.trace
fdo58718.zip

Description Sergey Anikin 2012-12-24 13:12:53 UTC
Mesa3d 9.0 crashes in Gallium software rasterizer (gallium/targets/libgl-gdi) + LLVM 3.1 compiled with MS VC++ 2010 for Win32 platform during glClear() call.

OS: Windows 7 64-bit.

The stack dump as reported by MS VC++ 2010 debugger:

	src_register(st_translate * t=0x0046b498, gl_register_file file=52787676, int index=638833664) Line 233	C
 	translate_src(st_translate * t=0x0046b498, const prog_src_register * SrcReg=0x0326ad8c) Line 348	C
 	compile_instruction(st_translate * t=0x0046b498, const prog_instruction * inst=0x0326ad88, unsigned char clamp_dst_color_output=0) Line 698	C
 	st_translate_mesa_program(gl_context * ctx=0x006f0ec0, unsigned int procType=4633752, ureg_program * ureg=0x037353d0, const gl_program * program=0x032579dc, unsigned int numInputs=3, const unsigned int * inputMapping=0x032579dc, const unsigned char * inputSemanticName=0x00000000, const unsigned char * inputSemanticIndex=0x00000000, const unsigned int * interpMode=0x00000000, unsigned int numOutputs=3, const unsigned int * outputMapping=0x03257ae4, const unsigned char * outputSemanticName=0x03257b70, const unsigned char * outputSemanticIndex=0x03257b93, unsigned char passthrough_edgeflags=0, unsigned char clamp_color=0) Line 1250	C
 	st_get_vp_variant(st_context * st=0x03390970, st_vertex_program * stvp=0x2613d400, const st_vp_variant_key * key=0x0046c520) Line 430	C
 	update_vp(st_context * st=0x03390970) Line 154	C
 	st_validate_state(st_context * st=0x03390970) Line 223	C
 	st_Clear(gl_context * ctx=0x006f0ec0, unsigned int mask=256) Line 459	C
 	_mesa_Clear(unsigned int mask=16640) Line 233	C
 	execute_list(gl_context * ctx=0x006f0ec0, unsigned int list=1) Line 7773	C
 	_mesa_BindTexture(unsigned int target=1, unsigned int texName=4652224) Line 1250	C
 	GLWidget::paintGL() Line 181	C++

The sample application to reproduce the problem is attached to https://bugs.freedesktop.org/show_bug.cgi?id=58716.

The strange point is that Mesa tries to do something related to vertex programs, while the sample application does not use shaders at all.
Comment 1 Keith Kriewall 2013-03-07 19:16:27 UTC
I've run into this with Mesa 9.0.2 and LLVM 3.2.  It appears that the bitfields in the prog_src_register struct (/src/mesa/program/prog_instruction.h) do not translate correctly to the src_register() parameter types.  A workaround is to remove the bitfield (:<size>) specifications in the prog_src_register struct.

In translate_src(), the following call ..

   struct ureg_src src = src_register( t, SrcReg->File, SrcReg->Index );

.. with the following data ..

   t       0x0458d958 {ureg=0x03c51258 temps=0x0458d95c constants=0x03c5c3a8 ...} st_translate *
   SrcReg  0x03c597bc {File=0x00000001 Index=0x00000400 Swizzle=0x00000249 ...} const prog_src_register *
      File  0x00000001	unsigned int
      Index 0x00000400	int

.. results in the following parameters to src_register():

>	opengl32.dll!src_register(st_translate * t=0x0458d958, gl_register_file file=0x03c7dd54, int index=0xf78cdac8)  Line 227 + 0x11 bytes	C

You can see that 't' is passed correctly, but the bitfields for SrcReg->File and SrcReg->Index have not.

I tried compiling LLVM and Mesa with all optimizations turned off, with no change.  (The debug build does not fail in this way, only the release build.)
Comment 2 Brian Paul 2013-03-07 19:37:27 UTC
Using gcc or clang?  Which version?
Comment 3 Keith Kriewall 2013-03-07 21:22:02 UTC
Neither, I'm building both with Visual Studio 2010.  LLVM has a project/solution as generated by CMake, and Mesa builds via scons in a Command Prompt environment configured as follows:
====
F:\proj\Mesa-9.0.2>path
PATH=f:\mingw\msys\1.0\bin;c:\python27_32bit;c:\python27_32bit\Scripts;c:\window
s;c:\windows\system32

F:\proj\Mesa-9.0.2>set V
VS100COMNTOOLS=C:\Progra~2\Micros~2.0\Common7\Tools\
====
Comment 4 Jose Fonseca 2013-03-08 22:37:40 UTC
Created attachment 76199 [details]
framebuffer2.trace

The attached trace allows me reproduce the issue with a release/profile MSVC build of mesa.

Still not clear what's happening.
Comment 5 Jose Fonseca 2013-03-09 21:31:21 UTC
Created attachment 76244 [details]
fdo58718.zip

It looks like MSVC is generating invalid code somehow.  I've attached a self-contained test program that shows this.

$ cat fdo58718.txt 
fdo58718.c
File = 1
Index = 1024 (expected 0)
Swizzle = 0x249
Negate = 0x0
Abs = 0
RelAddr = 0
Comment 6 Jose Fonseca 2013-03-09 22:34:09 UTC
I've been fiddling with this, and I only see two things that workaround this MSVC bug:
- disable all optmizations (/O0 isntead of /O2 or /O1)
- force all bit fields from struct prog_src_register to have the same signness (either all unsigned, or all signed), ie, the problem here seems to be the intermixing

Still, this is a scary bug. We use this all over the place.

BTW, it also fails with MSVS 11 Beta.
Comment 7 Keith Kriewall 2013-03-11 17:10:37 UTC
In case it helps, it appears that MSVC always treats enum values as signed int.  E.g. see:
  http://compgroups.net/comp.lang.c++/problem-with-visual-c++-7.1.3088-and-bit-fields/1013665

GCC appears to use unsigned int if no enum values are negative.
  http://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Structures-unions-enumerations-and-bit_002dfields-implementation.html

The implication is that bit-fields may be a bit short if specified as GLuint.
Comment 8 Roland Scheidegger 2013-03-11 17:49:05 UTC
(In reply to comment #7)
> In case it helps, it appears that MSVC always treats enum values as signed
> int.  E.g. see:
>  
> http://compgroups.net/comp.lang.c++/problem-with-visual-c++-7.1.3088-and-bit-
> fields/1013665
> 
> GCC appears to use unsigned int if no enum values are negative.
>  
> http://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Structures-unions-enumerations-
> and-bit_002dfields-implementation.html
> 
> The implication is that bit-fields may be a bit short if specified as GLuint.

I think I'm missing how that could cause the bug we're seeing here.
FWIW struct ureg_dst actually looks definitely buggy to me, since IndirectSwizzle should be unsigned, not int, but I just noticed that now by accident and it's not an issue with ureg_src.
Comment 9 Keith Kriewall 2013-03-11 18:29:44 UTC
Sorry, I didn't mean to imply that the signed issue is causing this problem.  I've tried increasing the 'File' bit field size by one, and it made no obvious difference.  I just wanted to note the difference between the compilers, in case it indirectly pertains to the problem.
Comment 10 Brian Paul 2013-03-11 18:43:52 UTC
Can you try reordering the fields such that all the signed fields come before the unsigned fields (or vice versa)?  I was tinkering with a test program and that seemed to help, but I haven't done extensive testing.
Comment 11 Keith Kriewall 2013-03-11 23:06:12 UTC
I just tried that (signed fields ahead of unsigned) and it didn't help in this case.  The modified struct began as:

struct prog_src_register
{
   GLint Index:(INST_INDEX_BITS+1); /**< Extra bit here for sign bit.
                                     * May be negative for relative addressing.
                                     */
   GLint Index2:(INST_INDEX_BITS+1); /**< Extra bit here for sign bit.
                                       * May be negative for relative
                                       * addressing. */
   GLuint File:4;	/**< One of the PROGRAM_* register file values. */
   GLuint Swizzle:12;
. . .
}
Comment 12 Jose Fonseca 2013-03-12 08:39:20 UTC
It looks like Visual Studio 2012 (must be the final version, and not the earlier preview) compiles the fd58718.zip test case correctly.  I don't know if it fully fixes all bit field usage in Mesa, or just this particular one, but I'd recommend you try it rebuilding with 2012 if you can, and post your results.
Comment 13 Keith Kriewall 2013-03-12 17:18:03 UTC
VS 2012 refuses to compile Mesa due to macro definition of "inline".

====
F:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\INCLUDE\xkeycheck.h(199): warning C4005: 'inline' : macro redefinition
        command-line arguments :  see previous definition of 'inline'
F:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\INCLUDE\xkeycheck.h(242): fatal error C1189: #error :  The C++ Standard Library forbids macroizing keywords. Enable warning C4005 to find the forbidden macro.
scons: *** [build\windows-x86\mesa\main\ff_fragment_shader.obj] Error 2
====

I've tried downloading 9.0.3 and 9.1, no change.  I had expected to see inline/INLINE changes in the source per the following ..
  http://lists.freedesktop.org/archives/mesa-dev/2011-July/009586.html
.. but the code appears unchanged.  I've tried commenting out all '#define inline' code (4 Mesa headers with duplicate code), but I still get the error.  LLVM doesn't define 'inline' that I can see.  So at this point, I don't know from where VS 2012 is getting the definition.  Any suggestions would be appreciated.
Comment 14 Jose Fonseca 2013-03-12 20:46:49 UTC
(In reply to comment #13)
> VS 2012 refuses to compile Mesa due to macro definition of "inline".
> 
> I've tried downloading 9.0.3 and 9.1, no change.  I had expected to see
> inline/INLINE changes in the source per the following ..
>   http://lists.freedesktop.org/archives/mesa-dev/2011-July/009586.html
> .. but the code appears unchanged.  I've tried commenting out all '#define
> inline' code (4 Mesa headers with duplicate code), but I still get the
> error.  LLVM doesn't define 'inline' that I can see.  So at this point, I
> don't know from where VS 2012 is getting the definition.  Any suggestions
> would be appreciated.

I posted a series of patches to mesa3d-dev which seems to fix the "inline" issue.
Comment 15 Jose Fonseca 2013-03-13 14:57:32 UTC
(In reply to comment #14)
> I posted a series of patches to mesa3d-dev which seems to fix the "inline"
> issue.

I pushed these now, the most important being

commit 57cd1d1454653f778837eec0ee5d4060bc59c5ba
Author: José Fonseca <jfonseca@vmware.com>
Date:   Tue Mar 12 20:37:47 2013 +0000

    include: Fix build with VS 11 (i.e, 2012).
    
    NOTE: Candidate for the stable branches.
    
    Reviewed-by: Brian Paul <brianp@vmware.com>

After rebuilding Mesa with VS 2012 the framebuffer2.trace no longer crashes.
Comment 16 Jose Fonseca 2013-03-15 19:57:43 UTC
Given that this works with MSVS 2012 I see no point in trying to workaround with older versions, given it's so difficult.

I've crossported the MSVS 2012 build fixes to 9.1 stable branch.

So I'm marking this as solved now.

Thanks.
Comment 17 Keith Kriewall 2013-05-08 18:30:13 UTC
I posted this issue to Microsoft and they have released a hotfix for Visual Studio 2010 SP1 to correct it:
  http://support.microsoft.com/kb/2836024
Comment 18 Sergey Anikin 2013-05-12 14:29:30 UTC
(In reply to comment #17)
> I posted this issue to Microsoft and they have released a hotfix for Visual
> Studio 2010 SP1 to correct it:
>   http://support.microsoft.com/kb/2836024

As soon as you try it (and in case if you try it!), could you please reply here if Mesa 9.x built with the patched MS VC++ 2010 works or not?
Comment 19 Keith Kriewall 2013-05-13 15:30:24 UTC
Yes, I've built Mesa 9.0.2 / LLVM 3.2 with the compiler hotfix and it works.  The test application also works.
Comment 20 Sergey Anikin 2013-05-13 15:51:01 UTC
(In reply to comment #19)
> Yes, I've built Mesa 9.0.2 / LLVM 3.2 with the compiler hotfix and it works.
> The test application also works.

Thanks a lot for the confirmation, Keith!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.