Bug 87658

Summary: [llvmpipe] SEGV in sse2_has_daz on ancient Pentium4-M
Product: Mesa Reporter: Chris Paulson-Ellis <chris>
Component: OtherAssignee: mesa-dev
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: david
Version: 10.3   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: 0001-util-improve-fxsave-detection.patch
also_try_this_patch.patch
0001-util-add-check-for-bit_FXSAVE.patch

Description Chris Paulson-Ellis 2014-12-23 23:59:21 UTC
Hi,

I'm trying to run some modern OpenGL 3.3 code on an ancient machine. When I set LIBGL_ALWAYS_SOFTWARE the llvmpipe driver seems to be chosen, but it raises a segv in sse2_has_daz()...

I'm running Fedora mesa-*-10.3.3-1.20141110.fc20.i686


Program received signal SIGSEGV, Segmentation fault.
sse2_has_daz () at util/u_cpu_detect.c:285
285	   __asm __volatile ("fxsave %0" : "+m" (fxarea));
(gdb) bt
#0  sse2_has_daz () at util/u_cpu_detect.c:285
#1  0xb72d4805 in util_cpu_detect () at util/u_cpu_detect.c:378
#2  0xb75c9407 in llvmpipe_create_screen (winsys=winsys@entry=0x8141a98) at lp_screen.c:555
#3  0xb7006b38 in sw_screen_create_named (driver=<optimized out>, winsys=0x8141a98) at ../../../../src/gallium/auxiliary/target-helpers/inline_sw_helper.h:30
#4  sw_screen_create (winsys=0x8141a98) at ../../../../src/gallium/auxiliary/target-helpers/inline_sw_helper.h:57
#5  drisw_create_screen (lf=lf@entry=0xb779d744 <drisw_lf>) at ../../../../src/gallium/auxiliary/target-helpers/inline_sw_helper.h:84
#6  0xb727507e in drisw_init_screen (sPriv=0x812eac8) at drisw.c:363
#7  0xb726ffc6 in driCreateNewScreen2 (scrn=scrn@entry=0, fd=fd@entry=-1, extensions=extensions@entry=0xb7b7d378 <loader_extensions>, 
    driver_extensions=driver_extensions@entry=0xb779d724 <galliumsw_driver_extensions>, driver_configs=driver_configs@entry=0xbfffed30, data=data@entry=0x81311a0) at dri_util.c:159
#8  0xb7270108 in driSWRastCreateNewScreen2 (scrn=0, extensions=0xb7b7d378 <loader_extensions>, driver_extensions=0xb779d724 <galliumsw_driver_extensions>, driver_configs=0xbfffed30, 
    data=0x81311a0) at dri_util.c:214
#9  0xb7b0bc78 in driswCreateScreen (screen=0, priv=0x812ebb0) at drisw_glx.c:686
#10 0xb7ae53f9 in AllocAndFetchScreenConfigs (priv=0x812ebb0, dpy=0x8120a38) at glxext.c:796
#11 __glXInitialize (dpy=dpy@entry=0x8120a38) at glxext.c:902
#12 0xb7ae166e in GetGLXPrivScreenConfig (dpy=dpy@entry=0x8120a38, scrn=scrn@entry=0, ppriv=ppriv@entry=0xbfffedd8, ppsc=ppsc@entry=0xbfffeddc) at glxcmds.c:172
#13 0xb7ae1fe7 in GetGLXPrivScreenConfig (ppsc=0xbfffeddc, ppriv=0xbfffedd8, scrn=0, dpy=0x8120a38) at glxcmds.c:168
#14 glXChooseVisual (dpy=0x8120a38, screen=0, attribList=0xbfffef34) at glxcmds.c:1249
#15 0xb7c47491 in X11_GL_GetVisual (_this=_this@entry=0x8120310, display=display@entry=0x8120a38, screen=screen@entry=0) at /usr/src/debug/SDL2-2.0.3/src/video/x11/SDL_x11opengl.c:525
#16 0xb7c4770e in X11_GL_InitExtensions (_this=0x8120310) at /usr/src/debug/SDL2-2.0.3/src/video/x11/SDL_x11opengl.c:323
#17 X11_GL_LoadLibrary (_this=0x8120310, path=<optimized out>) at /usr/src/debug/SDL2-2.0.3/src/video/x11/SDL_x11opengl.c:218
#18 0xb7c3a16f in SDL_GL_LoadLibrary_REAL (path=path@entry=0x0) at /usr/src/debug/SDL2-2.0.3/src/video/SDL_video.c:2401
#19 0xb7c3bfd8 in SDL_CreateWindow_REAL (title=title@entry=0xb7c612b2 "OpenGL test", x=x@entry=-32, y=y@entry=-32, w=w@entry=32, h=h@entry=32, flags=flags@entry=10)
    at /usr/src/debug/SDL2-2.0.3/src/video/SDL_video.c:1240
#20 0xb7c3bcb1 in ShouldUseTextureFramebuffer () at /usr/src/debug/SDL2-2.0.3/src/video/SDL_video.c:192
#21 SDL_VideoInit_REAL (driver_name=driver_name@entry=0x0) at /usr/src/debug/SDL2-2.0.3/src/video/SDL_video.c:506
#22 0xb7b8aad4 in SDL_InitSubSystem_REAL (flags=16416) at /usr/src/debug/SDL2-2.0.3/src/SDL.c:173
#23 0xb7bbe09d in SDL_Init (a=32) at /usr/src/debug/SDL2-2.0.3/src/dynapi/SDL_dynapi_procs.h:89


$ cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 15
model		: 2
model name	: Mobile Intel(R) Pentium(R) 4 - M CPU 1.80GHz
stepping	: 7
microcode	: 0x39
cpu MHz		: 1200.000
cache size	: 512 KB
fdiv_bug	: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 2
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts cid
bogomips	: 2392.96
clflush size	: 64
cache_alignment	: 128
address sizes	: 36 bits physical, 32 bits virtual
power management:
Comment 1 David Heidelberg (okias) 2014-12-24 18:45:18 UTC
Created attachment 111296 [details] [review]
0001-util-improve-fxsave-detection.patch

This is more improvement than fix.

In case you have GCC >= 4.8, it should also fix issue for you. For older GCCs it won't change anything.
Comment 2 David Heidelberg (okias) 2014-12-24 19:30:09 UTC
Created attachment 111297 [details] [review]
also_try_this_patch.patch

this is way how gcc check for DAZ, so maybe it'll work.

Please try both patches, can't be applied at same time. This patch should work even with older gcc's
Comment 3 Patrick Baggett 2014-12-24 19:35:14 UTC
-#if (defined(PIPE_CC_GCC) || defined(PIPE_CC_SUNPRO))
+#if (defined(PIPE_CC_GCC) && (PIPE_CC_GCC_VERSION >= 408))
+   __builtin_ia32_fxsave(&fxarea);
+#elif (defined(PIPE_CC_GCC) || defined(PIPE_CC_SUNPRO))

Is GCC >= 408 the right value? I only mention this because I didn't see any gcc 4.0.8 release on the GNU site, and there was some mention of GCC 4.8.
Comment 4 David Heidelberg (okias) 2014-12-24 20:12:02 UTC
(In reply to Patrick Baggett from comment #3)
> -#if (defined(PIPE_CC_GCC) || defined(PIPE_CC_SUNPRO))
> +#if (defined(PIPE_CC_GCC) && (PIPE_CC_GCC_VERSION >= 408))
> +   __builtin_ia32_fxsave(&fxarea);
> +#elif (defined(PIPE_CC_GCC) || defined(PIPE_CC_SUNPRO))
> 
> Is GCC >= 408 the right value? I only mention this because I didn't see any
> gcc 4.0.8 release on the GNU site, and there was some mention of GCC 4.8.

src/gallium/include/pipe/p_config.h:#define PIPE_CC_GCC_VERSION (__GNUC__ * 100 + __GNUC_MINOR__).

At first moment I also though it's 4.0.8, but in reality it's 4.8.x.

[1] https://www.ocf.berkeley.edu/~pad/tigcc/doc/html/cpp_SEC15_GNUC.html
Comment 5 Chris Paulson-Ellis 2014-12-27 20:04:00 UTC
(In reply to David Heidelberg (okias) from comment #1)
> 0001-util-improve-fxsave-detection.patch

  CC       util/u_cpu_detect.lo
util/u_cpu_detect.c: In function 'sse2_has_daz':
util/u_cpu_detect.c:285:4: error: implicit declaration of function '__builtin_ia32_fxsave' [-Werror=implicit-function-declaration]
    __builtin_ia32_fxsave(&fxarea);
    ^

$ gcc --version
gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)
Comment 6 David Heidelberg (okias) 2014-12-27 21:34:59 UTC
(In reply to Chris Paulson-Ellis from comment #5)
> (In reply to David Heidelberg (okias) from comment #1)
> > 0001-util-improve-fxsave-detection.patch
> 
>   CC       util/u_cpu_detect.lo
> util/u_cpu_detect.c: In function 'sse2_has_daz':
> util/u_cpu_detect.c:285:4: error: implicit declaration of function
> '__builtin_ia32_fxsave' [-Werror=implicit-function-declaration]
>     __builtin_ia32_fxsave(&fxarea);
>     ^
> 
> $ gcc --version
> gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)

try compile with CFLAGS="-mfxsr", anyway try second patch :)
Comment 7 Chris Paulson-Ellis 2014-12-27 22:23:03 UTC
(In reply to David Heidelberg (okias) from comment #2)
> also_try_this_patch.patch

Program received signal SIGSEGV, Segmentation fault.
sse2_has_daz () at util/u_cpu_detect.c:285
285	   __asm __volatile ("fxsave %0" : "=m" (fxarea) : "m" (fxarea));
Comment 8 Chris Paulson-Ellis 2014-12-28 11:17:46 UTC
(In reply to David Heidelberg (okias) from comment #6)
> > > 0001-util-improve-fxsave-detection.patch
> try compile with CFLAGS="-mfxsr"

Somewhat inevitably, given the result of the second patch...

Program received signal SIGSEGV, Segmentation fault.
sse2_has_daz () at util/u_cpu_detect.c:285
285	   __builtin_ia32_fxsave(&fxarea);
Comment 9 David Heidelberg (okias) 2014-12-28 14:21:18 UTC
I'd say it's kernel (more probably) or gcc bug.

For gcc, try 4.7.x and/or 4.9.x if you can.

For kernel, I think it may be(?) related to [1]- > Paranoid restore. send a SIGSEGV if we fail to restore the state.

[1] http://lkml.iu.edu/hypermail/linux/kernel/0807.3/0474.html

Otherwise, I think I'm out of ideas, maybe someone more experienced could help :)
Comment 10 Chris Paulson-Ellis 2014-12-28 14:42:49 UTC
(In reply to David Heidelberg (okias) from comment #9)
> I'd say it's kernel (more probably) or gcc bug.

For the record...

$ uname -r
3.17.7-200.fc20.i686

$ dmesg | grep gcc
[    0.000000] Linux version 3.17.7-200.fc20.i686 (mockbuild@bkernel02.phx2.fedoraproject.org) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-7) (GCC) ) #1 SMP Wed Dec 17 04:08:31 UTC 2014
Comment 11 ubizjak 2014-12-28 15:25:54 UTC
(In reply to David Heidelberg (okias) from comment #9)
> I'd say it's kernel (more probably) or gcc bug.

Nope. You have to check FXSAVE bit in cpuid.

> For gcc, try 4.7.x and/or 4.9.x if you can.
> 
> For kernel, I think it may be(?) related to [1]- > Paranoid restore. send a
> SIGSEGV if we fail to restore the state.
> 
> [1] http://lkml.iu.edu/hypermail/linux/kernel/0807.3/0474.html
> 
> Otherwise, I think I'm out of ideas, maybe someone more experienced could
> help :)

This is how gcc sets FTZ and DAZ for -ffast-math. Please see how to check for DAZ, and don't forget ((force_align_arg_pointer)) attribute.

--cut here--
#define MXCSR_DAZ (1 << 6)	/* Enable denormals are zero mode */
#define MXCSR_FTZ (1 << 15)	/* Enable flush to zero mode */

#ifndef __x86_64__
/* All 64-bit targets have SSE and DAZ;
   only check them explicitly for 32-bit ones. */
#include "cpuid.h"
#endif

static void __attribute__((constructor))
#ifndef __x86_64__
/* The i386 ABI only requires 4-byte stack alignment, so this is necessary
   to make sure the fxsave struct gets correct alignment.
   See PR27537 and PR28621.  */
__attribute__ ((force_align_arg_pointer))
#endif
set_fast_math (void)
{
#ifndef __x86_64__
  unsigned int eax, ebx, ecx, edx;

  if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx))
    return;

  if (edx & bit_SSE)
    {
      unsigned int mxcsr;
  
      if (edx & bit_FXSAVE)
	{
	  /* Check if DAZ is available.  */
	  struct
	    {
	      unsigned short cwd;
	      unsigned short swd;
	      unsigned short twd;
	      unsigned short fop;
	      unsigned int fip;
	      unsigned int fcs;
	      unsigned int foo;
	      unsigned int fos;
	      unsigned int mxcsr;
	      unsigned int mxcsr_mask;
	      unsigned int st_space[32];
	      unsigned int xmm_space[32];
	      unsigned int padding[56];
	    } __attribute__ ((aligned (16))) fxsave;

	  /* This is necessary since some implementations of FXSAVE
	     do not modify reserved areas within the image.  */
	  fxsave.mxcsr_mask = 0;

	  __builtin_ia32_fxsave (&fxsave);

	  mxcsr = fxsave.mxcsr;

	  if (fxsave.mxcsr_mask & MXCSR_DAZ)
	    mxcsr |= MXCSR_DAZ;
	}
      else
	mxcsr = __builtin_ia32_stmxcsr ();

      mxcsr |= MXCSR_FTZ;
      __builtin_ia32_ldmxcsr (mxcsr);
    }
#else
  unsigned int mxcsr = __builtin_ia32_stmxcsr ();
  mxcsr |= MXCSR_DAZ | MXCSR_FTZ;
  __builtin_ia32_ldmxcsr (mxcsr);
#endif
}
--cut here--
Comment 12 David Heidelberg (okias) 2014-12-28 16:33:29 UTC
(In reply to ubizjak from comment #11)
> (In reply to David Heidelberg (okias) from comment #9)
> > I'd say it's kernel (more probably) or gcc bug.
> 
> Nope. You have to check FXSAVE bit in cpuid.

Interesting, fxsave should be supported since Pentium II, 100% on Pentium III + All CPU's with SSE [1].

It must be really unusual cpu, when fxsave support isn't there.

[1] http://forum.osdev.org/viewtopic.php?f=1&t=10743
Comment 13 David Heidelberg (okias) 2014-12-28 16:55:46 UTC
Created attachment 111431 [details] [review]
0001-util-add-check-for-bit_FXSAVE.patch

In case your CPU doesn't have FXSAVE support, this patch should 100% help.
Comment 14 Chris Paulson-Ellis 2014-12-28 20:49:40 UTC
(In reply to David Heidelberg (okias) from comment #13)
> Created attachment 111431 [details] [review] [review]
> 0001-util-add-check-for-bit_FXSAVE.patch
> 
> In case your CPU doesn't have FXSAVE support, this patch should 100% help.

Hmmm - the bit you tested for seems to be present...

(gdb) p /x regs
$9 = {0x2, 0x756e6547, 0x6c65746e, 0x49656e69}
(gdb) s
377	         util_cpu_caps.has_daz = util_cpu_caps.has_sse3 ||
(gdb) 
378	            (util_cpu_caps.has_sse2 &&
(gdb) 
379	             ((regs2[3] >> 24) & 1 /* bit_FXSAVE */ ) && sse2_has_daz());
(gdb) 
sse2_has_daz () at util/u_cpu_detect.c:276
276	{
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
sse2_has_daz () at util/u_cpu_detect.c:285
285	   __builtin_ia32_fxsave(&fxarea);
Comment 15 Chris Paulson-Ellis 2014-12-28 21:36:55 UTC
Oops - printed the wrong array...

(gdb) p /x regs2
$11 = {<optimized out>, 0x1080e, 0x4400, 0xbfebf9ff}
Comment 16 Patrick Baggett 2014-12-28 22:04:43 UTC
That is strange...I would have expected an "illegal instruction" (SIGILL) if the CPU simply did not support this, but this is SIGSEGV. Is the 'fxsave' struct really aligned properly? It looks like should be...but it is still giving SIGSEGV. Can you print the 'fxsave' address when it crashes?
Comment 17 ubizjak 2014-12-29 11:35:08 UTC
(In reply to Patrick Baggett from comment #16)
> That is strange...I would have expected an "illegal instruction" (SIGILL) if
> the CPU simply did not support this, but this is SIGSEGV. Is the 'fxsave'
> struct really aligned properly? It looks like should be...but it is still
> giving SIGSEGV. Can you print the 'fxsave' address when it crashes?

As said above, with 32bit gcc, you need to decorate the function with

'force_align_arg_pointer'
     On the Intel x86, the 'force_align_arg_pointer' attribute may be
     applied to individual function definitions, generating an alternate
     prologue and epilogue that realigns the run-time stack if
     necessary.  This supports mixing legacy codes that run with a
     4-byte aligned stack with modern codes that keep a 16-byte stack
     for SSE compatibility.

"New" x86_32 ABI mandates 16-byte stack alignment, and the above attribute forces realignment of the incoming stack pointer. Maybe there is a hand-crafted assembly code in the MESA source, libc or somewhere else (that conforms to "old" x86_32 ABI with 4-byte stack alignment) that misaligns the SP.

Alternatively, you can use posix_memalign, aligned_alloc or similar function to allocate aligned memory.
Comment 18 Patrick Baggett 2014-12-29 15:10:44 UTC
Right, I thought Chris Paulson-Ellis tried your patch which includes this alignment, but it still didn't work due to alignment. I think we're both on the same page though.
Comment 19 Roland Scheidegger 2014-12-29 22:34:19 UTC
Maybe using
PIPE_ALIGN_STACK static INLINE boolean sse2_has_daz(void)
would do the trick. I guess the PIPE_ALIGN_VAR on a stack'ed variable isn't working otherwise. Not sure though this works on inlined functions?
(Rant mode on: I _really_ hate detection of daz flag on x86-32. Most complicated way to do simple feature detection ever. What the hell were intel thinking???)
Otherwise if this still doesn't work could do as ubizjak suggested could use malloc'ed memory I guess.
Checking for fxsave support should not be necessary, since we call this only when we have sse2 in the first place, which should be impossible without fxsave.
Comment 20 Chris Paulson-Ellis 2014-12-30 17:56:57 UTC
(In reply to Roland Scheidegger from comment #19)
> Maybe using
> PIPE_ALIGN_STACK static INLINE boolean sse2_has_daz(void)
> would do the trick. I guess the PIPE_ALIGN_VAR on a stack'ed variable isn't
> working otherwise. Not sure though this works on inlined functions?

This seems to work. Without PIPE_ALIGN_STACK, fxarea has an 8-byte aligned address, but with PIPE_ALIGN_STACK it is 16-byte aligned and the SEGV goes away.

If preparing a final patch, remember that I had to add -mfxsr to CFLAGS to get the patch using __builtin_ia32_fxsave to compile.

Thanks,
Chris.
Comment 21 Roland Scheidegger 2014-12-31 16:30:22 UTC
(In reply to Chris Paulson-Ellis from comment #20)
> (In reply to Roland Scheidegger from comment #19)
> > Maybe using
> > PIPE_ALIGN_STACK static INLINE boolean sse2_has_daz(void)
> > would do the trick. I guess the PIPE_ALIGN_VAR on a stack'ed variable isn't
> > working otherwise. Not sure though this works on inlined functions?
> 
> This seems to work. Without PIPE_ALIGN_STACK, fxarea has an 8-byte aligned
> address, but with PIPE_ALIGN_STACK it is 16-byte aligned and the SEGV goes
> away.
> 
> If preparing a final patch, remember that I had to add -mfxsr to CFLAGS to
> get the patch using __builtin_ia32_fxsave to compile.
> 
> Thanks,
> Chris.

I don't feel like messing with compile flags for now, so I'm not going to change anything not strictly required to fix this bug.
Comment 22 Roland Scheidegger 2015-01-05 17:00:49 UTC
Fixed by b59c7ed0ab1ac5b6d9f8d409f1a90401ab7775b6.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.