Hi, I'm trying to run some modern OpenGL 3.3 code on an ancient machine. When I set LIBGL_ALWAYS_SOFTWARE the llvmpipe driver seems to be chosen, but it raises a segv in sse2_has_daz()... I'm running Fedora mesa-*-10.3.3-1.20141110.fc20.i686 Program received signal SIGSEGV, Segmentation fault. sse2_has_daz () at util/u_cpu_detect.c:285 285 __asm __volatile ("fxsave %0" : "+m" (fxarea)); (gdb) bt #0 sse2_has_daz () at util/u_cpu_detect.c:285 #1 0xb72d4805 in util_cpu_detect () at util/u_cpu_detect.c:378 #2 0xb75c9407 in llvmpipe_create_screen (winsys=winsys@entry=0x8141a98) at lp_screen.c:555 #3 0xb7006b38 in sw_screen_create_named (driver=<optimized out>, winsys=0x8141a98) at ../../../../src/gallium/auxiliary/target-helpers/inline_sw_helper.h:30 #4 sw_screen_create (winsys=0x8141a98) at ../../../../src/gallium/auxiliary/target-helpers/inline_sw_helper.h:57 #5 drisw_create_screen (lf=lf@entry=0xb779d744 <drisw_lf>) at ../../../../src/gallium/auxiliary/target-helpers/inline_sw_helper.h:84 #6 0xb727507e in drisw_init_screen (sPriv=0x812eac8) at drisw.c:363 #7 0xb726ffc6 in driCreateNewScreen2 (scrn=scrn@entry=0, fd=fd@entry=-1, extensions=extensions@entry=0xb7b7d378 <loader_extensions>, driver_extensions=driver_extensions@entry=0xb779d724 <galliumsw_driver_extensions>, driver_configs=driver_configs@entry=0xbfffed30, data=data@entry=0x81311a0) at dri_util.c:159 #8 0xb7270108 in driSWRastCreateNewScreen2 (scrn=0, extensions=0xb7b7d378 <loader_extensions>, driver_extensions=0xb779d724 <galliumsw_driver_extensions>, driver_configs=0xbfffed30, data=0x81311a0) at dri_util.c:214 #9 0xb7b0bc78 in driswCreateScreen (screen=0, priv=0x812ebb0) at drisw_glx.c:686 #10 0xb7ae53f9 in AllocAndFetchScreenConfigs (priv=0x812ebb0, dpy=0x8120a38) at glxext.c:796 #11 __glXInitialize (dpy=dpy@entry=0x8120a38) at glxext.c:902 #12 0xb7ae166e in GetGLXPrivScreenConfig (dpy=dpy@entry=0x8120a38, scrn=scrn@entry=0, ppriv=ppriv@entry=0xbfffedd8, ppsc=ppsc@entry=0xbfffeddc) at glxcmds.c:172 #13 0xb7ae1fe7 in GetGLXPrivScreenConfig (ppsc=0xbfffeddc, ppriv=0xbfffedd8, scrn=0, dpy=0x8120a38) at glxcmds.c:168 #14 glXChooseVisual (dpy=0x8120a38, screen=0, attribList=0xbfffef34) at glxcmds.c:1249 #15 0xb7c47491 in X11_GL_GetVisual (_this=_this@entry=0x8120310, display=display@entry=0x8120a38, screen=screen@entry=0) at /usr/src/debug/SDL2-2.0.3/src/video/x11/SDL_x11opengl.c:525 #16 0xb7c4770e in X11_GL_InitExtensions (_this=0x8120310) at /usr/src/debug/SDL2-2.0.3/src/video/x11/SDL_x11opengl.c:323 #17 X11_GL_LoadLibrary (_this=0x8120310, path=<optimized out>) at /usr/src/debug/SDL2-2.0.3/src/video/x11/SDL_x11opengl.c:218 #18 0xb7c3a16f in SDL_GL_LoadLibrary_REAL (path=path@entry=0x0) at /usr/src/debug/SDL2-2.0.3/src/video/SDL_video.c:2401 #19 0xb7c3bfd8 in SDL_CreateWindow_REAL (title=title@entry=0xb7c612b2 "OpenGL test", x=x@entry=-32, y=y@entry=-32, w=w@entry=32, h=h@entry=32, flags=flags@entry=10) at /usr/src/debug/SDL2-2.0.3/src/video/SDL_video.c:1240 #20 0xb7c3bcb1 in ShouldUseTextureFramebuffer () at /usr/src/debug/SDL2-2.0.3/src/video/SDL_video.c:192 #21 SDL_VideoInit_REAL (driver_name=driver_name@entry=0x0) at /usr/src/debug/SDL2-2.0.3/src/video/SDL_video.c:506 #22 0xb7b8aad4 in SDL_InitSubSystem_REAL (flags=16416) at /usr/src/debug/SDL2-2.0.3/src/SDL.c:173 #23 0xb7bbe09d in SDL_Init (a=32) at /usr/src/debug/SDL2-2.0.3/src/dynapi/SDL_dynapi_procs.h:89 $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Mobile Intel(R) Pentium(R) 4 - M CPU 1.80GHz stepping : 7 microcode : 0x39 cpu MHz : 1200.000 cache size : 512 KB fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts cid bogomips : 2392.96 clflush size : 64 cache_alignment : 128 address sizes : 36 bits physical, 32 bits virtual power management:
Created attachment 111296 [details] [review] 0001-util-improve-fxsave-detection.patch This is more improvement than fix. In case you have GCC >= 4.8, it should also fix issue for you. For older GCCs it won't change anything.
Created attachment 111297 [details] [review] also_try_this_patch.patch this is way how gcc check for DAZ, so maybe it'll work. Please try both patches, can't be applied at same time. This patch should work even with older gcc's
-#if (defined(PIPE_CC_GCC) || defined(PIPE_CC_SUNPRO)) +#if (defined(PIPE_CC_GCC) && (PIPE_CC_GCC_VERSION >= 408)) + __builtin_ia32_fxsave(&fxarea); +#elif (defined(PIPE_CC_GCC) || defined(PIPE_CC_SUNPRO)) Is GCC >= 408 the right value? I only mention this because I didn't see any gcc 4.0.8 release on the GNU site, and there was some mention of GCC 4.8.
(In reply to Patrick Baggett from comment #3) > -#if (defined(PIPE_CC_GCC) || defined(PIPE_CC_SUNPRO)) > +#if (defined(PIPE_CC_GCC) && (PIPE_CC_GCC_VERSION >= 408)) > + __builtin_ia32_fxsave(&fxarea); > +#elif (defined(PIPE_CC_GCC) || defined(PIPE_CC_SUNPRO)) > > Is GCC >= 408 the right value? I only mention this because I didn't see any > gcc 4.0.8 release on the GNU site, and there was some mention of GCC 4.8. src/gallium/include/pipe/p_config.h:#define PIPE_CC_GCC_VERSION (__GNUC__ * 100 + __GNUC_MINOR__). At first moment I also though it's 4.0.8, but in reality it's 4.8.x. [1] https://www.ocf.berkeley.edu/~pad/tigcc/doc/html/cpp_SEC15_GNUC.html
(In reply to David Heidelberg (okias) from comment #1) > 0001-util-improve-fxsave-detection.patch CC util/u_cpu_detect.lo util/u_cpu_detect.c: In function 'sse2_has_daz': util/u_cpu_detect.c:285:4: error: implicit declaration of function '__builtin_ia32_fxsave' [-Werror=implicit-function-declaration] __builtin_ia32_fxsave(&fxarea); ^ $ gcc --version gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)
(In reply to Chris Paulson-Ellis from comment #5) > (In reply to David Heidelberg (okias) from comment #1) > > 0001-util-improve-fxsave-detection.patch > > CC util/u_cpu_detect.lo > util/u_cpu_detect.c: In function 'sse2_has_daz': > util/u_cpu_detect.c:285:4: error: implicit declaration of function > '__builtin_ia32_fxsave' [-Werror=implicit-function-declaration] > __builtin_ia32_fxsave(&fxarea); > ^ > > $ gcc --version > gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7) try compile with CFLAGS="-mfxsr", anyway try second patch :)
(In reply to David Heidelberg (okias) from comment #2) > also_try_this_patch.patch Program received signal SIGSEGV, Segmentation fault. sse2_has_daz () at util/u_cpu_detect.c:285 285 __asm __volatile ("fxsave %0" : "=m" (fxarea) : "m" (fxarea));
(In reply to David Heidelberg (okias) from comment #6) > > > 0001-util-improve-fxsave-detection.patch > try compile with CFLAGS="-mfxsr" Somewhat inevitably, given the result of the second patch... Program received signal SIGSEGV, Segmentation fault. sse2_has_daz () at util/u_cpu_detect.c:285 285 __builtin_ia32_fxsave(&fxarea);
I'd say it's kernel (more probably) or gcc bug. For gcc, try 4.7.x and/or 4.9.x if you can. For kernel, I think it may be(?) related to [1]- > Paranoid restore. send a SIGSEGV if we fail to restore the state. [1] http://lkml.iu.edu/hypermail/linux/kernel/0807.3/0474.html Otherwise, I think I'm out of ideas, maybe someone more experienced could help :)
(In reply to David Heidelberg (okias) from comment #9) > I'd say it's kernel (more probably) or gcc bug. For the record... $ uname -r 3.17.7-200.fc20.i686 $ dmesg | grep gcc [ 0.000000] Linux version 3.17.7-200.fc20.i686 (mockbuild@bkernel02.phx2.fedoraproject.org) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-7) (GCC) ) #1 SMP Wed Dec 17 04:08:31 UTC 2014
(In reply to David Heidelberg (okias) from comment #9) > I'd say it's kernel (more probably) or gcc bug. Nope. You have to check FXSAVE bit in cpuid. > For gcc, try 4.7.x and/or 4.9.x if you can. > > For kernel, I think it may be(?) related to [1]- > Paranoid restore. send a > SIGSEGV if we fail to restore the state. > > [1] http://lkml.iu.edu/hypermail/linux/kernel/0807.3/0474.html > > Otherwise, I think I'm out of ideas, maybe someone more experienced could > help :) This is how gcc sets FTZ and DAZ for -ffast-math. Please see how to check for DAZ, and don't forget ((force_align_arg_pointer)) attribute. --cut here-- #define MXCSR_DAZ (1 << 6) /* Enable denormals are zero mode */ #define MXCSR_FTZ (1 << 15) /* Enable flush to zero mode */ #ifndef __x86_64__ /* All 64-bit targets have SSE and DAZ; only check them explicitly for 32-bit ones. */ #include "cpuid.h" #endif static void __attribute__((constructor)) #ifndef __x86_64__ /* The i386 ABI only requires 4-byte stack alignment, so this is necessary to make sure the fxsave struct gets correct alignment. See PR27537 and PR28621. */ __attribute__ ((force_align_arg_pointer)) #endif set_fast_math (void) { #ifndef __x86_64__ unsigned int eax, ebx, ecx, edx; if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx)) return; if (edx & bit_SSE) { unsigned int mxcsr; if (edx & bit_FXSAVE) { /* Check if DAZ is available. */ struct { unsigned short cwd; unsigned short swd; unsigned short twd; unsigned short fop; unsigned int fip; unsigned int fcs; unsigned int foo; unsigned int fos; unsigned int mxcsr; unsigned int mxcsr_mask; unsigned int st_space[32]; unsigned int xmm_space[32]; unsigned int padding[56]; } __attribute__ ((aligned (16))) fxsave; /* This is necessary since some implementations of FXSAVE do not modify reserved areas within the image. */ fxsave.mxcsr_mask = 0; __builtin_ia32_fxsave (&fxsave); mxcsr = fxsave.mxcsr; if (fxsave.mxcsr_mask & MXCSR_DAZ) mxcsr |= MXCSR_DAZ; } else mxcsr = __builtin_ia32_stmxcsr (); mxcsr |= MXCSR_FTZ; __builtin_ia32_ldmxcsr (mxcsr); } #else unsigned int mxcsr = __builtin_ia32_stmxcsr (); mxcsr |= MXCSR_DAZ | MXCSR_FTZ; __builtin_ia32_ldmxcsr (mxcsr); #endif } --cut here--
(In reply to ubizjak from comment #11) > (In reply to David Heidelberg (okias) from comment #9) > > I'd say it's kernel (more probably) or gcc bug. > > Nope. You have to check FXSAVE bit in cpuid. Interesting, fxsave should be supported since Pentium II, 100% on Pentium III + All CPU's with SSE [1]. It must be really unusual cpu, when fxsave support isn't there. [1] http://forum.osdev.org/viewtopic.php?f=1&t=10743
Created attachment 111431 [details] [review] 0001-util-add-check-for-bit_FXSAVE.patch In case your CPU doesn't have FXSAVE support, this patch should 100% help.
(In reply to David Heidelberg (okias) from comment #13) > Created attachment 111431 [details] [review] [review] > 0001-util-add-check-for-bit_FXSAVE.patch > > In case your CPU doesn't have FXSAVE support, this patch should 100% help. Hmmm - the bit you tested for seems to be present... (gdb) p /x regs $9 = {0x2, 0x756e6547, 0x6c65746e, 0x49656e69} (gdb) s 377 util_cpu_caps.has_daz = util_cpu_caps.has_sse3 || (gdb) 378 (util_cpu_caps.has_sse2 && (gdb) 379 ((regs2[3] >> 24) & 1 /* bit_FXSAVE */ ) && sse2_has_daz()); (gdb) sse2_has_daz () at util/u_cpu_detect.c:276 276 { (gdb) c Continuing. Program received signal SIGSEGV, Segmentation fault. sse2_has_daz () at util/u_cpu_detect.c:285 285 __builtin_ia32_fxsave(&fxarea);
Oops - printed the wrong array... (gdb) p /x regs2 $11 = {<optimized out>, 0x1080e, 0x4400, 0xbfebf9ff}
That is strange...I would have expected an "illegal instruction" (SIGILL) if the CPU simply did not support this, but this is SIGSEGV. Is the 'fxsave' struct really aligned properly? It looks like should be...but it is still giving SIGSEGV. Can you print the 'fxsave' address when it crashes?
(In reply to Patrick Baggett from comment #16) > That is strange...I would have expected an "illegal instruction" (SIGILL) if > the CPU simply did not support this, but this is SIGSEGV. Is the 'fxsave' > struct really aligned properly? It looks like should be...but it is still > giving SIGSEGV. Can you print the 'fxsave' address when it crashes? As said above, with 32bit gcc, you need to decorate the function with 'force_align_arg_pointer' On the Intel x86, the 'force_align_arg_pointer' attribute may be applied to individual function definitions, generating an alternate prologue and epilogue that realigns the run-time stack if necessary. This supports mixing legacy codes that run with a 4-byte aligned stack with modern codes that keep a 16-byte stack for SSE compatibility. "New" x86_32 ABI mandates 16-byte stack alignment, and the above attribute forces realignment of the incoming stack pointer. Maybe there is a hand-crafted assembly code in the MESA source, libc or somewhere else (that conforms to "old" x86_32 ABI with 4-byte stack alignment) that misaligns the SP. Alternatively, you can use posix_memalign, aligned_alloc or similar function to allocate aligned memory.
Right, I thought Chris Paulson-Ellis tried your patch which includes this alignment, but it still didn't work due to alignment. I think we're both on the same page though.
Maybe using PIPE_ALIGN_STACK static INLINE boolean sse2_has_daz(void) would do the trick. I guess the PIPE_ALIGN_VAR on a stack'ed variable isn't working otherwise. Not sure though this works on inlined functions? (Rant mode on: I _really_ hate detection of daz flag on x86-32. Most complicated way to do simple feature detection ever. What the hell were intel thinking???) Otherwise if this still doesn't work could do as ubizjak suggested could use malloc'ed memory I guess. Checking for fxsave support should not be necessary, since we call this only when we have sse2 in the first place, which should be impossible without fxsave.
(In reply to Roland Scheidegger from comment #19) > Maybe using > PIPE_ALIGN_STACK static INLINE boolean sse2_has_daz(void) > would do the trick. I guess the PIPE_ALIGN_VAR on a stack'ed variable isn't > working otherwise. Not sure though this works on inlined functions? This seems to work. Without PIPE_ALIGN_STACK, fxarea has an 8-byte aligned address, but with PIPE_ALIGN_STACK it is 16-byte aligned and the SEGV goes away. If preparing a final patch, remember that I had to add -mfxsr to CFLAGS to get the patch using __builtin_ia32_fxsave to compile. Thanks, Chris.
(In reply to Chris Paulson-Ellis from comment #20) > (In reply to Roland Scheidegger from comment #19) > > Maybe using > > PIPE_ALIGN_STACK static INLINE boolean sse2_has_daz(void) > > would do the trick. I guess the PIPE_ALIGN_VAR on a stack'ed variable isn't > > working otherwise. Not sure though this works on inlined functions? > > This seems to work. Without PIPE_ALIGN_STACK, fxarea has an 8-byte aligned > address, but with PIPE_ALIGN_STACK it is 16-byte aligned and the SEGV goes > away. > > If preparing a final patch, remember that I had to add -mfxsr to CFLAGS to > get the patch using __builtin_ia32_fxsave to compile. > > Thanks, > Chris. I don't feel like messing with compile flags for now, so I'm not going to change anything not strictly required to fix this bug.
Fixed by b59c7ed0ab1ac5b6d9f8d409f1a90401ab7775b6.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.