So 32bit ut4.1 timedemo segfaults, while 64bit one works fine and also 32bit exe via wine works fine, bisected to:
Works fine before commit, issue present in current git too.
Hardware is Athlon 5350 APU which has sse4.1 supported.
Steps to reproduce:
0. download ut4.1 for example from here: http://www.moddb.com/downloads/start/9738/all?referer=http%3A%2F%2Fwww.moddb.com%2Fmods%2Furban-terror%2Fdownloads%2Furban-terror-41-full-installer-zip
2. open quake console and start demo with 'demo TUTORIAL' or just
click on Demos and start Tutorial demo... and:
Received signal 11, exiting...
smoki, please get a backtrace of the crash.
(In reply to Michel Dänzer from comment #1)
> smoki, please get a backtrace of the crash.
Well there it is:
Program received signal SIGSEGV, Segmentation fault.
_mesa_uint_array_min_max (ui_indices=0x89ca3e0 <tess>, min_index=0xbfff6500, max_index=0xbfff6504, count=1488) at main/sse_minmax.c:61
61 __m128i max_ui4 = _mm_setzero_si128();
#0 _mesa_uint_array_min_max (ui_indices=0x89ca3e0 <tess>, min_index=0xbfff6500, max_index=0xbfff6504, count=1488) at main/sse_minmax.c:61
#1 0xa5292fd9 in vbo_get_minmax_index (ctx=<error reading variable: Cannot access memory at address 0xbfff654c>,
ctx@entry=<error reading variable: Cannot access memory at address 0xbfff6548>, prim=<error reading variable: Cannot access memory at address 0xbfff6550>,
ib=<error reading variable: Cannot access memory at address 0xbfff6554>, min_index=<error reading variable: Cannot access memory at address 0xbfff6558>,
max_index=<error reading variable: Cannot access memory at address 0xbfff655c>, count=<error reading variable: Cannot access memory at address 0xbfff6560>)
It seems to be an alignment issue and I'm not sure I fully understand why its happening but does changing:
unsigned vec_count __attribute__ ((aligned (16)));
on line 60 fix it?
(In reply to Timothy Arceri from comment #3)
> It seems to be an alignment issue and I'm not sure I fully understand why
> its happening but does changing:
> unsigned vec_count;
> unsigned vec_count __attribute__ ((aligned (16)));
> on line 60 fix it?
No, that does fix it.
(In reply to smoki from comment #4)
> No, that does fix it.
Does *not* i mean of course.
> unsigned vec_count __attribute__ ((aligned (16)));
This won;'t work on 32-bits unless you use -mstackrealign or -mincoming-stack-boundary=2.
This is because some years ago gcc started to assume the stack alignment is 16-bytes. But old applications were still built with 4-byte stack aligment. And because Mesa drivers can't get to choose, they must assume the worst, ie. 4 bytes stack alignment.
BTW, wefore we start investing on custom SSE code paths, we really should be using -mfpmath=sse on Linux 32bits too. It's recommend by Intel and the default on MSVC.
This patch might help:
diff --git a/src/mesa/main/sse_minmax.c b/src/mesa/main/sse_minmax.c
index 222ac14..c3ec420 100644
@@ -31,6 +31,9 @@
_mesa_uint_array_min_max(const unsigned *ui_indices, unsigned *min_index,
unsigned *max_index, const unsigned count)
See https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html#index-g_t_0040code_007bforce_005falign_005farg_005fpointer_007d-attribute-3040 for more info.
But this is still not safe, because gcc might still use msse behind our backs. We really must modify configure.ac and Makefiles to use -mstackrealign -mincoming-stack-boundary=2 whenever we use -msse* on x86 platforms. (x86_64 is not necessary)
(In reply to José Fonseca from comment #7)
> This patch might help:
> diff --git a/src/mesa/main/sse_minmax.c b/src/mesa/main/sse_minmax.c
> index 222ac14..c3ec420 100644
> --- a/src/mesa/main/sse_minmax.c
> +++ b/src/mesa/main/sse_minmax.c
> @@ -31,6 +31,9 @@
> #include <stdint.h>
> +#if !defined(__x86_64__)
> _mesa_uint_array_min_max(const unsigned *ui_indices, unsigned *min_index,
> unsigned *max_index, const unsigned count)
If you need confirmation, yes that one works for me.
I pushed a quick fix on a1fc6a91e5c6ab098fa8576e63b3a070852aa2a7 .
Though I have to say I'm a bit disappointed that pointing out the problem/solution wasn't enough: I had to prepare the fix myself, and that my advice on the proper fix fell on deaf ears.
If this is the level of ownership one is to expect from these SSE optimizations, then it would be better to never get them committed in the first place. In fact, if this goes on like this, I'll propose to revert them. As I'm not convinced the meager performance improvements justify this sort of headaches.
Thanks for your fix and explanation of the problem. Your suggested solution did not fall on deaf ears, I still intend to implement a proper solution. I had a family member fall gravely ill over the four days between your suggestion and when you sent your patch to the mailing list and so didn't have much time for Mesa, thankfully they are now doing much better. I saw your patch was already on the mailing list when I finally had a bit of spare time to work on the real solution.
A fix is still required for the 10.4 branch as the SSE patch landed before the branch occurred. I'll try to get a change sent out tonight or maybe your patch should be pushed to the 10.4 branch for the time being.
Finally I questioned if the optimisation was worth doing when I first sent the patch but people seemed interested so I followed it up. Once we have a proper build solution in place this kind of issue should be greatly lessened so I dont think there is a need to purge everything. I have tried to provide a good level of ownership to all my Mesa changes and volunteer my personnal time to working on Mesa because I want to help make things better.
Created attachment 110506 [details] [review]
Realign via config/make
Are you able to give this patch a try against master?
(In reply to Timothy Arceri from comment #11)
> Created attachment 110506 [details] [review] [review]
> Realign via config/make
> Hi smoki,
> Are you able to give this patch a try against master?
Yep quick tried UT4.1 now, yeah that patch works too.
Sorry for jumping into wrong conclusions, and thanks for pursuing a more comprehensive fix. I appreciate it.
Fix is commited to mesa f1b5f2b157a092e93590bd43544fbf2671edab36