Bug 33069

Summary:	Failing Test for 0.20.0 and 0.21.2 under Windows 7 (with MMX or SSE2 enabled)
Product:	pixman	Reporter:	Brent Fulgham <bfulgham>
Component:	pixman	Assignee:	Søren Sandmann Pedersen <soren.sandmann>
Status:	RESOLVED FIXED	QA Contact:	Søren Sandmann Pedersen <soren.sandmann>
Severity:	major
Priority:	medium	CC:	jmuizelaar, jon.turney, kibi, siarhei.siamashka
Version:	other
Hardware:	x86 (IA32)
OS:	Windows (All)
Whiteboard:
i915 platform:		i915 features:
Attachments:	Build log for Pixman Win32 build file Assembly from Debug build Assembly from Release build 0001-Workaround-for-_mm_empty-related-miscompilation.patch

Description Brent Fulgham 2011-01-13 09:52:46 UTC

I built Pixman 0.20.2 from the source tarball on the website using Visual Studio 2008 with the stock Makefile.win32.  This completed successfully.

I then built all tests that did not require GTK.  Of those tests, one failed with the FLOAT_REGS_CORRUPTION_DETECTOR_FINISH assertion when MMX and SSE2 were enabled:

affine-test.c
"Assertion failed: frcd_canary_variable1 == frcd_volatile_constant1, file affine-test.c, line 250"

When built with MMX (but not SSE2) the FLOAT_REGS_CORRUPTION_DETECTOR_FINISH assertion did not fire, but the "blitters-test" failed with a checksum error:

blitters test failed! (checksum=CA054E72, expected 1DB8BDF8)

Comment 1 Brent Fulgham 2011-01-13 09:54:58 UTC

Created attachment 41978 [details]
Build log for Pixman

Build log showing some warnings that might be relevant.

Comment 2 Brent Fulgham 2011-01-13 10:08:22 UTC

The scaling-test also fails with a FLOAT_REGS_CORRUPTION_DETECTOR_FINISH when built with MMX and SSE2.

scaling-test.c:
"Assertion failed: frcd_canary_variable == frcd_volatile_constant1, file scaling-test.c, line 239"

The test passes with MMX (but not SSE2).

Comment 3 Brent Fulgham 2011-01-13 10:10:28 UTC

Created attachment 41979 [details]
Win32 build file

This is the Makefile I use to build the tests.  It only includes those that build cleanly under Windows.  A couple of tests could be easily added by handling the lack of snprintf in Windows.

Comment 4 Brent Fulgham 2011-01-13 10:50:18 UTC

Note: This bug seems to have been introduced in 0.19.4; the 0.19.2 release does not suffer from this issue.

Comment 5 Siarhei Siamashka 2011-01-14 07:00:08 UTC

Does it help to pass tests if you reduce/disable optimizations in Visual Studio?

Regarding the checksum error in blitters-test, you can try commenting out functions in 'mmx_fast_paths' array until the problem disappears.

Regarding the assertion error, warnings from your log about missing EMMS instructions are indeed most likely related. I wonder about this particular one: "function 'fast_composite_scaled_nearest_sse2_8888_8888_none_OVER' has no EMMS instruction". This code is used in affine and scaling tests and is the most likely culprit. But 'scaled_nearest_scanline_sse2_8888_8888_OVER' function which gets inlined there actually has '_mm_empty ()' intrinsic at the end. And it should normally expand into EMMS instruction. Something is really fishy.

Comment 6 Søren Sandmann Pedersen 2011-01-15 06:51:21 UTC

I can't reproduce any of the problems in Linux with gcc, so we are going to need some more information.

As Siarhei says, the fast_composite_scaled_nearest_sse2_8888_8888_none_OVER function is a like culprit for the float corruption issue. 

It would be helpful if you can verify this by commenting out lines in this block:

    SIMPLE_NEAREST_FAST_PATH_COVER (OVER, a8r8g8b8, x8r8g8b8, sse2_8888_8888),
    SIMPLE_NEAREST_FAST_PATH_COVER (OVER, a8b8g8r8, x8b8g8r8, sse2_8888_8888),
    SIMPLE_NEAREST_FAST_PATH_COVER (OVER, a8r8g8b8, a8r8g8b8, sse2_8888_8888),
    SIMPLE_NEAREST_FAST_PATH_COVER (OVER, a8b8g8r8, a8b8g8r8, sse2_8888_8888),
    SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8r8g8b8, x8r8g8b8, sse2_8888_8888),
    SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8b8g8r8, x8b8g8r8, sse2_8888_8888),
    SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8r8g8b8, a8r8g8b8, sse2_8888_8888),
    SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8b8g8r8, a8b8g8r8, sse2_8888_8888),
    SIMPLE_NEAREST_FAST_PATH_PAD (OVER, a8r8g8b8, x8r8g8b8, sse2_8888_8888),
    SIMPLE_NEAREST_FAST_PATH_PAD (OVER, a8b8g8r8, x8b8g8r8, sse2_8888_8888),
    SIMPLE_NEAREST_FAST_PATH_PAD (OVER, a8r8g8b8, a8r8g8b8, sse2_8888_8888),
    SIMPLE_NEAREST_FAST_PATH_PAD (OVER, a8b8g8r8, a8b8g8r8, sse2_8888_8888),

in pixman-sse2.c until the float corruption goes away.

Similarly, if you can determine which of the fast paths in the mmx_fast_paths table in pixman-mmx.c is causing blitters-test.c to fail, that would be helpful too.

Finally, most of the warnings about "not inlined" we should probably fix by switching to force_inline.

Comment 7 Siarhei Siamashka 2011-01-15 15:58:35 UTC

(In reply to comment #6)
> I can't reproduce any of the problems in Linux with gcc, so we are going to
> need some more information.
> 
> As Siarhei says, the fast_composite_scaled_nearest_sse2_8888_8888_none_OVER
> function is a like culprit for the float corruption issue. 

Actually I remember a similar problem in Linux, which was almost certainly a bug in some specific version of gcc with some very specific optimization flags: http://bugs.gentoo.org/show_bug.cgi?id=347947

But it is interesting that Visual Studio may have hit the same problem.

Comment 8 Brent Fulgham 2011-01-15 21:37:42 UTC

(In reply to comment #5)
> Does it help to pass tests if you reduce/disable optimizations in Visual
> Studio?

Since the build is from the Makefile.win32, you can see that very few optimization flags are active (e.g., things like link-time code generation, automatic generation of vectorized code, etc.)  So, I simply changed from the default -O2 to -O1, which did not stop the assertion from firing.

The various compiler options in Visual Studio have changed in the last couple of years, so just for grins I tried turning on all the compiler optimizations:

-Ox -Ob2 -Oi -Ot -Oy -GT -GL /arch:SSE2

Perhaps unsurprisingly this did not change anything.

Comment 9 Brent Fulgham 2011-01-15 22:01:32 UTC

(In reply to comment #6)
> It would be helpful if you can verify this by commenting out lines in this
> block:
> 
>     SIMPLE_NEAREST_FAST_PATH_COVER (OVER, a8r8g8b8, x8r8g8b8, sse2_8888_8888),
>     SIMPLE_NEAREST_FAST_PATH_COVER (OVER, a8b8g8r8, x8b8g8r8, sse2_8888_8888),
>     SIMPLE_NEAREST_FAST_PATH_COVER (OVER, a8r8g8b8, a8r8g8b8, sse2_8888_8888),
>     SIMPLE_NEAREST_FAST_PATH_COVER (OVER, a8b8g8r8, a8b8g8r8, sse2_8888_8888),

These lines can be included, either singly or in the entire block of four.

>     SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8r8g8b8, x8r8g8b8, sse2_8888_8888),
>     SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8b8g8r8, x8b8g8r8, sse2_8888_8888),
>     SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8r8g8b8, a8r8g8b8, sse2_8888_8888),
>     SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8b8g8r8, a8b8g8r8, sse2_8888_8888),

If any of the first three of these lines are included, the assertion fires.  If the last line is included (but not the last line of the SIMPLE_NEAREST_FAST_PATH_PAD), the assertion is NOT fired.

>     SIMPLE_NEAREST_FAST_PATH_PAD (OVER, a8r8g8b8, x8r8g8b8, sse2_8888_8888),
>     SIMPLE_NEAREST_FAST_PATH_PAD (OVER, a8b8g8r8, x8b8g8r8, sse2_8888_8888),
>     SIMPLE_NEAREST_FAST_PATH_PAD (OVER, a8r8g8b8, a8r8g8b8, sse2_8888_8888),
>     SIMPLE_NEAREST_FAST_PATH_PAD (OVER, a8b8g8r8, a8b8g8r8, sse2_8888_8888),

If any of the first three of these lines are included, the assertion fires.  If the last line is included, the assertion does NOT fire -- unless the last line of SIMPLE_NEAREST_FAST_PATH_NONE is included, in which case it DOES fire.

So you can choose from either the following:
>     SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8b8g8r8, a8b8g8r8, sse2_8888_8888),
>     SIMPLE_NEAREST_FAST_PATH_PAD (OVER, a8b8g8r8, a8b8g8r8, sse2_8888_8888),

> Similarly, if you can determine which of the fast paths in the mmx_fast_paths
> table in pixman-mmx.c is causing blitters-test.c to fail, that would be helpful
> too.

I'll try to narrow it down.

Comment 10 Brent Fulgham 2011-01-15 22:47:12 UTC

(In reply to comment #5)
> Regarding the checksum error in blitters-test, you can try commenting out
> functions in 'mmx_fast_paths' array until the problem disappears.

I commented out the entire array (except for the PIXMAN_OP_NONE line), and continued to get the error:

$ ./release/blitters-test.exe
blitters test failed! (checksum=CA054E72, expected 1DB8BDF8)

Comment 11 Søren Sandmann Pedersen 2011-01-16 06:48:32 UTC

Can you attach the generated assembly for the fast_composite_scaled_nearest_sse2_8888_8888_none_OVER() function?

And just to be clear, if you disable both MMX and SSE2, then blitters-test passes? If so, you might try disabling all the combiners too:

    imp->combine_32[PIXMAN_OP_OVER] = mmx_combine_over_u;
    imp->combine_32[PIXMAN_OP_OVER_REVERSE] = mmx_combine_over_reverse_u;
    imp->combine_32[PIXMAN_OP_IN] = mmx_combine_in_u;
    ...

in pixman-mmx.c

Comment 12 Siarhei Siamashka 2011-01-16 17:06:21 UTC

Could you check if the patches from the following branch resolve assert problems in Visual Studio? http://cgit.freedesktop.org/~siamashka/pixman/log/?h=builtin_constant_p

The whole point of having 'force_inline' scanline processing function was that in the case of processing padding pixels, 'vx' and 'unit_x' arguments are known at compile time and set to 0. Also for NONE repeat and OVER operator, the whole use of inlined functions for padding pixels can be optimized out because the source pixels are also zero. This is what is happening for C fast path implementation. Or at least this was happening before commit http://cgit.freedesktop.org/pixman/commit/?id=55bbccf84e475b2e3c4536606cd08c946c041fd0 (unfortunately it appears to hinder this optimization). Adding const modifiers for source scanline buffers solves the problem: http://cgit.freedesktop.org/~siamashka/pixman/commit/?h=builtin_constant_p&id=a317580c0e46757a682564a7e31f4feb9a38c734

But for SSE2 code the compiler is not so clever, so we have both excessive inlining of redundant code and have higher chance of triggering bugs in the compilers. That branch converts SSE2 scanline processing into a normal non-inlined function.

Comment 13 Siarhei Siamashka 2011-01-17 07:44:16 UTC

It's interesting that pixman MMX/SSE2 code works fine for GCC, Intel C and Clang. And fails for Solaris Studio and Microsoft Visual Studio. I wonder if the problem is in these compilers or pixman just tries to use some "unspecified implementation defined" behaviour of the intrinsics.

Comment 14 Brent Fulgham 2011-01-17 17:15:24 UTC

It's interesting that even though the assertion fires, if you don't halt on assert the checksum is correct.

Comment 15 Brent Fulgham 2011-01-17 21:50:53 UTC

Created attachment 42143 [details]
Assembly from Debug build

Comment 16 Brent Fulgham 2011-01-17 22:14:07 UTC

Created attachment 42144 [details]
Assembly from Release build

Comment 17 Brent Fulgham 2011-01-17 22:14:55 UTC

(In reply to comment #11)
> Can you attach the generated assembly for the
> fast_composite_scaled_nearest_sse2_8888_8888_none_OVER() function?
> 

I've attached the assembly (and source) output for both release and debug modes.

Comment 18 Søren Sandmann Pedersen 2011-01-18 12:16:12 UTC

The last lines from the release assembly:

$LN539@fast_compo@2:
  00f94	0f 77		 emms
  00f96	0f 6f 0d 00 00
	00 00		 movq	 mm1, MMWORD PTR _mask_x0080
  00f9d	0f 6f 05 00 00
	00 00		 movq	 mm0, MMWORD PTR _mask_x0101

                         ^^^^  Here we do MMX stuff after an EMMS

$LN1255@fast_compo@2:
  00fa4	83 6d 38 01	 sub	 DWORD PTR _height$[ebp], 1
  00fa8	0f 89 62 f2 ff
	ff		 jns	 $LL1271@fast_compo@2
$LN14@fast_compo@2:
  00fae	8b 8c 24 ac 00
	00 00		 mov	 ecx, DWORD PTR __$ArrayPad$[esp+176]
  00fb5	5f		 pop	 edi
  00fb6	5e		 pop	 esi
  00fb7	5b		 pop	 ebx
  00fb8	33 cc		 xor	 ecx, esp
  00fba	e8 00 00 00 00	 call	 @__security_check_cookie@4
  00fbf	8b e5		 mov	 esp, ebp
  00fc1	5d		 pop	 ebp
  00fc2	c3		 ret	 0

                         ^^^^ And we can fall through to the return.

_fast_composite_scaled_nearest_sse2_8888_8888_none_OVER ENDP
_TEXT	ENDS


So an EMMS is definitely being misplaced by someone in this function. Possibly Visual Studio.

Comment 19 Siarhei Siamashka 2011-01-19 10:12:49 UTC

Also adding Jeff Muizelaar to CC because I think that Mozilla might be interested in ensuring that pixman works properly with Visual Studio.

Regarding the things that are common between Visual Studio and Solaris Studio other than having 'Studio' in their names ;-) There are some #ifdefs in MMX/SSE2 code specifically for these compilers. While the rest of the bunch are reasonably gcc compatible (Clang, Intel C).

Comment 20 Cyril Brulebois 2011-02-15 12:17:30 UTC

Hi,

not sure it's relevant to this bug, but a regression looks similar on a Cygwin Tinderbox, while updating master: 8d359b0..8e41002

PASS: blitters-test.exe
assertion "frcd_canary_variable1 == frcd_volatile_constant1" failed: file "/opt/jhbuild/git/xorg/lib/pixman/test/scaling-test.c", line 357, function: test_composite
/bin/sh: line 5:  4332 Aborted                 (core dumped) ${dir}$tst
FAIL: scaling-test.exe
affine test passed (checksum=4B5D1852)

Build log:
  http://tinderbox.x.org/builds/2011-02-15-0008/logs/pixman/#check

KiBi.

Comment 21 Siarhei Siamashka 2011-02-15 15:39:17 UTC

Thanks for the report. This surely looks related (and we need to do something about this problem). I have reported a bug to gcc: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47759

Comment 22 Siarhei Siamashka 2011-02-15 15:40:51 UTC

Created attachment 43408 [details] [review]
0001-Workaround-for-_mm_empty-related-miscompilation.patch

If you have access to this Cygwin machine, could you try the attached patch?

Comment 23 Cyril Brulebois 2011-02-15 15:47:01 UTC

> If you have access to this Cygwin machine, could you try the attached patch?

I don't, but I've just forwarded your request to its maintainer.

KiBi.

Comment 24 Søren Sandmann Pedersen 2011-02-16 03:44:46 UTC

Can we simply add a function emms() that does mm_empty() and then hope that the compilers don't try to reschedule a function call?

Comment 25 Jon Turney 2011-02-16 05:57:41 UTC

(In reply to comment #22)
> Created an attachment (id=43408) [details]
> 0001-Workaround-for-_mm_empty-related-miscompilation.patch
> 
> If you have access to this Cygwin machine, could you try the attached patch?

This seems to allow 'make check' to succeed under cygwin.

On the tinderbox:
$ gcc --version
gcc (GCC) 4.3.4 20090804 (release) 1

Comment 26 Siarhei Siamashka 2011-02-16 07:52:26 UTC

(In reply to comment #24)
> Can we simply add a function emms() that does mm_empty() and then hope that the
> compilers don't try to reschedule a function call?

Yes, I think we can. But I guess Microsoft Visual Studio will still issue warnings about missing EMMS if it will be moved into a separate function.

And it's quite common usage pattern to have some function full of intrinsics and mm_empty() at the end of it, so this should be much less likely to be miscompiled. We went into uncharted waters when tried to inline such function into something else and this happened to be quite a challenge for instruction schedulers in different compilers.

Comment 27 Siarhei Siamashka 2011-02-16 09:37:30 UTC

(In reply to comment #25)
> (In reply to comment #22)
> > Created an attachment (id=43408) [details] [details]
> > 0001-Workaround-for-_mm_empty-related-miscompilation.patch
> > 
> > If you have access to this Cygwin machine, could you try the attached patch?
> 
> This seems to allow 'make check' to succeed under cygwin.

OK, thanks for confirming. Sent this patch to the mailing list too: http://lists.freedesktop.org/archives/pixman/2011-February/001022.html

Comment 28 Søren Sandmann Pedersen 2011-02-22 03:15:08 UTC

This should now be fixed in master. I will release 0.21.6 later today with the fixes. Feel free to reopen if there is still a problem.

Comment 29 Brent Fulgham 2011-02-23 09:38:33 UTC

I can confirm that I no longer get the assertion in the affine-test.c run (with either SSE2=yes,MMX=yes or SSE2=no,MMX=yes).

scaling-test.c, composite-traps-test.c, and blitters-test.c return 127.  The other tests return 0.

I'm not sure if the 127 return codes are expected from these three tests.  No error message output is shown on the console.

Comment 30 Søren Sandmann Pedersen 2011-02-24 09:51:45 UTC

If you look at fuzzer_test_main(), I don't see any way these programs could return anything other than 0 or 1. How are you determining what the exit code is?

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.