Summary: | Failing Test for 0.20.0 and 0.21.2 under Windows 7 (with MMX or SSE2 enabled) | ||
---|---|---|---|
Product: | pixman | Reporter: | Brent Fulgham <bfulgham> |
Component: | pixman | Assignee: | Søren Sandmann Pedersen <soren.sandmann> |
Status: | RESOLVED FIXED | QA Contact: | Søren Sandmann Pedersen <soren.sandmann> |
Severity: | major | ||
Priority: | medium | CC: | jmuizelaar, jon.turney, kibi, siarhei.siamashka |
Version: | other | ||
Hardware: | x86 (IA32) | ||
OS: | Windows (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Build log for Pixman
Win32 build file Assembly from Debug build Assembly from Release build 0001-Workaround-for-_mm_empty-related-miscompilation.patch |
Description
Brent Fulgham
2011-01-13 09:52:46 UTC
Created attachment 41978 [details]
Build log for Pixman
Build log showing some warnings that might be relevant.
The scaling-test also fails with a FLOAT_REGS_CORRUPTION_DETECTOR_FINISH when built with MMX and SSE2. scaling-test.c: "Assertion failed: frcd_canary_variable == frcd_volatile_constant1, file scaling-test.c, line 239" The test passes with MMX (but not SSE2). Created attachment 41979 [details]
Win32 build file
This is the Makefile I use to build the tests. It only includes those that build cleanly under Windows. A couple of tests could be easily added by handling the lack of snprintf in Windows.
Note: This bug seems to have been introduced in 0.19.4; the 0.19.2 release does not suffer from this issue. Does it help to pass tests if you reduce/disable optimizations in Visual Studio? Regarding the checksum error in blitters-test, you can try commenting out functions in 'mmx_fast_paths' array until the problem disappears. Regarding the assertion error, warnings from your log about missing EMMS instructions are indeed most likely related. I wonder about this particular one: "function 'fast_composite_scaled_nearest_sse2_8888_8888_none_OVER' has no EMMS instruction". This code is used in affine and scaling tests and is the most likely culprit. But 'scaled_nearest_scanline_sse2_8888_8888_OVER' function which gets inlined there actually has '_mm_empty ()' intrinsic at the end. And it should normally expand into EMMS instruction. Something is really fishy. I can't reproduce any of the problems in Linux with gcc, so we are going to need some more information. As Siarhei says, the fast_composite_scaled_nearest_sse2_8888_8888_none_OVER function is a like culprit for the float corruption issue. It would be helpful if you can verify this by commenting out lines in this block: SIMPLE_NEAREST_FAST_PATH_COVER (OVER, a8r8g8b8, x8r8g8b8, sse2_8888_8888), SIMPLE_NEAREST_FAST_PATH_COVER (OVER, a8b8g8r8, x8b8g8r8, sse2_8888_8888), SIMPLE_NEAREST_FAST_PATH_COVER (OVER, a8r8g8b8, a8r8g8b8, sse2_8888_8888), SIMPLE_NEAREST_FAST_PATH_COVER (OVER, a8b8g8r8, a8b8g8r8, sse2_8888_8888), SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8r8g8b8, x8r8g8b8, sse2_8888_8888), SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8b8g8r8, x8b8g8r8, sse2_8888_8888), SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8r8g8b8, a8r8g8b8, sse2_8888_8888), SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8b8g8r8, a8b8g8r8, sse2_8888_8888), SIMPLE_NEAREST_FAST_PATH_PAD (OVER, a8r8g8b8, x8r8g8b8, sse2_8888_8888), SIMPLE_NEAREST_FAST_PATH_PAD (OVER, a8b8g8r8, x8b8g8r8, sse2_8888_8888), SIMPLE_NEAREST_FAST_PATH_PAD (OVER, a8r8g8b8, a8r8g8b8, sse2_8888_8888), SIMPLE_NEAREST_FAST_PATH_PAD (OVER, a8b8g8r8, a8b8g8r8, sse2_8888_8888), in pixman-sse2.c until the float corruption goes away. Similarly, if you can determine which of the fast paths in the mmx_fast_paths table in pixman-mmx.c is causing blitters-test.c to fail, that would be helpful too. Finally, most of the warnings about "not inlined" we should probably fix by switching to force_inline. (In reply to comment #6) > I can't reproduce any of the problems in Linux with gcc, so we are going to > need some more information. > > As Siarhei says, the fast_composite_scaled_nearest_sse2_8888_8888_none_OVER > function is a like culprit for the float corruption issue. Actually I remember a similar problem in Linux, which was almost certainly a bug in some specific version of gcc with some very specific optimization flags: http://bugs.gentoo.org/show_bug.cgi?id=347947 But it is interesting that Visual Studio may have hit the same problem. (In reply to comment #5) > Does it help to pass tests if you reduce/disable optimizations in Visual > Studio? Since the build is from the Makefile.win32, you can see that very few optimization flags are active (e.g., things like link-time code generation, automatic generation of vectorized code, etc.) So, I simply changed from the default -O2 to -O1, which did not stop the assertion from firing. The various compiler options in Visual Studio have changed in the last couple of years, so just for grins I tried turning on all the compiler optimizations: -Ox -Ob2 -Oi -Ot -Oy -GT -GL /arch:SSE2 Perhaps unsurprisingly this did not change anything. (In reply to comment #6) > It would be helpful if you can verify this by commenting out lines in this > block: > > SIMPLE_NEAREST_FAST_PATH_COVER (OVER, a8r8g8b8, x8r8g8b8, sse2_8888_8888), > SIMPLE_NEAREST_FAST_PATH_COVER (OVER, a8b8g8r8, x8b8g8r8, sse2_8888_8888), > SIMPLE_NEAREST_FAST_PATH_COVER (OVER, a8r8g8b8, a8r8g8b8, sse2_8888_8888), > SIMPLE_NEAREST_FAST_PATH_COVER (OVER, a8b8g8r8, a8b8g8r8, sse2_8888_8888), These lines can be included, either singly or in the entire block of four. > SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8r8g8b8, x8r8g8b8, sse2_8888_8888), > SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8b8g8r8, x8b8g8r8, sse2_8888_8888), > SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8r8g8b8, a8r8g8b8, sse2_8888_8888), > SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8b8g8r8, a8b8g8r8, sse2_8888_8888), If any of the first three of these lines are included, the assertion fires. If the last line is included (but not the last line of the SIMPLE_NEAREST_FAST_PATH_PAD), the assertion is NOT fired. > SIMPLE_NEAREST_FAST_PATH_PAD (OVER, a8r8g8b8, x8r8g8b8, sse2_8888_8888), > SIMPLE_NEAREST_FAST_PATH_PAD (OVER, a8b8g8r8, x8b8g8r8, sse2_8888_8888), > SIMPLE_NEAREST_FAST_PATH_PAD (OVER, a8r8g8b8, a8r8g8b8, sse2_8888_8888), > SIMPLE_NEAREST_FAST_PATH_PAD (OVER, a8b8g8r8, a8b8g8r8, sse2_8888_8888), If any of the first three of these lines are included, the assertion fires. If the last line is included, the assertion does NOT fire -- unless the last line of SIMPLE_NEAREST_FAST_PATH_NONE is included, in which case it DOES fire. So you can choose from either the following: > SIMPLE_NEAREST_FAST_PATH_NONE (OVER, a8b8g8r8, a8b8g8r8, sse2_8888_8888), > SIMPLE_NEAREST_FAST_PATH_PAD (OVER, a8b8g8r8, a8b8g8r8, sse2_8888_8888), > Similarly, if you can determine which of the fast paths in the mmx_fast_paths > table in pixman-mmx.c is causing blitters-test.c to fail, that would be helpful > too. I'll try to narrow it down. (In reply to comment #5) > Regarding the checksum error in blitters-test, you can try commenting out > functions in 'mmx_fast_paths' array until the problem disappears. I commented out the entire array (except for the PIXMAN_OP_NONE line), and continued to get the error: $ ./release/blitters-test.exe blitters test failed! (checksum=CA054E72, expected 1DB8BDF8) Can you attach the generated assembly for the fast_composite_scaled_nearest_sse2_8888_8888_none_OVER() function? And just to be clear, if you disable both MMX and SSE2, then blitters-test passes? If so, you might try disabling all the combiners too: imp->combine_32[PIXMAN_OP_OVER] = mmx_combine_over_u; imp->combine_32[PIXMAN_OP_OVER_REVERSE] = mmx_combine_over_reverse_u; imp->combine_32[PIXMAN_OP_IN] = mmx_combine_in_u; ... in pixman-mmx.c Could you check if the patches from the following branch resolve assert problems in Visual Studio? http://cgit.freedesktop.org/~siamashka/pixman/log/?h=builtin_constant_p The whole point of having 'force_inline' scanline processing function was that in the case of processing padding pixels, 'vx' and 'unit_x' arguments are known at compile time and set to 0. Also for NONE repeat and OVER operator, the whole use of inlined functions for padding pixels can be optimized out because the source pixels are also zero. This is what is happening for C fast path implementation. Or at least this was happening before commit http://cgit.freedesktop.org/pixman/commit/?id=55bbccf84e475b2e3c4536606cd08c946c041fd0 (unfortunately it appears to hinder this optimization). Adding const modifiers for source scanline buffers solves the problem: http://cgit.freedesktop.org/~siamashka/pixman/commit/?h=builtin_constant_p&id=a317580c0e46757a682564a7e31f4feb9a38c734 But for SSE2 code the compiler is not so clever, so we have both excessive inlining of redundant code and have higher chance of triggering bugs in the compilers. That branch converts SSE2 scanline processing into a normal non-inlined function. It's interesting that pixman MMX/SSE2 code works fine for GCC, Intel C and Clang. And fails for Solaris Studio and Microsoft Visual Studio. I wonder if the problem is in these compilers or pixman just tries to use some "unspecified implementation defined" behaviour of the intrinsics. It's interesting that even though the assertion fires, if you don't halt on assert the checksum is correct. Created attachment 42143 [details]
Assembly from Debug build
Created attachment 42144 [details]
Assembly from Release build
(In reply to comment #11) > Can you attach the generated assembly for the > fast_composite_scaled_nearest_sse2_8888_8888_none_OVER() function? > I've attached the assembly (and source) output for both release and debug modes. The last lines from the release assembly: $LN539@fast_compo@2: 00f94 0f 77 emms 00f96 0f 6f 0d 00 00 00 00 movq mm1, MMWORD PTR _mask_x0080 00f9d 0f 6f 05 00 00 00 00 movq mm0, MMWORD PTR _mask_x0101 ^^^^ Here we do MMX stuff after an EMMS $LN1255@fast_compo@2: 00fa4 83 6d 38 01 sub DWORD PTR _height$[ebp], 1 00fa8 0f 89 62 f2 ff ff jns $LL1271@fast_compo@2 $LN14@fast_compo@2: 00fae 8b 8c 24 ac 00 00 00 mov ecx, DWORD PTR __$ArrayPad$[esp+176] 00fb5 5f pop edi 00fb6 5e pop esi 00fb7 5b pop ebx 00fb8 33 cc xor ecx, esp 00fba e8 00 00 00 00 call @__security_check_cookie@4 00fbf 8b e5 mov esp, ebp 00fc1 5d pop ebp 00fc2 c3 ret 0 ^^^^ And we can fall through to the return. _fast_composite_scaled_nearest_sse2_8888_8888_none_OVER ENDP _TEXT ENDS So an EMMS is definitely being misplaced by someone in this function. Possibly Visual Studio. Also adding Jeff Muizelaar to CC because I think that Mozilla might be interested in ensuring that pixman works properly with Visual Studio. Regarding the things that are common between Visual Studio and Solaris Studio other than having 'Studio' in their names ;-) There are some #ifdefs in MMX/SSE2 code specifically for these compilers. While the rest of the bunch are reasonably gcc compatible (Clang, Intel C). Hi, not sure it's relevant to this bug, but a regression looks similar on a Cygwin Tinderbox, while updating master: 8d359b0..8e41002 PASS: blitters-test.exe assertion "frcd_canary_variable1 == frcd_volatile_constant1" failed: file "/opt/jhbuild/git/xorg/lib/pixman/test/scaling-test.c", line 357, function: test_composite /bin/sh: line 5: 4332 Aborted (core dumped) ${dir}$tst FAIL: scaling-test.exe affine test passed (checksum=4B5D1852) Build log: http://tinderbox.x.org/builds/2011-02-15-0008/logs/pixman/#check KiBi. Thanks for the report. This surely looks related (and we need to do something about this problem). I have reported a bug to gcc: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47759 Created attachment 43408 [details] [review] 0001-Workaround-for-_mm_empty-related-miscompilation.patch If you have access to this Cygwin machine, could you try the attached patch? > If you have access to this Cygwin machine, could you try the attached patch?
I don't, but I've just forwarded your request to its maintainer.
KiBi.
Can we simply add a function emms() that does mm_empty() and then hope that the compilers don't try to reschedule a function call? (In reply to comment #22) > Created an attachment (id=43408) [details] > 0001-Workaround-for-_mm_empty-related-miscompilation.patch > > If you have access to this Cygwin machine, could you try the attached patch? This seems to allow 'make check' to succeed under cygwin. On the tinderbox: $ gcc --version gcc (GCC) 4.3.4 20090804 (release) 1 (In reply to comment #24) > Can we simply add a function emms() that does mm_empty() and then hope that the > compilers don't try to reschedule a function call? Yes, I think we can. But I guess Microsoft Visual Studio will still issue warnings about missing EMMS if it will be moved into a separate function. And it's quite common usage pattern to have some function full of intrinsics and mm_empty() at the end of it, so this should be much less likely to be miscompiled. We went into uncharted waters when tried to inline such function into something else and this happened to be quite a challenge for instruction schedulers in different compilers. (In reply to comment #25) > (In reply to comment #22) > > Created an attachment (id=43408) [details] [details] > > 0001-Workaround-for-_mm_empty-related-miscompilation.patch > > > > If you have access to this Cygwin machine, could you try the attached patch? > > This seems to allow 'make check' to succeed under cygwin. OK, thanks for confirming. Sent this patch to the mailing list too: http://lists.freedesktop.org/archives/pixman/2011-February/001022.html This should now be fixed in master. I will release 0.21.6 later today with the fixes. Feel free to reopen if there is still a problem. I can confirm that I no longer get the assertion in the affine-test.c run (with either SSE2=yes,MMX=yes or SSE2=no,MMX=yes). scaling-test.c, composite-traps-test.c, and blitters-test.c return 127. The other tests return 0. I'm not sure if the 127 return codes are expected from these three tests. No error message output is shown on the console. If you look at fuzzer_test_main(), I don't see any way these programs could return anything other than 0 or 1. How are you determining what the exit code is? |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.