101033 – Testsuite fails on i386

Bug 101033 - Testsuite fails on i386

Summary: Testsuite fails on i386

Status:	RESOLVED FIXED

Alias:	None

Product:	uchardet
Classification:	Unclassified
Component:	general (show other bugs)
Version:	unspecified
Hardware:	x86 (IA32) Linux (All)

Importance:	medium normal
Assignee:	Jehan
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2017-05-13 16:07 UTC by James Cowgill
Modified:	2018-03-16 19:12 UTC (History)
CC List:	3 users (show)

See Also:
i915 platform:
i915 features:

Attachments
0001-nsSBCSGroupProber.cpp-accound-to-FPU-imprecision.patch (1.71 KB, patch) 2017-11-05 23:05 UTC, Sergei Trofimovich	Details \| Splinter Review
0001-build-turn-TARGET_ARCHITECTURE-into-option.patch (1.34 KB, patch) 2018-01-21 14:18 UTC, Coacher	Details \| Splinter Review
View All

Description James Cowgill 2017-05-13 16:07:46 UTC

The uchardet testsuite fails when run on i386 (although strangely not on other 32-bit architectures).

Full list of Debian architectures:
https://buildd.debian.org/status/logs.php?pkg=uchardet&ver=0.0.6-1&suite=experimental

In the above logs the th:tis-620 test fails. On the master branch these tests fail:
> The following tests FAILED:
> 	 29 - fi:iso-8859-1 (Failed)
> 	 37 - ga:iso-8859-1 (Failed)
> 	106 - th:tis-620 (Failed)
The reported (incorrect) encodings are:
> $ src/tools/uchardet ../test/fi/iso-8859-1.txt 
> ISO-8859-15
> $ src/tools/uchardet ../test/ga/iso-8859-1.txt 
> WINDOWS-1252
> $ src/tools/uchardet ../test/th/tis-620.txt 
> ISO-8859-11
The fact that it only affects one architecture makes me think that undefined behavior is being invoked somewhere, but neither ASan or UBSan pick up anything specific to these test cases.

Comment 1 Jehan 2017-05-14 11:50:00 UTC

That's weird.

Thanks for reporting. I will have to test an i386 build. If ever you find more information or even have a patch, do not hesitate. :)

Comment 2 James Cowgill 2017-05-29 20:36:37 UTC

It seems this is an x86 FPU accuracy issue. If I compile uchardet with "-msse2 -mfpmath=sse" so that all FPU math uses SSE instead of the legacy x86 FPU unit, then all the tests pass.

Unfortunately this isn't the greatest solution for Debian because packages are supposed to work on really old CPUs without SSE.

Comment 3 Jehan 2017-05-30 10:54:29 UTC

Interesting. Thanks.

I came up a few months ago on a similar issue, which is that gcc usually sets SSE flags by default (so you don't need to bother usually), yet I came on a case on a small 32-bit notebook where it didn't happen and I had to make sure that the flags are set (through the build system): https://git.gnome.org/browse/gimp/commit/?id=e957347dd693d8e2a96e5ed194a4136d50fae976

This is likely similar even though a little more subtle and hidden since no intrinsics functions were used (so it still builds fine; simply accuracy is lessened).

I guess, what we should do here is:
- Add a cmake test for SSE capability of the compiler.
- Add the -msse2 -mfpmath=sse flags explicitly when available.
- Just warn at end of configuration when the flags are not available.

This won't block the build on non-SSE compatible processors, but would simply warn. This way, it would still work on older machines, yet with potential accuracy issues.

Patches welcome if you want to add such CMake test. :-)

Comment 4 Jehan 2017-05-30 11:04:23 UTC

Oh but I guess your last message meant that you'd want to build the Debian package for 32-bit without SSE even if the build machine has SSE. So that won't suit you, won't it?

Well if we manage to find out which calls result in SSE instructions and how to replace them without deteriorating uchardet otherwise and make uchardet fully deterministic again whatever the CPU it runs on, I'm all good. That's an alternative, though a much more complicated one.

Patches are also welcome in this direction.

Comment 5 James Cowgill 2017-05-30 11:45:02 UTC

> I came up a few months ago on a similar issue, which is that gcc usually sets
> SSE flags by default
This will depend on your distribution. I know Debian does not enable SSE by default, but I think recent versions of Fedora do (for example).

> Well if we manage to find out which calls result in SSE instructions and how to
> replace them without deteriorating uchardet otherwise and make uchardet fully
> deterministic again whatever the CPU it runs on, I'm all good. That's an
> alternative, though a much more complicated one.
Pretty much all floating point code is going to be affected by this. I'm not sure its worth "fixing" uchardet to workaround i386's strange FPU stuff.

I found the -ffloat-store option which fixes the testsuite on i386 by trying to keep floating point variables on the stack instead of in registers. I could enable this in Debian. The only downside is the performance cost, but I don't think uchardet is too floating point heavy?

Comment 6 Jehan 2017-05-30 12:35:21 UTC

(In reply to James Cowgill from comment #5)
> > I came up a few months ago on a similar issue, which is that gcc usually sets
> > SSE flags by default
> This will depend on your distribution. I know Debian does not enable SSE by
> default, but I think recent versions of Fedora do (for example).

I see. Indeed I use Fedora and the SSE flags are enabled by default. I think the small notebook was on a lightweight derivative of Debian indeed. This may have been the issue. At the time I assumed it was more about being a 32-bit gcc.

> > Well if we manage to find out which calls result in SSE instructions and how to
> > replace them without deteriorating uchardet otherwise and make uchardet fully
> > deterministic again whatever the CPU it runs on, I'm all good. That's an
> > alternative, though a much more complicated one.
> Pretty much all floating point code is going to be affected by this. I'm not
> sure its worth "fixing" uchardet to workaround i386's strange FPU stuff.
> 
> I found the -ffloat-store option which fixes the testsuite on i386 by trying
> to keep floating point variables on the stack instead of in registers. I
> could enable this in Debian. The only downside is the performance cost, but
> I don't think uchardet is too floating point heavy?

Just like this, I don't see much floating point computation. The multi-byte probers use state machines. The single byte prober uses frequency count of characters (and sequence of characters). There is barely any floating point computation there.
When you run `make test`, it tells you how long it took. Does it look like it takes significantly slower? (well that's definitely not a benchmark, but at least we could see if there is obvious difference)

This said, I'm not sure I am all that fond of the idea of potentially decreasing performance  on all machines for the sake of old machines either. One of the point of uchardet is still to be pretty fast and instant on most data (well on reasonably sized-data for personal use at least).

Funnily reading man gcc, I discover about -ffloat-store that it is for machine where register floating point have more precision than the standard (x86 is indeed cited as one of them). So the problem was that it had too much precision, not the opposite. Funny. Apparently it looks like -fexcess-precision=standard could work too.

Anyway maybe if we could have some build test so that the build option is only set when needed, this would be ideal. Maybe test SSE and only enable a floating precision build flag when SSE is not available. `man gcc` indeed confirms that the issue disappears when SSE is set:

> On the x86, it also has no effect if -mfpmath=sse […] IEEE semantics apply without excess precision

Obviously there should also be a configure option for you to deactivate SSE even if the build machine has it (so that you can disable SSE for the x86 Debian packages).

Comment 7 Jehan 2017-05-30 12:43:29 UTC

I just noticed that line in the man about -fexcess-precision=standard:

> This option is enabled by default for C if a strict conformance option such as -std=c99 is used.

I wonder if that applies to C++ too, because just 2 days ago, I added a conformance to C++11. I didn't notice any noticeable performance issue.
If this note applies to C++ conformance as well, maybe the option is already set now. Could you try to build git master without adding any flag on your own and see if all the tests pass now?

Comment 8 James Cowgill 2017-06-01 10:37:28 UTC

> When you run `make test`, it tells you how long it took. Does it look like it
> takes significantly slower? (well that's definitely not a benchmark, but at
> least we could see if there is obvious difference)
There doesn't seem to be any noticable difference in the speed of the testsuite with -ffloat-store.

> Apparently it looks like -fexcess-precision=standard could work too.
Unfortunately it looks like -fexcess-precision=standard doesn't work with C++. If I use it I get "cc1plus: sorry, unimplemented: -fexcess-precision=standard for C++"

> This said, I'm not sure I am all that fond of the idea of potentially
> decreasing performance  on all machines for the sake of old machines either.
> One of the point of uchardet is still to be pretty fast and instant on most
> data (well on reasonably sized-data for personal use at least).
Well 99% of users are going to have SSE2 so it probably makes sense to do the compiler detection and enable -msse2 -mfpmath=sse by default, as long as there's a way for Debian to disable that. For Debian, one option to keep performance is to compile uchardet twice (one with float-store, one with sse) and use glibc's HWCAP support which allows it to load the "best" library at runtime. However, if uchardet doesn't do a lot of floating point math anyway, I'm not sure it's worth it.

Comment 9 Jehan 2017-09-20 21:18:31 UTC

I have tested today on a i686 (an old eeepc) and `make test` worked fine. I still tested to add to add `-ffloat-store` to see if there is any performance cost, and I could see the test suite takes 15 to 25% longer on this slow machine (that was easier to see than on my fast dev laptop; the impact was quite obvious here).

So I am not really happy to make such a change.
Ideally we would only add such flag when there is no SSE support to make uchardet consistent on all machines while not impacting performance when unneeded.

Comment 10 Coacher 2017-10-08 19:06:05 UTC

Similarly fails on i686: https://bugs.gentoo.org/631852
This one is reproducible in i686 chroot on amd64 machine.

Comment 11 Coacher 2017-10-08 19:32:35 UTC

Also test failures from comment#0 are only seen here with uchardet built from master and not with 0.0.6.

Comment 12 Coacher 2017-10-08 19:41:29 UTC

Erm, exact test failures from comment#0 are only seen with git master, but th:tis-620 detection is broken with 0.0.6 too.

Comment 13 Jehan 2017-10-08 19:51:34 UTC

Just to make sure this is the same issue of floating point precision, have you tested one of the solutions above to see if it fixes the tests?

That is, either force building with SSE2 if your processor supports it: "-msse2 -mfpmath=sse"

Alternatively build with "-ffloat-store".

Either of these should fix the tests. Can you confirm?

Comment 14 Coacher 2017-10-08 20:07:54 UTC

(In reply to Jehan from comment #13)
> Just to make sure this is the same issue of floating point precision, have
> you tested one of the solutions above to see if it fixes the tests?
> 
> That is, either force building with SSE2 if your processor supports it:
> "-msse2 -mfpmath=sse"
> 
> Alternatively build with "-ffloat-store".
> 
> Either of these should fix the tests. Can you confirm?
Both of these workarounds make tests pass successfully.

Comment 15 Jehan 2017-10-08 20:24:52 UTC

(In reply to Coacher from comment #14)
> (In reply to Jehan from comment #13)
> > Either of these should fix the tests. Can you confirm?
> Both of these workarounds make tests pass successfully.

Ok well that's indeed the same issue then.
As said earlier though, I don't want to set "-ffloat-store" all the time though, because it decrease performance. On the other hand, we can't force SSE2 because the original poster was saying this library needs to be built on very old CPUs without SSE too.

So the solution would be to have a configure test for SSE and force SSE flags when it is available, and only fallback to ffloat-store on last resort.

I would make such a change at some point when I can make the time. But if anyone wishes to do it first and propose a patch, you would be very welcome to do so and I would push your commit. ;-)

Comment 16 Coacher 2017-10-23 21:10:41 UTC

Please check out discussion at https://github.com/gentoo/gentoo/pull/5993

Comment 17 Sergei Trofimovich 2017-11-05 23:05:02 UTC

Created attachment 135251 [details] [review]
0001-nsSBCSGroupProber.cpp-accound-to-FPU-imprecision.patch

The attached patch roughly does:

  @@ -301,7 +302,7 @@ float nsSBCSGroupProber::GetConfidence(void)
         if (!mIsActive[i])
           continue;
         cf = mProbers[i]->GetConfidence();
  -      if (bestConf < cf)
  +      if (cf - bestConf > FLT_EPSILON)
         {
           bestConf = cf;
           mBestGuess = i;

to account for FPU instability.

Comment 18 Jehan 2017-11-05 23:52:55 UTC

>   +      if (cf - bestConf > FLT_EPSILON)

This does not feel right.
What you are telling here is that "cf" beats bestconf only if it is higher than bestconf + FLT_ESPILON. But no, it is higher if it is higher and that's all. You would usually use an epsilon value when you want to test for an equality so that to consider that 2 double values are basically the same when they are close enough. But to compare one as higher as the other, it does not really make sense to use an epsilon IMO (at least not for this specific issue).

Also you can see it is not good with this simple counterexample: let's assume that we have encoding1 and encoding2 and our text returns 5.6 for encoding1 and 5.65 for encoding2; and epsilon is 0.1 in our example (of course, actual epsilons should be much lower, but for the sake of the example…). If we test against encoding1 first, then encoding2, the proposed algorithm will return encoding1 (since 5.65 - 5.6 < 0.1). BUT if we test encoding2 first, then encoding1, the same algorithm returns encoding2!
Basically it would return a different result depending on the test order. This is really not a good idea. :-/

Here the issue is (as far as we could understand) just that different accuracies are used on different architectures (in particular without SSE), and this messes up the computation reproducibility when confidences are very close or equal.
So we need to keep known accuracies on these non-SSE machines. And for this, let's just add the proper flags at compilation time when we detect such a machine.

Comment 19 Jehan 2017-11-06 01:01:49 UTC

Ok so this seems to affect too many people, I just made some time to have this fixed. The CMake configuration step will now set SSE2 flags when the options are available, or force IEEE floating point definition otherwise.

I also added the ENABLE_SSE2 option to the CMake so that one can simply disable the SSE flag and always use -ffloat-store instead. This is not by default since it may decrease performance, hence is not advised when the SSE alternative exists. Yet it will allow to build portably if you also targets old x86 machines.

Please test and reopen the bug report if this does not fix the issue.
Thanks! :-)

commit 5996bbd995aed5045cc22e4d1fab08c989377983 (HEAD -> master, origin/master, origin/HEAD)
Author: Jehan <jehan@girinstud.io>
Date:   Mon Nov 6 01:41:50 2017 +0100

    Bug 101033 - Testsuite fails on i386.
    
    Floating point accuracy may be different depending on the architecture.
    In particular some architectures may store floating values with
    different precision, resulting in unreliable results across various
    machines. It would seem in particular true on older x86 machines without
    SSE support, which were reported cases.
    The proposed solution is to test for SSE support and explicitly add the
    proper flags (even though they are set by default anyway on modern x86).
    When this is not available (on older machines or simply when not on x86
    processors), I replace sse2 flags with -ffloat-store, which forces IEEE
    floating point definition.
    The reason why not to always force -ffloat-store is because it seems to
    decrease performance on some machines. SSE is prefered if available.
    
    I also add a ENABLE_SSE2 option on the CMake file to allow builders to
    use -ffloat-store even though SSE2 may be available on the build
    machine. This would allow to build portable binaries which can also be
    installed on older machines.

 CMakeLists.txt | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comment 20 James Cowgill 2017-11-06 20:46:26 UTC

Could you adjust your patch so that -ffloat-store is only used on x86? I don't think you should penalise other architectures needlessly.

Comment 21 Jehan 2017-11-06 23:44:06 UTC

> Could you adjust your patch so that -ffloat-store is only used on x86? I don't think you should penalise other architectures needlessly.

I guess you are right. The problem with -ffloat-store is that its meaning is not to get IEEE floating point, but to not store floats in register (whose side effect is to ensure IEEE standard). What we would really need is -fexcess-precision=standard but that does not work for C++. :-/

Ideally we should just test for the architecture's register precision.
Problem with just not setting -ffloat-store on any non-x86 is that (I guess), we may end up with the same problem of non-consistency of computations on some machines.

All this to say that I am really not sure if this is the right solution. But I'll do so for now anyway. In worst case, I'll just get new bug reports for non-x86 architectures and we'll look for a more appropriate resolution. :-)

commit cd617d181de03a7a13c2020e6c73cd14585e24b6 (HEAD -> master, origin/master, origin/HEAD)
Author: Jehan <jehan@girinstud.io>
Date:   Tue Nov 7 00:37:54 2017 +0100

    CMake: do not check/set SSE and float-store options on non-x86 targets.
    
    Not sure if that's right. I guess we might also find non-x86 machines
    where floating point computation won't follow IEEE standard as well. But
    let's do this for now to prevent from useless performance hit.

 CMakeLists.txt | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

Comment 22 Coacher 2017-12-30 18:58:55 UTC

This problem still occurs on some machines.

You rely on CMAKE_SYSTEM_PROCESSOR, which AFAIK relies on info from uname. Thus if a system doesn't have x86 or amd64 in uname info, you won't be able to recognize it. Such x86 systems exist: https://bugs.gentoo.org/641716

Moreover, you don't provide a way to override TARGET_ARCHITECTURE variable, so it's impossible to workaround this issue.

Please provide a way to override TARGET_ARCHITECTURE from cmake command line, or make arch detection better if possible.

Comment 23 Jehan 2017-12-30 19:01:09 UTC

(In reply to Coacher from comment #22)
> This problem still occurs on some machines.
> 
> You rely on CMAKE_SYSTEM_PROCESSOR, which AFAIK relies on info from uname.
> Thus if a system doesn't have x86 or amd64 in uname info, you won't be able
> to recognize it. Such x86 systems exist: https://bugs.gentoo.org/641716
> 
> Moreover, you don't provide a way to override TARGET_ARCHITECTURE variable,
> so it's impossible to workaround this issue.
> 
> Please provide a way to override TARGET_ARCHITECTURE from cmake command
> line, or make arch detection better if possible.

Ok.
As always, patches welcome. ;-)

Comment 24 Coacher 2017-12-30 19:17:48 UTC

Smth like this should work:

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 9d25a47..bf24fae 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -37,7 +37,10 @@ if (BUILD_SHARED_LIBS)
        option(BUILD_STATIC "Build static library" ON)
 endif (BUILD_SHARED_LIBS)

-string(TOLOWER ${CMAKE_SYSTEM_PROCESSOR} TARGET_ARCHITECTURE)
+if(NOT DEFINED TARGET_ARCHITECTURE)
+    string(TOLOWER ${CMAKE_SYSTEM_PROCESSOR} TARGET_ARCHITECTURE)
+endif(NOT DEFINED TARGET_ARCHITECTURE)
+
 if (TARGET_ARCHITECTURE MATCHES ".*(x86)|(amd).*")
     CHECK_C_COMPILER_FLAG(-msse2 SUPPORTS_CFLAG_SSE2)
     CHECK_C_COMPILER_FLAG(-mfpmath=sse SUPPORTS_CFLAG_SSE_MATH)

Comment 25 Jehan 2017-12-30 22:32:15 UTC

It would be better to have a CMake option so that the feature is visible (a "hidden" feature is barely better than no feature).

Also please, I prefer a git-formatted patch, this way I have the proper authorship that you expect on your commit and a commit message.
Just make your commit in a local repository, then run `git format-patch origin/master` and it will generate a file which you can upload here.
Thanks!

Comment 26 Coacher 2018-01-21 14:18:02 UTC

Created attachment 136877 [details] [review]
0001-build-turn-TARGET_ARCHITECTURE-into-option.patch

(In reply to Jehan from comment #25)
> It would be better to have a CMake option so that the feature is visible (a
> "hidden" feature is barely better than no feature).
> 
> Also please, I prefer a git-formatted patch, this way I have the proper
> authorship that you expect on your commit and a commit message.
> Just make your commit in a local repository, then run `git format-patch
> origin/master` and it will generate a file which you can upload here.
> Thanks!

Here you go. Please merge.

Comment 27 Jehan 2018-01-21 15:02:02 UTC

Comment on attachment 136877 [details] [review]
0001-build-turn-TARGET_ARCHITECTURE-into-option.patch

Review of attachment 136877 [details] [review]:
-----------------------------------------------------------------

Code looks good.
Tested and still looking good.
Pushed!

commit f136d434f0809e064ac195b5bc4e0b50484a474c (HEAD -> master, origin/master, origin/HEAD)
Author: Ilya Tumaykin <itumaykin@gmail.com>
Date:   Sun Jan 21 17:07:27 2018 +0300

    build: turn TARGET_ARCHITECTURE into option
    
    Default value is autodetected if not specified by user.

 CMakeLists.txt | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comment 28 Jehan 2018-01-21 15:14:54 UTC

I could just close now, but maybe it is possible to improve the system detection by a case-by-case basis. Would it be possible to know the value which CMake put into CMAKE_SYSTEM_PROCESSOR for this machine?

I looked in the logs on the bug report in comment 22, but unless I missed it, I can't see it. If we could have the exact value (not just the uname), I could check if we can find a pattern where to add the compiler flags as well.

Of course, that does not remove any of the interest of your commit (which I pushed already, cf. previous commit) for further unprocessed case, but maybe at least there will be one less case to handle manually. :-)

Comment 29 Jehan 2018-03-14 11:24:00 UTC

Well no answer, so I assume you are fine enough with the previous commit and don't need to attempt case-by-case detection.

If you finally decide it is worth it, the question still holds: Would it be possible to know the value which CMake put into CMAKE_SYSTEM_PROCESSOR for this machine?
Then answer and reopen the report.

Other than this, let's consider this as FIXED. :-)

Comment 30 Coacher 2018-03-16 19:12:34 UTC

Sorry, your previous comment slipped off my radar.
Yes, the current solution works perfectly. Thank you very much!
Let's consider this as FIXED, indeed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.