Bug 103559

Summary: Incorrect code generation for atomic operations if HAVE_INTEL_ATOMIC_PRIMITIVES defined
Product: cairo Reporter: Mikhail Fludkov <fludkov.me>
Component: generalAssignee: Chris Wilson <chris>
Status: RESOLVED FIXED QA Contact: cairo-bugs mailing list <cairo-bugs>
Severity: normal    
Priority: medium CC: fludkov.me
Version: unspecified   
Hardware: All   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: Fix the code generation when using GCC legacy atomic operations

Description Mikhail Fludkov 2017-11-03 12:27:33 UTC
It is a follow up after https://bugs.freedesktop.org/show_bug.cgi?id=103037. The patch has the loop:
    while (_cairo_atomic_int_get(once) != CAIRO_ATOMIC_ONCE_INITIALIZED) {}

When the library is compiled with HAVE_INTEL_ATOMIC_PRIMITIVES:
    /* whether memory barriers are needed around atomic operations */
    /* #undef ATOMIC_OP_NEEDS_MEMORY_BARRIER */
    ...
    #define HAVE_INTEL_ATOMIC_PRIMITIVES 1

The _cairo_atomic_int_get will be defined as:
   # define _cairo_atomic_int_get(x) (*x)

Which will make this loop to hot-loop forever, because gcc will optimize it and read once instead of reading on every loop iteration.

Does anybody actually use the implementation under HAVE_INTEL_ATOMIC_PRIMITIVES define? According to git history, the code under HAVE_CXX11_ATOMIC_PRIMITIVES define landed after the one HAVE_INTEL_ATOMIC_PRIMITIVES. But should not it have completely replaced it?
Comment 1 Uli Schlachter 2017-11-05 08:48:41 UTC
Quick comment that doesn't actually answer any question:

   # define _cairo_atomic_int_get(x) (*x)

I think this needs to be a volatile read, at least. So, shouldn't 

   typedef int cairo_atomic_int_t;

instead use volatile int instead of int? I guess this might make this work correctly, but I am not sure (I don't think I heard much about this intel atomics before) and I certainly don't use this code anyway.
Comment 2 Adrian Johnson 2017-11-06 08:34:43 UTC
I would just use __sync_fetch_and_add(x, 0) for the get. The C++11 atomics were added in gcc 4.7 (about 5 years ago). Anyone caring about speed should be using a recent version of gcc.

What version of gcc were you using? Maybe the old versions that don't support C++11 atomics do not optimize away the read.
Comment 3 Mikhail Fludkov 2017-11-06 10:49:08 UTC
(In reply to Adrian Johnson from comment #2)
> I would just use __sync_fetch_and_add(x, 0) for the get. The C++11 atomics
> were added in gcc 4.7 (about 5 years ago). Anyone caring about speed should
> be using a recent version of gcc.

Agree it will be explicit and less "magical". What about using the same implementation regardless of ATOMIC_OP_NEEDS_MEMORY_BARRIER define?

The naming is a bit confusing though. When I read HAVE_INTEL_ATOMIC_PRIMITIVES define, I immediately thought about Intel Compiler and the libraries that come with it. Maybe HAVE_GCC_41_ATOMICS, or HAVE_GCC_OLD_ATOMICS names are better?

> What version of gcc were you using? Maybe the old versions that don't
> support C++11 atomics do not optimize away the read.
We backported the patch (https://bugs.freedesktop.org/show_bug.cgi?id=103037) on top of libcairo version which didn't have the code under HAVE_CXX11_ATOMIC_PRIMITIVES define. Compiled with gcc 4.9, ended up with a hot loop.
Comment 4 Adrian Johnson 2017-11-06 11:17:49 UTC
The gcc docs calls it legacy atomics. So maybe HAVE_GCC_LEGACY_ATOMICS.

I think the code is trying to be too clever. My view is just use the compiler builtins and let the compiler figure out the optimal output. If you think the compiler output is not optimal, file a bug with the compiler.

I would be in favor of cleaning it up to just use the builtins and get rid of the broken optimizations.
Comment 5 Mikhail Fludkov 2017-11-06 12:11:18 UTC
Created attachment 135257 [details] [review]
Fix the code generation when using GCC legacy atomic operations

I think this should do
Comment 6 Mikhail Fludkov 2017-11-06 12:23:41 UTC
I've double checked that the patch fixes it. The code generated for the loop I mentioned above looks like now:
   36f00: mfence 
   36f03: cmpl   $0x2,0x2e871e(%rip)        # 31f628 <once.11793>
   36f0a: jne    36f00 <_cairo_image_traps_compositor_get+0x30>

previously it looked like:
0x7f3fd67d4710 <_cairo_image_traps_compositor_get+48>   cmp $0x2, %eax
0x7f3fd67d4713 <_cairo_image_traps_compositor_get+51>   jne 0x7f3fd67d4710 <_cairo_image_traps_compositor_get+48>
Comment 7 Mikhail Fludkov 2017-11-16 11:12:48 UTC
Hello.
Did anybody have a chance to have a look at the patch?
Comment 8 Bill Spitzak 2017-11-16 18:23:08 UTC
Apparently the old commands come from the "Intel Itanium Processor-specific Application Binary Interface, section 7.4". This is according to the gcc page https://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Atomic-Builtins.html. The Intel docuement: https://uclibc.org/docs/psABI-ia64.pdf

Therefore calling these "GCC_LEGACY" is probably wrong. "INTEL" is more accurate but certainly leads people to believe it has something to do with I86. I'm thinking they should be called "ICC" (the name of the Intel C Compiler is ICC) or maybe IIPsABI using the initial of the document title, or some variation of that.
Comment 9 Adrian Johnson 2017-11-16 19:52:21 UTC
(In reply to Bill Spitzak from comment #8)
> Therefore calling these "GCC_LEGACY" is probably wrong.

Why is it wrong? That is exactly what the gcc manual refers to them as:

https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gcc/_005f_005fsync-Builtins.html#g_t_005f_005fsync-Builtins

> I'm thinking they should be called "ICC" (the name of the Intel C
> Compiler is ICC)

We are talking about gcc builtins. Why would we call them "ICC"?

> or maybe IIPsABI using the initial of the document title,
> or some variation of that.

The gcc manual says they should not be used for new code. So I don't see the point of coming up with any name other than something intended to discourage use of them like "legacy" or "obsolete".
Comment 10 Bill Spitzak 2017-11-16 23:11:36 UTC
From the referenced document, first line:

"The following built-in functions are intended to be compatible with those described in the Intel Itanium Processor-specific Application Binary Interface, section 7.4. As such, they depart from normal GCC practice by not using the ‘__builtin_’ prefix and also by being overloaded so that they work on multiple types."

Sure sounds like they existed BEFORE gcc, still exist as the only option in some other compilers, and thus "GCC legacy" is wrong in any documentation that is not gcc-only. Cairo in theory can be compiled with ICC and that may require turning this switch on.

To be honest I think most programmers familiar with these call them the "__sync" functions, not "Intel" or "GCC" anything. But it still seems like the comment could be improved.

My recomendation is to call the define HAVE_ICC_SYNC_PRIMITIVES or HAVE_SYNC_PRIMITIVES.
Comment 11 Adrian Johnson 2017-11-26 09:53:07 UTC
Pushed

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.