Bug 23403

Summary: compiler padding causes reply parsing to use incorrect offsets
Product: XCB Reporter: Aaron <m.aaron.young>
Component: LibraryAssignee: xcb mailing list dummy <xcb>
Status: RESOLVED MOVED QA Contact: xcb mailing list dummy <xcb>
Severity: normal    
Priority: medium CC: adrake, guillem, robert.ancell
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: Test case showing incorrect offset handling in xcb_sync_systemcounter_name

Description Aaron 2009-08-18 20:49:49 UTC
This is known to affect xcb_sync_systemcounter_t, which should have a size of 14 but due to padding has a size of 16.  This causes xcb_sync_systemcounter_name to miss the first two bytes of the name, and each call to xcb_sync_systemcounter_next gets off by two bytes as well.

It's possible that other structures are also affected by this same issue.
Comment 1 Alan Coopersmith 2012-05-11 22:17:49 UTC
Created attachment 61472 [details]
Test case showing incorrect offset handling in xcb_sync_systemcounter_name

I just hit this again today, and extracted the attached test case to prove it,
then found this bug when searching before filing a new one.

When compiled with either -m32 or -m64, it outputs (run against Xorg post-1.12
git master):

4c: VICEIDLETIME 7 (resolution: 4)
  Actual name: DEVICEIDLETIME 7
4b: VICEIDLETIME 6 (resolution: 4)
  Actual name: DEVICEIDLETIME 6
4a: VICEIDLETIME 5�� (resolution: 4)
  Actual name: DEVICEIDLETIME 5
49: VICEIDLETIME 4 (resolution: 4)
  Actual name: DEVICEIDLETIME 4
48: VICEIDLETIME 3 (resolution: 4)
  Actual name: DEVICEIDLETIME 3
47: VICEIDLETIME 2 (resolution: 4)
  Actual name: DEVICEIDLETIME 2
3f: LETIME  (resolution: 4)
  Actual name: IDLETIME
3e: RVERTIMEh (resolution: 4)
  Actual name: SERVERTIME
sizeof(xcb_sync_systemcounter_t) = 16
Comment 2 Alan Coopersmith 2012-05-11 22:24:43 UTC
Quick, nasty hack workaround:

#define xcb_sync_systemcounter_name(sc) (((char *) &(sc)->name_len) + 2)
Comment 3 David Faure 2012-07-26 11:30:40 UTC
I just hit this issue while porting the KDE Framework kidletime to XCB (since Qt5 uses XCB).

xcb/sync.h says (with my comments added about field size) :

typedef struct xcb_sync_systemcounter_t {
    xcb_sync_counter_t counter; /**<  */                  // 4 bytes
    xcb_sync_int64_t   resolution; /**<  */               // 8 bytes
    uint16_t           name_len; /**<  */                 // 2 bytes
} xcb_sync_systemcounter_t; // sizeof says 16 bytes due to padding

but sync.c (in the XCB sources) says:

char *
xcb_sync_systemcounter_name (const xcb_sync_systemcounter_t *R  /**< */)
{
    return (char *) (R + 1);
}

Shouldn't that be

char *
xcb_sync_systemcounter_name (const xcb_sync_systemcounter_t *R  /**< */)
{
    return (char *) (R) + 14;
}
(gdb confirms that the name is after the struct, as I expected).

Apparently this comes from c_client.py, in the code that says
        if field.prev_varsized_field == None:
            _c('    i.data = (%s *) (R + 1);', field.c_field_type)
which comes from field.type.fixed_size(), but I can't find the definition of fixed_size() anywhere...

I don't really understand how the presence of 2 bytes of padding is causing a "+14" to turn into a "+1"?

Alan: thanks for the workaround.
Comment 4 Josh Triplett 2012-07-26 14:23:53 UTC
(In reply to comment #3)
> xcb/sync.h says (with my comments added about field size) :
> 
> typedef struct xcb_sync_systemcounter_t {
>     xcb_sync_counter_t counter; /**<  */                  // 4 bytes
>     xcb_sync_int64_t   resolution; /**<  */               // 8 bytes
>     uint16_t           name_len; /**<  */                 // 2 bytes
> } xcb_sync_systemcounter_t; // sizeof says 16 bytes due to padding

Yeah.  Normally, the X protocol padding rules always agree with compiler padding rules, but not when a struct ends with a smaller-than-4-byte field.

> but sync.c (in the XCB sources) says:
> 
> char *
> xcb_sync_systemcounter_name (const xcb_sync_systemcounter_t *R  /**< */)
> {
>     return (char *) (R + 1);
> }
> 
> Shouldn't that be
> 
> char *
> xcb_sync_systemcounter_name (const xcb_sync_systemcounter_t *R  /**< */)
> {
>     return (char *) (R) + 14;
> }
> (gdb confirms that the name is after the struct, as I expected).
> 
> Apparently this comes from c_client.py, in the code that says
>         if field.prev_varsized_field == None:
>             _c('    i.data = (%s *) (R + 1);', field.c_field_type)
> which comes from field.type.fixed_size(), but I can't find the definition of
> fixed_size() anywhere...
> 
> I don't really understand how the presence of 2 bytes of padding is causing a
> "+14" to turn into a "+1"?

Notice that the +1 occurs inside the parentheses, and R has type "const xcb_sync_systemcounter_t *".  C pointer math applies: +1 advances by sizeof(xcb_sync_systemcounter_t).  In this case, the structure has the wrong size, so that advances by 16 bytes instead of 14.

I don't think it makes sense to leave protocol structs incorrectly padded and manually compute their real size, ignoring the compiler.  While using something like __attribute__((packed)) doesn't work portably across compilers, assuming that the padding only occurs at the *end* of the structure (and not, for instance, right before the last field) doesn't work portably either.
Comment 5 Andrew Drake 2012-12-17 11:01:14 UTC
This also affects xcb_sync_systemcounter_next, leading to all sorts of undesirable behavior if you try to actually iterate with it.

Given that there's already assumptions about compiler padding behavior as Josh notes, would the project accept a patch that makes c_client.py add explicit packing attributes to protocol structures? I could also add compile-time asserts to prevent bugs of this form from cropping up again.
Comment 6 Daniel Martin 2012-12-17 15:19:21 UTC
(In reply to comment #4)
> I don't think it makes sense to leave protocol structs incorrectly padded
> and manually compute their real size, ignoring the compiler.  While using
> something like __attribute__((packed)) doesn't work portably across
> compilers, assuming that the padding only occurs at the *end* of the
> structure (and not, for instance, right before the last field) doesn't work
> portably either.

I think adding the packed attribute and thereby possibly breaking it for other compilers is a minor issue then leaving it broken for all (mainly gcc).

But, I don't have any experience with other compilers. What do they do if they find an unsupported attribute (warning, error, ignore it silently)?
Comment 7 Daniel Martin 2012-12-17 15:35:10 UTC
(In reply to comment #5)
> Given that there's already assumptions about compiler padding behavior as
> Josh notes, would the project accept a patch that makes c_client.py add
> explicit packing attributes to protocol structures? I could also add
> compile-time asserts to prevent bugs of this form from cropping up again.

Imho asserts at compile time are to late. The generator should complain about that.

My suggestion would be:

a) handle "packed" from the beginning
   - add a "packed" attribute to the struct in the schema (xcb.xsd)
   - handle it in xtypes.py
   - handle it in c_client.py
b) preventive measures:
   - enhance xtypes.py to watch out for unpadded structs

From my pov those additions don't sound hard to realize. (Well, I played with length handling during the past 1,5 week for xinput.)
Comment 8 Daniel Martin 2012-12-31 10:15:43 UTC
Forgotten to link my patch:
http://lists.freedesktop.org/archives/xcb/2012-December/008032.html
Comment 9 Peter Harris 2013-01-03 21:06:57 UTC
In general, I am opposed to language-specific cruft in the protocol description.

If I may make a suggestion, all structs/requests/replies/etc should be treated as packed. Instead of adding a "packed" attribute, explicit padding should be added to the few remaining objects that currently contain implicit padding.

c_client.py (and/or xtypes.py) could then be enhanced to automatically notice structs that don't fit the default C alignment rules and add whichever annotations the C compiler needs.

What do you think?
Comment 10 Guillem Jover 2017-10-30 10:32:59 UTC
Hmm, very recently I was trying to use the XCB SYNC bindings and spent several hours wondering whether my code was wrong, or even if my system was wrong, when I realized this was a problem with the bindings implementation, then to notice this bug report.

Please, could the proposed patches be merged right away, even if someone else wants to implement something better (more automatic or transparent) later? Because as it stands this is exposing a completely broken interface, and even if the fixes might be considered perhaps suboptimal, they are certainly better in any measurable way, as they make the interface work at all. Otherwise please consider removing the bindings for this specific part, which would be better to avoid confusing people? :(
Comment 11 Daniel Stone 2019-02-16 19:46:43 UTC
Unfortunately Bugzilla fails to generate any well-formed XML for this bug, so we can't copy it to GitLab automatically. But it should definitely be raised and preserved there.
Comment 12 Uli Schlachter 2019-02-18 08:05:44 UTC
Now this is really moved: https://gitlab.freedesktop.org/xorg/lib/libxcb/issues/36

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.