Bug 49166

Summary: Sequence number byte swapping error in GLX replies after 2^16
Product: xorg Reporter: Kevin POCHAT <kpochat>
Component: Server/Ext/GLXAssignee: Adam Jackson <ajax>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: jeremyhu
Version: unspecified   
Hardware: All   
OS: All   
Whiteboard: 2012BRB_Reviewed
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 44202    

Description Kevin POCHAT 2012-04-26 01:18:47 UTC
Description of problem: 
When displaying remote OpenGL applications (using DISPLAY environment
variables) from a big-endian system to a Linux system using X.org x server (tested on xserver releases 1.6 (in SLES 11), 1.7 (Ubuntu Lucid Lynx), 1.10 (Ubuntu Oneiric Ocelot), and 1.12 (latest Ubuntu beta)), the sequence numbers applied to the GLX replies are badly byte-swapped.
All the GLXGet[Float|Integer|...][v]() methods on (from the libglx.so file) seem to swap the X.org internal sequence number (which is an integer) as a 16 bit short without masking it with 0xFFFF. This results in all replies sent after the 65635th having wrong sequence numbers.

The following behavior is observed : the sequence numbers from 0x0 to 0xFFFF are correctly swapped, i.e. 0x1234 -> 0x3412, but once 0x10000 is reached the bits matching the 0xFF0000 mask are binary-or'd with the ones matching the 0xFF mask, i.e. 0x12345 -> 0x4623 (instead of 0x4523).

The exact behavior can be observed in X.org's libglx.so, where the issue can be found in src/glx/indirect_util.c where the __glXSendReplySwap() method does a "bswap16( client->sequence )" (line 176 in current X.org git) while it should do "bswap16( client->sequence & 0xFFFF )". 


Steps to reproduce:
1. Run an openGL application from a big-endian system (Solaris/SPARC here), with DISPLAY set to a linux X.org server.
2. Use a tool like wireshark or tcpdump to observe the GLX replies sent by the X.org server to the client.
3. Manipulate the OpenGL application so that it sends more than 65635 GLX requests (resizing the window seems to trigger a lot of these, when window manager is configured to keep drawing the client while resizing its window)

Actual result: 
The sequence numbers field of the replies is not increasing in a linear way after the 65635th one (0x0000, 0x0001 ... 0xFFFF, 0x0100, 0x0200).

Expected result: 
The sequence numbers shall always increase linearily, rolling back to 0 after the 65635th one (0x0000, 0x0001 ... 0xFFFF, 0x0000, 0x0001)

Note : the same issue can be reproduced using ATI's fglrx driver (which provides its own libglx.so), and the corresponding bug was filed in ATI's unofficial linux driver bug tracker. Maybe the same error can be found in NVidia's proprietary driver release too.
Comment 1 Michel Dänzer 2012-04-26 01:53:01 UTC
(In reply to comment #1)
> [...] the issue can be found in src/glx/indirect_util.c where the
> __glXSendReplySwap() method does a "bswap16( client->sequence )" (line 176 in
> current X.org git) while it should do "bswap16( client->sequence & 0xFFFF )".

I'd argue that shouldn't be necessary, or there's a bug in bswap_16(). With current xserver Git, which implementation does glx/glxbyteorder.h end up using for you? I can definitely see how the fallback implementation:

#define bswap_16(value)  \
        ((((value) & 0xff) << 8) | ((value) >> 8))

could cause this problem, as it doesn't mask the value shifted down.


P.S. The reporter normally receives all bug mail anyway, you shouldn't need to be in the CC list.
Comment 2 Kevin POCHAT 2012-04-26 02:06:14 UTC
(In reply to comment #1)
> I can definitely see how the fallback implementation:
> 
> #define bswap_16(value)  \
>         ((((value) & 0xff) << 8) | ((value) >> 8))
> 
> could cause this problem, as it doesn't mask the value shifted down.

Indeed. I hesitated between to two fixes options in my report, but as for me it concerned specifically the sequence numbers (32 bits integer internally, but 16 bit short in the protocol), I chose the other one :)

> 
> P.S. The reporter normally receives all bug mail anyway, you shouldn't need to
> be in the CC list.

Good to know. First time using a bugzilla tracker in almost 7 years. Thanks.
Comment 3 Kevin POCHAT 2012-04-26 02:09:56 UTC
(In reply to comment #1)
> 
> I'd argue that shouldn't be necessary, or there's a bug in bswap_16(). With
> current xserver Git, which implementation does glx/glxbyteorder.h end up using
> for you? 

I'd settle for the fallback one, but cannot guarantee it 100%.
Comment 4 Kevin POCHAT 2012-04-26 02:18:19 UTC
Link for the ATI driver bug showing same behavior : http://ati.cchtml.com/show_bug.cgi?id=490
Comment 5 Jeremy Huddleston Sequoia 2012-06-12 03:43:59 UTC
Still waiting on feedback from originator.
Comment 6 Kevin POCHAT 2012-06-12 04:19:27 UTC
What additional feedback do you expect ?
Everything was described in the first post. I can only add that it definitely uses the "fallback" method cited earlier.

Best regards,
Comment 7 Jeremy Huddleston Sequoia 2012-06-12 09:10:04 UTC
Your response in comment 3 doesn't really answer the question, and it's confusing.  From your comment 6, I take it to mean that in your case it is using the macro that Michel pointed you to.

Have you tried the fix that he suggested?
Comment 8 Kevin POCHAT 2012-06-18 04:54:55 UTC
Indeed, comment 6 meant that it is using that macro.

What fix are you referring to ?
Comment 9 Jeremy Huddleston Sequoia 2012-06-18 09:42:35 UTC
(In reply to comment #8)
> Indeed, comment 6 meant that it is using that macro.
> 
> What fix are you referring to ?

From comment 1, "mask the value shifted down"
Comment 10 Jeremy Huddleston Sequoia 2012-06-18 09:43:18 UTC
ie, change it to:

#define bswap_16(value)  \
        ((((value) & 0xff) << 8) | ((value & 0xff00) >> 8))
Comment 11 Jeremy Huddleston Sequoia 2012-06-18 09:43:42 UTC
(In reply to comment #10)
> ie, change it to:
> 
> #define bswap_16(value)  \
>         ((((value) & 0xff) << 8) | ((value & 0xff00) >> 8))

or rather:

#define bswap_16(value)  \
        ((((value) & 0xff) << 8) | (((value) & 0xff00) >> 8))
Comment 12 Kevin POCHAT 2012-06-21 08:06:21 UTC
Our current setup does not allow us to recompile X.org with this modification. Moreover, this lib is overwritten by the binary drivers that we use, that contain the same issue. We worked around this by using an X11 proxy that corrects "on-the-fly" those sequence numbers.
Comment 13 Matt Dew 2013-02-28 06:26:27 UTC
Hi folks,
  Has any progress been made on this?
Comment 14 Adam Jackson 2018-04-23 20:24:40 UTC
Fixed (somewhat accidentally) by:

commit 563b6ee873b898c0f3e3671cf6adaf91def5d92a
Author: Eric Anholt <eric@anholt.net>
Date:   Mon Mar 27 14:59:06 2017 -0700

    Rewrite the byte swapping macros.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.