Bug 26511

Summary: Assertion while calling XPending() (xcb_io.c:242)
Product: xorg Reporter: Leonardo Chiquitto <leonardo>
Component: Server/GeneralAssignee: Xorg Project Team <xorg-team>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: xcb
Version: 7.4 (2008.09)   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
traffic captured on the ppc32 machine
none
traffic captured on the ppc32 machine (v2)
none
Proposed xserver patch none

Description Leonardo Chiquitto 2010-02-10 04:40:30 UTC
This problem can be reproduced consistently with the following combination:

* Machine A (x86-64) running openSUSE 11.3 (current development version)
* Machine B (ppc32) running openSUSE 11.3

How to reproduce:

* Run Epiphany or Firefox on machine B exporting the display to machine A

The following assertion failure will happen every 1-10 minutes:

  epiphany: xcb_io.c:242: process_responses: Assertion `(((long)
  (dpy->last_request_read) - (long) (dpy->request)) <= 0)' failed.

or

  firefox: xcb_io.c:242: process_responses: Assertion `(((long)
  (dpy->last_request_read) - (long) (dpy->request)) <= 0)' failed.

Here's the (almost) complete back trace from Epiphany:

#0  0x0d8c1a5c in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x0d8c3714 in abort () at abort.c:88
#2  0x0d8b8a48 in __assert_fail (assertion=
    0xfecfdd8 "(((long) (dpy->last_request_read) - (long) (dpy->request)) <=
0)", file=0xfecfcf0 "xcb_io.c", line=242, function=0xfecfc9c
"process_responses")
    at assert.c:78
#3  0x0fe4645c in process_responses (dpy=0x10118800, wait_for_first_event=0, 
    current_error=0x0, current_request=0) at xcb_io.c:242
#4  0x0fe46f64 in _XEventsQueued (dpy=0x10118800, mode=<value optimized out>)
    at xcb_io.c:256
#5  0x0fe2a7ac in XPending (dpy=0x10118800) at Pending.c:56
#6  0x0e0d8e20 in gdk_check_xpending (display=
    Traceback (most recent call last):
  File "/usr/share/glib-2.0/gdb/gobject.py", line 72, in to_string
    name = g_type_name_from_instance (self.val)
  File "/usr/share/glib-2.0/gdb/gobject.py", line 59, in
g_type_name_from_instance
    name = g_type_to_name (gtype)
  File "/usr/share/glib-2.0/gdb/gobject.py", line 26, in g_type_to_name
    return glib.g_quark_to_string (typenode["qname"])
  File "/usr/share/glib-2.0/gdb/glib.py", line 13, in g_quark_to_string
    val = read_global_var ("g_quarks")
  File "/usr/share/glib-2.0/gdb/glib.py", line 5, in read_global_var
    return gdb.selected_frame().read_var(symname)
ValueError: variable 'g_quarks' not found
) at gdkevents-x11.c:154
#7  0x0e0d8fd8 in gdk_event_check (source=<value optimized out>)
    at gdkevents-x11.c:2347
#8  0x0dc51918 in g_main_context_check () from /usr/lib/libglib-2.0.so.0
#9  0x0dc5225c in ?? () from /usr/lib/libglib-2.0.so.0
#10 0x0dc52b54 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
#11 0x0e38fec4 in IA__gtk_main () at gtkmain.c:1217
#12 0x100251b4 in main (argc=1, argv=0xbffff614) at ephy-main.c:778
Comment 1 Julien Cristau 2010-02-10 05:09:03 UTC
cc:ing the xcb people in case they can help.
Comment 2 Michel Dänzer 2010-02-10 08:18:53 UTC
I can't seem to reproduce this between Debian sid powerpc/amd64 machines. What versions of xserver, XCB and libX11 are you using?
Comment 3 Leonardo Chiquitto 2010-02-10 09:07:29 UTC
> I can't seem to reproduce this between Debian sid powerpc/amd64 machines. What
> versions of xserver, XCB and libX11 are you using?

On the powerpc machine:

 xorg-x11-server-7.4-66.15.ppc
 xorg-x11-libX11-7.4-15.2.ppc
 xorg-x11-libxcb-7.4-15.2.ppc

On the x86_64 machine:

 xorg-x11-server-7.4-67.4.x86_64
 xorg-x11-libX11-7.4-16.4.x86_64
 xorg-x11-libxcb-7.4-15.4.x86_64

Something that might be relevant: here, the x86_64 machine is a laptop. If I'm running on AC power and disconnect the cable, the browsers will die immediately. Although this is a way to trigger the problem "at will", I have to mention that it also happens during regular use and is not dependent on the laptop being on AC or battery.
Comment 4 Jamey Sharp 2010-02-10 11:17:12 UTC
This assert means that libX11 got responses from the X server for requests that it doesn't believe have been sent. I don't have a hypothesis yet about how that could happen.

I don't immediately see how the architecture of either machine could matter here. Can you check whether you can reproduce this bug on either the ppc32 machine or the x86-64 machine alone?

I'll probably need you to print the values of dpy->last_request_read and dpy->request at the point where the assertion fails, and it may help if you could attach a capture of the X network traffic in the same failing session using something like wireshark.

Judging by the assert line number, I think your libX11 must be at least version 1.1.99.2, but no later than 1.3. The only more recent change to xcb_io.c is a Cygwin build fix, which had better not matter.

I hope the two commits in between those versions don't matter on a 32-bit client, but I'm not certain that "Avoid datatype overflow on AMD64 and friends" was correct, so it'd be nice to know if that commit is involved. (I hadn't noticed it before today.)

Is OpenSUSE applying any patches to libX11's xcb_io.c? I'd guess not, but if so that would be important to know.

I suspect the versions of the server and libxcb don't matter, which is fortunate since the OpenSUSE version numbers are meaningless to me.

> Something that might be relevant: here, the x86_64 machine is a laptop. If I'm
> running on AC power and disconnect the cable, the browsers will die
> immediately.

Sounds like the problem occurs when an event arrives, which makes sense. I'm curious what event your desktop environment is triggering on the switch to battery, but it doesn't matter.

I'm also curious how you got a Python traceback in the middle of a gdb stack trace...
Comment 5 Peter Harris 2010-02-10 11:44:12 UTC
(In reply to comment #4)
> This assert means that libX11 got responses from the X server for requests that
> it doesn't believe have been sent. I don't have a hypothesis yet about how that
> could happen.

Hypothesis: Your X server forgot to swap the sequence number in some event or reply.

Eg. A commit similar to 3f2e4b9867 may be required to fix the bug (if it hasn't been already).

> I'll probably need you to print the values of dpy->last_request_read and
> dpy->request at the point where the assertion fails, and it may help if you
> could attach a capture of the X network traffic in the same failing session
> using something like wireshark.

Definitely take a Wireshark trace. It will prove or disprove my hypothesis.
Comment 6 Leonardo Chiquitto 2010-02-12 06:25:38 UTC
> I don't immediately see how the architecture of either machine could matter
> here. Can you check whether you can reproduce this bug on either the ppc32
> machine or the x86-64 machine alone?

No, I can't reproduce the problem on ppc32 or x86_64 when running Epiphany/Firefox locally. I can't say for sure about the possibility of an architecture dependent bug, but for me this looks like the case (different endianness).

> I'll probably need you to print the values of dpy->last_request_read and
> dpy->request at the point where the assertion fails, and it may help if you
> could attach a capture of the X network traffic in the same failing session
> using something like wireshark.

I'll attach the traffic capture. It was collected with:

  # tcpdump -s 0 -n -w xorg-epiphany.cap -i eth0 port 6000

After I started tcpdump and Epiphany, it took less than one minute for the assertion failure to happen. Please let me know if the values you mentioned above are not in the capture and I'll patch the library to print them.

> Judging by the assert line number, I think your libX11 must be at least version
> 1.1.99.2, but no later than 1.3. The only more recent change to xcb_io.c is a
> Cygwin build fix, which had better not matter.

It's libX11 1.2.2. Sorry for not providing useful version numbers before.

> Is OpenSUSE applying any patches to libX11's xcb_io.c? I'd guess not, but if so
> that would be important to know.

We have 13 patches on xorg-x11-libX11, but none of them touch xcb files.

> I suspect the versions of the server and libxcb don't matter, which is
> fortunate since the OpenSUSE version numbers are meaningless to me.

Here are the correct version numbers:

libxcb 1.5
xorg-server 1.6.5

Thanks for your prompt responses and attention.
Comment 7 Leonardo Chiquitto 2010-02-12 06:27:42 UTC
Created attachment 33252 [details]
traffic captured on the ppc32 machine
Comment 8 Leonardo Chiquitto 2010-02-12 06:39:00 UTC
Created attachment 33253 [details]
traffic captured on the ppc32 machine (v2)

While the first attachment was captured without "external interference" (ie, I just started Epiphany and waited for it to crash, without touching keyboard or mouse), this one was captured during the following sequence of events:

1. Started tcpdump and Epiphany (same use case: running on ppc32 with $DISPLAY
   pointing to the x86_64 laptop)
2. Unplugged laptop's power cable
3. Assertion failure happened immediately.

I believe the cause is the same and both traffic captures will be similar, but just in case...
Comment 9 Peter Harris 2010-02-12 07:01:13 UTC
Created attachment 33254 [details] [review]
Proposed xserver patch

Thanks for the wireshark trace.

As I suspected, it is a swapping bug in your x server. Please try this patch, and let us know if it fixes the problem (I don't have an MSBFirst machine handy).
Comment 10 Julien Cristau 2010-02-12 07:11:42 UTC
Moving this bug to the server per Peter's analysis.
Comment 11 Leonardo Chiquitto 2010-02-12 10:34:14 UTC
I confirm that the patch in comment #9 resolves this problem. Thanks a lot Peter and everyone involved for the extremely quick response time and fix!
Comment 12 Julien Cristau 2010-02-12 15:02:51 UTC
Fixed in xserver master, thanks for the report!

commit 97b03037f4d99fcebc7603011f41c3aff9871ce2
Author: Peter Harris <pharris@opentext.com>
Date:   Fri Feb 12 15:36:30 2010 -0500

    Don't double-swap the RandR PropertyNotify event
    
    The event is already swapped in randr.c/SRROutputPropertyNotifyEvent, so
    it should not be swapped here.
    
    X.Org Bugzilla #26511: http://bugs.freedesktop.org/show_bug.cgi?id=26511
    
    Tested-by: Leonardo Chiquitto <leonardo@ngdn.org>
    Acked-by: Adam Jackson <ajax at redhat.com>
    Reviewed-by: Julien Cristau <jcristau at debian.org>
    Signed-off-by: Peter Harris <pharris@opentext.com>
    Signed-off-by: Keith Packard <keithp@keithp.com>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.