Bug 54372 - GLX_INTEL_swap_event crashes driver when swapping window buffers
Summary: GLX_INTEL_swap_event crashes driver when swapping window buffers
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: GLX (show other bugs)
Version: 10.1
Hardware: x86-64 (AMD64) Linux (All)
: highest critical
Assignee: Eric Anholt
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
: 79711 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-09-02 02:46 UTC by danmanj
Modified: 2014-10-26 02:16 UTC (History)
5 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Edited backtrace of glx crash (1.27 KB, text/plain)
2014-06-07 05:37 UTC, danmanj
Details
patch to fix glxusefont (1.51 KB, text/plain)
2014-06-22 20:29 UTC, danmanj
Details

Description danmanj 2012-09-02 02:46:05 UTC
Hi, I sent this to debian's BTS, they told me to send it to mesa-dev, and mesa-dev ignored it, so I'm sending it to your BTS as well.

Have great day!


Package: libgl1-mesa-glx
Version: 8.0.4-1
Severity: important
Tags: patch, upstream

Dear Maintainer,
When enabling GLX_INTEL_swap_event for windows:
in the file src/glx/dri2.c:
DRI2WireToEvent assumes the event it is generating is asociated with a
GLXPixmap, and tries to look up the pixmap in the glxDrawHash hash
table, but as the buffer swap is for a _WINDOW_ not a pixmap the result
is NULL. DRI2WireToEvent then dereferences the null pointer, and SEGVs.

The attached patch handles this situation gracefully.

diff --git b/src/glx/dri2.c a/src/glx/dri2.c
index b1b5013..2cbe968 100644
--- b/src/glx/dri2.c
+++ a/src/glx/dri2.c
@@ -130,10 +130,17 @@ DRI2WireToEvent(Display *dpy, XEvent *event, xEvent *wire)
         aevent->msc = ((CARD64)awire->msc_hi<<  32) | awire->msc_lo;

         glxDraw = GetGLXDrawable(dpy, pdraw->drawable);
-      if (awire->sbc<  glxDraw->lastEventSbc)
-        glxDraw->eventSbcWrap += 0x100000000;
-      glxDraw->lastEventSbc = awire->sbc;
-      aevent->sbc = awire->sbc + glxDraw->eventSbcWrap;
+      if(glxDraw)
+      {
+         if (awire->sbc<  glxDraw->lastEventSbc)
+           glxDraw->eventSbcWrap += 0x100000000;
+         glxDraw->lastEventSbc = awire->sbc;
+         aevent->sbc = awire->sbc + glxDraw->eventSbcWrap;
+      }
+      else
+      {
+         aevent->sbc = awire->sbc;
+      }
Comment 1 danmanj 2012-12-15 19:35:47 UTC
Hi guys, It's been over 3 months since I reported this CRASH bug complete with a patch that fixes it, but I have seen zero response from y'all.

In fact I have tried to get this fixed via the debian X team, the mesa-dev list, and the freedesktop.org bug tracker, but no response. I don't know how else to get your attention.

Can somebody please take a look and apply the patch?

Thanks a ton,
Dan
Comment 2 Kenneth Graunke 2012-12-15 21:53:12 UTC
Eric, mind taking a look at this?  You understand GLX swap events better than anyone at this point.
Comment 3 danmanj 2013-01-05 23:30:47 UTC
Hi guys, 

The GLX_INTEL_swap_event extension is still broken, and upon further testing things are slightly worse: The extension crashes on intel, as it has for several months now, but it appears that the extension is erroneously reported as being supported for non-intel chipsets, even though the swap events are never actually generated.

I've had to write code into my application to work around the fact that the extension is reported as supported, but doesn't actually do anything, defeating the whole point of having an extension string in the first place.

Can somebody please get mesa to not report this extension for the free ati/nvidia drivers since they do not work, and would somebody please look into applying the patch I sent months ago to fix the crash in intel?

At this point the extension is 100% useless except on my machine with my patch applied.
Comment 4 danmanj 2013-06-28 06:56:01 UTC
Hi guys. It's been 9 months since I reported this crash bug, and sent you a patch that fixes it. I just tested your 9.1.3 drivers and they seem to crash the same way as the 8.0 version that I wrote this patch for.

There has been no visible progress, no additional email messages asking for clarification, no criticism on the patch being good/bad. Nothing.

For 9 whole months. Nothing.

The code still crashes without the patch, and I've run out of patience. I wasted an entire day of my life writing the fix, and several hours trying to get y'all to respond and actually integrate it.

Will SOMEBODY with commit access please just apply the damn patch already, or explain what is wrong with the patch that makes it's preferable to just have applications that attempt to use the INTEL_swap_event extension crash?

If you have no intention of supporting this extension then please, FOR THE LOVE OF GOD stop reporting it in the gl extension string. It forces me and other application writers to maintain a blacklist of mesa versions that LIE about what extensions they support. And we have to keep that code around FOREVER.
Comment 5 danmanj 2014-06-06 04:07:37 UTC
Hi,

Nearly 2 Years ago I sent you this patch to fix a crash bug in mesa 8.0.x.

We are now up to Mesa 10.1.4 and the library still crashes.

_EVERY SINGLE_ version of mesa's GLX libraries from 8. to 10.1 has had this CRASH bug, even though a patch has been sitting here at your doorstep.

AFICT nobody has ever even looked at the patch.

I'm going to go into "public shaming" mode soon...
Comment 6 Tapani Pälli 2014-06-06 10:37:15 UTC
Do you have some small test app to trigger this bug? I haven't worked on GLX code but it would help to understand the problem better.

In Piglit test suite there is a test "./tests/glx/glx-swap-event.c" that creates a window, sets event mask to GLX_BUFFER_SWAP_COMPLETE_INTEL_MASK. On my machine (SNB) it receives those events and works fine.
Comment 7 Tapani Pälli 2014-06-06 10:39:24 UTC
Also please include some information on your setup, which version of X server, Mesa and so on.
Comment 8 danmanj 2014-06-07 05:37:56 UTC
Created attachment 100587 [details]
Edited backtrace of glx crash
Comment 9 danmanj 2014-06-07 05:52:11 UTC
Attached is a backtrace from the crash with the 10.1.4 glx libs.

The same patch attached here still fixes the crash.

It would take me days to produce a small test program for you. My program is quite complex.

Let me give you my take on what is happening <plus 2 years of foggy memmory>.

A bufferswap event has been generated and the app process has received it.

DRI2 WireToEvent is supposed to translate the serialized event into a memory-resident XEvent struct.

DRI2WireToEvent looks for a GLX Drawable that matches the input event with GetGLXDrawable. It needs to do this just in case there has been an overflow in the swap count to do basic ccounting <apparently>.

UNEXPECTEDLY GetGLXDrawable returns NULL. This is allowed. If somebody sends you a bogus XEvent you SHOULD NOT CRASH.

DRI2WireToEvent doesn't bother checking if the pointer is NULL. Dereferences it, and CRASH occurs.

My opinion: Even if you decide there is a separate bug causing the NULL return you still need this patch. As things stand, any random delayed XEvent can crash any glx program.
Comment 10 danmanj 2014-06-07 05:57:39 UTC
Regarding your request for info about my system, X server etc:

I have tested my application with over 40 GPUs, and 7 host systems. The full matrix is just a distraction.

In good news the radeon r300 driver finally actually sends the buffer_swap events. It previously reported that it supported the extension, but never generated the events, leaving the app hanging.
Comment 11 Ian Romanick 2014-06-07 07:29:55 UTC
I don't know how I never saw this bug before.  Sorry about that. :(  I'm also adding Jesse to the CC list... because git-blame says he wrote all of this code.

The NULL pointer from GetGLXDrawable is a pretty catastrophic event at this point in the code.  GetGLXDrawable and dri2GetGlxDrawableFromXDrawableId (earlier in the function) look up the drawable IDs in separate hash tables.  The entry for dri2GetGlxDrawableFromXDrawableId is added in dri2CreateDrawable, and the entry for GetGLXDrawable is added in InitGLXDrawable.  Both of these functions are called together (see CreateDrawable in glx_pbuffer.c for an example).

The entries are removed from the hash tables by DestroyGLXDrawable and DestroyDRIDrawable.  These are also called together (see DestroyDrawable in glx_pbuffer.c).

So... if one hash table has the entry and the other does not, that suggests that we're between the DestroyGLXDrawable and DestroyDRIDrawable calls... which seems hella crazy because your backtrace is in the middle of a glXSwapBuffers call (though perhaps not for the same drawable?).

Assuming you can recreate this crash at will, at the time of the crash:

1. What is the value of awire->drawable?

2. What is the value of pdraw->drawable?

3. In the dri2SwapBuffers frame, what is the value of pdraw?  This is just from the backtrace.

4. In the glXSwapBuffers frame, what is the value of drawable?  This is just from the backtracke.

5. Is your app multithreaded?
Comment 12 danmanj 2014-06-08 03:37:54 UTC
Ok, I have spent another day looking into all this. I am more confident than before that this patch is _EXACTLY_ correct. Let me break it down for posterity:

Prior to GLX 1.3 there was the glxMakeCurrent() function that took a single drawable handle. The Drawable could be either a bare XID for a Window or an XID for a glxpixmap.

GLX 1.3 added glxMakeContextCurrent that takes 2 handles: one for reading, one for writing. Nowadays the old glxMakeCurrent call is implemented as a call to glxMakeContextCurrent with the single handle duplicated.

Because of this it is allowed to use a plain-old Window ID as an argument to glxMakeContextCurrent, although nobody really documents this sort of thing. The manpage for the NEW call specifies the arguments as GLXPixmaps, but the actual code accepts Window XIDs too, and handles them correctly.

Similarly, the glxSelectEvents function can also take a bare Window XID.

The "piglit" tests all use GLXWindows and/or GLXPixmaps. You never tested swap events with a bare Window XID. That is what my app was doing.

The swap_events code worked with Window XIDs in mesa 7.x.y. The new code added in versions 8, 9, and 10 assumes that all buffer swap events have a GLXPixmap associated with them. Because of the historical quirks above THIS IS NOT TRUE. Swap events for bare Window XIDs do NOT have a glxpixmap.

I repeat ____THIS IS NOT TRUE____. It will crash.

So... any app that uses the old school glxMakeCurrent call with a Window XID while trying to use swap_events will crash when the libs try to lookup the nonexistent GLXPixmap associated with the incoming swap event.

I believe that the people who wrote the spec overlooked this, because the "sbc" field comes from the OML_sync extension that is defined in terms of glxpixmaps only.

Regardless... The only reason that the library is looking up the glxpixmap is to handle counter overflows nicely. I can tell you that nobody cares about that. If the choice is a crash or handling counter wraps manually I'll take counter manual wraps.

Please just apply the patch already and update the spec to say counter wraps only work nicely with glxpixmaps.

I can work around this bug by creating glxpixmaps instead of using a raw Window XID for my main window. But you're still stuck having to handle this case. I've thought deeply about your options, and my patch is the least painful.

I could write more but I'm hoping that you are convinced by now.
Comment 13 Tapani Pälli 2014-06-08 04:07:33 UTC
(In reply to comment #12)
 
> Prior to GLX 1.3 there was the glxMakeCurrent() function that took a single
> drawable handle. The Drawable could be either a bare XID for a Window or an
> XID for a glxpixmap.
> 
> GLX 1.3 added glxMakeContextCurrent that takes 2 handles: one for reading,
> one for writing. Nowadays the old glxMakeCurrent call is implemented as a
> call to glxMakeContextCurrent with the single handle duplicated.
> 
> Because of this it is allowed to use a plain-old Window ID as an argument to
> glxMakeContextCurrent, although nobody really documents this sort of thing.
> The manpage for the NEW call specifies the arguments as GLXPixmaps, but the
> actual code accepts Window XIDs too, and handles them correctly.

Can you pinpoint to such documentation? I can't find historical API reference where glxMakeCurrent would take XID as parameter (for example GLX 1.2 does not, it requires a GLXDrawable handle).

I think in theory you should be getting GLXBadDrawable error here.

("GLXBadDrawable is generated if drawable is not a valid GLX drawable.")

> The "piglit" tests all use GLXWindows and/or GLXPixmaps. You never tested
> swap events with a bare Window XID. That is what my app was doing.

We could add such a test but it should probably fail in the above way. I don't know enough of X server to be 100% sure though, I would imagine it has unique handles for every resource but this starts to look like it does not.

> I can work around this bug by creating glxpixmaps instead of using a raw
> Window XID for my main window. But you're still stuck having to handle this
> case. I've thought deeply about your options, and my patch is the least
> painful.

I would consider this not a workaround but a fix to your application as you need to use today's API documentation and specs. I do agree that there might be a problem with the API too to detect this situation.
Comment 14 danmanj 2014-06-08 04:11:00 UTC
from
https://www.opengl.org/sdk/docs/man2/xhtml/glXSelectEvent.xml

draw    Specifies a GLX drawable. Must be a GLX pixel buffer or a window.
                

from
http://www.tru64unix.compaq.com/docs/base_doc/DOCUMENTATION/V51B_HTML/MAN/MAN3/2118____.HTM

drawable      Specifies a GLX drawable. Must be either an X window ID or a GLX pixmap ID.
Comment 15 danmanj 2014-06-08 04:25:18 UTC
addemdum:

Since buffer_swap is the newest feature it's the lest painful place to make a change. Changing the way old-school functions work is a bad idea, even if you think you can language-lawyer your way into it by saying "use the new modern API". This change is so trivial I would be amazed if you chose another option.
Comment 16 danmanj 2014-06-08 04:51:47 UTC
BTW kudos on the swap_events extension in general. I was very excited about this when it first came out (although my excitement waned in the last 22 months or so). Today I tested the modern r300, radeon, nouvea, GM45, and HD2000 drivers, and they all work as expected. I hope nvidia/ATI are close to implementing this in the proprietary drivers, it really makes application logic easier, and smooths out rendering.
Comment 17 Alexander Monakov 2014-06-08 09:40:22 UTC
This may be related to bug 54080, comment 3: glXMakeCurrent on plain X windows (which is allowed by the GLX specification, see glx1.4.pdf, section 2.1, last paragraph: "For backwards compatibility, ...") does not properly register the drawable in X server's GLX subsystem.
Comment 18 Tapani Pälli 2014-06-08 10:17:16 UTC
(In reply to comment #17)
> This may be related to bug 54080, comment 3: glXMakeCurrent on plain X
> windows (which is allowed by the GLX specification, see glx1.4.pdf, section
> 2.1, last paragraph: "For backwards compatibility, ...") does not properly
> register the drawable in X server's GLX subsystem.

Yes, seems related. Thanks for the reference (and links in comment #14). Looks like a backwards compatibility issue, did not know the API allows such behavior.
Comment 19 danmanj 2014-06-08 16:15:28 UTC
*** Bug 79711 has been marked as a duplicate of this bug. ***
Comment 20 Jesse Barnes 2014-06-09 15:26:45 UTC
Ooh I'm going to have to page in the context on this one...  but your description does sound correct, and I have no problem with applying the patch to avoid your crashes.  I do remember fixing other bugs related to pixmap vs window ID confusion back in the early days on this one, so I'm not surprised that there may be some left.

Ian, do you want to update the spec with the additional info?  Sounds like it's just some corner case handling we need to document (and of course fix the crash), so not a big deal from an app perspective.
Comment 21 danmanj 2014-06-11 05:21:14 UTC
*SIGH*

Well this bug has led to a new bug. I'm just going to post here since I have your attention :)

I changed my app to use glxwindows to avoid the crash, and it turns out that this breaks glXUseXFont()

:'(


In detail:

DRI_glXUseXFont() creates a Pixmap and X11 graphics context relative to the current glx context (see the line 
win = CC->currentDrawable;
)

But now that I'm using glxwindows the currentDrawable isn't actually an XID for a Window, so the plain-old X11 subsystem refuses, and all subsequent operations fail. I get a stream of errors like:

[17098]    7.810910: *** ERROR: xid=0x400010 req=53/0 err=9: X_CreatePixmap: BadDrawable (invalid Pixmap or Window parameter)
[17098]    7.810919: *** ERROR: xid=0x4002cf req=55/0 err=9: X_CreateGC: BadDrawable (invalid Pixmap or Window parameter)
[17098]    7.810923: *** ERROR: xid=0x4002cf req=54/0 err=4: X_FreePixmap: BadPixmap (invalid Pixmap parameter)
[17098]    7.810926: *** ERROR: xid=0x400010 req=53/0 err=9: X_CreatePixmap: BadDrawable (invalid Pixmap or Window parameter)
[17098]    7.810930: *** ERROR: xid=0x4002d1 req=70/0 err=9: X_PolyFillRectangle: BadDrawable (invalid Pixmap or Window parameter)
[17098]    7.810934: *** ERROR: xid=0x4002d1 req=56/0 err=13: X_ChangeGC: BadGC (invalid GC parameter)

(1000 or so more lines of errors deleted)

So now I can either have text or I can have swap events, but not both!

===

To follow up on the whole "swap events reported but not actually generated" complaint: It looks like this was fixed in http://lists.x.org/archives/xorg-devel/2013-February/035449.html . I've added code to my app to detect this and work around it by falling back to not using swap events.

===

To follow up on the spec: the line of text "The mask field indicates which of the buffers in the drawable were swapped." Is referring to what must have been a deleted field. Probably want to remove this.
Comment 22 danmanj 2014-06-12 01:10:41 UTC
Ok, so this patch fixes glxUseXFont for glxwindows.

Any idea when these patches will go in?


diff --git a/src/glx/xfont.c b/src/glx/xfont.c
index 316c585..3c34b1d 100644
--- a/src/glx/xfont.c
+++ b/src/glx/xfont.c
@@ -221,6 +221,7 @@ DRI_glXUseXFont(struct glx_context *CC, Font font, int first, int count, int lis
    XGCValues values;
    unsigned long valuemask;
    XFontStruct *fs;
+   __GLXDRIdrawable *wtf;
 
    GLint swapbytes, lsbfirst, rowlength;
    GLint skiprows, skippixels, alignment;
@@ -233,6 +234,10 @@ DRI_glXUseXFont(struct glx_context *CC, Font font, int first, int count, int lis
    dpy = CC->currentDpy;
    win = CC->currentDrawable;
 
+   wtf = GetGLXDRIDrawable(CC->currentDpy, CC->currentDrawable);
+   if(wtf)
+          win = wtf->xDrawable;
+
    fs = XQueryFont(dpy, font);
    if (!fs) {
       __glXSetError(CC, GL_INVALID_VALUE);
Comment 23 danmanj 2014-06-20 04:36:55 UTC
Hi, it has been a week since my last ping message and your git repository still doesn't have either of these bug fixes in it. What's the schedule?
Comment 24 Ian Romanick 2014-06-20 18:07:07 UTC
I pushed the first patch to master, and it should end up in the next 10.2 release and probably in the final 10.1 release.  Could you send your other patches to the mesa-dev mailing list with a full commit message?  You can CC me on them to reduce the odds that they'll get lost in my inbox flood.

commit 86bd2196b4ccec50443e99e6c8bf1659e1df9f37
Author: Daniel Manjarres <danmanj@gmail.com>
Date:   Fri Jun 20 10:51:33 2014 -0700

    glx: Don't crash on swap event for a Window (non-GLXWindow)
    
    Prior to GLX 1.3 there was the glxMakeCurrent() function that took a
    single drawable handle. The Drawable could be either a bare XID for a
    Window or an XID for a glxpixmap.
    
    GLX 1.3 added glxMakeContextCurrent that takes 2 handles: one for
    reading, one for writing. Nowadays the old glxMakeCurrent call is
    implemented as a call to glxMakeContextCurrent with the single handle
    duplicated.
    
    Because of this it is allowed to use a plain-old Window ID as an
    argument to glxMakeContextCurrent, although nobody really documents this
    sort of thing. The manpage for the NEW call specifies the arguments as
    GLXPixmaps, but the actual code accepts Window XIDs too, and handles
    them correctly.
    
    Similarly, the glxSelectEvents function can also take a bare Window XID.
    
    The "piglit" tests all use GLXWindows and/or GLXPixmaps. You never
    tested swap events with a bare Window XID. That is what my app was
    doing.
    
    The swap_events code worked with Window XIDs in mesa 7.x.y. The new code
    added in versions 8, 9, and 10 assumes that all buffer swap events have
    a GLXPixmap associated with them. Because of the historical quirks
    above, this is not true. Swap events for bare Window XIDs do NOT have a
    glxpixmap resulting in a segfault.
    
    Any app that uses the old school glxMakeCurrent call with a Window XID
    while trying to use swap_events will crash when the libs try to lookup
    the nonexistent GLXPixmap associated with the incoming swap event.
    
    I believe that the people who wrote the spec overlooked this, because
    the "sbc" field comes from the OML_sync extension that is defined in
    terms of glxpixmaps only.
    
    v2 (idr): Formatting changes.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54372
    Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
    Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
    Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Comment 25 danmanj 2014-06-22 20:29:10 UTC
Created attachment 101536 [details]
patch to fix glxusefont
Comment 26 danmanj 2014-06-22 20:29:39 UTC
Sent, waiting for moderator approval.
Comment 27 danmanj 2014-08-03 02:30:04 UTC
Sorry to overload this bug, but I sent the glxusexfont patch to mesa-dev June 22, and it seems not to have been merged yet...... Ian can you help get this done?

http://lists.freedesktop.org/archives/mesa-dev/2014-June/062007.html
Comment 28 Emil Velikov 2014-10-26 02:16:01 UTC
The latter patch is part of master and mesa 10.3.2.
Thanks for the patches Daniel.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.