Summary: | x11-libs/libxcb media-libs/mesa segfault in __glXGetString | ||
---|---|---|---|
Product: | Mesa | Reporter: | Richard Freeman <rich0> |
Component: | GLX | Assignee: | mesa-dev |
Status: | RESOLVED WORKSFORME | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | devurandom, idr, jon.turney |
Version: | 9.0 | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
backtrace
backtrace -O0 Output of glxinfo |
Description
Richard Freeman
2012-11-21 14:38:47 UTC
Created attachment 70391 [details]
backtrace -O0
I've been doing some debugging, and some observations: 1. xcb_glx_get_string is returning a cookie. 2. xcb_connection_has_error is 0 both before and after xcb_glx_get_string_reply is called. I'm slowly teaching myself far more xcb than I ever expected to learn, but if there is anything I can do to capture more useful info let me know - I can patch any libraries as needed as long as it isn't disruptive to the rest of the system. One other note - I'm running xorg-server 1.13.0. That was upgraded at the same time as xcb/mesa, so the impactful change could be on the server side. From reviewing git it is apparent that none of the xcb/mesa code involved has changed recently (well, as of the versions I'm using, which are libxcb 1.9, mesa 9.0, and xcb-proto 1.8). Might be some kind of race condition where a change in the server side triggered a failure to handle some case on the client side. After further testing I discovered that this segfault does not occur if I use x2goserver as my x11 server. This is on the same host with the same versions of libxcb/mesa/qt/etc. I suspect that this means that either the problem lies in driver-specific mesa code (which would obviously be on the server side), or with the interaction with xorg-server. I am on ATI hardware. I'll upload my glxinfo. Created attachment 70533 [details]
Output of glxinfo
I've got some more data points here. While the segfault occurs client-side, it appears to be triggered by a change server-side. I can reproduce the problem if the X11 server is xorg-server 1.13.0-r1 (Gentoo version number). The problem does not occur if the X11 server is 1.12.4. The behavior is the same if either mesa 8.0.4-r1 or mesa 9.0 is installed on either the server or the client side. The behavior is reproducible if the client is running on a separate host from the X11 server, if the X11 server is running 1.13.0. Note also that the problem does not occur with xorg-server 1.12.4 even with the same versions of xf86-input-evdev/keyboard/mouse and xf86-video-ati as were running in 1.13 (though they do need to be recompiled against whatever server version is in use). So, I can make the problem appear or go away by only changing the version of xorg-server, even when using the older server with the newer drivers. I'm tempted to break out the commit log and start bifurcating it next, but if anybody has any pointers I'm all ears... One other data point - the problem does not occur for xorg-server 1.13.0 if it is running in virtualbox. So either the virtualbox video driver or the fact that the server is running in virtualbox eliminates the problem. I've determined that the segfault does not occur with xserver commit 90aa2486e394c0344aceb2a70432761665a79333, and it does occur with xserver commit ed6daa15a7dcf8dba930f67401f4c1c8ca2e6fac. So, that appears to be the commit that introduces this behavior: commit ed6daa15a7dcf8dba930f67401f4c1c8ca2e6fac Author: Ian Romanick <ian.d.romanick@intel.com> Date: Wed Jul 4 15:21:09 2012 -0700 glx/dri2: Enable GLX_ARB_create_context_robustness If the driver supports __DRI2_ROBUSTNESS, then enable GLX_ARB_create_cotnext_robustness as well. If robustness values are passed to glXCreateContextAttribsARB and the driver doesn't support __DRI2_ROBUSTNESS, existing drivers will already generate the correct error values (so that the correct GLX errors are generated). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Keith Packard <keithp@keithp.com> I'll see if I can get xorg-1.13.0 patched to be missing just this commit. Ok, it turns out that I can eliminate this issue in xserver 1.13.0 if I revert two commits: ed6daa15a7dcf8dba930f67401f4c1c8ca2e6fac (mentioned above) bcbf95b1bafa6ffe724768b9309295e2fdb4b860 Author: Jon TURNEY <jon.turney@dronecode.org.uk> Date: Thu Jul 12 00:36:10 2012 +0100 Revert bogus GlxPushProvider() in commit a1d41e3 a1d41e3 "Move extension initialisation prototypes into extinit.h" also includes a change to GlxExtensionInit to install the swrast GLX provider. Since b86aa74 "GLX: Insert swrast provider from GlxExtensionInit" already does this (correctly, by installing the swrast provider at the end of the chain, rather than at the beginning), and since this would seem to have the effect of making the swrast provider the most preferred provider, I'm guessing this wasn't intended. Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk> Reviewed-by: Daniel Stone <daniel@fooishbar.org> Reviewed-by: Colin Harrison <colin.harrison@virgin.net> Parts of that second commit appear to have gotten reverted along the way already. I suspect that the GlxExtensionInit is the part of the second commit that is relevant. So, I haven't had any time to try to decipher what is going on - I suspect this will be more obvious to those who introduced the code in the first place. However, taking a look at this is my next step if nobody gets to it first. Appropriate action might be to either change xserver or mesa depending on which is actually wrong. (In reply to comment #9) > Ok, it turns out that I can eliminate this issue in xserver 1.13.0 if I > revert two commits: > ed6daa15a7dcf8dba930f67401f4c1c8ca2e6fac (mentioned above) > bcbf95b1bafa6ffe724768b9309295e2fdb4b860 Not sure if you are saying you both or either of these need to be reverted. > Author: Jon TURNEY <jon.turney@dronecode.org.uk> > Date: Thu Jul 12 00:36:10 2012 +0100 > > Revert bogus GlxPushProvider() in commit a1d41e3 > > a1d41e3 "Move extension initialisation prototypes into extinit.h" > also includes a change to GlxExtensionInit to install the swrast GLX > provider. > > Since b86aa74 "GLX: Insert swrast provider from GlxExtensionInit" > already does this (correctly, by installing the swrast provider > at the end of the chain, rather than at the beginning), and since this > would seem to have the effect of making the swrast provider the most > preferred provider, I'm guessing this wasn't intended. I think this may be a bit of a red herring. I would guess with this commit reverted, the X server ends up swrast, which perhaps masks the problem you are seeing. (In reply to comment #10) > (In reply to comment #9) > > Ok, it turns out that I can eliminate this issue in xserver 1.13.0 if I > > revert two commits: > > ed6daa15a7dcf8dba930f67401f4c1c8ca2e6fac (mentioned above) > > bcbf95b1bafa6ffe724768b9309295e2fdb4b860 > > Not sure if you are saying you both or either of these need to be reverted. Both of those need to be reverted to prevent the segfault in 1.13.0. You get a segfault if either commit is still present (when I reverted only the first one it still segfaulted in 1.13.0, but not in git versions prior to the second commit, and if I build 1.13.0 without both of those commits then there is no error). > I think this may be a bit of a red herring. I would guess with this commit > reverted, the X server ends up swrast, which perhaps masks the problem you > are seeing. Entirely possible, which is probably why the virtualbox drivers work as well. However, unless 1.12.4 was also stuck in swrast it seems like there is something else going on. Note that I'm not suggesting that YOU should actually revert those commits or anything - just that I don't get the segfault without them. I guess I could try leaving xserver alone and start messing with the ati drivers instead (stock 1.13.0 with earlier builds of the drivers). Does this problem still occur? (In reply to comment #12) > Does this problem still occur? I haven't seen this in a long time, but it is just as likely because the software that was generating the errors made some substantial QT/OpenGL changes. If you want to close this feel free, and if it comes up again I'll happily resubmit with updated info (I'm well past xorg 1.13 at this point). |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.