Bug 24226 - libGL tries using direct mode when it should use indirect (only tries indirect if forced)
libGL tries using direct mode when it should use indirect (only tries indirec...
Status: NEW
Product: Mesa
Classification: Unclassified
Component: GLX
7.5
Other All
: medium normal
Assigned To: mesa-dev
:
: 24590 28415 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-09-30 11:14 UTC by Jeremy Huddleston
Modified: 2011-10-06 10:46 UTC (History)
6 users (show)

See Also:


Attachments
glxinfo.txt (116.53 KB, text/plain)
2009-10-01 21:11 UTC, Jeremy Huddleston
Details
no empy configs (6.97 KB, patch)
2009-10-01 23:50 UTC, Chia-I Wu
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremy Huddleston 2009-09-30 11:14:26 UTC
If I ssh to a remote box, and I try to run 'glxinfo' on the remote system, it fails unless I explicitly force indirect mode via LIBGL_ALWAYS_INDIRECT.

The problem is that without LIBGL_ALWAYS_INDIRECT set, glx_direct gets set to true in glxext.c:

---
glx_direct = (getenv("LIBGL_ALWAYS_INDIRECT") == NULL);
glx_accel = (getenv("LIBGL_ALWAYS_SOFTWARE") == NULL);

   /*
    ** Initialize the direct rendering per display data and functions.
    ** Note: This _must_ be done before calling any other DRI routines
    ** (e.g., those called in AllocAndFetchScreenConfigs).
    */
   if (glx_direct && glx_accel) {
      dpyPriv->dri2Display = dri2CreateDisplay(dpy);
      dpyPriv->driDisplay = driCreateDisplay(dpy);
   }
   if (glx_direct)
      dpyPriv->driswDisplay = driswCreateDisplay(dpy);
---

Then in AllocAndFetchScreenConfigs(), we get our visuals and fbconfigs via:
      getVisualConfigs(dpy, priv, i);
      getFBConfigs(dpy, priv, i);

but then they get thrashed by:
      if (psc->driScreen == NULL && priv->driswDisplay)
         psc->driScreen = (*priv->driswDisplay->createScreen) (psc, i, priv);


which is driCreateScreen in drisw_glx.c and ends up doing:
   psc->configs = driConvertConfigs(psc->core, psc->configs, driver_configs);

which results in psc->configs being set to NULL which is causing GetGLXPrivScreenConfig to return GLX_BAD_VISUAL and thus causing glxinfo to bail.
Comment 1 Chia-I Wu 2009-09-30 22:08:11 UTC
glxinfo creates a context that allows direct rendering, and libGL is capable of doing direct rendering (through drisw).  I think it is a reasonable behavior.  To get indirect rendering, you can specify -i when executing glxinfo.

As for driConvertConfigs, it should not return NULL normally.  It returns NULL when the original psc->configs and driver_configs have no config in common.  Can you check what's in the original psc->configs?
Comment 2 Jeremy Huddleston 2009-10-01 01:17:41 UTC
You miss the point.  This isn't about glxinfo, this is about *ALL* glx applications.  glxinfo is just an example.

You said, "glxinfo creates a context that allows direct rendering" ... but direct rendering is not available because the client is remote from the server.  In older versions of mesa (6.5 for sure, not sure through when), it would detect when remote and use indirect.  This behavior seems to have reverted.

Now, when remote, it tries to use drisw, but it results in an empty set of visuals and fbconfigs as described by the codepath in my initial report.
Comment 3 Chia-I Wu 2009-10-01 20:29:39 UTC
I see your point.  I guess the old behavior is simply because there was no swrast_dri.

As for swrast_dri, it has _direct_ access to the (pure software) OpenGL pipelines, and is thus chosen for direct rendering.  The fact that xserver is remote is not taken into consideration.  I am not sure which behavior is desired/correct though.  Other people should have better answer than me.

The empty visual/fbconfig list you saw might be some other bug.  That's why I would like to know the original value of psc->configs.  I can run glxgears from a remote machine under "ssh -X" just fine, and it uses swrast_dri.  My remote machine runs mesa 7.5.1.
Comment 4 Jeremy Huddleston 2009-10-01 21:11:29 UTC
Created attachment 29991 [details]
glxinfo.txt

Here's glxinfo's output when run as a local client.

numvisuals was 800 before calling into drisw to prune them, and it looks the same as the output from the remote host when forcing INDIRECT.
Comment 5 Chia-I Wu 2009-10-01 23:50:01 UTC
Created attachment 29992 [details] [review]
no empy configs

Do you have a nvidia card on your local machine?

driConvertConfigs filters out any visual/fbconfig that has no matching DRI configs.  It could be that nvidia report visuals/fbconfigs that none of them matches DRI configs (the matching rule is strict).

The patch (against git master) makes the conversion fail in such case and skips the failing DRI screen.  It should hopefully skip swrast_dri in your case.

Can you help verify it?  I don't have any machine with nvidia cards.
Comment 6 Jeremy Huddleston 2009-10-02 00:02:05 UTC
Yes, I have an nVidia card on this machine, but it also happens when I have an ATI card on the server machine.  The vendor card information is abstracted away and we just query information straight from our OpenGL.framework, so the drivers are not directly involved.

Even if you had an nVidia card, that wouldn't help unless you were running OSX as well.

Can you tell me or point me to a spec that details the matching rule?  We just generate a series of configurations based on the reported details from OpenGL.framework, but it's possible that we are missing a few or can add a few others for compatability with drisw

http://cgit.freedesktop.org/xorg/xserver/tree/hw/xquartz/GL/visualConfigs.c

I'll test the patch and let you know.
Comment 7 Chia-I Wu 2009-10-02 00:43:07 UTC
There is no spec on the rules.  The visuals/fbconfigs reported by the server are converted to __GLcontextModes in mesa.  You can have a look at driConfigEqual in src/glx/x11/dri_common.c.  
Comment 8 Chia-I Wu 2009-10-02 00:57:36 UTC
I had a quick look at the link you gave.  It might be maxPbufferWidth/maxPbufferHeight that fails the matching test.  But there is no point to adjust the two fileds (maybe some others) only to pass the test.
Comment 9 Jeremy Huddleston 2009-10-02 01:08:27 UTC
Excellent.  With that patch in place, we end up with AIGLX rather than erroring out.  Looks good to me.
Comment 10 Chia-I Wu 2009-10-08 03:10:44 UTC
Any comment on the patch?  IMO, the issue is a general one that can be seen on XQuartz and other non-DRI based X servers.
Comment 11 Jon TURNEY 2009-10-08 04:03:16 UTC
(In reply to comment #10)
> Any comment on the patch?  IMO, the issue is a general one that can be seen on
> XQuartz and other non-DRI based X servers.

From my experience working on a similar problem with the Cygwin/X DDX, I think the real problem is that the config matching code expects to exactly match bindToTexture and maxPbuffer with a value of -1 (don't care), hence if these are actually set to report our capabilities, no configs remain after the attempt find the common configs.

Your patch to fall back to indirect if we can't find any common configs makes sense, but I don't actually think that should be happening.

(In reply to the bug title)

It's currently a policy in libGL to use swrast in preference to indirect, unless forced, and swrast is direct (Comment #3)

It would be nice if for Xservers which can only do indirect acceleration, there was a way to cause local clients to automatically use the indirect path, but I'm not sure how that could be done cleanly.
Comment 12 Dan Nicholson 2009-10-08 14:03:26 UTC
(In reply to comment #11)
> 
> It would be nice if for Xservers which can only do indirect acceleration, there
> was a way to cause local clients to automatically use the indirect path, but
> I'm not sure how that could be done cleanly.

I might not be understanding the issue correctly, but you can build GLX to only support indirect rendering. Basically, the code needs to build without -DGLX_DIRECT_RENDERING. With configure, this is --disable-driglx-direct.
Comment 13 Chia-I Wu 2009-10-08 20:49:45 UTC
(In reply to comment #11)
> From my experience working on a similar problem with the Cygwin/X DDX, I think
> the real problem is that the config matching code expects to exactly match
> bindToTexture and maxPbuffer with a value of -1 (don't care), hence if these
> are actually set to report our capabilities, no configs remain after the
> attempt find the common configs.
> Your patch to fall back to indirect if we can't find any common configs makes
> sense, but I don't actually think that should be happening.
IMO, driConfigEqual is doing great.  It should look for exact match.  I think GLX_DONT_CARE is only assigned to those attributes that are not common to visual and fbconfig.  It is so that driConfigEqual can work without caring a __GLcontextModes is from a visual or a fbconfig.

In the patch, driConvertConfigs fails only when _none_ of the configs reported by the server has a matching dri config.  It should be quite safe.  But I am also starting thinking that it should fail if _any_ of the configs reported by the server has no matching dri config...

> (In reply to the bug title)
> It's currently a policy in libGL to use swrast in preference to indirect,
> unless forced, and swrast is direct (Comment #3)
But only in some sense.  It is hard to say which interpretation is desirable.

> It would be nice if for Xservers which can only do indirect acceleration, there
> was a way to cause local clients to automatically use the indirect path, but
> I'm not sure how that could be done cleanly.
A config runs with reduced performance when GLX_CONFIG_CAVEAT reports GLX_SLOW_CONFIG.  It is not about direct or indirect though.
Comment 14 Chia-I Wu 2009-10-08 21:00:10 UTC
(In reply to comment #12)
> I might not be understanding the issue correctly, but you can build GLX to only
> support indirect rendering. Basically, the code needs to build without
> -DGLX_DIRECT_RENDERING. With configure, this is --disable-driglx-direct.
The question is, when libGL.so is compiled with direct rendering support, how to decide if direct rendering is viable at runtime?  The connection to xserver may be remote or local.  The configs from xserver and dri driver may or may not match.  How do they affect the decision?
Comment 15 George Sapountzis 2010-03-27 05:56:07 UTC
(In reply to comment #14)
> The question is, when libGL.so is compiled with direct rendering support, how
> to decide if direct rendering is viable at runtime?  The connection to xserver
> may be remote or local.  The configs from xserver and dri driver may or may not
> match.  How do they affect the decision?
> 

I don't think there is a simple way to decide. One way to answer this is to change glxext.c from:

   if (glx_direct)
      dpyPriv->driswDisplay = driswCreateDisplay(dpy);

to:

   if (glx_direct && !glx_accel)
      dpyPriv->driswDisplay = driswCreateDisplay(dpy);

and see if people complain :-(

It will mainly affect developers who usually explicitly set LIBGL_ALWAYS_SOFTWARE. Also wiki's won't have to be updated because they usually instruct people to set the envvar.
Comment 16 Michel Dänzer 2010-06-10 03:46:29 UTC
*** Bug 28415 has been marked as a duplicate of this bug. ***
Comment 17 Jeremy Huddleston 2010-10-14 12:46:22 UTC
More users are complaining about this... ping...
Comment 18 Nathan Kidd 2010-10-14 13:52:26 UTC
1) To further document "lots of people affected by this" (and I'm well aware I haven't gotten off my posterior to fix this either):

SuSE brought this up last year[1] and ended up shipping with this patch[2] (a band-aid):

-    if (glx_direct)
-	dpyPriv->driswDisplay = driswCreateDisplay(dpy);
+//    if (glx_direct)
+//	dpyPriv->driswDisplay = driswCreateDisplay(dpy);

[1] http://www.mail-archive.com/mesa3d-dev@lists.sourceforge.net/msg06684.html
[2] https://bugzilla.novell.com/show_bug.cgi?id=469280


2) Bear in mind that making the visuals match exactly merely allows the software renderer to be used, it doesn't allow indirect GLX (check your GLX server strings carefully). 

LIBGL_ALWAYS_INDIRECT is quite a pain, (more so for XDMCP sessions). In my experience client-side rendering is what most people want if the display is remote.  If the network load is too heavy for client-side the render load will very likely be too heavy for the software rasterizer. (And apps can use display lists to alleviate the network load.)
Comment 19 Jeremy Huddleston 2011-10-06 10:46:16 UTC
*** Bug 24590 has been marked as a duplicate of this bug. ***