Bug 27333

Summary: Problem with libopencascade - salome-platform
Product: Mesa Reporter: Paulo César Pereira de Andrade <pcpa>
Component: Drivers/DRI/i965Assignee: Eric Anholt <eric>
Status: RESOLVED MOVED QA Contact:
Severity: critical    
Priority: medium CC: geromanas, przanoni
Version: git   
Hardware: x86 (IA32)   
OS: Linux (All)   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=30509
Whiteboard:
i915 platform: i915 features:
Attachments: Mesa-7.8.1-salome.patch

Description Paulo César Pereira de Andrade 2010-03-26 12:39:49 UTC
This problem is using the same procedure as in #27332.

I am working on a "salome" package for mandriva:
http://www.salome-platform.org, that uses opencascade in the "geometry" module.
  It appears to work correctly in software rendering (i.e. force it to
use swraster, usually by switching the Xorg module to fbdev or vesa as
a "fast hack"), but with radeon drm it doesn't work correctly.

  Running the command:

% MESA_DEBUG=FP LIBGL_DEBUG=verbose runSalome

then loading a sample file, gdb -p'ing the pid of SALOME_Session_Server,
when I select the geometry module (what causes it to load the opencascade
module/library), it crashes before reporting any debug from mesa:

(gdb) c
Continuing.
[New Thread 0xb1717b70 (LWP 28308)]

Program received signal SIGSEGV, Segmentation fault.
0xa8e003d0 in _glapi_set_dispatch () from /usr/lib/dri/i965_dri.so
(gdb) bt
#0  0xa8e003d0 in _glapi_set_dispatch () from /usr/lib/dri/i965_dri.so
#1  0xa8df3e93 in _glapi_set_dispatch () from /usr/lib/dri/i965_dri.so
#2  0xa8df6948 in _glapi_set_dispatch () from /usr/lib/dri/i965_dri.so
#3  0xa8e581b3 in _glapi_set_dispatch () from /usr/lib/dri/i965_dri.so
#4  0xa9536ee8 in TelInitWS (ws=203, w=1008, h=406, bgcolr=0.600000024, bgcolg=0.600000024, bgcolb=0.600000024)
    at ../../../src/OpenGl/OpenGl_telem_util.c:433
#5  0xa9534587 in call_subr_open_ws (aview=0xbfcf0920) at ../../../src/OpenGl/OpenGl_subrvis.c:285
#6  0xa954409c in call_togl_view (aview=0xbfcf0920) at ../../../src/OpenGl/OpenGl_togl_view.c:50
#7  0xa94fa297 in OpenGl_GraphicDriver::View (this=0x2001cfcc, ACView=<value optimized out>)
    at ../../../src/OpenGl/OpenGl_GraphicDriver_7.cxx:424
#8  0xaf544d3b in Visual3d_View::SetWindow (this=0x9079a7c, AWindow=...) at ../../../src/Visual3d/Visual3d_View.cxx:501
#9  0xaf5293af in V3d_View::SetWindow (this=0x8efbbc4, TheWindow=...) at ../../../src/V3d/V3d_View.cxx:485
#10 0xaf6a00fc in OCCViewer_ViewPort3d::attachWindow(Handle_V3d_View const&, Handle_Aspect_Window const&) ()
   from /usr/lib/salome/libOCCViewer.so.0

  I don't see anything suspecting, i.e:
(gdb) frame 4
#4  0xa9536ee8 in TelInitWS (ws=203, w=1008, h=406, bgcolr=0.600000024, bgcolg=0.600000024, bgcolb=0.600000024)
    at ../../../src/OpenGl/OpenGl_telem_util.c:433
433                 glClear(GL_COLOR_BUFFER_BIT);
(gdb) l
428             }
429             else
430             {
431                 glDrawBuffer(GL_FRONT_AND_BACK);
432                 glClearColor(bgcolr, bgcolg, bgcolb, ( float )1.0);
433                 glClear(GL_COLOR_BUFFER_BIT);
434                 glDrawBuffer(GL_BACK);
435             }
436     #else
437             glDrawBuffer(GL_FRONT_AND_BACK);


  If I tell gdb to continue, the interface will show a popup about a
segfault at address 34, and then some debug from mesa is printed:

libGL: OpenDriver: trying /usr/lib/dri/i965_dri.so
th. 3036104400 - Trace SALOME_Session_Server.cxx [97] : Debug: connect
libGL: OpenDriver: trying /usr/lib/dri/i965_dri.so
Mesa warning: couldn't open libtxc_dxtn.so, software DXTn compression/decompression unavailable
libGL: Can't open configuration file /etc/drirc: No such file or directory.
libGL: Can't open configuration file /home/pcpa/.drirc: No such file or directory.
th. 3036104400 - Trace SALOME_Session_Server.cxx [100] : Warning: QWidget::repaint: Recursive repaint detected

  But it doesn't create a display window, or draw anything, only give
errors about segfault or attempt to access a null object.
Comment 1 Paulo Zanoni 2010-05-04 13:07:05 UTC
I tried to reproduce the bug with the "default" stuff in the distro (not forcing anything, but with runSalome still doing LIBGL_ALWAYS_INDIRECT=true).

Distro is using mesa 7.8.1, x11-server 1.7.7, drm 2.4.20, kernel 2.6.33, intel 2.11.0

This is the backtrace I got:


#0  intel_region_buffer (intel=0x2403ee0, region=0x0, flag=2)
    at intel_regions.c:498
#1  0x00007fbb6301ba56 in intelClearWithBlit (ctx=0x2403ee0, mask=3)
    at intel_blit.c:268
#2  0x00007fbb6301434a in intelClear (ctx=0x2403ee0, 
    mask=<value optimized out>) at intel_clear.c:169
#3  0x00007fbb6456b293 in __glXDisp_Render (cl=<value optimized out>, 
    pc=0x20367b8 "\b") at glxcmds.c:1823
#4  0x00007fbb6456f4be in __glXDispatch (client=0x20363e0) at glxext.c:578
#5  0x000000000043886c in Dispatch () at dispatch.c:439
#6  0x000000000042271c in main (argc=1, argv=0x7d9408, 
    envp=<value optimized out>) at main.c:285


The segfault happens because at intel_regions.c:498 region=0x0 and we have this:
   if (region->pbo) {
Comment 2 Paulo César Pereira de Andrade 2010-05-04 15:12:26 UTC
  Just for clarification. I asked Paulo Zanoni to look at it because my
previous correction/workaround of setting LIBGL_ALWAYS_INDIRECT=true, to
having a working package now causes the X Server to crash...

  If not setting LIBGL_ALWAYS_INDIRECT, now it appears to have a behavior
very similar to the one of the ati driver, that is always triggering a
SIGFPE, but in the client code, that handles it... And the root cause
appears to be the same; at least the same mesa debug is visible:

Mesa: User error: GL_INVALID_ENUM in glIsEnabled(0xb72)
Mesa: User error: GL_INVALID_ENUM in glEnable(0xb72)

https://bugs.freedesktop.org/show_bug.cgi?id=27332 describes the problem
with the ati driver.

(changing Importance to highest because now the package is unusable with
the intel driver)
Comment 3 Paulo César Pereira de Andrade 2010-05-07 13:11:32 UTC
Created attachment 35503 [details] [review]
Mesa-7.8.1-salome.patch

  I suggested applying this patch in Mandriva, based on the backtrace Pzanoni
reported, but any feedback about it is welcome.

  The patch corrects the problem, and at least in the computer I tested, it
did not have problems with multiple outputs also.
Comment 4 Paulo César Pereira de Andrade 2010-05-08 05:12:19 UTC
  I will check monday if I there are opencascade sources available for
the prebuilt salome binaries (I asked in the salome-platform forum for
some hints just after I got a working package, but got no responses so far...
 http://www.salome-platform.org/forum/forum_12/459079973), but I am
almost sure it uses OSMesa, as the problem doesn't happen there, but
the binaries are for a very older Mandriva distro, and requires installing
some old packages/libraries to get it working in Mandriva cooker.

  Changing priority because the intel driver is very common, but the
current version makes the salome package almost unusable (almost, because
only the geometry module has the problem; but I understand it is not a
"common" package, neither have many users...)
Comment 5 Jesse Barnes 2010-05-10 11:32:28 UTC
Sounds like a DRI driver bug...
Comment 6 Paulo César Pereira de Andrade 2010-05-12 13:08:54 UTC
  BTW, the attached patch was applied to mandriva packages, and the
related mandriva bug marked as resolved/fixed.

https://qa.mandriva.com/show_bug.cgi?id=59084
Comment 7 Magnus Kessler 2010-08-17 04:47:45 UTC
A crash in intel_region_buffer can still be obtained with the latest DRI driver as of 2010-08-16 (post glsl2 merge) through kwin-4.5.0 when full-screen OpenGL applications (e.g. FlightGear's fgfs) exit.

I got this backtrace:

[KCrash Handler]
#6  intel_region_buffer (intel=0x2dabaf0, region=0x0, flag=2) at intel_regions.c:505
#7  0x00007fa97ad44ab9 in intelClearWithBlit (ctx=0x2dabaf0, mask=2) at intel_blit.c:266
#8  0x00007fa97ad46c3a in intelClear (ctx=0x2dabaf0, mask=<value optimized out>) at intel_clear.c:173
#9  0x00007fa9918a2bb5 in KWin::SceneOpenGL::paintBackground (this=<value optimized out>, region=<value optimized out>) at /usr/src/debug/kde-base/kwin-4.5.0/kwin-4.5.0/kwin/scene_opengl.cpp:892
#10 0x00007fa99189a5ce in KWin::Scene::paintGenericScreen (this=0x21f1170, orig_mask=32) at /usr/src/debug/kde-base/kwin-4.5.0/kwin-4.5.0/kwin/scene.cpp:187
#11 0x00007fa9918990ca in KWin::Scene::finalPaintScreen (this=0x21f1170, mask=32, region=<value optimized out>, data=<value optimized out>)
    at /usr/src/debug/kde-base/kwin-4.5.0/kwin-4.5.0/kwin/scene.cpp:177
#12 0x00007fa9918afd6f in KWin::EffectsHandlerImpl::paintScreen (this=<value optimized out>, mask=32, region=<value optimized out>, data=...)
Comment 8 Kristian Høgsberg 2010-10-01 05:54:35 UTC
Is this still present in master?
Comment 9 Paulo César Pereira de Andrade 2010-10-18 12:40:04 UTC
(In reply to comment #8)
> Is this still present in master?

  Sorry for the delay, I was waiting for xorg/mesa packages to be updated,
but not git master, sorry I am not following closely Xorg/Mesa for some
time. The problem still happens, with these packages:

$ rpm -q x11-server-xorg x11-driver-video-intel mesa libdrm2
x11-server-xorg-1.9.0.902-1mdv2011.0
x11-driver-video-intel-2.13.0-4mdv2011.0
mesa-7.9-1mdv2011.0
libdrm2-2.4.22-1mdv2011.0

[ guess I will need to rewrite the patch at some point so that users with
an intel card can use the salome package; actually, last updates made
me need to run my own computer with the vesa driver as the ati driver
broke for me, segv at startup...]

  After installing the -debug packages, and attaching gdb to the Xserver,
the backtrace is:
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
intel_region_buffer (intel=0xbf246f8, region=0x0, flag=2)
    at intel_regions.c:514
514        if (region->pbo) {
(gdb) bt
#0  intel_region_buffer (intel=0xbf246f8, region=0x0, flag=2)
    at intel_regions.c:514
#1  0xb6c0c95a in intelClearWithBlit (ctx=0xbf246f8, mask=3)
    at intel_blit.c:266
#2  0xb6c0ee68 in intelClear (ctx=0xbf246f8, mask=3) at intel_clear.c:173
#3  0xb6c87638 in _mesa_Clear (mask=<value optimized out>) at main/clear.c:179
#4  0xb7251037 in ?? () from /usr/lib/xorg/modules/extensions/libglx.so
#5  0xb727c659 in ?? () from /usr/lib/xorg/modules/extensions/libglx.so
#6  0xb727f2bf in ?? () from /usr/lib/xorg/modules/extensions/libglx.so
#7  0x0806f777 in ?? ()
#8  0x080625e5 in _start ()
(gdb) p region
$1 = (struct intel_region *) 0x0


  If I do the pseudo patch:
--- /usr/bin/runSalome
#export LIBGL_ALWAYS_INDIRECT=true
+#export LIBGL_ALWAYS_INDIRECT=true

it shows a dialog message about segmentation violation at address 0x30
(that happens to be the offset of the pbo field), and pressing ok just
keeps showing another "Attempt to access null object" dialog, so, the
"solution" I had found the first time still would need remaking my
mesa 7.8.1 patch.
Comment 10 Paulo César Pereira de Andrade 2010-11-17 14:42:28 UTC
Tested with a build of salome version 5.1.4 and it no longer crashes
when using LIBGL_ALWAYS_INDIRECT. Only other difference is x11-server
1.9.2. But it still causes the segvs if not setting LIBGL_ALWAYS_INDIRECT.
Also, the image in the opencascade viewer most times "keeps jumping",
like when attempting to make a selection.
Comment 11 Eric Anholt 2011-01-07 18:30:42 UTC
I'm pretty sure this is fixed by:

commit 94ed481131e4f5ba2c83fe7f3b12715661660133
Author: Eric Anholt <eric@anholt.net>
Date:   Sun Jan 2 17:04:57 2011 -0800

    intel: Handle forced swrast clears before other clear bits.
    
    Fixes a potential segfault on a non-native depthbuffer, and possible
    accidental swrast fallback on extra color buffers.

I couldn't get the app installed to try it myself.
Comment 12 Paulo César Pereira de Andrade 2011-01-09 09:31:47 UTC
(In reply to comment #11)
> I'm pretty sure this is fixed by:
> 
> commit 94ed481131e4f5ba2c83fe7f3b12715661660133
> Author: Eric Anholt <eric@anholt.net>
> Date:   Sun Jan 2 17:04:57 2011 -0800
> 
>     intel: Handle forced swrast clears before other clear bits.
> 
>     Fixes a potential segfault on a non-native depthbuffer, and possible
>     accidental swrast fallback on extra color buffers.
> 
> I couldn't get the app installed to try it myself.

  Thanks. Looking at the patch, it appears to be going to
correct the issue. In that case, I will probably remove/comment
the "export LIBGL_ALWAYS_INDIRECT=true" from the script. I will
test it as soon as possible. Mandriva updated mesa packages
should be available soon.

  To test the application you probably would want to test on
Mandriva Cooker, as I don't know if any other distro packages
salome, and afaik, the binaries from www.salome-platform.org
have some patches to render to an offscreen pixmap, or something
related; the sources are available but need to register a free
account to download, and I did only a quick look at it some
month ago (there are plenty of other things to look when
packaging such a large package :-)...
Comment 13 Paulo César Pereira de Andrade 2011-01-12 10:35:29 UTC
(In reply to comment #11)
> I'm pretty sure this is fixed by:
> 
> commit 94ed481131e4f5ba2c83fe7f3b12715661660133
> Author: Eric Anholt <eric@anholt.net>
> Date:   Sun Jan 2 17:04:57 2011 -0800
> 
>     intel: Handle forced swrast clears before other clear bits.
> 
>     Fixes a potential segfault on a non-native depthbuffer, and possible
>     accidental swrast fallback on extra color buffers.
> 
> I couldn't get the app installed to try it myself.

I tested with:

$ rpm -q x11-server-xorg libdri-drivers mesa
x11-server-xorg-1.9.3-3mdv2011.0.i586
libdri-drivers-7.10-1-mdv2011.0.i586
mesa-7.10-1-mdv2011.0.i586

(the mesa and dri rpms are experimental rpms from pzanoni)

And it would always fail with a weird "Unknwon Exception" dialog,
message, and attaching gdb to all salome related processes would
not help. So, it was failing with some error code somewhere, that
would not trigger a signal.

But, when doing some minor testing with driconf options, if enabling
"Enable flush batchbuffer after each draw call" it crashes the
X Server, with the backtrace:

Program received signal SIGSEGV, Segmentation fault.
intel_region_buffer (intel=0xbb363d0, region=0x0, flag=2)
    at intel_regions.c:514
514        if (region->pbo) {
(gdb) bt
#0  intel_region_buffer (intel=0xbb363d0, region=0x0, flag=2)
    at intel_regions.c:514
#1  0xb6b383fa in intelClearWithBlit (ctx=0xbb363d0, mask=3)
    at intel_blit.c:262
#2  0xb6b3ad5b in intelClear (ctx=0xbb363d0, mask=3) at intel_clear.c:174
#3  0xb6d4e2f8 in _mesa_Clear (mask=<value optimized out>) at main/clear.c:241
#4  0xb71c9fe7 in __glXDisp_Clear (pc=0xbb15fc4 "") at indirect_dispatch.c:1335
#5  0xb71f5609 in __glXDisp_Render (cl=0xbb0c1e8, pc=<value optimized out>)
    at glxcmds.c:1847
#6  0xb71f826f in __glXDispatch (client=0xbb0c110) at glxext.c:600
#7  0x0806f777 in Dispatch () at dispatch.c:432
#8  0x080625e5 in main (argc=8, argv=0xbf8f4234, envp=0xbf8f4258) at main.c:291

Weirdly enough, removing ~/.drirc or changing back the driconf
option does not revert it to the state of not crashing the X Server,
rebooting, powering down, etc does not revert it either (guess may
may need powering down and leting it so for significant time...).

I tested Mesa 7.10 on a x86_64 with an ati card and the salome package
works.

If remaking the Mesa-7.8.1-salome.patch (to apply on 7.10) and rebuilding the package,
it will work again. But, need to set "export LIBGL_ALWAYS_INDIRECT=true"
or it will fail with SIGFPEs when loading a sample file/project.
Comment 14 Paulo Zanoni 2011-01-24 11:51:46 UTC
(In reply to comment #13)
> Program received signal SIGSEGV, Segmentation fault.
> intel_region_buffer (intel=0xbb363d0, region=0x0, flag=2)
>     at intel_regions.c:514
> 514        if (region->pbo) {
> (gdb) bt
> #0  intel_region_buffer (intel=0xbb363d0, region=0x0, flag=2)
>     at intel_regions.c:514
> #1  0xb6b383fa in intelClearWithBlit (ctx=0xbb363d0, mask=3)
>     at intel_blit.c:262
> #2  0xb6b3ad5b in intelClear (ctx=0xbb363d0, mask=3) at intel_clear.c:174
> #3  0xb6d4e2f8 in _mesa_Clear (mask=<value optimized out>) at main/clear.c:241
> #4  0xb71c9fe7 in __glXDisp_Clear (pc=0xbb15fc4 "") at indirect_dispatch.c:1335
> #5  0xb71f5609 in __glXDisp_Render (cl=0xbb0c1e8, pc=<value optimized out>)
>     at glxcmds.c:1847
> #6  0xb71f826f in __glXDispatch (client=0xbb0c110) at glxext.c:600
> #7  0x0806f777 in Dispatch () at dispatch.c:432
> #8  0x080625e5 in main (argc=8, argv=0xbf8f4234, envp=0xbf8f4258) at main.c:291
> 

I was able to see this backtrace on a SandyBridge, mesa 7.10, when playing extremetuxracer. Maybe the driver developers will find it easier to debug with etracer. Please see bug #33422
Comment 15 Eric Anholt 2012-10-03 19:49:58 UTC
The code appearing in the last backtrace is gone -- do you still have problems?

Also, seriously, stop setting LIBGL_ALWAYS_INDIRECT.  We don't handle bugs when you do that.
Comment 16 Paulo César Pereira de Andrade 2012-10-04 16:51:27 UTC
(In reply to comment #15)
> The code appearing in the last backtrace is gone -- do you still have
> problems?

  I do not know if I will or how long it will take for me to
be able to rebuild dependencies to have it working again. It
should be ok close the bug now.

> Also, seriously, stop setting LIBGL_ALWAYS_INDIRECT.  We don't handle bugs
> when you do that.

  At that time, I used it as a workaround to just display a
dialog message about a segmentation fault at address 0x30
as described in #c9, instead of crashing the X Server.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.