Bug 64810 - EGL/Gles/Weston give segfault on RADEONSI with egl_gallium.so
Summary: EGL/Gles/Weston give segfault on RADEONSI with egl_gallium.so
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium enhancement
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
: 68273 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-05-21 04:58 UTC by Rafael Castillo
Modified: 2014-01-15 23:17 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
fix multiple symbols bug (281.14 KB, patch)
2013-08-28 02:25 UTC, tux_mind
Details | Splinter Review
move shared data on the top (5.47 KB, patch)
2013-08-29 13:35 UTC, tux_mind
Details | Splinter Review
link libradeon only once (11.73 KB, patch)
2013-09-10 11:44 UTC, Johannes Obermayr
Details | Splinter Review
new attemp with LIBADD += $(ELF_LIB) (13.91 KB, patch)
2013-09-18 02:46 UTC, Johannes Obermayr
Details | Splinter Review
New version. (107.44 KB, patch)
2013-09-20 01:09 UTC, Johannes Obermayr
Details | Splinter Review

Description Rafael Castillo 2013-05-21 04:58:23 UTC
Just informing since i can't see it in the TODO[i assume they aren't ready], so just for the info.

glx and OpenGL works good enough

Hardware
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde XT [Radeon HD 7770 GHz Edition]
Linux localhost 3.10.0-rc1 #1 SMP PREEMPT Sun May 12 11:20:57 VET 2013 x86_64 AMD FX(tm)-6100 Six-Core Processor AuthenticAMD GNU/Linux

Software

mesa/libdrm/glamor/ddx/llvm/wayand/weston all daily git builded

Issue

Egl is not working beyond eglinfo, eglgears and eglscreen_XX segfault
GLES is not working at all, any related command segfaults either es1_ or es2_
Wayland/Weston logically doesn't work either

GDB throws
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff4346cde in r600_init_query_functions () from /usr/lib64/egl/egl_gallium.so
Comment 1 Michel Dänzer 2013-05-22 11:15:27 UTC
Does setting the environment variable EGL_DRIVER=egl_dri2 help?
Comment 2 Rafael Castillo 2013-05-22 23:28:20 UTC
yes, effectively setting this environment variable works fine it seems.

only thing is weston give this warning, not sure if important

EGL_EXT_buffer_age not supported. Performance could be affected
Comment 3 Michel Dänzer 2013-05-23 07:33:19 UTC
I think the problem is that egl_gallium.so links both radeonsi and r600g, which have some conflicting symbols.
Comment 4 Michel Dänzer 2013-05-23 07:35:45 UTC
BTW, if you don't need OpenVG support, building Mesa without --enable-gallium-egl (or with --disable-gallium-egl) might avoid this problem.
Comment 5 Rafael Castillo 2013-05-23 14:51:15 UTC
well im using gentoo so /etc/env.d/00egl did the trick to make it global, so its working that way it seems[kwin_gles works fine now].

should i close the report? 

many thanks for your help

btw i hit this other bug https://bugzilla.kernel.org/show_bug.cgi?id=58621 maybe you know someone who could check it out or give me some advice's in how to debug it further?
Comment 6 Michel Dänzer 2013-05-23 15:31:38 UTC
(In reply to comment #5)
> should i close the report? 

No. It's just a workaround, not a fix.
Comment 7 Laurent carlier 2013-08-19 13:17:22 UTC
*** Bug 68273 has been marked as a duplicate of this bug. ***
Comment 8 Johannes Obermayr 2013-08-22 14:55:41 UTC
(In reply to comment #3)
> I think the problem is that egl_gallium.so links both radeonsi and r600g,
> which have some conflicting symbols.

Maybe this patch helps:
https://github.com/jobermayr/mesa/commit/27db605
Comment 9 tux_mind 2013-08-27 22:00:42 UTC
hello there, i'm having the same issue and i recompiled mesa with debugging symbols.
i'm using mesa-9.2.0_rc1 on gentoo.
i attached gdb and i found that the issue is at src/gallium/drivers/r600/r600_query.c:743.

735     void r600_init_query_functions(struct r600_context *rctx)
736     {
737             rctx->context.create_query = r600_create_query;
738             rctx->context.destroy_query = r600_destroy_query;
739             rctx->context.begin_query = r600_begin_query;
740             rctx->context.end_query = r600_end_query;
741             rctx->context.get_query_result = r600_get_query_result;
742
743             if (rctx->screen->info.r600_num_backends > 0)

breaking gdb on line 743 i got this: http://pastebin.com/A4JANB5F

as you can see rctx->screen is null ( 0x0 ), please, can you fix that?

i hope this helps.
Comment 10 tux_mind 2013-08-28 01:08:02 UTC
(In reply to comment #3)
> I think the problem is that egl_gallium.so links both radeonsi and r600g,
> which have some conflicting symbols.

you're right!

http://pastebin.com/Zq3NDDeX

i'm patching mesa 9.2.0_rc1 to get this working.
Comment 11 tux_mind 2013-08-28 02:25:24 UTC
Created attachment 84756 [details] [review]
fix multiple symbols bug
Comment 12 Michel Dänzer 2013-08-28 07:51:21 UTC
(In reply to comment #11)
> fix multiple symbols bug

Thanks for the patch, but I think it's more invasive than necessary. We'd like to retain the possibility of sharing code between radeonsi and r600g in the future.

Can you try if Johannes' patch from comment 8 fixes the problem?
Comment 13 tux_mind 2013-08-29 13:35:31 UTC
Created attachment 84840 [details] [review]
move shared data on the top

just an hint
Comment 14 tux_mind 2013-08-29 13:40:46 UTC
(In reply to comment #12)
> Can you try if Johannes' patch from comment 8 fixes the problem?

i tried the to use the relevant part of the patch:
i added '__attribute__((visibility("default")))' in front of r600_init_query_functions.
but it didn't works.

i'm not a mesa developer but i think that you should do something like this:
edit shared structs to have shared data at the top: see attached #84840
find every shared function and check that data accessed from function it's in the shared ones.
if not you have to do an "if(sizeof(arg) == sizeof(struct r600_struct))" to determine how to read/write that data.

a well-done job it's to make a folder with all the shared code ( function and structs ) and use hardware-specific wrappers functions.

happy developing guys :)
Comment 15 Johannes Obermayr 2013-08-29 21:55:38 UTC
(In reply to comment #14)
> (In reply to comment #12)
> > Can you try if Johannes' patch from comment 8 fixes the problem?
> 
> i tried the to use the relevant part of the patch:
> i added '__attribute__((visibility("default")))' in front of
> r600_init_query_functions.
> but it didn't works.

No. The relevant part of my patch is to build libllvmradeon shared again.

It seems sb. ignored my warning:
https://bugs.freedesktop.org/show_bug.cgi?id=62226#c12
in this commit:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=914d797

> 
> i'm not a mesa developer but [...]

you should try this patch:
https://github.com/jobermayr/mesa/commit/619c251
Comment 16 tux_mind 2013-08-29 23:51:40 UTC
applied your patch, got a missing mesalibdir error in automake.

added this:
dnl Where to install internal libraries
mesalibdir="\$(libdir)/mesa-${VERSION}"
AC_SUBST([mesalibdir])

to configure.ac to get automake works fine.

after a bit of compiling i get:
libtool: link: x86_64-pc-linux-gnu-gcc -shared  -fPIC -DPIC  .libs/libllvmradeon9.2.0-rc1_la-radeon_uvd.o .libs/libllvmradeon9.2.0-rc1_la-radeon_setup_tgsi_llvm.o .libs/libllvmradeon9.2.0-rc1_la-radeon_llvm_emit.o .libs/libllvmradeon9.2.0-rc1_la-radeon_llvm_util.o  -Wl,--whole-archive ../../../../src/gallium/auxiliary/.libs/libgallium.a -Wl,--no-whole-archive  -L/usr/lib64/llvm -lz -lpthread -lffi -ldl -lm -Wl,--as-needed -lrt -lelf  -march=corei7 -O0 -Wl,--no-undefined -Wl,-R -Wl,/usr/lib64/llvm -Wl,-O1   -Wl,-soname -Wl,libllvmradeon9.2.0-rc1.so -o .libs/libllvmradeon9.2.0-rc1.so
.libs/libllvmradeon9.2.0-rc1_la-radeon_setup_tgsi_llvm.o: In function `tgsi2llvmtype':
/var/tmp/portage/media-libs/mesa-9.2.0_rc1/work/Mesa-9.2.0-rc1/src/gallium/drivers/radeon/radeon_llvm.h:139: undefined reference to `LLVMInt32TypeInContext'

followed by hundreds of:
undefined reference to `LLVM$whatever'

probably it's missing a -lllvm when making the shared object.
but i read that llvm ldflag it's something special and need to be taken from llvm-config --ldflags output.
after a little search in mesa sources i find that a scons script ( never heard of this language before ) call env.ParseConfig on llvm-config output:
http://www.scons.org/doc/1.2.0/HTML/scons-user/c1814.html
https://github.com/jobermayr/mesa/blob/master/scons/llvm.py

how can i fix that?
thanks for your support :)

PS: i'm keeping using my dirty-patched version without problems ;)
Comment 17 tux_mind 2013-08-29 23:56:14 UTC
i forget to give you the output of my llvm-config --ldflags:
-L/usr/lib64/llvm -Wl,-R -Wl,/usr/lib64/llvm  -lz -lpthread -lffi -lrt -ldl -lm
Comment 18 Johannes Obermayr 2013-08-30 16:36:34 UTC
This one should compile (I forgot to adapt a few things after I moved it up in my patch series):
https://github.com/jobermayr/mesa/commit/7e31d65
Comment 19 Johannes Obermayr 2013-09-10 11:44:43 UTC
Created attachment 85550 [details] [review]
link libradeon only once

Please try this patch.
libradeon will be linked in egl_gallium.so only once.
Comment 20 Rafael Castillo 2013-09-18 02:03:00 UTC
testing
Comment 21 Rafael Castillo 2013-09-18 02:09:46 UTC
bOOM

../../../../src/gallium/drivers/radeonsi/.libs/libradeonsi.a(r600_blit.o): In function `r600_blit_decompress_depth':
r600_blit.c:(.text+0x1f00): multiple definition of `r600_blit_decompress_depth'
../../../../src/gallium/drivers/r600/.libs/libr600.a(r600_blit.o):r600_blit.c:(.text+0x1fa0): first defined here
../../../../src/gallium/drivers/radeonsi/.libs/libradeonsi.a(r600_blit.o): In function `r600_decompress_color_textures':
r600_blit.c:(.text+0x23c0): multiple definition of `r600_decompress_color_textures'
../../../../src/gallium/drivers/r600/.libs/libr600.a(r600_blit.o):r600_blit.c:(.text+0x2780): first defined here
../../../../src/gallium/drivers/radeonsi/.libs/libradeonsi.a(r600_query.o): In function `r600_init_query_functions':
r600_query.c:(.text+0x280): multiple definition of `r600_init_query_functions'
../../../../src/gallium/drivers/r600/.libs/libr600.a(r600_query.o):r600_query.c:(.text+0x2000): first defined here
../../../../src/gallium/drivers/radeonsi/.libs/libradeonsi.a(r600_resource.o): In function `r600_init_screen_resource_functions':
r600_resource.c:(.text+0x40): multiple definition of `r600_init_screen_resource_functions'
../../../../src/gallium/drivers/r600/.libs/libr600.a(r600_resource.o):r600_resource.c:(.text+0x80): first defined here
../../../../src/gallium/drivers/radeonsi/.libs/libradeonsi.a(r600_resource.o): In function `r600_init_context_resource_functions':
r600_resource.c:(.text+0x80): multiple definition of `r600_init_context_resource_functions'
../../../../src/gallium/drivers/r600/.libs/libr600.a(r600_resource.o):r600_resource.c:(.text+0xc0): first defined here
../../../../src/gallium/drivers/radeonsi/.libs/libradeonsi.a(r600_texture.o): In function `r600_init_flushed_depth_texture':
r600_texture.c:(.text+0x14e0): multiple definition of `r600_init_flushed_depth_texture'
../../../../src/gallium/drivers/r600/.libs/libr600.a(r600_texture.o):r600_texture.c:(.text+0x2e60): first defined here
../../../../src/gallium/drivers/radeon/.libs/libradeon.a(libradeon_la-radeon_llvm_emit.o): In function `radeon_llvm_compile':
radeon_llvm_emit.c:(.text+0x1eb): undefined reference to `elf_version'
radeon_llvm_emit.c:(.text+0x243): undefined reference to `elf_memory'
radeon_llvm_emit.c:(.text+0x255): undefined reference to `elf_getshdrstrndx'
radeon_llvm_emit.c:(.text+0x267): undefined reference to `elf_nextscn'
radeon_llvm_emit.c:(.text+0x27e): undefined reference to `gelf_getshdr'
radeon_llvm_emit.c:(.text+0x29b): undefined reference to `elf_strptr'
radeon_llvm_emit.c:(.text+0x2cf): undefined reference to `elf_getdata'
radeon_llvm_emit.c:(.text+0x37e): undefined reference to `elf_getdata'
collect2: error: ld returned 1 exit status
gmake[3]: *** [egl_gallium.la] Error 1
Comment 22 Johannes Obermayr 2013-09-18 02:46:47 UTC
Created attachment 86034 [details] [review]
new attemp with LIBADD += $(ELF_LIB)

I hate this automake / libtool because mine seems to be smart and I don't see this errors ...
Comment 23 Rafael Castillo 2013-09-18 03:32:05 UTC
../../../../src/gallium/drivers/radeonsi/.libs/libradeonsi.a(r600_blit.o): In function `r600_blit_decompress_depth':
r600_blit.c:(.text+0x1f00): multiple definition of `r600_blit_decompress_depth'
../../../../src/gallium/drivers/r600/.libs/libr600.a(r600_blit.o):r600_blit.c:(.text+0x1fa0): first defined here
../../../../src/gallium/drivers/radeonsi/.libs/libradeonsi.a(r600_blit.o): In function `r600_decompress_color_textures':
r600_blit.c:(.text+0x23c0): multiple definition of `r600_decompress_color_textures'
../../../../src/gallium/drivers/r600/.libs/libr600.a(r600_blit.o):r600_blit.c:(.text+0x2780): first defined here
../../../../src/gallium/drivers/radeonsi/.libs/libradeonsi.a(r600_query.o): In function `r600_init_query_functions':
r600_query.c:(.text+0x280): multiple definition of `r600_init_query_functions'
../../../../src/gallium/drivers/r600/.libs/libr600.a(r600_query.o):r600_query.c:(.text+0x2000): first defined here
../../../../src/gallium/drivers/radeonsi/.libs/libradeonsi.a(r600_resource.o): In function `r600_init_screen_resource_functions':
r600_resource.c:(.text+0x40): multiple definition of `r600_init_screen_resource_functions'
../../../../src/gallium/drivers/r600/.libs/libr600.a(r600_resource.o):r600_resource.c:(.text+0x80): first defined here
../../../../src/gallium/drivers/radeonsi/.libs/libradeonsi.a(r600_resource.o): In function `r600_init_context_resource_functions':
r600_resource.c:(.text+0x80): multiple definition of `r600_init_context_resource_functions'
../../../../src/gallium/drivers/r600/.libs/libr600.a(r600_resource.o):r600_resource.c:(.text+0xc0): first defined here
../../../../src/gallium/drivers/radeonsi/.libs/libradeonsi.a(r600_texture.o): In function `r600_init_flushed_depth_texture':
r600_texture.c:(.text+0x14e0): multiple definition of `r600_init_flushed_depth_texture'
../../../../src/gallium/drivers/r600/.libs/libr600.a(r600_texture.o):r600_texture.c:(.text+0x2e60): first defined here
collect2: error: ld returned 1 exit status
gmake[3]: *** [egl_gallium.la] Error 1
gmake[3]: Leaving directory `/var/tmp/portage/media-libs/mesa-9999/work/Mesa-9999-amd64/src/gallium/targets/egl-static'
gmake[2]: *** [all-recursive] Error 1
gmake[2]: Leaving directory `/var/tmp/portage/media-libs/mesa-9999/work/Mesa-9999-amd64/src/gallium/targets'
gmake[1]: *** [all-recursive] Error 1
gmake[1]: Leaving directory `/var/tmp/portage/media-libs/mesa-9999/work/Mesa-9999-amd64/src'
make: *** [all-recursive] Error 1

im using gentoo with mesa-9999 from x11 overlay and llvm-9999 as well, maybe is that
Comment 24 Johannes Obermayr 2013-09-18 03:47:44 UTC
$ find src/gallium/drivers/ -name "*.la" -delete -o -name "*.a" -delete -o -name "*.so" -delete
$ make

to rebuild libradeon as well as libr600 and libradeonsi without linked-in libradeon ...
Comment 25 Rafael Castillo 2013-09-18 04:43:51 UTC
nope the same, i guess you have something in your branch different than mesa master maybe
Comment 26 Johannes Obermayr 2013-09-20 01:09:18 UTC
Created attachment 86169 [details] [review]
New version.

I renamed r600_ functions to si_ in radeonsi to avoid duplicate symbols.
Some duplicate code is now in radeon/r600_{blit,texture}_common.{c,h}.

Tested on AMD Fusion [Radeon HD 6310].
Built with:
../configure --enable-xvmc --enable-vdpau --enable-texture-float --enable-debug --with-dri-drivers="" --with-gallium-drivers=r600,radeonsi --enable-dri --enable-glx --enable-osmesa --enable-gles1 --enable-gles2 --enable-openvg --enable-gbm --enable-xa --enable-gallium-egl --enable-gallium-llvm --enable-gallium-gbm --enable-r600-llvm-compiler --enable-opencl
Comment 27 Rafael Castillo 2013-09-20 14:24:45 UTC
many thanks for your hard work, ill test it tonight and report back
Comment 28 Rafael Castillo 2013-09-20 23:44:44 UTC
Sadly this patch breaks mesa to the point glamor/xinit/weston hangs my 7770
Comment 29 Rafael Castillo 2013-09-23 16:02:04 UTC
I have a working memory saving patch but I am not authorised to publish it.
Comment 30 Fabio Pedretti 2013-11-03 12:48:44 UTC
Is this still an issue?

(In reply to comment #29)
> I have a working memory saving patch but I am not authorised to publish it.

What do you mean?
Comment 31 Rafael Castillo 2013-11-04 16:19:49 UTC
(In reply to comment #30)
> Is this still an issue?
> 
> (In reply to comment #29)
> > I have a working memory saving patch but I am not authorised to publish it.
> 
> What do you mean?

yes is still an issue

that Johannes Obermayr gave me a patch that fixed it but seems there is some issue with upstream and i don't have his autorization to publish the patch
Comment 32 farmboy0+freedesktop 2014-01-12 22:13:26 UTC
I have a Radeon HD 7750 and I am hit by the same problem with Mesa 10.0.2.
Is there anything I can do to help you fix this bug?
Comment 33 Michel Dänzer 2014-01-14 09:17:53 UTC
Does this still happen with current Mesa Git master? There should no longer be any symbol clashes between radeonsi and r600g.
Comment 34 Laurent carlier 2014-01-14 09:52:22 UTC
(In reply to comment #33)
> Does this still happen with current Mesa Git master? There should no longer
> be any symbol clashes between radeonsi and r600g.

Yes, fixed here with radeonsi and mesa-git
Comment 35 Marek Olšák 2014-01-14 10:48:56 UTC
(In reply to comment #34)
> (In reply to comment #33)
> > Does this still happen with current Mesa Git master? There should no longer
> > be any symbol clashes between radeonsi and r600g.
> 
> Yes, fixed here with radeonsi and mesa-git

I'm closing this then.
Comment 36 Rafael Castillo 2014-01-14 12:20:04 UTC
Yes seems fully fuixed
El 14/01/2014 04:47, <bugzilla-daemon@freedesktop.org> escribió:

>   *Comment # 33 <https://bugs.freedesktop.org/show_bug.cgi?id=64810#c33>
> on bug 64810 <https://bugs.freedesktop.org/show_bug.cgi?id=64810> from
> Michel Dänzer <michel@daenzer.net> *
>
> Does this still happen with current Mesa Git master? There should no longer be
> any symbol clashes between radeonsi and r600g.
>
>  ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 37 Johannes Obermayr 2014-01-14 15:41:15 UTC
(In reply to comment #36)
> Yes seems fully fuixed

Is runtime memory usage as low as with my patch?
Comment 38 Rafael Castillo 2014-01-15 23:17:39 UTC
(In reply to comment #37)
> (In reply to comment #36)
> > Yes seems fully fuixed
> 
> Is runtime memory usage as low as with my patch?

seems so, but im not really sure if ksysguard is that exact


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.