Bug 111376 - [bisected] Steam crashes when newest Iris built with LTO
Summary: [bisected] Steam crashes when newest Iris built with LTO
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/Iris (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Mark Janes
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords: bisected, regression
Depends on:
Blocks:
 
Reported: 2019-08-12 09:46 UTC by Mike Lothian
Modified: 2019-09-25 18:47 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Mike Lothian 2019-08-12 09:46:17 UTC
I've bisected back to https://gitlab.freedesktop.org/mesa/mesa/commit/0fd4359733e6920d5cac9596eeada753a587a246

I was seeing steam[1804860]: segfault at 0 ip 00000000f6371b97 sp 00000000fff096c0 error 6 in iris_dri.so[f5cf7000+1134000] in my dmesg 

Here's the build:

meson --buildtype plain --libdir lib --localstatedir /var/lib --prefix /usr --sysconfdir /etc --wrap-mode nodownload --cross-file /var/tmp/portage/media-libs/mesa-9999/temp/meson.i686-pc-linux-gnu.x86 -Dplatforms=surfaceless,x11,wayland,drm -Dllvm=true -Dlmsensors=true -Dlibunwind=false -Dgallium-nine=true -Dgallium-va=true -Dva-libs-path=/usr/lib/va/drivers -Dgallium-vdpau=true -Dgallium-xa=false -Dgallium-xvmc=false -Dgallium-opencl=disabled -Dglx-read-only-text=false -Dosmesa=none -Dbuild-tests=false -Dglx=dri -Dshared-glapi=true -Ddri3=true -Degl=true -Dgbm=true -Dgles1=false -Dgles2=true -Dglvnd=false -Dselinux=false -Dvalgrind=false -Ddri-drivers= -Dgallium-drivers=iris,radeonsi,swrast -Dvulkan-drivers=amd,intel -Dvulkan-overlay-layer=true --buildtype plain -Db_ndebug=true /var/tmp/portage/media-libs/mesa-9999/work/mesa-9999 /var/tmp/portage/media-libs/mesa-9999/work/mesa-9999-abi_x86_32.x86
The Meson build system
Version: 0.51.1
Source dir: /var/tmp/portage/media-libs/mesa-9999/work/mesa-9999
Build dir: /var/tmp/portage/media-libs/mesa-9999/work/mesa-9999-abi_x86_32.x86
Build type: cross build
Program python found: YES (/var/tmp/portage/media-libs/mesa-9999/temp/python3.7/bin/python)
Project name: mesa
Project version: 19.2.0-devel
Appending CFLAGS from environment: '-O3 -march=native -pipe -flto=8'
Appending LDFLAGS from environment: '-O3 -march=native -pipe -flto=8 -Wl,-O2 -Wl,--hash-style=gnu -Wl,--as-needed -Wl,--build-id=sha1'
C compiler for the build machine: x86_64-pc-linux-gnu-gcc -m32 (gcc 9.1.0 "x86_64-pc-linux-gnu-gcc (Gentoo 9.1.0-r1 p1.1) 9.1.0")
Appending CXXFLAGS from environment: '-O3 -march=native -pipe -flto=8'
Appending LDFLAGS from environment: '-O3 -march=native -pipe -flto=8 -Wl,-O2 -Wl,--hash-style=gnu -Wl,--as-needed -Wl,--build-id=sha1'
C++ compiler for the build machine: x86_64-pc-linux-gnu-g++ -m32 (gcc 9.1.0 "x86_64-pc-linux-gnu-g++ (Gentoo 9.1.0-r1 p1.1) 9.1.0")
C compiler for the host machine: x86_64-pc-linux-gnu-gcc -m32 (gcc 9.1.0 "x86_64-pc-linux-gnu-gcc (Gentoo 9.1.0-r1 p1.1) 9.1

And a backtrace from an unstripped mesa

Thread 1 "steam" received signal SIGSEGV, Segmentation fault.
0xf632d4ad in ralloc_steal () from /usr/lib/dri/iris_dri.so
(gdb) bt
#0  0xf632d4ad in ralloc_steal () from /usr/lib/dri/iris_dri.so
#1  0xf63efe93 in steal_memory(ir_instruction*, void*) [clone .lto_priv.0] () from /usr/lib/dri/iris_dri.so
#2  0xf63ee7ca in ir_hierarchical_visitor::visit_enter(ir_function*) () from /usr/lib/dri/iris_dri.so
#3  0xf63ee0b1 in ir_function::accept(ir_hierarchical_visitor*) () from /usr/lib/dri/iris_dri.so
#4  0xf6678787 in _mesa_get_fixed_func_fragment_program () from /usr/lib/dri/iris_dri.so
#5  0xf675c65b in update_program () from /usr/lib/dri/iris_dri.so
#6  0xf6772b2f in _mesa_update_state_locked () from /usr/lib/dri/iris_dri.so
#7  0xf6773237 in _mesa_update_state () from /usr/lib/dri/iris_dri.so
#8  0xf6497467 in _mesa_Clear () from /usr/lib/dri/iris_dri.so
#9  0xed1ae806 in ?? () from /home/fireburn/.local/share/Steam/ubuntu12_32/vgui2_s.so
#10 0xed1bd4ed in ?? () from /home/fireburn/.local/share/Steam/ubuntu12_32/vgui2_s.so
#11 0xf054bc6d in ?? () from /home/fireburn/.local/share/Steam/ubuntu12_32/steamui.so
#12 0xf054bef5 in ?? () from /home/fireburn/.local/share/Steam/ubuntu12_32/steamui.so
#13 0xf053e28f in ?? () from /home/fireburn/.local/share/Steam/ubuntu12_32/steamui.so
#14 0xf0491eaa in ?? () from /home/fireburn/.local/share/Steam/ubuntu12_32/steamui.so
#15 0xf0493c2e in ?? () from /home/fireburn/.local/share/Steam/ubuntu12_32/steamui.so
#16 0x5658e1b0 in RunSteam(int, char**, bool) ()
#17 0x5658f0ab in ?? ()
#18 0x5657a06c in ?? ()
#19 0xf78a5021 in __libc_start_main () from /lib/libc.so.6
#20 0x5657dd29 in _start ()
Comment 1 Denis 2019-08-12 15:40:03 UTC
hi, confirming the crash. In my case I cut installation script (check below).
I believe that simply iris enabled would be enough.

meson setup . mbuild_dbg_x64 \
-Dplatforms=surfaceless,x11,wayland,drm \
-Dprefix=/home/den/mesa64/mesa-commit_test/ \
-Dlmsensors=true \
-Dlibunwind=false \
-Dgallium-nine=false \
-Dgallium-xa=false \
-Dgallium-xvmc=false \
-Dgallium-opencl=disabled \
-Dglx-read-only-text=false \
-Dosmesa=none \
-Dbuild-tests=false \
-Dglx=dri \
-Dshared-glapi=true \
-Ddri3=true \
-Degl=true \
-Dgbm=true \
-Dgles1=false \
-Dgles2=true \
-Dglvnd=false \
-Dselinux=false \
-Dvalgrind=false \
-Ddri-drivers= \
-Dgallium-drivers=iris \
-Dvulkan-drivers= \
-Dvulkan-overlay-layer=true \
-Db_ndebug=true \
-Dbuildtype=debug
Comment 2 Mark Janes 2019-08-12 17:10:55 UTC
Mike,

Your last comment does not configure LTO/O3.  Does this reproduce for you on debug builds?

What is your hardware and kernel?

thanks!
Comment 3 Kenneth Graunke 2019-08-12 17:36:14 UTC
I suspect it's something crashing in iris_monitor_init_metrics.  The only thing in that commit of real relevance is the new driver hooks, and...

[Sunday, August 11, 2019] [4:13:07 PM PDT] <Kayden> does it help if you drop the iris_screen.c changes?  + pscreen->get_driver_query_group_info = iris_get_monitor_group_info;    and   +   pscreen->get_driver_query_info = iris_get_monitor_info;
[Sunday, August 11, 2019] [4:29:12 PM PDT] <FireBurn>   Yip that worked

I have no idea why LTO would matter.
Comment 4 Mike Lothian 2019-08-12 21:35:07 UTC
If I enable debugging and still pass LTO flags in I don't see the issue

I'll see if I can get anything more useful out
Comment 5 Mike Lothian 2019-08-12 21:49:13 UTC
Here's a slightly different back trace:

Thread 1 "steam" received signal SIGSEGV, Segmentation fault.
0xf6411277 in ir_function::clone(void*, hash_table*) const () from /usr/lib/dri/iris_dri.so
(gdb) bt
#0  0xf6411277 in ir_function::clone(void*, hash_table*) const () from /usr/lib/dri/iris_dri.so
#1  0xf641ce8c in clone_ir_list(void*, exec_list*, exec_list const*) () from /usr/lib/dri/iris_dri.so
#2  0xf5e0c875 in link_intrastage_shaders(void*, gl_context*, gl_shader_program*, gl_shader**, unsigned int, bool) [clone .constprop.0] () from /usr/lib/dri/iris_dri.so
#3  0xf640c187 in link_shaders(gl_context*, gl_shader_program*) [clone .part.0] () from /usr/lib/dri/iris_dri.so
#4  0xf64c70f3 in _mesa_glsl_link_shader () from /usr/lib/dri/iris_dri.so
#5  0xf66787a2 in _mesa_get_fixed_func_fragment_program () from /usr/lib/dri/iris_dri.so
#6  0xf675c63b in update_program () from /usr/lib/dri/iris_dri.so
#7  0xf6772b0f in _mesa_update_state_locked () from /usr/lib/dri/iris_dri.so
#8  0xf6773217 in _mesa_update_state () from /usr/lib/dri/iris_dri.so
#9  0xf64974a7 in _mesa_Clear () from /usr/lib/dri/iris_dri.so
#10 0xed1a3806 in ?? () from /home/fireburn/.local/share/Steam/ubuntu12_32/vgui2_s.so
#11 0xed1b24ed in ?? () from /home/fireburn/.local/share/Steam/ubuntu12_32/vgui2_s.so
#12 0xf054bc6d in ?? () from /home/fireburn/.local/share/Steam/ubuntu12_32/steamui.so
#13 0xf054bef5 in ?? () from /home/fireburn/.local/share/Steam/ubuntu12_32/steamui.so
#14 0xf053e28f in ?? () from /home/fireburn/.local/share/Steam/ubuntu12_32/steamui.so
#15 0xf0491eaa in ?? () from /home/fireburn/.local/share/Steam/ubuntu12_32/steamui.so
#16 0xf0493c2e in ?? () from /home/fireburn/.local/share/Steam/ubuntu12_32/steamui.so
#17 0x5658e1b0 in RunSteam(int, char**, bool) ()
#18 0x5658f0ab in ?? ()
#19 0x5657a06c in ?? ()
#20 0xf78a5021 in __libc_start_main () from /lib/libc.so.6
#21 0x5657dd29 in _start ()
Comment 6 Eric Engestrom 2019-08-12 22:12:27 UTC
As a workaround, this disables LTO on GCC just for iris:
---8<---
diff --git a/src/gallium/drivers/iris/meson.build b/src/gallium/drivers/iris/meson.build
index 3f611c2b5698be71ba08..c9f62a877c0df6889411 100644
--- a/src/gallium/drivers/iris/meson.build
+++ b/src/gallium/drivers/iris/meson.build
@@ -85,8 +85,8 @@ libiris = static_library(
     # these should not be necessary, but main/macros.h...
     inc_mesa, inc_mapi
   ],
-  c_args : [c_vis_args, c_sse2_args],
-  cpp_args : [cpp_vis_args, c_sse2_args],
+  c_args : [c_vis_args, c_sse2_args, gcc_lto_quirk],
+  cpp_args : [cpp_vis_args, c_sse2_args, gcc_lto_quirk],
   dependencies : [dep_libdrm, dep_valgrind, idep_genxml, idep_libintel_common],
   link_with : [
     iris_gen_libs, libintel_compiler, libintel_dev, libisl,
--->8---

Does this help?

Have you checked if LTO causes any issues on Clang for instance?
Comment 7 Mike Lothian 2019-08-12 22:27:05 UTC
Thanks, I'm currently working around it with:

diff --git a/src/gallium/drivers/iris/iris_screen.c b/src/gallium/drivers/iris/iris_screen.c
index e92685d4ae6..97eaeb15d4d 100644
--- a/src/gallium/drivers/iris/iris_screen.c
+++ b/src/gallium/drivers/iris/iris_screen.c
@@ -684,8 +684,6 @@ iris_screen_create(int fd, const struct pipe_screen_config *config)
    pscreen->flush_frontbuffer = iris_flush_frontbuffer;
    pscreen->get_timestamp = iris_get_timestamp;
    pscreen->query_memory_info = iris_query_memory_info;
-   pscreen->get_driver_query_group_info = iris_get_monitor_group_info;
-   pscreen->get_driver_query_info = iris_get_monitor_info;
 
    return pscreen;
 }
Comment 8 Mike Lothian 2019-08-12 22:47:04 UTC
Using clang-10 from git also works around the issue
Comment 9 Mike Lothian 2019-08-13 23:43:00 UTC
I think this could be related to glibc 2.30

I've see another issue in libacl with systemd-tmpfiles that looks similar
Comment 10 Mark Janes 2019-09-04 18:27:21 UTC
Hi Mike,

Did you get any more information about glibc 2.30 / 32bit LTO issues?  I'm stumped as to how the bisected commit could trigger a segfault in a completely different code path.

I might be able to change the initialization to defer querying the performance counters until first use.  However, it seems to me that we shouldn't complicate mesa to work around a bug in glibc.
Comment 11 Mark Janes 2019-09-15 13:58:53 UTC
Since this issue can't yet be confirmed as a mesa bug (vs glibc), I'm removing it from the 19.2 release tracker.
Comment 12 Mike Lothian 2019-09-15 15:28:44 UTC
No probs, this is Iris only anyway so shouldn't be a blocker at all
Comment 13 GitLab Migration User 2019-09-25 18:47:23 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1358.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.