Bug 107655 - X segfaults on startup in r300_dri.so, making system unusable
Summary: X segfaults on startup in r300_dri.so, making system unusable
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/swr (show other bugs)
Version: 18.1
Hardware: x86-64 (AMD64) Linux (All)
: medium blocker
Assignee: mesa-dev
QA Contact: mesa-dev
URL: https://bugzilla.opensuse.org/show_bu...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-22 05:15 UTC by Sergey Kondakov
Modified: 2019-09-18 18:24 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Xorg.pid-1154.gdb.log (12.23 KB, text/plain)
2018-08-22 05:15 UTC, Sergey Kondakov
Details
Xorg.0.log (17.60 KB, text/plain)
2018-08-22 05:16 UTC, Sergey Kondakov
Details
Asus_F3Ke.dmesg (64.65 KB, text/plain)
2018-08-22 05:18 UTC, Sergey Kondakov
Details

Description Sergey Kondakov 2018-08-22 05:15:48 UTC
Created attachment 141236 [details]
Xorg.pid-1154.gdb.log

I have updated my old Asus F3Ke notebook with "ATI Mobility Radeon X2300 (ChipID = 0x718a)" after neglecting it for 1-2 years and now I can't launch desktop session because X immediately segfaults. This is what I was able to get after installing debug data and launching Xgdb script.
Comment 1 Sergey Kondakov 2018-08-22 05:16:27 UTC
Created attachment 141237 [details]
Xorg.0.log

Normal X log.
Comment 2 Sergey Kondakov 2018-08-22 05:18:18 UTC
Created attachment 141238 [details]
Asus_F3Ke.dmesg

dmesg from affected machine.
Comment 3 Michel Dänzer 2018-08-22 08:27:44 UTC
GCC's libstdc++ code crashes trying to use an instruction not supported by your CPU. You need to report this to your distro.
Comment 4 Sergey Kondakov 2018-08-27 14:47:48 UTC
(In reply to Michel Dänzer from comment #3)
> GCC's libstdc++ code crashes trying to use an instruction not supported by
> your CPU. You need to report this to your distro.

So, I've bothered my distro's bugzilla, and gcc's, then figured out why it was crashing: Mesa doesn't like being built with clang/gold and ThinLTO (Mesa doesn't build via gcc with LTO and openSUSE's OBS can't handle gcc's LTO implementation even if it would). I don't know the actual reason of the crash but the guys there figured out that the crash was coming from AVX instruction in Mesa's SWR code. The affected machine does not support any kind of AVX, so it threw out the error. But it's unclear why SWR even been trying to initialize during the load of r300_dri. If built without any {C,LD}FLAGS and with gcc, nothing crashes even with SWR built and installed. And there is no trace of SWR doing things at boot on AVX-capable amdgpu/radeonsi machine even with clang's build.
Comment 5 Michel Dänzer 2018-08-27 15:05:32 UTC
(In reply to Sergey Kondakov from comment #4)
> I don't know the actual reason of the crash but the guys there figured out that
> the crash was coming from AVX instruction in Mesa's SWR code. The affected
> machine does not support any kind of AVX, so it threw out the error. But it's
> unclear why SWR even been trying to initialize during the load of r300_dri.

I think it's the combination of two things:

* All Gallium drivers are linked into a single binary (so-called mega-driver)

* SWR is compiled with AVX support and has initializers which are automatically executed when the above binary is dlopen()ed.

Until there's a solution for this, SWR cannot be enabled in a build which has to run on non-AVX capable CPUs.
Comment 6 Ilia Mirkin 2018-08-27 15:09:07 UTC
The intention (and original function) was that this would be safe -- swr would check for AVX support and load the relevant library (libAVX/libAVX2), or bail on load.

Something must have regressed in there and no one noticed since AVX has become fairly common.
Comment 7 mirh 2018-08-27 19:17:26 UTC
(In reply to Ilia Mirkin from comment #6)
> The intention (and original function) was that this would be safe -- swr
> would check for AVX support and load the relevant library (libAVX/libAVX2),
> or bail on load.
> 
> Something must have regressed in there and no one noticed since AVX has
> become fairly common.

Or, I mean, more easily it might be a problem with its special build root?
AVX might be common nowadays, but not *that* much (especially considering all <i3 Intel cpus cut it)

I, for one, have no problem with these switches on a Zacate (which is pretty near as for extensions)
https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=mesa-git#n48
Comment 8 GitLab Migration User 2019-09-18 18:24:29 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/197.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.