Bug 104762

Summary: Various segfaults/problems in qt/plasma
Product: Mesa Reporter: Christoph Haag <haagch>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium CC: ao, devurandom, mike, t_arceri, vedran
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: mostly useless gdb backtrace
systemsettings black
better gdb backtrace
possible fix

Description Christoph Haag 2018-01-24 08:51:33 UTC
Created attachment 136931 [details]
mostly useless gdb backtrace

I think it's related to the series that ended with this commit "st/glsl_to_tgsi: add ARB_get_program_binary support using TGSI"
https://cgit.freedesktop.org/mesa/mesa/commit/?id=a20016d8277f9cd68620784417a57ae227783a04
but I have to try around a bit more.

It doesn't always happen. I think it's related to the shader cache. Sometimes deleting the shader cache helps, sometimes it doesn't. I think MESA_EXTENSION_OVERRIDE=-GL_ARB_get_program_binary helps.

Usually export LD_LIBRARY_PATH=... LIBGL_DRIVERS_PATH=... to another mesa build, starting the application once and then starting it with the system mesa again helps. It could be related to mesa first deleting the other shader cache before making a new one. Some weird race condition?

Applications I've seen affected:
sddm-greeter (segfaulting)
plasmashell (segfaulting
systemsettings (rendering black window content, segfaulting)
krunner (rendering breaking, segfaulting)

As far as I can tell it doesn't happen with mesa debug builds...

More investigation is needed.
Comment 1 Michel Dänzer 2018-01-24 09:00:23 UTC
Can you try running an affected application in valgrind (either with a debug build of radeonsi_dri.so, or at least a release build with -g)?
Comment 2 Christoph Haag 2018-01-24 09:26:53 UTC
Created attachment 136932 [details]
systemsettings black

I think I have tried running with valgrind before. I tried valgrind with -Og mesa rendering the black window, but no (relevant) errors comes up.

To describe one case:
I just tried a mesa debug build and everything worked normal.
Then with the same options, but a release build with additionally CFLAGS=-Og CXXFLAGS=-Og renders systemsettings5 and plasmashell completely black. No segfaults this time. Starting plasmashell and systemsettings with MESA_EXTENSION_OVERRIDE=-GL_ARB_get_program_binary makes it display normally. But starting it without the variable after that makes it black again. Deleting the shader cache does not help in this case.

But after starting systemsettings5 with another mesa installation once, it starts working normally with the mesa build that produced the black window just before too.

It's possible the segfaults and the black windows are different problems but they started happening at the same time.
Comment 3 Mike Lothian 2018-01-24 10:33:06 UTC
I made this go away by deleting ~/.cache/qtshadercache and ~/.cache/mesa_shader_cache
Comment 4 Christoph Haag 2018-01-24 14:59:52 UTC
Created attachment 136940 [details]
better gdb backtrace

Managed to get a segfault (happens every time now) with the same mesa build that previously was just showing a black screen, now with a better backtrace.

Deleting only the qt shadercache (TIL that's a thing) doesn't help. Deleting only the mesa shader cache doesn't help. Deleting both does help. Weird.
Comment 5 Timothy Arceri 2018-01-25 01:31:24 UTC
Created attachment 136950 [details] [review]
possible fix

I wasn't able to reproduce the issue, but can you give this patch a try?
Comment 6 Dieter Nützel 2018-01-25 03:59:12 UTC
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

I got it since the commit went in.
Deleting both like Mike suggested worked here since that time, too.
I ignored it 'cause I thought that I'm running devel stuff and my son's account switched daily from Mesa devel to release back and forth...;-)
Comment 7 Christoph Haag 2018-01-25 08:36:11 UTC
(In reply to Timothy Arceri from comment #5)
> Created attachment 136950 [details] [review] [review]
> possible fix
> 
> I wasn't able to reproduce the issue, but can you give this patch a try?

Preliminary result: It does not help, after booting today with the patch plasmashell was segfaulting again until
rm -rf ~/.cache/mesa_shader_cache/ ~/.cache/qtshadercache
Comment 8 Dieter Nützel 2018-01-25 10:33:56 UTC
Sometimes I wiped this, too.

.cache/plasmashell/qmlcache/
.cache/ksycoca5_de_***
.cache/plasma-svgelements-***
.cache/plasma_theme_***
.cache/icon-cache.kcache

All above files would be regenerated automatically.
After this KDE5/Plasma5 started all the time with a nice/clean desktop, again.
Comment 9 Mike Lothian 2018-01-25 10:56:09 UTC
It might be worth keeping these somewhere rather than deleting. Once everything is working, copy them back see if they issue can be reproduced that way
Comment 10 Timothy Arceri 2018-01-26 00:58:28 UTC
I was able to reproduce the problem. Fix sent to list:

https://patchwork.freedesktop.org/series/37147/
Comment 11 Timothy Arceri 2018-01-26 23:11:08 UTC
Shoul be fixed by the following commit. Please reopen if the issue continues.

commit 041b18cf23a0acf7b0eddf63cd7a2a10192432a1
Author: Timothy Arceri <tarceri@itsqueeze.com>
Date:   Fri Jan 26 11:56:50 2018 +1100

    st/shader_cache: restore num_tgsi_tokens when loading from cache
    
    Without this we will fail to correctly serialise programs when
    using glGetProgramBinary() if the program was retrieved from
    the disk cache rather than freshly compiled.
    
    Fixes: c69b0dd6817b "st/glsl_to_tgsi: store num_tgsi_tokens in st_*_program"
    
    Reviewed-by: Gert Wollny <gw.fossdev@gmail.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104762
Comment 12 Michel Dänzer 2018-01-30 09:25:25 UTC
*** Bug 104806 has been marked as a duplicate of this bug. ***
Comment 13 Mike Lothian 2018-01-30 10:13:55 UTC
That commit needs to be cherry picked to the 18.0 branch
Comment 14 Dennis Schridde 2018-01-31 01:18:06 UTC
Confirming: 041b18cf23a0acf7b0eddf63cd7a2a10192432a1 applied to 18.0.0_rc3, followed by cleaning ~/.cache of root, sddm and my user stops the crashes.
Comment 15 Fireball 2018-02-05 07:07:21 UTC
I still experience this with mesa 18.0.0_rc3 and r600 driver.
Resolution first looked ok, user processes (plasmashell etc.) stopped crashing, but screen locker and logout screen crash (not right away, after a few hours of working).

As always, clearing caches in /root/.cache and /var/lib/sddm/.cache helps, but this problem would occur again.
I use Qt 5.9.4.
Backtrace of /usr/lib64/libexec/ksmserver-logout-greeter mentions /usr/lib64/dri/r600_dri.so.
Comment 16 Dennis Schridde 2018-02-05 08:43:01 UTC
(In reply to Fireball from comment #15)
> I still experience this with mesa 18.0.0_rc3 and r600 driver.

It is not supposed to be fixed by rc3, since Timothy's patch 041b18cf23a0acf7b0eddf63cd7a2a10192432a1 only got applied after that version was released.  It also was not yet backported to the 18.0 branch [1], so for now you need to apply it manually (e.g. by placing it in /etc/portage/patches/media-libs/mesa-18.0.0_rc3, if you are on Gentoo).

[1]: https://cgit.freedesktop.org/mesa/mesa/log/?h=18.0
Comment 17 Timothy Arceri 2018-02-05 09:41:43 UTC
Please don't reopen this bug, the fix is already in master and has been tagged as a fix in the commit message so it will be picked up in the next stable version.
Comment 18 Christoph Haag 2018-02-12 09:00:03 UTC
To be fair, the segfaults are fixed, but sddm and plasmashell randomly not working/rendering until the shader cache(s) are deleted is still happening. Unfortunately that's a bit harder to debug, but there might still be a similar issue hiding somewhere.
Comment 19 Christoph Haag 2018-02-14 22:41:35 UTC
(In reply to Christoph Haag from comment #18)
> To be fair, the segfaults are fixed, but sddm and plasmashell randomly not
> working/rendering until the shader cache(s) are deleted is still happening.
> Unfortunately that's a bit harder to debug, but there might still be a
> similar issue hiding somewhere.

This may be rather bug 105065, which is really a QT bug: https://bugreports.qt.io/browse/QTBUG-66420

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.