Summary: | [Debug mesa]. Dirt 4 crashes after launching | ||
---|---|---|---|
Product: | Mesa | Reporter: | Denis <denys.kostin> |
Component: | Drivers/Vulkan/intel | Assignee: | Caio Marcelo de Oliveira Filho <caio.oliveira> |
Status: | RESOLVED FIXED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | jason, leozinho29_eu |
Version: | git | Keywords: | bisected, regression |
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Bug Depends on: | |||
Bug Blocks: | 111444 | ||
Attachments: | Valgrind, gdb and shader cache |
Description
Denis
2019-08-09 11:48:09 UTC
hmmm, I am frustrated 8-/ I had a yesterday build with mesa, commit => 207026d29e And got crashes. Then I started bisection, in the middle of it switched to "release" from "debug", to speed up first game launch. And crashes disappeared. Ok, I built latest (today) mesa, both, release and debug => 5e38db0c47ca57c6e904f44d0d0e9ef299d14f3c And I couldn't reproduce the crash. Tried to bisect between yesterday and today's commits (4 steps only) - and was pointed to 100% wrong commit (related to virgl). Then I re-built mesa on 207026d29e commit again, and now can't reproduce crashes. I removed mesa shaders cache (~/.cache/mesa...) and also couldn't reproduce. So there are two assumptions: 1. crash was really fixed in today mesa version 2. crash is flaky, and need to spend more time, to catch it. I will try to get a gdb backtrace for it I just finished the git bisect. The result is: b6d475356846f57a034e662ab9245d11ed0dd4a0 is the first bad commit nir/large_constants: De-duplicate constants The log: git bisect start # good: [e4e6a3deaff4f84f0fb99b4dec950dc498d507ed] panfrost: Implement FIXED formats git bisect good e4e6a3deaff4f84f0fb99b4dec950dc498d507ed # bad: [5a898e2a652843dbb9b013437b0715c3563cafdb] pan/midgard: Disassemble load/store barrel shift git bisect bad 5a898e2a652843dbb9b013437b0715c3563cafdb # bad: [5a898e2a652843dbb9b013437b0715c3563cafdb] pan/midgard: Disassemble load/store barrel shift git bisect bad 5a898e2a652843dbb9b013437b0715c3563cafdb # bad: [5a898e2a652843dbb9b013437b0715c3563cafdb] pan/midgard: Disassemble load/store barrel shift git bisect bad 5a898e2a652843dbb9b013437b0715c3563cafdb # good: [e4e6a3deaff4f84f0fb99b4dec950dc498d507ed] panfrost: Implement FIXED formats git bisect good e4e6a3deaff4f84f0fb99b4dec950dc498d507ed # good: [e4e6a3deaff4f84f0fb99b4dec950dc498d507ed] panfrost: Implement FIXED formats git bisect good e4e6a3deaff4f84f0fb99b4dec950dc498d507ed # bad: [5a898e2a652843dbb9b013437b0715c3563cafdb] pan/midgard: Disassemble load/store barrel shift git bisect bad 5a898e2a652843dbb9b013437b0715c3563cafdb # bad: [5a898e2a652843dbb9b013437b0715c3563cafdb] pan/midgard: Disassemble load/store barrel shift git bisect bad 5a898e2a652843dbb9b013437b0715c3563cafdb # bad: [5a898e2a652843dbb9b013437b0715c3563cafdb] pan/midgard: Disassemble load/store barrel shift git bisect bad 5a898e2a652843dbb9b013437b0715c3563cafdb # good: [e8917dcadb376168150b36d2390644186724bc25] radv: do not decompress levels without DCC with the compute path git bisect good e8917dcadb376168150b36d2390644186724bc25 # good: [637b168470190507c89eca8a7d0479103fe236ae] nir/linker: Initialize UniformDataDefaults when using SPIR-V git bisect good 637b168470190507c89eca8a7d0479103fe236ae # bad: [58ee973e8737441a78c3ca49d3f8fe9db29447d0] radv/gfx10: do not use the fast depth or stencil clear bytes path git bisect bad 58ee973e8737441a78c3ca49d3f8fe9db29447d0 # bad: [bfaca7259ca898b5aaab0e592b76eb20e593e9f9] radeonsi/gfx10: deduplicate code for esvert_lds_size git bisect bad bfaca7259ca898b5aaab0e592b76eb20e593e9f9 # good: [06e5daf5758ffdc06a5a96ab0fe58552732e35d1] spirv_extensions: add list of extensions and to_string method git bisect good 06e5daf5758ffdc06a5a96ab0fe58552732e35d1 # good: [40e760960319bc8c9ee943c3d8136e23ef474d59] v3d: Fix assertion failures in debug builds. git bisect good 40e760960319bc8c9ee943c3d8136e23ef474d59 # bad: [09a8a39940ad02951b62454a5d222af669fef694] util: use standard name for strchrnul() git bisect bad 09a8a39940ad02951b62454a5d222af669fef694 # bad: [e38b93087638781ef83c9b3cc3bb424e448a5380] nir/lower_clip: add a find_clipvertex_and_position_outputs() helper git bisect bad e38b93087638781ef83c9b3cc3bb424e448a5380 # bad: [d56f92502e21767b7f755fa7a093502b2d01ed91] panfrost: Shrink tiler heap git bisect bad d56f92502e21767b7f755fa7a093502b2d01ed91 # good: [61098baf42fc0026900a67b86336ad90fc0966a2] freedreno: Convert load_barycentric_at_offset to the NIR lowering helper. git bisect good 61098baf42fc0026900a67b86336ad90fc0966a2 # good: [0d8a4c67cf44604d648696e007740bd9fa9faa4c] freedreno: Convert nir_lower_tg4_to_tex to the NIR lowering helper. git bisect good 0d8a4c67cf44604d648696e007740bd9fa9faa4c # bad: [b6d475356846f57a034e662ab9245d11ed0dd4a0] nir/large_constants: De-duplicate constants git bisect bad b6d475356846f57a034e662ab9245d11ed0dd4a0 # good: [d9b67ad0796612620b82b7ea11a720735ce7df3f] nir/large_constants: Use ralloc for var_infos git bisect good d9b67ad0796612620b82b7ea11a720735ce7df3f # first bad commit: [b6d475356846f57a034e662ab9245d11ed0dd4a0] nir/large_constants: De-duplicate constants Some of the dmesg errors while bisecting: [51890.577528] traps: F3DWarmer.2[31485] general protection ip:7fe51bab73c0 sp:7fe4877d0490 error:0 in libvulkan_intel.so[7fe51b819000+526000] [52467.973758] traps: IdxD3D11_1[3725] general protection ip:7f8b1c6a43c0 sp:7f8a82fd74b0 error:0 in libvulkan_intel.so[7f8b1c406000+526000] [55452.557728] traps: IdxD3D11_1[1552] general protection ip:7fb3f60623c0 sp:7fb35afde4b0 error:0 in libvulkan_intel.so[7fb3f5dc4000+526000] [55862.007032] traps: IdxD3D11_1[8383] general protection ip:7f8ee8ea5060 sp:7f8e4ffe04b0 error:0 in libvulkan_intel.so[7f8ee8c07000+526000] [56103.707751] traps: IdxD3D11_1[13176] general protection ip:7fe334a2b060 sp:7fe29afde4b0 error:0 in libvulkan_intel.so[7fe33478d000+526000] [57049.055105] traps: IdxD3D11_1[28081] general protection ip:7f499bab7060 sp:7f490d7db4b0 error:0 in libvulkan_intel.so[7f499b819000+526000] sadly, but I can't reproduce the crash even on specified commit. Leozinho, did you clean the game cache before running the game? Maybe that could be related to it? /run/media/manjaro/a244962e-96b2-4c41-a8df-5609424527a0/SteamLibrary/steamapps/shadercache/421020 and ~/.cache/mesa_shader_cache/ I am testing on CFL, in your case, as I remember, SKL. But I don't believe that it is specific to platform... Because I reproduced it also got 1 crash after about 10 runs, with different mesa versions (even on bisected commit). So somewhy for me it became random.
>I will try to get a gdb backtrace for it
it was useless because showed only ??? instead of functions
I tried deleting the cache directories ~/.cache/mesa_shader_cache/ and steamapps/shadercache/421020. After deleting them, I tested both commits d9b67ad0796612620b82b7ea11a720735ce7df3f (the last good commit) and b6d475356846f57a034e662ab9245d11ed0dd4a0 (the first bad commit). I still got the crash with the bad commit and the game working (but affected by the bug 110295) with the good commit. :( I took my SKL. I was able to reproduce the issue on mesa-master - but, after about 5-10 times (crashing it) - it stopped crashing, and launched normally. Interesting, that even I exchanged mesa libs or not after that, it didn't "load" long time (as you mentioned, first launch with "custom" libs usually tooks about 5-10 minutes). Also I built "debug" mesa, and tried to get gdb backtrace, and it looks not well. @Leozinho, could you please build mesa with gdb symbols and try to read gdb? Below you can find how I built mesa, and my core-dump output: export CFLAGS='-O0 -ggdb3 -g' export CXXFLAGS='-O0 -ggdb3 -g' meson setup . mbuild_dbg_x64 -Dbuildtype=debug --prefix=/home/ubuntu/mesa_versions/mesa-git-15.08 -Dvalgrind=false -Ddri-drivers=i965 -Dgallium-drivers=iris -Dvulkan-drivers=intel -Dgallium-omx="disabled" -Dplatforms=x11,drm,surfaceless -Dtools=intel -Db_ndebug=true ninja -C ./mbuild_dbg_x64/ install coredump: ubuntu@ubuntu:~/mesa$ gdb '/home/ubuntu/.steam/steam/steamapps/common/DiRT 4/bin/Dirt4' '/home/ubuntu/.steam/steam/steamapps/common/DiRT 4/bin/core' GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /home/ubuntu/.steam/steam/steamapps/common/DiRT 4/bin/Dirt4...(no debugging symbols found)...done. warning: core file may not match specified executable file. [New LWP 28855] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/home/ubuntu/.steam/steam/steamapps/common/DiRT 4/bin/Dirt4'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000000001dbf10f in ?? () (gdb) bt #0 0x0000000001dbf10f in () #1 0x0000000001daa8eb in () #2 0x0000000001e5a612 in () #3 0x0000000001d9da45 in () #4 0x0000000001ddd94d in () #5 0x0000000001ddd592 in () #6 0x0000000001d9a6eb in () #7 0x0000000001da5c9f in () #8 0x0000000001d9fbc8 in () #9 0x0000000001d9fefb in () #10 0x0000000001a9252a in () #11 0x0000000001951a9c in () #12 0x000000000195339e in () #13 0x0000000002840cef in () #14 0x00007fd6ac6bf6db in start_thread (arg=0x7fd4ebaa1700) at pthread_create.c:463 #15 0x00007fd6a1f1c88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 I tried building Mesa with the same settings as yours, using both gcc-7 and gcc-8 (and its g++ counterparts), and in both cases the game worked well, having no crashes. Probably, what is triggering this bug are some different build settings. My build commands are based on the build settings used on Ubuntu to build Mesa, so the meson command is really, really long: env PREFIX="/usr/local/mesa64" \ CFLAGS="-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wall" \ CPPFLAGS="-Wdate-time -D_FORTIFY_SOURCE=2" CXXFLAGS="-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wall" \ FCFLAGS="-g -O2 -fstack-protector-strong" FFLAGS="-g -O2 -fstack-protector-strong" \ GCJFLAGS="-g -O2 -fstack-protector-strong" LDFLAGS="-Wl,-Bsymbolic-functions -Wl,-z,relro" \ OBJCFLAGS="-g -O2 -fstack-protector-strong -Wformat -Werror=format-security" \ OBJCXXFLAGS="-g -O2 -fstack-protector-strong -Wformat -Werror=format-security" \ PKG_CONFIG_PATH="/usr/local/mesa64/lib/pkgconfig" CC=/usr/bin/gcc-8 CXX=/usr/bin/g++-8 /usr/local/bin/meson build/ \ -Dprefix="/usr/local/mesa64" -Dlibdir="/usr/local/mesa64/lib" \ -Dplatforms="x11,drm,surfaceless" -Ddri3=true \ -Ddri-drivers="i965" -Dgallium-drivers="iris,swrast,virgl" -Dgallium-vdpau=false -Dgallium-xvmc=false \ -Dgallium-omx=disabled -Dgallium-va=false -Dgallium-xa=false -Dgallium-nine=false -Dgallium-opencl=disabled \ -Dvulkan-drivers="intel" -Dshader-cache=true -Dshared-glapi=true -Dgles1=true -Dgles2=true -Dopengl=true -Dgbm=true \ -Dglx=dri -Degl=true -Dglvnd=true -Dasm=true -Dllvm=true -Dlmsensors=true -Dosmesa=gallium -Dosmesa-bits=8 -Dglx-direct=true I will try disabling certain settings and removing certain environment variables to check which parameter is causing the crash. After removing that environment variables and setting -Dglx-direct=false (don't know why this was relevant) , I got the following backtrace: (gdb) bt #0 0x00007fffb9f9f874 in unsafe_free (info=0x7fffc6486a60) at ../src/util/ralloc.c:297 #1 0x00007fffb9f9f847 in unsafe_free (info=0x7fffc5cdf790) at ../src/util/ralloc.c:292 #2 0x00007fffb9f9f847 in unsafe_free (info=0x7fffc43a3050) at ../src/util/ralloc.c:292 #3 0x00007fffb9f9f768 in ralloc_free (ptr=0x7fffc43a3080) at ../src/util/ralloc.c:262 #4 0x00007fffb9cbb2f7 in anv_pipeline_compile_graphics (pipeline=0x7fffc4a1d320, cache=0x7fff9538bd50, info=0x7fffc4a2d4e0) at ../src/intel/vulkan/anv_pipeline.c:1428 #5 0x00007fffb9cbc7a3 in anv_pipeline_init (pipeline=0x7fffc4a1d320, device=0x7fff9538bf80, cache=0x7fff9538bd50, pCreateInfo=0x7fffc4a2d4e0, alloc=0x7fff9538bf88) at ../src/intel/vulkan/anv_pipeline.c:1911 #6 0x00007fffb9d98618 in gen9_graphics_pipeline_create (_device=0x7fff9538bf80, cache=0x7fff9538bd50, pCreateInfo=0x7fffc4a2d4e0, pAllocator=0x0, pPipeline=0x7fff23fee268) at ../src/intel/vulkan/genX_pipeline.c:2115 #7 0x00007fffb9d9900d in gen9_CreateGraphicsPipelines (_device=0x7fff9538bf80, pipelineCache=0x7fff9538bd50, count=1, pCreateInfos=0x7fffc4a2d4e0, pAllocator=0x0, pPipelines=0x7fff23fee268) at ../src/intel/vulkan/genX_pipeline.c:2365 #8 0x00007fffb8ba8078 in () at /home/usuario/.local/share/Steam/ubuntu12_64/libVkLayer_steam_fossilize.so #9 0x00007fffba5cf492 in vkCreateGraphicsPipelines () at /usr/local/mesa64/lib/libvulkan.so.1 #10 0x0000000001dd2fb1 in () #11 0x0000000001dd3b6a in () #12 0x0000000001dd41eb in () #13 0x0000000001d9f293 in () #14 0x0000000001d9f69c in () #15 0x000000000196c336 in () #16 0x0000000001a9219a in () #17 0x00000000019517a8 in () #18 0x000000000195339e in () #19 0x0000000002840cef in () #20 0x00007ffff6ef76db in start_thread (arg=0x7fff23fef700) at pthread_create.c:463 #21 0x00007fffec75488f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Using -Dosmesa=gallium DiRT 4 crashes even before any stage element can be seen. Using -Dosmesa=none or -Dosmesa=classic, SOMETIMES, I am able to watch the introduction of the stage, see the positions, configure the car, start the race, race a bit and then the game crashes. Sometimes, the game just crashes. I noticed I can see when the game is going to crash before the stage loads. If the messages like: SPIR-V WARNING: In file ../src/compiler/spirv/spirv_to_nir.c:826 Decoration not allowed on struct members: SpvDecorationRestrict 1388 bytes into the SPIR-V binary Appear, the game crashes before anything from the stage appears. If these messages do not appear, then I am able to play the race for a few seconds, then the game crashes later. Hopefully that backtraces provide useful information. After trying enough, I discovered that it's not a specific build setting or environment variable the problem. The problem is the Vulkan version used to build Mesa. If I use the default version from Ubuntu 18.04, then the game always work, no matter the Mesa git version. However, once I set to build Mesa git using Vulkan git (current Vulkan-Headers version is 23b2e8e64bdf3f25b3d73f1593e72977ebfcd39b and Vulkan-Loader version is fdc5ec43b00e03db432cb8b8bc9bdafc9599c522), then the results from the bisect I did are valid. I used PKG_CONFIG_PATH when using meson and ninja commands to point to the Vulkan git versions when building Mesa git and doing the bisect. Has this test case been run with Valgrind? It's usually good at helping find intermittent memory problems like this. I may try to use valgrind for this, but I need some guidance. When I try to open Steam using Valgrind, I get the following: $ DEBUGGER=valgrind steam steam://rungameid/421020 Running Steam on ubuntu 18.04 64-bit STEAM_RUNTIME is enabled automatically Pins up-to-date! ==23154== Memcheck, a memory error detector ==23154== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==23154== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==23154== Command: /home/usuario/.local/share/Steam/ubuntu12_32/steam steam://rungameid/421020 ==23154== ==23154== FATAL: can't open suppressions file "ubuntu12_32/steam.supp" And searching in the internet wasn't fruitful. Tried many things, some just ignored valgrind, others made the game end in an infinite loop before the launch screen opens. What is the correct procedure? Opening DiRT 4 without Steam isn't useful, as it uses different libraries causing unrelated crashes in different moments. This crash happens using Steam and happens just before or just after the stage scenario finishes loading. Another quick test you can do is to use the environment variable `NIR_SKIP=nir_opt_large_constants` (this 'works' for all the passes, but there's no guarantee all passes are safe to be skipped). With that, do we get a failure? Also: the WARNING about restrict decoration is a "false alarm" -- just some info we are currently ignoring. Just guessing: 1. FATAL: can't open suppressions file "ubuntu12_32/steam.supp" Its suppression file of valgrind - just try to create some simple one (see https://wiki.wxwidgets.org/Valgrind_Suppression_File_Howto) or copy from /usr/lib/valgrind/default.supp 2. Most likely you will need to run a game primary as input for valgrind (from path like /home/user/.steam/steam/steamapps/common/some_game/game_bin) I(In reply to leozinho29_eu from comment #12) > And searching in the internet wasn't fruitful. Tried many things, some just > ignored valgrind, others made the game end in an infinite loop before the > launch screen opens. What is the correct procedure? Opening DiRT 4 without > Steam isn't useful, as it uses different libraries causing unrelated crashes > in different moments. This crash happens using Steam and happens just before > or just after the stage scenario finishes loading. I wouldn't waste time trying to get the app to run under valgrind. Between wine and the game, you're likely to see piles of valgrind errors and trying to find the mesa issue will be near impossible. One thing to try would be to capture the pipelines with fossilize (https://github.com/ValveSoftware/Fossilize) and then run valgrind on a fossilize replay. If you can repro with a fossilize replay, it also makes debugging way easier because you can use normal GDB instead of the wine version. I tested using NIR_SKIP=nir_opt_large_constants. Setting this environment variable, the game worked correctly without crashing. DiRT 4 used Wine and DXVK back when I reported the bug 110295. Recently, DiRT 4 received a native port for Linux, which gave a significant performance increase to the game. But even with it being native now, the script that sets certain environment variables and check if Steam is running makes certain debugging operations difficult. I have even noticed that the game may use different Vulkan version every time I open it! Sometimes it uses the intended libvulkan.so, which is the .1.121, but sometimes it uses libvulkan.so from Steam, which is the .1.73, so I have to check the maps every time I open the game, because every time I open it may change the library being used. Of course, this complicates debugging. I still have to check both Fossilize and valgrind configuration file. I think it is relevant to comment that the game works using NIR_SKIP=nir_opt_large_constants. Created attachment 145299 [details]
Valgrind, gdb and shader cache
I managed to get the data using Fossilize + valgrind. I needed some time to understand how to use Fossilize, however.
The attached file has the shader cache for DiRT 4, the command I used to reproduce the crash, the gdb output and valgrind output.
If someone from Valve is reading this: make the text box to add launch options bigger, please. It's unreadable once two environment variables are set.
Hopefully the data attached will be helpful.
If 'NIR_SKIP=nir_opt_large_constants' helps, than, maybe, just for case you (@leozinho) could check with https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1802 ? I have tested the commit d89d075589964b88c98f885d07fa45a6ec9d066c with the patch. The SPIR-V WARNINGs were present and the game froze for a considerable time (more than one minute) when the stage loading finished, but it didn't crash. The game worked after applying the patch. Using Mesa 19.3.0-devel (git-eea6f21cbd), DiRT 4 is no longer crashing, as https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1802 was merged. The only remaining issues are the longer loading times and the SPIR-V warnings. The SPIR-V warning is an app (actually GLSLang) bug which we can't do anything about. It's also harmless which is why we warn instead of fail. The long load times seem unrelated to this. If they're longer than they used to be then maybe we have a problem and that should be its own bug. It would also be very helpful to have a bisect on that one, otherwise chasing down "my load times are long" is really difficult. I think we can close this now. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.