Created attachment 141004 [details] full backtrace of gnome-shell crashing on SIGSEGV in image_get_buffers When resuming from blank screen in Gnome Shell (can be configured in Settings->Power), Gnome Shell process sometimes (around 40% to 50% of cases in my experience) crashes on segmentation fault. The backtraces vary slightly, but mostly they point to intel_update_image_buffers function in the src/mesa/drivers/dri/i965/brw_context.c file. There is a related bug report on Gnome Bugzilla: https://bugzilla.gnome.org/show_bug.cgi?id=795537 One person suggested that a crash inside mesa should never happen, so I am reporting it here as well. One of the backtraces that I have starts like this: Core was generated by `/usr/bin/gnome-shell'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f74143eed33 in image_get_buffers (driDrawable=0x56353e8ca4d0, format=4098, stamp=0x56353e8ca500, loaderPrivate=0x56353e8d4510, buffer_mask=1, buffers=0x7ffd94a282c0) at ../../../src/gbm/backends/dri/gbm_dri.c:132 #1 0x00007f7405c29741 in intel_update_image_buffers (drawable=0x56353e8ca4d0, brw=0x56353e7a9980) at ../../../../../../src/mesa/drivers/dri/i965/brw_context.c:1751 #2 intel_update_renderbuffers (context=context@entry=0x56353e7c82d0, drawable=drawable@entry=0x56353e8ca4d0) at ../../../../../../src/mesa/drivers/dri/i965/brw_context.c:1427 #3 0x00007f7405c29dc1 in intel_prepare_render (brw=brw@entry=0x56353e7a9980) at ../../../../../../src/mesa/drivers/dri/i965/brw_context.c:1448 #4 0x00007f7405c2c012 in brw_prepare_drawing (max_index=4294967295, min_index=0, index_bounds_valid=<optimized out>, ib=0x7ffd94a28430, arrays=<optimized out>, ctx=0x56353e7a9980) at ../../../../../../src/mesa/drivers/dri/i965/brw_draw.c:730 #5 brw_draw_prims (ctx=0x56353e7a9980, prims=0x7ffd94a28450, nr_prims=1, ib=0x7ffd94a28430, index_bounds_valid=<optimized out>, min_index=<optimized out>, max_index=<optimized out>, gl_xfb_obj=0x0, stream=0, indirect=0x0) at ../../../../../../src/mesa/drivers/dri/i965/brw_draw.c:992 #6 0x00007f74059c1b59 in vbo_validated_drawrangeelements (ctx=<optimized out>, mode=<optimized out>, index_bounds_valid=<optimized out>, start=0, end=<optimized out>, count=<optimized out>, type=5123, indices=0x0, basevertex=0, numInstances=1, baseInstance=0) at ../../../src/mesa/vbo/vbo_exec_array.c:843 #7 0x00007f74059c252e in vbo_exec_DrawElements (mode=4, count=1302, type=5123, indices=0x0) at ../../../src/mesa/vbo/vbo_exec_array.c:1001 The full version is attached.
To be fair I've had a fair amount of crash in gnome-shell too. So far it has not ended up with a backtrace in mesa, instead ending malloc aborting (https://gitlab.gnome.org/GNOME/gnome-shell/issues/472) showing there was a memory corruption. It's not clear where the bug lies. I'm not sure what's the solution to figure out where things go wrong... Valgrind? (although it might make the system unusable)
Created attachment 141008 [details] full backtrace of gnome-shell crashing on SIGSEGV in intel_update_image_buffers
(In reply to Lionel Landwerlin from comment #1) > So far it has not ended up with a backtrace in mesa Then I believe that your issue is different. In my case, the crash always happens in mesa code. The last stack frame varies slightly, which is probably caused by wrong memory reference in the previous stack frame already. I have attached the most recent backtrace, where the topmost stack frame is unknown to the debugger, probably because the previous stack frame has already attempted to use wrong memory address. Not sure. As of how to debug this properly, I believe that a core dump may be sufficient for someone familiar with the code. I can provide a lot of them if needed. If there are too many optimized-out values to properly understand the issue, it would be necessary to compile some parts without optimizations and try again.
Hi Peter, provide please information about your HW and SW, such - kernel, mesa versions, CPU/GPU models. How many displays do you have connected to PC? Your OS I took from the original report: >Arch Linux >GNOME Shell 3.28.1 on Wayland For now I have desktop PC with coffeelake CPU and same OS, so I can try to reproduce it.
I am sorry, I should have provided the system and relevant applications-related information earlier. The information you have quoted is from Simon, the reporter of the Gnome Shell bug. My setup is similar, but not quite the same: CPU model name: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz CPU microcode: 0x84 Operating System: Debian GNU/Linux buster/sid Kernel: Linux 4.17.0-1-amd64 Architecture: x86-64 OpenGL renderer string: Mesa DRI Intel(R) HD Graphics 630 (Kaby Lake GT2) OpenGL core profile version string: 4.5 (Core Profile) Mesa 18.1.5 OpenGL version string: 3.0 Mesa 18.1.5 OpenGL shading language version string: 1.30 GNOME Shell 3.28.3 running on Xwayland 1.20.0
You were also asking about the display configuration. I use a single screen connected via DisplayPort at 3840x2160@60fps with "hidpi" scaling enabled in Gnome Shell (Settings->Display->Scale is set to 200%).
Thanks for info. My configuration: CPU model name: Intel(R) Core(TM) i7-7700U CPU @ 4.20GHz HD Graphics 620 (Kaby Lake GT2) Operating System: Debian GNU/Linux buster/sid Kernel: Linux 4.17.0-1-amd64 GNOME Shell 3.28.3 xwayland 1.20 Mesa 18.1.5 login manager - gdm3 (not sure if this important). I have laptop, so I tried about 10-15 tries to reproduce an issue on it (built-in display). (interesting thing, that using lightdm and wayland i couldn't put laptop into sleep mode, that's why I changed it into gdm3). Didn't reproduce. Tomorrow I will check with 4k and 2k monitors, via display port also.
Interesting, I reproduced crash, but it doesn't look like yours. Steps: 1. Turn off built-in display 2. Connect 4k display via display port and set 200% 3. Follow your steps (put PC into sleep mode) 4. Wake up by "Enter" button Result: appears on login screen. Both screens are turned on (built-in and 4k). Reproducibility: 2 times from 5 tries for now. Continue investigation [11419.667456] gnome-shell[7093]: segfault at 1c ip 00007f9caa2cd7f0 sp 00007fff838c6c28 error 4 in libmutter-2.so.0.0.0[7f9caa2b7000+c0000] [11419.691017] rfkill: input handler enabled [11420.127671] wlp2s0: deauthenticating from 20:a6:cd:d3:89:50 by local choice (Reason: 3=DEAUTH_LEAVING) [11420.138715] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready [11431.909028] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready [11433.397711] wlp2s0: authenticate with 20:a6:cd:d3:89:50 [11433.402293] wlp2s0: send auth to 20:a6:cd:d3:89:50 (try 1/3) [11433.404083] wlp2s0: authenticated [11433.407310] wlp2s0: associate with 20:a6:cd:d3:89:50 (try 1/3) [11433.408502] wlp2s0: RX AssocResp from 20:a6:cd:d3:89:50 (capab=0x1411 status=0 aid=7) [11433.409910] wlp2s0: associated [11433.435356] wlp2s0: Limiting TX power to 33 (33 - 0) dBm as advertised by 20:a6:cd:d3:89:50 [11433.575183] IPv6: ADDRCONF(NETDEV_CHANGE): wlp2s0: link becomes ready [11434.341799] rfkill: input handler disabled [11435.442383] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS. [11555.060700] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS. [11556.442206] gnome-shell[7613]: segfault at 1c ip 00007f8f0a0b27f0 sp 00007ffe1d185dc8 error 4 in libmutter-2.so.0.0.0[7f8f0a09c000+c0000] [11556.482070] rfkill: input handler enabled [11556.868104] wlp2s0: deauthenticating from 20:a6:cd:d3:89:50 by local choice (Reason: 3=DEAUTH_LEAVING) [11556.878860] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready [11564.855046] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready [11566.355255] wlp2s0: authenticate with 20:a6:cd:d3:89:50 [11566.361166] wlp2s0: send auth to 20:a6:cd:d3:89:50 (try 1/3) [11566.363045] wlp2s0: authenticated [11566.365002] wlp2s0: associate with 20:a6:cd:d3:89:50 (try 1/3) [11566.366224] wlp2s0: RX AssocResp from 20:a6:cd:d3:89:50 (capab=0x1411 status=0 aid=7) [11566.367681] wlp2s0: associated [11566.453582] wlp2s0: Limiting TX power to 33 (33 - 0) dBm as advertised by 20:a6:cd:d3:89:50 [11566.526028] IPv6: ADDRCONF(NETDEV_CHANGE): wlp2s0: link becomes ready [11567.397441] rfkill: input handler disabled [11568.531937] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
upd - actually, there were only 2 times when I reproduced it. Then, during about 1-2 hours I didn't see it :(
just for the record - I have libgdm1 (18.1.5) and libmutter2-dev (3.28.3-2) packages installed. Possibly this may have value
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1747.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.