Bug 107520 - Segmentation fault in Gnome Shell when resuming from blank screen
Summary: Segmentation fault in Gnome Shell when resuming from blank screen
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: 18.1
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-07 20:58 UTC by Peter Bašista
Modified: 2019-09-25 19:13 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
full backtrace of gnome-shell crashing on SIGSEGV in image_get_buffers (20.69 KB, text/plain)
2018-08-07 20:58 UTC, Peter Bašista
Details
full backtrace of gnome-shell crashing on SIGSEGV in intel_update_image_buffers (17.53 KB, text/plain)
2018-08-08 05:36 UTC, Peter Bašista
Details

Description Peter Bašista 2018-08-07 20:58:42 UTC
Created attachment 141004 [details]
full backtrace of gnome-shell crashing on SIGSEGV in image_get_buffers

When resuming from blank screen in Gnome Shell (can be configured in Settings->Power), Gnome Shell process sometimes (around 40% to 50% of cases in my experience) crashes on segmentation fault. The backtraces vary slightly, but mostly they point to intel_update_image_buffers function in the src/mesa/drivers/dri/i965/brw_context.c file.

There is a related bug report on Gnome Bugzilla:

https://bugzilla.gnome.org/show_bug.cgi?id=795537

One person suggested that a crash inside mesa should never happen, so I am reporting it here as well.

One of the backtraces that I have starts like this:

Core was generated by `/usr/bin/gnome-shell'.
Program terminated with signal SIGSEGV, Segmentation fault.

#0  0x00007f74143eed33 in image_get_buffers (driDrawable=0x56353e8ca4d0, format=4098, stamp=0x56353e8ca500, loaderPrivate=0x56353e8d4510, buffer_mask=1, buffers=0x7ffd94a282c0)
    at ../../../src/gbm/backends/dri/gbm_dri.c:132
#1  0x00007f7405c29741 in intel_update_image_buffers (drawable=0x56353e8ca4d0, brw=0x56353e7a9980) at ../../../../../../src/mesa/drivers/dri/i965/brw_context.c:1751
#2  intel_update_renderbuffers (context=context@entry=0x56353e7c82d0, drawable=drawable@entry=0x56353e8ca4d0) at ../../../../../../src/mesa/drivers/dri/i965/brw_context.c:1427
#3  0x00007f7405c29dc1 in intel_prepare_render (brw=brw@entry=0x56353e7a9980) at ../../../../../../src/mesa/drivers/dri/i965/brw_context.c:1448
#4  0x00007f7405c2c012 in brw_prepare_drawing (max_index=4294967295, min_index=0, index_bounds_valid=<optimized out>, ib=0x7ffd94a28430, arrays=<optimized out>, ctx=0x56353e7a9980)
    at ../../../../../../src/mesa/drivers/dri/i965/brw_draw.c:730
#5  brw_draw_prims (ctx=0x56353e7a9980, prims=0x7ffd94a28450, nr_prims=1, ib=0x7ffd94a28430, index_bounds_valid=<optimized out>, min_index=<optimized out>, max_index=<optimized out>, 
    gl_xfb_obj=0x0, stream=0, indirect=0x0) at ../../../../../../src/mesa/drivers/dri/i965/brw_draw.c:992
#6  0x00007f74059c1b59 in vbo_validated_drawrangeelements (ctx=<optimized out>, mode=<optimized out>, index_bounds_valid=<optimized out>, start=0, end=<optimized out>, 
    count=<optimized out>, type=5123, indices=0x0, basevertex=0, numInstances=1, baseInstance=0) at ../../../src/mesa/vbo/vbo_exec_array.c:843
#7  0x00007f74059c252e in vbo_exec_DrawElements (mode=4, count=1302, type=5123, indices=0x0) at ../../../src/mesa/vbo/vbo_exec_array.c:1001


The full version is attached.
Comment 1 Lionel Landwerlin 2018-08-07 21:26:57 UTC
To be fair I've had a fair amount of crash in gnome-shell too. So far it has not ended up with a backtrace in mesa, instead ending malloc aborting (https://gitlab.gnome.org/GNOME/gnome-shell/issues/472) showing there was a memory corruption.
It's not clear where the bug lies. I'm not sure what's the solution to figure out where things go wrong... Valgrind? (although it might make the system unusable)
Comment 2 Peter Bašista 2018-08-08 05:36:48 UTC
Created attachment 141008 [details]
full backtrace of gnome-shell crashing on SIGSEGV in intel_update_image_buffers
Comment 3 Peter Bašista 2018-08-08 05:47:39 UTC
(In reply to Lionel Landwerlin from comment #1)
> So far it has not ended up with a backtrace in mesa

Then I believe that your issue is different. In my case, the crash always happens in mesa code. The last stack frame varies slightly, which is probably caused by wrong memory reference in the previous stack frame already.

I have attached the most recent backtrace, where the topmost stack frame is unknown to the debugger, probably because the previous stack frame has already attempted to use wrong memory address. Not sure.

As of how to debug this properly, I believe that a core dump may be sufficient for someone familiar with the code. I can provide a lot of them if needed.

If there are too many optimized-out values to properly understand the issue, it would be necessary to compile some parts without optimizations and try again.
Comment 4 Denis 2018-08-08 07:48:24 UTC
Hi Peter, provide please information about your HW and SW, such - kernel, mesa versions, CPU/GPU models. How many displays do you have connected to PC?

Your OS I took from the original report:


>Arch Linux
>GNOME Shell 3.28.1 on Wayland
For now I have desktop PC with coffeelake CPU and same OS, so I can try to reproduce it.
Comment 5 Peter Bašista 2018-08-08 08:20:17 UTC
I am sorry, I should have provided the system and relevant applications-related information earlier. The information you have quoted is from Simon, the reporter of the Gnome Shell bug. My setup is similar, but not quite the same:

CPU model name: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
CPU microcode: 0x84

Operating System: Debian GNU/Linux buster/sid
Kernel: Linux 4.17.0-1-amd64
Architecture: x86-64

OpenGL renderer string: Mesa DRI Intel(R) HD Graphics 630 (Kaby Lake GT2)                                                                                                                     
OpenGL core profile version string: 4.5 (Core Profile) Mesa 18.1.5 
OpenGL version string: 3.0 Mesa 18.1.5                                       
OpenGL shading language version string: 1.30

GNOME Shell 3.28.3 running on Xwayland 1.20.0
Comment 6 Peter Bašista 2018-08-08 08:29:48 UTC
You were also asking about the display configuration. I use a single screen connected via DisplayPort at 3840x2160@60fps with "hidpi" scaling enabled in Gnome Shell (Settings->Display->Scale is set to 200%).
Comment 7 Denis 2018-08-08 16:17:11 UTC
Thanks for info.

My configuration:

CPU model name: Intel(R) Core(TM) i7-7700U CPU @ 4.20GHz
HD Graphics 620 (Kaby Lake GT2)
Operating System: Debian GNU/Linux buster/sid
Kernel: Linux 4.17.0-1-amd64
GNOME Shell 3.28.3
xwayland 1.20
Mesa 18.1.5
login manager - gdm3 (not sure if this important).

I have laptop, so I tried about 10-15 tries to reproduce an issue on it (built-in display). (interesting thing, that using lightdm and wayland i couldn't put laptop into sleep mode, that's why I changed it into gdm3). Didn't reproduce.

Tomorrow I will check with 4k and 2k monitors, via display port also.
Comment 8 Denis 2018-08-09 10:12:38 UTC
Interesting, I reproduced crash, but it doesn't look like yours.
Steps:
1. Turn off built-in display
2. Connect 4k display via display port and set 200%
3. Follow your steps (put PC into sleep mode)
4. Wake up by "Enter" button
Result: appears on login screen. Both screens are turned on (built-in and 4k).
Reproducibility: 2 times from 5 tries for now. Continue investigation




[11419.667456] gnome-shell[7093]: segfault at 1c ip 00007f9caa2cd7f0 sp 00007fff838c6c28 error 4 in libmutter-2.so.0.0.0[7f9caa2b7000+c0000]
[11419.691017] rfkill: input handler enabled
[11420.127671] wlp2s0: deauthenticating from 20:a6:cd:d3:89:50 by local choice (Reason: 3=DEAUTH_LEAVING)
[11420.138715] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
[11431.909028] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
[11433.397711] wlp2s0: authenticate with 20:a6:cd:d3:89:50
[11433.402293] wlp2s0: send auth to 20:a6:cd:d3:89:50 (try 1/3)
[11433.404083] wlp2s0: authenticated
[11433.407310] wlp2s0: associate with 20:a6:cd:d3:89:50 (try 1/3)
[11433.408502] wlp2s0: RX AssocResp from 20:a6:cd:d3:89:50 (capab=0x1411 status=0 aid=7)
[11433.409910] wlp2s0: associated
[11433.435356] wlp2s0: Limiting TX power to 33 (33 - 0) dBm as advertised by 20:a6:cd:d3:89:50
[11433.575183] IPv6: ADDRCONF(NETDEV_CHANGE): wlp2s0: link becomes ready
[11434.341799] rfkill: input handler disabled
[11435.442383] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
[11555.060700] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
[11556.442206] gnome-shell[7613]: segfault at 1c ip 00007f8f0a0b27f0 sp 00007ffe1d185dc8 error 4 in libmutter-2.so.0.0.0[7f8f0a09c000+c0000]
[11556.482070] rfkill: input handler enabled
[11556.868104] wlp2s0: deauthenticating from 20:a6:cd:d3:89:50 by local choice (Reason: 3=DEAUTH_LEAVING)
[11556.878860] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
[11564.855046] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
[11566.355255] wlp2s0: authenticate with 20:a6:cd:d3:89:50
[11566.361166] wlp2s0: send auth to 20:a6:cd:d3:89:50 (try 1/3)
[11566.363045] wlp2s0: authenticated
[11566.365002] wlp2s0: associate with 20:a6:cd:d3:89:50 (try 1/3)
[11566.366224] wlp2s0: RX AssocResp from 20:a6:cd:d3:89:50 (capab=0x1411 status=0 aid=7)
[11566.367681] wlp2s0: associated
[11566.453582] wlp2s0: Limiting TX power to 33 (33 - 0) dBm as advertised by 20:a6:cd:d3:89:50
[11566.526028] IPv6: ADDRCONF(NETDEV_CHANGE): wlp2s0: link becomes ready
[11567.397441] rfkill: input handler disabled
[11568.531937] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
Comment 9 Denis 2018-08-09 12:24:43 UTC
upd - actually, there were only 2 times when I reproduced it. Then, during about 1-2 hours I didn't see it :(
Comment 10 Denis 2018-08-10 12:18:12 UTC
just for the record - I have libgdm1 (18.1.5) and libmutter2-dev (3.28.3-2) packages installed. Possibly this may have value
Comment 11 GitLab Migration User 2019-09-25 19:13:14 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1747.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.