Created attachment 91966 [details] gdb backtrace 1 When using weston with logind + multihead setup, rapid vt-switching triggers segmentation fault in drm_output_set_cursor(). The SEGV stack trace is not consistent however. See attached gdb backtraces for more info. kernel 3.11.10-301.fc20.x86_64 systemd (master) heads/master-0-g63966da wayland (master) 1.3.92-0-gc102c20 drm (master) libdrm-2.4.50-0-g4c5de72 mesa (master) heads/master-0-g532b1fe libva (master) libva-1.2.1-0-g88ed1eb intel-driver (master) 1.2.1-0-g8f306e3 weston (master) 1.3.92-0-gb637a40
Created attachment 91967 [details] gdb backtrace 2
Created attachment 91968 [details] gdb backtrace 3
Additionally, I could not reproduce this issue when: 1. using weston-launch (i.e. no logind) 2. using weston + logind + single-head
(In reply to comment #3) > Additionally, I could not reproduce this issue when: > > 1. using weston-launch (i.e. no logind) > 2. using weston + logind + single-head Reproduced #2 with Weston 1.3.93 (i.e. with single-head): systemd (master) heads/master-0-g63966da wayland (master) heads/master-0-g47bbc6b drm (master) libdrm-2.4.50-0-g4c5de72 mesa (master) heads/master-0-g4cd8011 libva (master) libva-1.2.1-0-g88ed1eb intel-driver (master) 1.2.1-0-g8f306e3 weston (master) 1.3.93-0-g7cccfca
Created attachment 92527 [details] 1.3.93 gdb backtrace 1
Created attachment 92528 [details] 1.3.93 gdb backtrace 2
(In reply to comment #4) > (In reply to comment #3) > > Additionally, I could not reproduce this issue when: > > > > 1. using weston-launch (i.e. no logind) > > 2. using weston + logind + single-head > > Reproduced #2 with Weston 1.3.93 (i.e. with single-head): > > systemd (master) heads/master-0-g63966da > wayland (master) heads/master-0-g47bbc6b > drm (master) libdrm-2.4.50-0-g4c5de72 > mesa (master) heads/master-0-g4cd8011 > libva (master) libva-1.2.1-0-g88ed1eb > intel-driver (master) 1.2.1-0-g8f306e3 > weston (master) 1.3.93-0-g7cccfca Hrmm... we are hitting a different bug now since vt-switching is also broken without logind usage. Will file a new bug for that.
Looks like two different stack traces; perhaps two different bugs in a race condition as to which will crash first? Two of the backtraces show a crash in the wl_list_insert routine: elm->next->prev = elm; Tracing up the stack, this implies that: new_output->destroy_signal.listener_list.prev->next is invalid (e.g. maybe NULL). Not sure what to make of that though. I think this will require more than just a stack trace to sort out; perhaps instrumenting the weston_view_assign_output() routine could reveal something?
The other two backtraces show a crash in drm_output_set_cursor() stride = wl_shm_buffer_get_stride(buffer->shm_buffer); s = wl_shm_buffer_get_data(buffer->shm_buffer); wl_shm_buffer_begin_access(buffer->shm_buffer); for (i = 0; i < ev->surface->height; i++) memcpy(buf + i * 64, s + i * stride, ev->surface->width * 4); But both s and stride appear to be invalid here: #1 0x00007f31e46dd58f in drm_output_set_cursor (output=0x13f20f0) at compositor-drm.c:1007 ev = 0x142e810 buffer = 0x7f31e5cb8a88 <main_arena+808> stride = -439645528 buf = {0 <repeats 4096 times>} x = 4223888 c = 0x1291940 handle = 0 bo = 0x12a8ea0 s = 0x7f31cb971560 <Address 0x7f31cb971560 out of bounds> i = 0 y = 32561 In the other stacktrace, s looks ok, but stride looks invalid still: #1 0x00007fc70e76a506 in drm_output_set_cursor (output=0x1685ef0) at compositor-drm.c:1004 ev = 0x182d7f0 buffer = 0x7fc70fd45a88 <main_arena+808> stride = 265575080 buf = {0 <repeats 4096 times>} x = 4223888 c = 0x166e940 handle = 0 bo = 0x17cec50 s = 0x1685ef0 "" i = 0 y = 32711 My guess would be that buffer->shm_buffer is uninitialized when this code gets hit, or has been partially destroyed. If this is relatively easily reproduced, perhaps instrumenting the wl_shm_buffer creation and destruction routines would reveal something. Maybe also check reference counts in shm_pool_unref().
commit b3955b095317466afd05b474666eef938b7f72d0 Author: Kristian Høgsberg <krh@bitplanet.net> Date: Thu Jan 23 16:25:06 2014 -0800 compositor-drm: Set cursor surface to NULL when pageflip fails If we VT switch away between picking a cursor surface and actually doing the pageflip in drm_output_repaint(), we never set output->cursor_view to NULL. Then we unplug all the input devices and as the last pointer device goes away we destroy the cursor surface. Then when we switch back, we call drm_output_set_cursor() with an invalid surface and crashes. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=73566
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.