Bug 73711

Summary: Invalid user data pointer passed to "pointer_handle_enter"
Product: Wayland Reporter: Gabriel Jacobo <gabomdq>
Component: westonAssignee: Wayland bug list <wayland-bugs>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: testgles2 debug log

Description Gabriel Jacobo 2014-01-16 20:12:02 UTC
While testing the Wayland backend for SDL2 I noticed a repeatable problem with the user data pointer passed back to the pointer listeners.
When the mouse enters the application, the corresponding callback is fired (pointer_handle_enter in our case), but the first parameter is invalid (it seems to be consistently 0xffffffffc). The rest of the parameters seem sane.

This error does not happen with Wayland/Weston 1.3, but it is present in 1.3.91 and the latest from git.

The invalid pointer eventually produces a segfault.
Comment 1 Kristian Høgsberg 2014-01-17 22:28:47 UTC
Is this a problem in the SDL2 wayland code?
Comment 2 Gabriel Jacobo 2014-01-17 22:40:24 UTC
It may very well be a problem with SDL, but I doubt it, the same code works as is with Wayland 1.3, and the user data pointer is set correctly, but comes back incorrectly when the callback is fired. FWIW, I was also told about this issue before the patch landed in the main repo. Anyway, if you need help debugging this let me know what I can do to help (should be easy to reproduce though)
Comment 3 Gabriel Jacobo 2014-01-28 15:11:19 UTC
It turns out that we were getting two calls to pointer_handle_enter. The first one was a call with surface="window surface" (which had the right user data), while the second one was surface="mouse pointer surface". I'm not sure if this is the intended behavior, I fixed the issue in SDL by making sure we set the mouse pointer user data to NULL and checking for that in the pointer_handle_enter function (as well as the other pointer handling callbacks).

If what I've described is the expected behavior, feel free to close. Thanks!
Comment 4 Pekka Paalanen 2014-01-28 15:20:10 UTC
You mean literally two enters without a leave in between? Does WAYLAND_DEBUG=client confirm?

I think it should be impossible to have two enters with the same pointer object without a leave in between, and also cursor surfaces should not get enters at all. So a possibility for two different bugs in Weston. A log of WAYLAND_DEBUG=client set for the application should show the fundamental problems in Weston if there are some.

IIRC wether you clear the input region of the cursor surface or not should be irrelevant.
Comment 5 Gabriel Jacobo 2014-01-28 15:38:47 UTC
The log looks like this:

[2021284.639] wl_pointer@9.enter(1907, wl_surface@11, 5.000000, 0.000000)
INFO: wl_surface_get_user_data 0x2257760, 00000000
[2021284.668] wl_pointer@9.motion(3648448518, 5.000000, 0.000000)
[2021284.685] wl_pointer@9.motion(3648448518, 5.000000, 0.000000)
[2021284.700] wl_display@1.delete_id(14)
[2021300.164] wl_callback@23.done(3648448528)
[2021300.200]  -> wl_surface@20.frame(new id wl_callback@14)
[2021300.216]  -> wl_surface@20.attach(wl_buffer@22, 0, 0)
[2021300.234]  -> wl_surface@20.damage(0, 0, 640, 480)
[2021300.252]  -> wl_surface@20.commit()
[2021300.273] wl_pointer@9.leave(1911, wl_surface@11)
[2021300.287] wl_pointer@9.enter(1912, wl_surface@20, 635.000000, 354.000000)
INFO: wl_surface_get_user_data 0x22c1430, 0x22e37b0
[2021300.321]  -> wl_pointer@9.set_cursor(0, wl_surface@11, 5, 0)
[2021300.341] wl_shell_surface@18.ping(1913)
[2021300.350]  -> wl_shell_surface@18.pong(1913)
[2021300.359] wl_pointer@9.motion(3648448534, 634.000000, 353.000000)
[2021300.379] wl_pointer@9.motion(3648448534, 632.000000, 353.000000)


wl_surface_get_user_data is a log I added, it logs the wl_surface* and the user data.

As you can see we get two "enter" events (apparently followed correctly by the corresponding leave), yet only one of those is for the window surface. The other is the mouse cursor surface.

So, if there's a bug in here it's probably that the cursor surface is getting a pointer enter event, right?
Comment 6 Pekka Paalanen 2014-01-29 06:44:48 UTC
Ok, so enter/leave events are paired properly. Yes, I suspect the cursor surface should never get an enter as a cursor surface makes no sense to have an input region nor input focus.

What I am uncertain of is whether the compositor is supposed to prevent focusing the surface immediately when it becomes a cursor, or do we just rely on clients to set the input region to empty (which I assume you do not do?).

Would have been nice to get the whole trace from the very beginning as an attachment, so we could actually see how the cursor surface gets set, when surfaces are created, etc. It would show what sequence of messages leads to this problem.
Comment 7 Gabriel Jacobo 2014-01-29 13:23:38 UTC
Created attachment 92993 [details]
testgles2 debug log

Find the full log attached. We don't set an input region on the mouse cursor surface AFAICT.

The code is here: https://hg.libsdl.org/SDL/file/8372466c003c/src/video/wayland/SDL_waylandmouse.c
Comment 8 Pekka Paalanen 2014-01-29 13:47:04 UTC
Thanks, the trace confirms. You never set an empty input region, so it defaults to the whole surface. However, the mistake is not yours.

The protocol specification for wl_pointer.set_cursor says: "The current and pending input regions of the wl_surface are cleared, and wl_surface.set_input_region is ignored until the wl_surface is no longer used as the cursor."

Therefore this is indeed a Weston bug: it should never pick the cursor surface into focus.
Comment 9 Ander Conselvan de Oliveira 2014-01-31 13:35:47 UTC
(In reply to comment #8)
> Thanks, the trace confirms. You never set an empty input region, so it
> defaults to the whole surface. However, the mistake is not yours.
> 
> The protocol specification for wl_pointer.set_cursor says: "The current and
> pending input regions of the wl_surface are cleared, and
> wl_surface.set_input_region is ignored until the wl_surface is no longer
> used as the cursor."
> 
> Therefore this is indeed a Weston bug: it should never pick the cursor
> surface into focus.

Indeed the problem is in Weston. The input region of the cursor surface is set to empty in pointer_cursor_surface_configure(). Since during the commit process this function is called before the pending input region is made current, it empties surface->pending.input instead of the pending input region instead of surface->input.

But pointer_cursor_surface_configure() is also called from pointer_set_cursor() in order to map the cursor even if there isn't a subsequent attach and commit to the cursor surface. In that case, surface->input is never emptied, since the configure function emptied only the pending input region and there wasn't a commit that made it effective.

Here's the sequence of requests that triggers the problem. 

[3269633.625]  -> wl_compositor@3.create_surface(new id wl_surface@11)
[3269633.685]  -> wl_surface@11.attach(wl_buffer@12, 0, 0)
[3269633.704]  -> wl_surface@11.damage(0, 0, 24, 24)
[3269633.730]  -> wl_surface@11.commit()
[3269633.739]  -> wl_pointer@9.set_cursor(0, wl_surface@11, 5, 0)
Comment 10 Kristian Høgsberg 2014-02-01 09:12:47 UTC
commit 23900f70e57277805db652316b76d18b2d59281c
Author: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Date:   Fri Jan 31 16:07:51 2014 +0200

    input: Empty the current input region when configuring pointer surfaces
    
    The input region of the cursor surface is set to empty in
    pointer_cursor_surface_configure(). Since during the commit process
    this function is called before the pending input region is made
    current, it empties surface->pending.input instead of surface->input.
    
    But pointer_cursor_surface_configure() is also called from
    pointer_set_cursor() in order to map the cursor even if there isn't a
    subsequent attach and commit to the cursor surface. In that case,
    surface->input is never emptied, since the configure function emptied
    only the pending input region and there wasn't a commit that made it
    effective.
    
    Fix this by emptying both pending and current input regions. The latter
    shouldn't cause problems since the surface can't have a role prior to
    being assigned the cursor role, so it shouldn't be mapped in the first
    place.
    
    Also change toytoolkit so that it triggers the bug.
    
    https://bugs.freedesktop.org/show_bug.cgi?id=73711

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.