Bug 55619

Summary: spice-gtk crash while unplug usb strorage
Product: Spice Reporter: clydezhou <zhouwei54321>
Component: usbredirAssignee: Hans de Goede <jwrdegoede>
Status: RESOLVED FIXED QA Contact:
Severity: major    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: Potential fix
libusbx patch: usbi_handle_disconnect: Fix race condition

Description clydezhou 2012-10-04 11:49:54 UTC
spice-gtk version spice-gtk-0.11
usb-redir version 0.4.3

It can be reproduce,I plug and unplug about 30 times,and this coredump occured. I guest it crash when logging  below is the coredump info. if you use export SPICE_DEBUG=1, it will crash more frequently.


GSpice-Message: main channel: opened
going to redirect usb device 2

(noc-client:1916): GSpice-CRITICAL **: usbredirhost error: resetting device: -5
libusb:error [_open_sysfs_attr] open /sys/bus/usb/devices/1-3/descriptors failed ret=-1 errno=2

(noc-client:1916): GSpice-CRITICAL **: usbredirhost error: resetting device: -5
libusb:error [_open_sysfs_attr] open /sys/bus/usb/devices/2-3/descriptors failed ret=-1 errno=2

(noc-client:1916): GSpice-CRITICAL **: usbredirhost error: resetting device: -5
libusb:error [_open_sysfs_attr] open /sys/bus/usb/devices/2-3/descriptors failed ret=-1 errno=2

(noc-client:1916): GSpice-CRITICAL **: usbredirhost error: submitting bulk transfer on ep 02: -4

(noc-client:1916): GSpice-CRITICAL **: usbredirhost error: submitting bulk transfer on ep 02: -4

(noc-client:1916): GSpice-CRITICAL **: usbredirhost error: submitting bulk transfer on ep 81: -4
Segmentation fault (core dumped)
noc@ubuntu:~$ gdb /usr/local/bin/noc-client core 
GNU gdb (Ubuntu/Linaro 7.3-0ubuntu2) 7.3-2011.08
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /usr/local/bin/noc-client...done.
[New LWP 1916]
[New LWP 1995]

warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Core was generated by `spicy -h 172.16.0.7 -p 6100'.
Program terminated with signal 11, Segmentation fault.
#0  usbredirhost_bulk_packet_complete (libusb_transfer=0x2248880) at usbredirhost.c:1934
1934    usbredirhost.c: No such file or directory.
        in usbredirhost.c
(gdb) bt
#0  usbredirhost_bulk_packet_complete (libusb_transfer=0x2248880) at usbredirhost.c:1934
#1  0x00007fec7e32343a in usbredirparser_call_type_func (parser_pub=0x22492d0) at usbredirparser.c:673
#2  usbredirparser_do_read (parser_pub=0x22492d0) at usbredirparser.c:777
#3  0x0000000000439691 in usbredir_handle_msg (c=<optimized out>, in=0x25044a0) at channel-usbredir.c:609
#4  0x000000000042aa04 in spice_channel_recv_msg (channel=0x2248880, msg_handler=0x4390f0 <spice_usbredir_handle_msg>, data=0x0) at spice-channel.c:1835
#5  0x000000000042b451 in spice_channel_iterate_read (channel=0x2248880) at spice-channel.c:1997
#6  spice_channel_iterate_read (channel=0x2248880) at spice-channel.c:1981
#7  0x00000000004291d4 in spice_channel_iterate (channel=0x2248880) at spice-channel.c:2055
#8  spice_channel_coroutine (data=0x2248880) at spice-channel.c:2208
#9  0x00000000004bc65b in coroutine_trampoline (cc=<optimized out>) at coroutine_ucontext.c:56
#10 0x00000000004c8163 in continuation_trampoline (i0=<optimized out>, i1=<optimized out>) at continuation.c:49
#11 0x00007fec7bdaf180 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000002248d00 in ?? ()
#13 0x0000000000000000 in ?? ()
(gdb) p host
$1 = <optimized out>
(gdb) p host->log_func
value has been optimized out
(gdb) p host->lock
value has been optimized out
(gdb) p libusb_transfer->buffer
$2 = (unsigned char *) 0x0
(gdb) p libusb_transfer->actual_length
$3 = 0

I dont know why the pointer libusb_transfer->buffer is zero, I saw hot->log_func is 0 at previous core dump.
Comment 1 clydezhou 2012-10-04 11:54:11 UTC
I rename spicy to noc-client
Comment 2 clydezhou 2012-10-08 02:02:16 UTC
core diffs from time to time


(gdb) bt
#0  usbredirhost_handle_disconnect (host=0x0) at usbredirhost.c:239
#1  0x00007ffff64792d5 in libusb_status_or_error_to_redir_status (host=<optimized out>, status=<optimized out>) at usbredirhost.c:284
#2  0x00007ffff647a354 in usbredirhost_bulk_packet_complete (libusb_transfer=0x7fffd80010d8) at usbredirhost.c:1950
#3  0x00007ffff606443a in usbredirparser_call_type_func (parser_pub=0x798830) at usbredirparser.c:673
#4  usbredirparser_do_read (parser_pub=0x798830) at usbredirparser.c:777
#5  0x0000000000439691 in usbredir_handle_msg (c=<optimized out>, in=0x7fffd8001160) at channel-usbredir.c:609
#6  0x000000000042aa04 in spice_channel_recv_msg (channel=0x7968d0, msg_handler=0x4390f0 <spice_usbredir_handle_msg>, data=0x0) at spice-channel.c:1835
#7  0x000000000042b451 in spice_channel_iterate_read (channel=0x7968d0) at spice-channel.c:1997
#8  spice_channel_iterate_read (channel=0x7968d0) at spice-channel.c:1981
#9  0x00000000004291d4 in spice_channel_iterate (channel=0x7968d0) at spice-channel.c:2055
#10 spice_channel_coroutine (data=0x7968d0) at spice-channel.c:2208
#11 0x00000000004bc65b in coroutine_trampoline (cc=<optimized out>) at coroutine_ucontext.c:56
#12 0x00000000004c8163 in continuation_trampoline (i0=<optimized out>, i1=<optimized out>) at continuation.c:49
#13 0x00007ffff3af0180 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#14 0x0000000000796d50 in ?? ()
#15 0x0000000000000000 in ?? ()
(gdb) p host->disconnect_lock
Cannot access memory at address 0x10
Comment 3 Hans de Goede 2012-10-08 08:48:29 UTC
Hi,

Hmm, it seems as if sometimes some parts of usbredir internals get overwritten with zeros. Could be a use after free scenario, but you're the only one reporting this!

Which distro are you using? And which version of spice-gtk, libusb and usbredir do you have ?

Regards,

Hans
Comment 4 Hans de Goede 2012-10-08 22:10:55 UTC
Created attachment 68294 [details] [review]
Potential fix

I hit a similar issue as yours today, and I've created a patch for this. Hopefully this also fixes your issue, can you give the attached patch a try please?
Comment 5 clydezhou 2012-10-09 06:24:20 UTC
Thanks a lot, but it still happens, I know your patch is for the latest code from the git, newer than 0.14, I update spice-gtk and patched, and I still use usb-redir-0.4.3, need I update usb-redir to 0.5.2 and qemu to master branch to fix this problem? 

now the coredump info is like below 

Core was generated by `noc-client -h 192.168.11.185 -p 6100'.
Program terminated with signal 6, Aborted.
#0  0x00007fa13e2b63a5 in ?? ()
(gdb) bt
#0  0x00007fa13e2b63a5 in ?? ()
#1  0x00007fa13e2b9b0b in ?? ()
#2  0x0000000000000000 in ?? ()

it's appear that the stack has been overwrited.
Comment 6 Hans de Goede 2012-10-09 08:00:26 UTC
(In reply to comment #5)
> Thanks a lot, but it still happens, I know your patch is for the latest code
> from the git, newer than 0.14, I update spice-gtk and patched, and I still
> use usb-redir-0.4.3, need I update usb-redir to 0.5.2 and qemu to master
> branch to fix this problem? 

You could try updating usb-redir to 0.4.4, but I don't think that will fix it
(there is one possible fix in there which might help, but it is unlikely).

A better bet would be to update your libusb to the latest libusbx release:
http://sourceforge.net/projects/libusbx/files/releases/1.0.14/

Note that the libusbx-1.0.x releases are drop in bugfix releases
for libusb-1.0, A group of people (including me) forked libusb as
the original upstream was failing to do releases for a period
of over 2 years.

You could try building the latest qemu, in that case please use
this git tree:
http://cgit.freedesktop.org/~jwrdegoede/qemu/log/?h=qemu-kvm-1.2-usbredir

And it that case you do need usbredir-0.5.x (where as if you stay with
an older qemu you need 0.4.x) But I don't think updating qemu is
useful since this clearly is a bug on the client side, not the host.
Comment 7 clydezhou 2012-10-10 07:23:26 UTC
still coredump
libusb-1.0.14
usbredir 0.5.2
spice-gtk, patched version
can you tell me what version you are using? 

noc@ubuntu:~$ gdb /usr/local/bin/spicy core
GNU gdb (Ubuntu/Linaro 7.3-0ubuntu2) 7.3-2011.08
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /usr/local/bin/spicy...done.
[New LWP 4006]
[New LWP 3958]

warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Core was generated by `spicy -h 192.168.23.14 -p 6100'.
Program terminated with signal 11, Segmentation fault.
#0  list_del (entry=0x125d048) at libusbi.h:112
112             entry->next->prev = entry->prev;
(gdb) bt
#0  list_del (entry=0x125d048) at libusbi.h:112
#1  usbi_handle_transfer_completion (itransfer=<optimized out>, status=LIBUSB_TRANSFER_NO_DEVICE) at io.c:1460
#2  0x00007f86dbc2497c in usbi_handle_disconnect (handle=0x129eec0) at io.c:2447
#3  0x00007f86dbc269c5 in op_handle_events (ctx=0xf9edf0, fds=0x7f86c8002e90, nfds=3, num_ready=0) at os/linux_usbfs.c:2490
#4  0x00007f86dbc235dc in handle_events (ctx=0xf9edf0, tv=<optimized out>) at io.c:1940
#5  0x00007f86dbc241d7 in libusb_handle_events_timeout_completed (ctx=0xf9edf0, tv=<optimized out>, completed=0x0) at io.c:2020
#6  0x00007f86dbc242cf in libusb_handle_events (ctx=<optimized out>) at io.c:2096
#7  0x00007f86df30ff50 in spice_usb_device_manager_usb_ev_thread (user_data=<optimized out>) at usb-device-manager.c:867
#8  0x00007f86ddab02b6 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#9  0x00007f86dd5acefc in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007f86dd2e759d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#11 0x0000000000000000 in ?? ()

p entry->prev 
$1 = (struct list_head *)0x0
Comment 8 Hans de Goede 2012-10-10 07:41:08 UTC
Hi,

You've hit the jackpot with that last backtrace, you seem to be hitting a bug in libusb's disconnection handling.

I need to make some time to think about how to best fix this, but in the mean time, can you please try
editing libusb/os/linux_usbfs.c, around line 2490 you should have the following code block:

                if (pollfd->revents & POLLERR) {
                        usbi_remove_pollfd(HANDLE_CTX(handle), hpriv->fd);
                        usbi_handle_disconnect(handle);
                        continue;
                }

If I'm right simply removing that entire code block, and then doing a make; make install, should fix the issue you are seeing, that might in the end also be the best fix, but as said I need to investigate this a bit more.

Regards,

Hans
Comment 9 clydezhou 2012-10-12 02:46:00 UTC
yes, It did work, no crash was found during plug and unplug, but cause spicy hang, I have to kill spicy.
Comment 10 Hans de Goede 2013-05-03 15:47:21 UTC
Hi,

I've not forgotten about this, I've merely been busy with other stuff. So I've spend today reading all your backtraces and I believe I know what is going now, here is a rather detailed step by step of the race I believe you're hitting:

1) app calls libusb_submit_transfer
2) libusb_submit_transfer locks itransfer->lock
3) libusb_submit_transfer adds the transfer to the flying list
4) *thread switch*
5) other thread notices POLL_ERR on device fd, calls usbi_handle_disconnect
6) usbi_handle_disconnect find the transfer which is in progress of being
   submitted in the flying list
7) usbi_handle_disconnect calls usbi_backend->clear_transfer_priv on the
   transfer, this blocks waiting on itransfer->lock (this makes the race
   more likely to trigger)
8) *thread switch*
9) libusb_submit_transfer actually tries to submit the transfer now,
   calls usbi_backend->submit_transfer, which fails with -ENODEV
10) libusb_submit_transfer *removes* the transfer from the flying list,
   unlocks itransfer->lock and returns an error to its caller
11) the caller frees the transfer, meaning the to_cancel pointer in
   usbi_handle_disconnect now points to free-ed memory, for extra mayhem
12) *thread switch*
13) usbi_handle_disconnect calls usbi_handle_transfer_completion
14) usbi_handle_transfer_completion tries to remove the transfer from
    the flying list *for the 2nd time*
    But the first call done from libusb_submit_transfer has already done
    this. list_del looks like this:

    static inline void list_del(struct list_head *entry)
    {
        entry->next->prev = entry->prev;
        entry->prev->next = entry->next;
        entry->next = entry->prev = NULL;
    }

    So the first call sets it next and prev to NULL, and then the 2nd call
    tries to deref next -> BOOM

    For an example of this, see:
    https://bugs.freedesktop.org/show_bug.cgi?id=55619#c7

So this clearly makes this a libusb issue. I've a plan to fix this, which I hope to finish tonight and then give you a patch to test.

Regards,

Hans
Comment 11 Hans de Goede 2013-05-03 19:27:13 UTC
Created attachment 78821 [details] [review]
libusbx patch: usbi_handle_disconnect: Fix race condition

Hi,

Here is a patch for libusbx which should fix this, can you try building a recent libusbx with this patch and see if it fixes things?

Thanks,

Hans
Comment 12 clydezhou 2013-05-06 12:43:36 UTC
(In reply to comment #11)
> Created attachment 78821 [details] [review] [review]
> libusbx patch: usbi_handle_disconnect: Fix race condition
> 
> Hi,
> 
> Here is a patch for libusbx which should fix this, can you try building a
> recent libusbx with this patch and see if it fixes things?
> 
> Thanks,
> 
> Hans

Thanks a lot, I'll try it soon.
Comment 13 Marc-Andre Lureau 2014-11-03 00:51:01 UTC
clydezhou, any update?

Hans, is the fix still required?
Comment 14 clydezhou 2014-11-03 02:27:48 UTC
(In reply to Marc-Andre Lureau from comment #13)
> clydezhou, any update?
> 
> Hans, is the fix still required?

My reponsibility shifts to another department, I do not need this fix anymore, but I think it's a improvement of user experience and the quality of whole spice-project, thank you very much

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.