Bug 43977

Summary: red_channel_remove_client: ASSERT pthread_equal(pthread_self(), rcc->channel->thread_id) failed
Product: Spice Reporter: Hans de Goede <jwrdegoede>
Component: serverAssignee: Yonit Halperin <yhalperi>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: don't reset display channel

Description Hans de Goede 2011-12-20 06:04:36 UTC
Hi,

I hit this when I reconnected after my client machine's kernel crashed (so no disconnect tcp packets from client to server). spice-server did the disconnect thingie for the currently attached client (as it was not configured for multi client), and when it did that I hit this, here is some debug output generated before the assert:

reds_handle_auth_mechanism: Auth method: 1
reds_handle_main_link: 
reds_disconnect: 
reds_client_disconnect: 
red_client_destroy: destroy client with #channels 5
red_channel_client_disconnect: 0x555556a6cac0 (channel 0x5555569edc90 type 9 id 2)
red_channel_client_disconnect: 0x555556a6c350 (channel 0x555556521b40 type 3 id 0)
red_channel_client_disconnect: 0x7fff98620e90 (channel 0x7fff9846d4d0 type 2 id 0)
red_channel_remove_client: ASSERT pthread_equal(pthread_self(), rcc->channel->thread_id) failed

Regards,

Hans
Comment 1 Alon Levy 2011-12-20 08:52:32 UTC
I'm trying to reproduce, I did:
            
1. run server (running server/tests test_display_no_ssl)
2. launch client
3. suspend client - easiest way I have to emulate falling off the grid.
4. launch second client
5. wait for timeout - DISPLAY_CLIENT_TIMEOUT 15000000000ULL //nano
6. switches to second client, no assert seen.

Can you verify? do you have any suggestion to reproduce?
Comment 2 Hans de Goede 2011-12-20 23:38:46 UTC
Hi,

(In reply to comment #1)
> I'm trying to reproduce, I did:
> 
> 1. run server (running server/tests test_display_no_ssl)
> 2. launch client
> 3. suspend client - easiest way I have to emulate falling off the grid.

The tcp stack will then still be alive as it lives in the kernel, suggestion
to truely emulate, use separate machine for client, yank network cable.

> 4. launch second client
> 5. wait for timeout - DISPLAY_CLIENT_TIMEOUT 15000000000ULL //nano
> 6. switches to second client, no assert seen.

Was this with multi-client enabled? What happened in my case (AFAIK) and is standard spice behavior, is that the connecting of the 2nd client automatically caused the 1st to get disconnected by the server immediately.

Regards,

Hans
Comment 3 Yonit Halperin 2011-12-21 00:19:01 UTC
(In reply to comment #0)
> Hi,
> 
> I hit this when I reconnected after my client machine's kernel crashed (so no
> disconnect tcp packets from client to server). spice-server did the disconnect
> thingie for the currently attached client (as it was not configured for multi
> client), and when it did that I hit this, here is some debug output generated
> before the assert:
> 
> reds_handle_auth_mechanism: Auth method: 1
> reds_handle_main_link: 
> reds_disconnect: 
> reds_client_disconnect: 
> red_client_destroy: destroy client with #channels 5
> red_channel_client_disconnect: 0x555556a6cac0 (channel 0x5555569edc90 type 9 id
> 2)
> red_channel_client_disconnect: 0x555556a6c350 (channel 0x555556521b40 type 3 id
> 0)
> red_channel_client_disconnect: 0x7fff98620e90 (channel 0x7fff9846d4d0 type 2 id
> 0)
> red_channel_remove_client: ASSERT pthread_equal(pthread_self(),
> rcc->channel->thread_id) failed
> 
> Regards,
> 
> Hans

I assume that the client that was connected before the 2 clients discussed above, was disconnected with "update timeout" in flush_display_commands. Then red_disconnect_all_display_TODO_remove_me (!!!) was called. It sets worker->display_channel = NULL (it shouldn't do it). 
Then, when the next client was connected, ensure_display_channel_created was called and created a new channel (we shouldn't do this as well, the display channel is always alive) with the default disconnect callback, instead of the dispatcher's one.
Comment 4 Hans de Goede 2011-12-21 00:50:00 UTC
Hi,

(In reply to comment #3)
> (In reply to comment #0)
> > Hi,
> > 
> > I hit this when I reconnected after my client machine's kernel crashed (so no
> > disconnect tcp packets from client to server). spice-server did the disconnect
> > thingie for the currently attached client (as it was not configured for multi
> > client), and when it did that I hit this, here is some debug output generated
> > before the assert:
> > 
> > reds_handle_auth_mechanism: Auth method: 1
> > reds_handle_main_link: 
> > reds_disconnect: 
> > reds_client_disconnect: 
> > red_client_destroy: destroy client with #channels 5
> > red_channel_client_disconnect: 0x555556a6cac0 (channel 0x5555569edc90 type 9 id
> > 2)
> > red_channel_client_disconnect: 0x555556a6c350 (channel 0x555556521b40 type 3 id
> > 0)
> > red_channel_client_disconnect: 0x7fff98620e90 (channel 0x7fff9846d4d0 type 2 id
> > 0)
> > red_channel_remove_client: ASSERT pthread_equal(pthread_self(),
> > rcc->channel->thread_id) failed
> > 
> > Regards,
> > 
> > Hans
> 
> I assume that the client that was connected before the 2 clients discussed
> above, was disconnected with "update timeout" in flush_display_commands.

That may be the case. I honestly don't know. All I can tell you for sure is that:
1) The server is latest git master and it is *not* running in multi client mode
2) I likely did connect / disconnect a client normally several times before this
happened
3) A client was connected over the network, and the client kernel panicked,
causing the client to not send a single network packet anymore, iow it did
not disconnect cleanly
4) I rebooted the crashed client machine, re-started the client and then hit
the assert.
Comment 5 Hans de Goede 2011-12-21 07:19:39 UTC
I believe the analysis from comment #3 is correct. I've (accidentally) managed to reproduce this today. I crashed the client machine twice (seems my client side userspace usbredir code manages to trigger some kernel bug).

1) After the first crash I was a bit slow with rebooting and I got:
flush_display_commands: update timeout
red_channel_client_disconnect: 0x7fff990bdd50 (channel 0x7fff980458e0 type 2 id 0)
display_channel_client_on_disconnect: 

2) Then after reboot the client connected fine.

3) Then I rebooted again and this time connected before I got another update timeout and got the assert from $summary again.
Comment 6 Yonit Halperin 2011-12-22 02:59:52 UTC
Created attachment 54677 [details] [review]
don't reset display channel

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.