Bug 47429 - Support SalutContact and SautContact-manager with Bonjour backend
Summary: Support SalutContact and SautContact-manager with Bonjour backend
Status: RESOLVED FIXED
Alias: None
Product: Telepathy
Classification: Unclassified
Component: salut (show other bugs)
Version: unspecified
Hardware: All Windows (All)
: medium normal
Assignee: Siraj Razick
QA Contact: Telepathy bugs list
URL: http://cgit.collabora.com/git/user/si...
Whiteboard: review+
Keywords: patch
Depends on:
Blocks:
 
Reported: 2012-03-17 02:33 UTC by Siraj Razick
Modified: 2012-03-26 08:39 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Siraj Razick 2012-03-17 02:33:04 UTC
Implement bonjour-contact and bonjour-contact-manager similar to avahi-contact
and avahi-contact-manager.
Comment 1 Olli Salli 2012-03-17 06:32:01 UTC
Thanks for having that initial commit which removes the unneeded avahi-specific code, that makes the rest of the diffs much more readable.

Sharing the code for DNSServiceRef FD management is a good idea in principle; however the implementation is currently rather broken.

+void
+salut_bonjour_cleanup_socket (DNSServiceRef *service)
+{
+  GIOChannel *channel = NULL;
+
+  channel = g_io_channel_win32_new_socket (
+     DNSServiceRefSockFD ((*service)));
+
+  g_io_channel_unref (channel);
+}

This is totally silly. You create a new GIOChannel and destroy it whereas I think you'd rather want to clean up the old one. Currently it'll stay in existence and still be registered as a watch.

To clean up the old one you need to look it up somehow. But you don't have any way to do that. Alternatives for fixing that:

1) Return the GIOChannel pointer or a structure with that + the DNSServiceRef from add_socket_watch and pass that to cleanup_socket instead of just the ServiceRef
2) have a GHashTable from DNSServiceRef * to GIOChannel * which you insert to in add_socket_watch

1 would be messy IMO, as then the point of this code that users don't have to care about the GIOChannels would be lost.

2 is better; the current simplistic API would be all that's required. But where to put that GHashTable? There is no object where you could put it as a member. Global var would be messy and impossible to clean up. I think cleanest would be to turn this code into BonjourDiscoveryClient member functions, as the users of this functionality always have a ref to the disco client. This could be done as follows:
* add_socket_watch becomes bonjour_discovery_client_watch_svc_ref
* cleanup_socket becomes bonjour_discovery_client_drop_svc_ref
* put a GHashTable DNSSR * -> GIOCh * in BonjourDiscoClientPriv
* clean up all the remaining stuff (and the hash table itself of course) on dispose

The new names reflect what these functions do somewhat better.

With the service ref handling fixed, couldn't you also use it in bonjour-self now? That also illuminated another issue in my mind: shouldn't we DNSServiceRefDeallocate() the service we register in BonjourSelf when it is disposed - otherwise we'll leak it, and leave the service registered on the network? You can fix that in this branch.

In fact, is there a situation where we wouldn't want to dealloc service refs when we remove them from the main loop? It's not very useful anymore at least as new events on it are not processed. You could make drop_svc_ref also deallocate the service ref so you don't forget to do that.

Now to the actual backend code...

BonjourContact:

This class needs to make a decision: does it support multiple services for a single contact, or does it not? Everything has to work properly with multiple services, or everything has to explicitly forbid multiple services.

The Salut Avahi backend supports multiple services per contact perfectly, by the virtue of keeping per-service data in a "resolvers" linked list.

Judging by _mdns_service_browse_callback in http://developer.pidgin.im/viewmtn/revision/file/3ecd9b9823c0c23f80314cbc8f9d07092cccd91e/libpurple/protocols/bonjour/mdns_win32.c, the libpurple Bonjour win32 backend (using the same library as we are using) also supports multiple services per contact ("buddy"), much in the same fashion as our Avahi backend.

So I don't really see how we could justify omitting support for that - especially in the current half-assed fashion of just being silently broken in that case.

Let's now consider how SalutBonjourContact fares with multiple services for a contact. There is a function called salut_bonjour_contact_add_service(), which would indicate multiple services can be added to a single contact.

However, when called the second time for a contact priv->resolve_service will be overwritten - leaving no reference to the previous service's one to stop it later. Also, when a single service is removed with service_remove(), the whole contact will be declared "lost".

SalutBonjourContact has a resolvers SList as well... but it contains... addresses, responses for address resolving calls we make each time we receive a new result for the contact service's TXT record. We'll receive those e.g. when a contact changes their presence - while still having the same address for the service. We'll just keep adding duplicate addresses there - what is the point in that?

On the subject of addresses, you currently salut_contact_found() the contact at the time the first TXT record is received - as I've said numerous times, that is wrong. The contact is not contactable (has any addresses) until the first addresses have been discovered through the GetAddrInfo operation.

Okay, enough pointing out what's wrong. Ideas for fixing this stuff follow.

* store full blown "resolver" objects in the resolvers list
* add one each time add_service is called
* remove the correct one (matching the name, type, domain tuple) when remove_service is called

-> the resolver objects/structs have to contain at least copies of the name, type and domain values, so we know which one is which

* each resolver should have a DNSServiceResolve job - that's its main purpose
* when we've received a result, there should be a GetAddrInfo job as well to resolve the address

-> the struct should also contain DNSServiceRefs for a service resolve and address resolve

* they should still contain the address info as well to be able to implement get_addresses and has_address
* BUT ONLY ONE ADDRESS FOR EACH SERVICE

-> have a single address member, where you copy the service address when GetAddrInfo produces a result

You should also do what AvahiContact does with _lost - declare a contact lost when the last such resolver is removed.

So. As a conceptual recap:
* A contact can have multiple resolvers
* There is a resolver for each (name, type, domain) tuple
* Each resolver watches a TXT record and the SRV record
* And resolves the SRV record values to addresses
* A contact exists when it has parsed at least one service's TXT record and resolved the address from the corresponding SRV record
* A contact ceases to exist when there are no longer any services from the TXT records of which we would receive info for it and through the SRV records be able to contact it

BonjourContactManager:

+  /* Filter out self contact, I don't feel good about, how I do it here.
+     so suggestions are welcome. And secondly g_get_host_name() return the
+     host name in upper case while ServiceName returned in lowercase.
+  */
+  if (g_ascii_strcasecmp (name, g_get_host_name ()) == 0)
+    return;
+

More importantly than case, AIUI the name is a JID not just a host name? Compare against SalutSelf::name? If you populate that correctly in the BonjourSelf subclass, it should be your JID. You might need to normalize/lowercase it there. 

Actually when I look at BonjourSelf now - do you publish the record correctly at all? Avahi uses self->published_name for naming the record, you don't. Have you tested at all how the records appear on the network? You might end up calling yourself "host" instead of "siraj@machine".

  if (!priv->all_for_now)
    {
      g_signal_emit_by_name (self, "all-for-now");
      priv->all_for_now = TRUE;
    }

Shouldn't you consider the MoreComing flag - if bonjour says there's more coming immediately, clearly it's not all for now yet.

The multiple service handling in the ContactManager looks OK - but that doesn't save Contact.
Comment 2 Siraj Razick 2012-03-22 00:06:54 UTC
updated patch. with all the changes requested .
Comment 3 Jonny Lamb 2012-03-22 12:44:38 UTC
> We need to pass published_name@host when registering for the service current
> code passes NULL, which lets mdns choose it's name.

OOI what does mdns choose if you pass NULL?

> + priv->svc_ref_table = NULL;
> + priv->svc_source_table = NULL;

Huh? These are set to NULL in _init and set to real values in _watch_svc_ref. If _watch_svc_ref is never called then g_hash_table_foreach_remove, g_hash_table_remove_all, and g_hash_table_destroy will be all called on NULL. Creating two hash tables in the discovery client isn't so costly; just create them in _init then you can remove the checks in _watch_svc_ref and _drop_svc_ref and not assert in said case.

> + g_hash_table_remove (priv->svc_source_table, channel);
> +
> + return TRUE;

You're calling g_hash_table_foreach_remove which takes a GHRFunc which is defined to  return "TRUE if the key/value pair should be removed from the GHashTable." However, just before returning TRUE you remove the key/value pair from the table anyway..?

In fact, why don't you just make this the value destroy function of the svc_source_table GHashTable? It would simplify this code a lot.

> + priv->svc_source_table = g_hash_table_new_full (g_direct_hash,
> +     g_direct_equal,NULL, NULL);

Missing space.

> + priv->discovery_client = g_value_get_object (value);
> + g_object_ref (priv->discovery_client);

g_value_dup_object

> + socket_address = g_inet_socket_address_new (addr, port);
> + g_object_ref (addr);

Why do you suddenly ref addr? The equivalent code in SalutAvahiContact which you've otherwise used as a base does this:

    socket_address = g_inet_socket_address_new (addr, port);
    g_object_unref (addr);

so I'm confused?

> +     g_array_append_val (addresses, s_address);
> +    }
> +}

Indentation.

> + addresses =
> +   g_array_sized_new (TRUE, TRUE, sizeof (salut_contact_address_t), 0);

It seems odd to use sized_new when you just use 0 and you /could/ find out the exact size if you wanted? Given finding the size is O(n) perhaps it's not worth it though; what do you think? If you used GQueue you could just use .length which would be even nicer.

Every time a GQueue is used I cheer!

> + resolvers_left = g_slist_length (priv->resolvers);
> +
> + g_free (ctx);
> +
> + if (resolvers_left == 0)
> +   salut_contact_lost (SALUT_CONTACT (self));

g_[s]list_length(list) == 0 if and only if list == NULL...

> + ctx = g_new0 (SalutBonjourResolveCtx, 1);

Slice allocating this stuff is preferred these days; use g_slice_new0 and g_slice_free.

> + priv->resolvers = g_slist_prepend (priv->resolvers, ctx);

I really hate this "prepend to a list because it's quicker than appending" thing. Yes, turn priv->resolvers into a GQueue.


> + if (error_type != kDNSServiceErr_NoError)
> +   {
> +     DEBUG ("Resolver failed witih : (%d)", error_type);
> +     return ;
> +   }

*with

There's an extra space after "return".

Should we not still thaw the salut contact here?

+ if (error_type !=kDNSServiceErr_NoError)

Missing space.

> + else if (!salut_bonjour_contact_has_services (SALUT_BONJOUR_CONTACT
> +     (contact)))
> +   {
> +     g_object_ref (contact);
> +   }

but later on:

+ if (salut_bonjour_contact_has_services (SALUT_BONJOUR_CONTACT (contact)))
+   {
+     /* We reffed this contact as it has services */
+     g_object_unref (contact);
+   }

So this reffing with services confuses me?
Comment 4 Olli Salli 2012-03-22 14:28:49 UTC
+static void DNSSD_API
+_bonjour_getaddr_cb (DNSServiceRef service_ref,
+                     DNSServiceFlags flags,
+                     uint32_t interfaceIndex,
+                     DNSServiceErrorType error_type,
+                     const char *host_name,
+                     const struct sockaddr *address,
+                     uint32_t ttl,
+                     void *context)
+{

...

+
+  ctx->address = address;
+
+  DEBUG ("Announce Contact Found");
+  salut_contact_found (contact);
+  salut_contact_thaw (contact);
+}
+

Assuming Bonjour doesn't have a bottomless pit of memory it uses to give you forever existing sockaddrs via const pointers, your ctx->address pointer will point to some totally random memory from this point on. Which is to say, always when it will be used.

To fix this, you'll need to actually store the address. Not just a pointer to where the address was stored when Bonjour gave it to you.

So allocate a copy of the address. You can do this e.g. with g_memdup. If address->sa_family is AF_INET, you'll have sizeof(struct sockaddr_in) bytes to allocate & copy, if it's AF_INET6, and you'll have sizeof(struct sockaddr_in6) bytes instead.

And remember to free your address copy when the context goes away!
Comment 5 Olli Salli 2012-03-22 14:31:20 UTC
Please, do test your code with something which makes outgoing link-local XMPP connections so that that address enumeration code is actually used.

tp-yts-glib test client-ping is a good option. Run client-pong in the other end.

Also passing-status together with nosey-status listening for it.
Comment 6 Siraj Razick 2012-03-22 17:29:44 UTC
Thanks for you the review
(In reply to comment #3)
> > We need to pass published_name@host when registering for the service current
> > code passes NULL, which lets mdns choose it's name.
> 
> OOI what does mdns choose if you pass NULL?

just the host name. 

reset i'll fix.
> 
> > + priv->svc_ref_table = NULL;
> > + priv->svc_source_table = NULL;
> 
> Huh? These are set to NULL in _init and set to real values in _watch_svc_ref.
> If _watch_svc_ref is never called then g_hash_table_foreach_remove,
> g_hash_table_remove_all, and g_hash_table_destroy will be all called on NULL.
> Creating two hash tables in the discovery client isn't so costly; just create
> them in _init then you can remove the checks in _watch_svc_ref and
> _drop_svc_ref and not assert in said case.
> 
> > + g_hash_table_remove (priv->svc_source_table, channel);
> > +
> > + return TRUE;
> 
> You're calling g_hash_table_foreach_remove which takes a GHRFunc which is
> defined to  return "TRUE if the key/value pair should be removed from the
> GHashTable." However, just before returning TRUE you remove the key/value pair
> from the table anyway..?
> 
> In fact, why don't you just make this the value destroy function of the
> svc_source_table GHashTable? It would simplify this code a lot.
> 
> > + priv->svc_source_table = g_hash_table_new_full (g_direct_hash,
> > +     g_direct_equal,NULL, NULL);
> 
> Missing space.
> 
> > + priv->discovery_client = g_value_get_object (value);
> > + g_object_ref (priv->discovery_client);
> 
> g_value_dup_object
> 
> > + socket_address = g_inet_socket_address_new (addr, port);
> > + g_object_ref (addr);
> 
> Why do you suddenly ref addr? The equivalent code in SalutAvahiContact which
> you've otherwise used as a base does this:
> 
>     socket_address = g_inet_socket_address_new (addr, port);
>     g_object_unref (addr);
> 
> so I'm confused?
> 
> > +     g_array_append_val (addresses, s_address);
> > +    }
> > +}
> 
> Indentation.
> 
> > + addresses =
> > +   g_array_sized_new (TRUE, TRUE, sizeof (salut_contact_address_t), 0);
> 
> It seems odd to use sized_new when you just use 0 and you /could/ find out the
> exact size if you wanted? Given finding the size is O(n) perhaps it's not worth
> it though; what do you think? If you used GQueue you could just use .length
> which would be even nicer.
> 
> Every time a GQueue is used I cheer!
> 
> > + resolvers_left = g_slist_length (priv->resolvers);
> > +
> > + g_free (ctx);
> > +
> > + if (resolvers_left == 0)
> > +   salut_contact_lost (SALUT_CONTACT (self));
> 
> g_[s]list_length(list) == 0 if and only if list == NULL...
> 
> > + ctx = g_new0 (SalutBonjourResolveCtx, 1);
> 
> Slice allocating this stuff is preferred these days; use g_slice_new0 and
> g_slice_free.
> 
> > + priv->resolvers = g_slist_prepend (priv->resolvers, ctx);
> 
> I really hate this "prepend to a list because it's quicker than appending"
> thing. Yes, turn priv->resolvers into a GQueue.
> 
> 
> > + if (error_type != kDNSServiceErr_NoError)
> > +   {
> > +     DEBUG ("Resolver failed witih : (%d)", error_type);
> > +     return ;
> > +   }
> 
> *with
> 
> There's an extra space after "return".
> 
> Should we not still thaw the salut contact here?
> 
> + if (error_type !=kDNSServiceErr_NoError)
> 
> Missing space.
> 
> > + else if (!salut_bonjour_contact_has_services (SALUT_BONJOUR_CONTACT
> > +     (contact)))
> > +   {
> > +     g_object_ref (contact);
> > +   }
> 
> but later on:
> 
> + if (salut_bonjour_contact_has_services (SALUT_BONJOUR_CONTACT (contact)))
> +   {
> +     /* We reffed this contact as it has services */
> +     g_object_unref (contact);
> +   }
> 
> So this reffing with services confuses me?
Comment 7 Siraj Razick 2012-03-22 17:40:22 UTC
(In reply to comment #6)
> Thanks for you the review
> (In reply to comment #3)
> > > We need to pass published_name@host when registering for the service current
> > > code passes NULL, which lets mdns choose it's name.
> > 
> > OOI what does mdns choose if you pass NULL?
> 
> just the host name. 
> 
> reset i'll fix.
> > 
> > > + priv->svc_ref_table = NULL;
> > > + priv->svc_source_table = NULL;
> > 
> > Huh? These are set to NULL in _init and set to real values in _watch_svc_ref.
> > If _watch_svc_ref is never called then g_hash_table_foreach_remove,
> > g_hash_table_remove_all, and g_hash_table_destroy will be all called on NULL.
> > Creating two hash tables in the discovery client isn't so costly; just create
> > them in _init then you can remove the checks in _watch_svc_ref and
> > _drop_svc_ref and not assert in said case.
> > 
> > > + g_hash_table_remove (priv->svc_source_table, channel);
> > > +
> > > + return TRUE;
> > 
> > You're calling g_hash_table_foreach_remove which takes a GHRFunc which is
> > defined to  return "TRUE if the key/value pair should be removed from the
> > GHashTable." However, just before returning TRUE you remove the key/value pair
> > from the table anyway..?
> > 
> > In fact, why don't you just make this the value destroy function of the
> > svc_source_table GHashTable? It would simplify this code a lot.
> > 
> > > + priv->svc_source_table = g_hash_table_new_full (g_direct_hash,
> > > +     g_direct_equal,NULL, NULL);
> > 
> > Missing space.
> > 
> > > + priv->discovery_client = g_value_get_object (value);
> > > + g_object_ref (priv->discovery_client);
> > 
> > g_value_dup_object
> > 
> > > + socket_address = g_inet_socket_address_new (addr, port);
> > > + g_object_ref (addr);
> > 
> > Why do you suddenly ref addr? The equivalent code in SalutAvahiContact which
> > you've otherwise used as a base does this:
> > 
> >     socket_address = g_inet_socket_address_new (addr, port);
> >     g_object_unref (addr);
> > 
> > so I'm confused?
> > 
> > > +     g_array_append_val (addresses, s_address);
> > > +    }
> > > +}
> > 
> > Indentation.
> > 
> > > + addresses =
> > > +   g_array_sized_new (TRUE, TRUE, sizeof (salut_contact_address_t), 0);
> > 
> > It seems odd to use sized_new when you just use 0 and you /could/ find out the
> > exact size if you wanted? Given finding the size is O(n) perhaps it's not worth
> > it though; what do you think? If you used GQueue you could just use .length
> > which would be even nicer.
> > 
> > Every time a GQueue is used I cheer!
> > 
> > > + resolvers_left = g_slist_length (priv->resolvers);
> > > +
> > > + g_free (ctx);
> > > +
> > > + if (resolvers_left == 0)
> > > +   salut_contact_lost (SALUT_CONTACT (self));
> > 
> > g_[s]list_length(list) == 0 if and only if list == NULL...
> > 
> > > + ctx = g_new0 (SalutBonjourResolveCtx, 1);
> > 
> > Slice allocating this stuff is preferred these days; use g_slice_new0 and
> > g_slice_free.
> > 
> > > + priv->resolvers = g_slist_prepend (priv->resolvers, ctx);
> > 
> > I really hate this "prepend to a list because it's quicker than appending"
> > thing. Yes, turn priv->resolvers into a GQueue.
> > 
> > 
> > > + if (error_type != kDNSServiceErr_NoError)
> > > +   {
> > > +     DEBUG ("Resolver failed witih : (%d)", error_type);
> > > +     return ;
> > > +   }
> > 
> > *with
> > 
> > There's an extra space after "return".
> > 
> > Should we not still thaw the salut contact here?
> > 
> > + if (error_type !=kDNSServiceErr_NoError)
> > 
> > Missing space.
> > 
> > > + else if (!salut_bonjour_contact_has_services (SALUT_BONJOUR_CONTACT
> > > +     (contact)))
> > > +   {
> > > +     g_object_ref (contact);
> > > +   }
> > 
> > but later on:
> > 
> > + if (salut_bonjour_contact_has_services (SALUT_BONJOUR_CONTACT (contact)))
> > +   {
> > +     /* We reffed this contact as it has services */
> > +     g_object_unref (contact);
> > +   }
> > 
> > So this reffing with services confuses me?

me too, I just followed whats in avahi-contact-manager so it follows a similar logic. how do you propose I change this ?
Comment 8 Siraj Razick 2012-03-23 00:35:05 UTC
updated
Comment 9 Olli Salli 2012-03-23 02:23:40 UTC
(In reply to comment #5)
> Please, do test your code with something which makes outgoing link-local XMPP
> connections so that that address enumeration code is actually used.
> 
> tp-yts-glib test client-ping is a good option. Run client-pong in the other
> end.
> 

Actually on closer inspection, client-ping always contacts the self handle (local user). So it doesn't do this.

> Also passing-status together with nosey-status listening for it.

But passing-status should cause remote connections to be made to those who show an interest for cats (such as anybody running nosey-status).
Comment 10 Siraj Razick 2012-03-24 20:05:22 UTC
(In reply to comment #9)
> (In reply to comment #5)
> > Please, do test your code with something which makes outgoing link-local XMPP
> > connections so that that address enumeration code is actually used.
> > 
> > tp-yts-glib test client-ping is a good option. Run client-pong in the other
> > end.
> > 
> 
> Actually on closer inspection, client-ping always contacts the self handle
> (local user). So it doesn't do this.
> 
> > Also passing-status together with nosey-status listening for it.
> 
> But passing-status should cause remote connections to be made to those who show
> an interest for cats (such as anybody running nosey-status).

patch updated after testing 

1.) client-ping/pong, 
2.) nosey-status, passing-status
3.) and file transfer. (needs a patch from ft bug)
Comment 11 Olli Salli 2012-03-25 07:47:41 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > > > + else if (!salut_bonjour_contact_has_services (SALUT_BONJOUR_CONTACT
> > > > +     (contact)))
> > > > +   {
> > > > +     g_object_ref (contact);
> > > > +   }
> > > 
> > > but later on:
> > > 
> > > + if (salut_bonjour_contact_has_services (SALUT_BONJOUR_CONTACT (contact)))
> > > +   {
> > > +     /* We reffed this contact as it has services */
> > > +     g_object_unref (contact);
> > > +   }
> > > 
> > > So this reffing with services confuses me?
> 
> me too, I just followed whats in avahi-contact-manager so it follows a similar
> logic. how do you propose I change this ?

I investigated this. It's indeed the same logic as in the Avahi backend. (Arguably a better abstraction would have this multiple service management in teh base class).

The contact manager holds *one* reference to all contacts which have at least one service. So on the first service add for a contact, it adds a ref. (Contact objects can exist without having been discovered from the network e.g. because they've been added to a channel's members, see e.g. salut_muc_channel_add_members).

The latter code excerpt with the g_object_unref is in the dispose_contact virtual method. That is only called by close_all, when disconnecting. Then we won't receive any service remove events so to not leave the contact dangling, if we previously added a ref due to a service add, we remove it. So that much actually makes sense when one ponders it hard with their friend grep(1).

But there is a problem in the Bonjour code in this area: 

+      DEBUG ("Contact Removed : %s", name);
+      contact = g_hash_table_lookup (mgr->contacts, name);
+
+      if (contact != NULL)
+        {
+          salut_bonjour_contact_remove_service (SALUT_BONJOUR_CONTACT (contact),
+              interfaceIndex, name, regtype, domain);
+          g_object_unref (contact);
+        }

When a single service is removed, it unconditionally removes the manager's ref to the contact. Even if it wasn't the last service on that contact. If you have multiple services, by the time the second of them goes away, it'll be dropping references it never had.

This should be like the corresponding Avahi code instead:

  if (contact != NULL)
    {
      salut_avahi_contact_remove_service (SALUT_AVAHI_CONTACT (contact),
          interface, protocol, name, type, domain);
      if (!salut_avahi_contact_has_services (SALUT_AVAHI_CONTACT (contact)))
         g_object_unref (contact);

    }

The alternative would be to ref each time we discover a service for a contact, but then the manager would have to ask the contact via some API how many times it has to unref it in dispose_contact() - or record this in some data structure of its own. Both messy. So just make the service remove code path in service_browse_cb() consider whether it was the last service, and if so, drop our current *single* ref to the contact.

So much about that. Review for rest of the current code:

Aside from that multiple unref trap, SalutBonjourContactManager looks OK to me.

Service ref management
======================

You're storing pointers to DNSServiceRefs in your mappings, and later calling stuff like DNSServiceRefDeallocate by dereferencing the pointer. The DNSServiceRefs are stored in your other modules' private structures, like in BonjourSelf and the BonjourContact resolvers. This means that if watched service refs were left over, you'll end up dereferencing dangling pointers, as the discovery client is destroyed later than some of the modules (they all hold a ref to the disco client, so at most one of them can be destroyed later than it).

DNSServiceRef by itself is a pointer type. So we can copy and store it safely (if we couldn't, we couldn't dereference a pointer to one to pass it to dns-sd API functions). Just change watch_svc_ref and drop_svc_ref to take a DNSServiceRef instead of a pointer to one, and store DNSServiceRefs, not pointers to them, in the tables. Then this ServiceRef management stuff doesn't have to read the actual ref pointers from private structs of your other modules, and all this becomes much less prone to obscure errors of the like where you've left a watch over and are now calling dns-sd functions on something completely different through a dangling pointer.

In drop_svc_ref:

+  if (!priv->svc_ref_table)
+    return;
+
+  if (!priv->svc_source_table)
+    return;
+

With the changes you did according to Jonny's review, these never are NULL so this is totally redundant.

BonjourContact
==============

+  const char *txt_record = ctx->txt_record;
+  tmp = (char *) TXTRecordGetValuePtr (txt_length, txt_record,

This is OK. You pass the pointer to the TXT record bytes to TXTRecordGetValuePtr, as you should be doing.

+  tmp = (char *) TXTRecordGetValuePtr (txt_length, &txt_record,

This OTOH is completely wrong. You're passing a pointer to a pointer (and random stuff following it) in your stack, to be interpreted as TXT record bytes. The TXT record should be interpreted exactly in the same fashion for ALL attributes as for the caps ones.

+gboolean
+salut_bonjour_contact_add_service (SalutBonjourContact *self,
+                                   uint32_t interface,
+                                   const char *name,
+                                   const char *type,
+                                   const char *domain)
+{

...

+  ctx->name = name;
+  ctx->type = type;
+  ctx->domain = domain;

You're storing pointers to strings Bonjour passed to you, in the long lived context structure. They will point to something totally different in a minute, so the resolver comparisons will never find the resolvers correctly. g_strdup them and g_free them when the context is destroyed.

(This might be why you omitted the if () for there still being services for unrefing a contact in the contact manager - because there always happened to be services, as the resolver to be removed couldn't be found?)

This code has these other critical problems with the cleanup actually:

1) If the service is resolved again while it has an ongoing GetAddrInfo (that's a slow network operation so it's very much possible), you'll leak the txt record and address_ref (because the pointers to them are overwritten with new stuff)

Fix: when you free the txt record or drop address ref, set the pointers to them to NULL. Verify both are initialized to that as well. Then you can determine that a GetAddrInfo is currently running in resolve_cb, by them being non-NULL. In that case, cancel the old operation by dropping the svc ref watch and freeing the txt record contents, and only then start another GetAddrInfo which'll lead to parsing the txt record.

2) If remove_service() is called while the corresponding resolver has a GetAddrInfo running, it will continue running... and eventually crash when it gets a result, because the ctx has already been freed.

Fix: Again determine from address_ref and txt_record being non-NULL if a GetAddrInfo is running, and in that case cancel it as well while destroying the resolver.

Now that the resolver context struct gains more to clean up, please add a salut_bonjour_resolve_ctx_free which you use everywhere to deallocate the ctx's, instead of trying to remember to free all heap members separately in those places.

Please test the same cases again after these changes to verify you've implemented them correctly.

ps. Now that you're just doing smaller fixes and not full revamps of the code, please don't amend and amend your patches again - I want to see the fixes, and if they exist, from new commits, not re-review the full code repeatedly.
Comment 12 Olli Salli 2012-03-25 07:51:34 UTC
+             DEBUG ("%s", g_inet_address_to_string (addr));
+  DEBUG ("%s", txt_record);

These are a bit over verbose to have after initial development.
Comment 13 Siraj Razick 2012-03-25 13:05:46 UTC
patch updated , and tested similar to yesterday (ping/pong/nosey/passing-status/ and ft) both ways.
Comment 14 Olli Salli 2012-03-26 01:51:50 UTC
> bonjour-contact-manager: Unref contact only if it has no services

Yes good but...

> +          if (salut_bonjour_contact_has_services
> +              (SALUT_BONJOUR_CONTACT (contact)))
> +            {
> +              g_object_unref (contact);
> +            }

This is the exact opposite. Can you please test your changes in some way? (Or look at the diffs yourself to see if they make sense). In this case, you should see your contacts disposed when they disappear from the network.

Service ref management looks good now.

Contact:

       DEBUG ("Address resolving failed with : (%d)", _error_type);
+      _salut_bonjour_resolve_ctx_free (self, ctx);
       return;

This is wrong - your resolver still exists although this one GetAddrInfo would've somehow failed. Specifically the resolve_cb will be called on a freed ctx from this on. So don't _free here.

       DEBUG ("Resolver failed with : (%d)", error_type);
-      g_free (ctx->txt_record);
-      ctx->txt_length = 0;
-      salut_bonjour_discovery_client_drop_svc_ref (priv->discovery_client,
-          ctx->address_ref);
+      _salut_bonjour_resolve_ctx_free (self, ctx);
       return;

Also this. You'll want to free this particular TXT record and drop the svc ref (setting to NULL so you don't think they exist later), but you don't want to kill the resolver struct.

You still just
>      g_slice_free (SalutBonjourResolveCtx, ctx);
in _dispose - should use ctx_free so you don't leak the dup'd members and still have any running address refs active etc.

You *should keep* the
>       salut_bonjour_discovery_client_drop_svc_ref (priv->discovery_client,
>          ctx->resolve_ref);
here separately though *as it is*, as your ctx_free doesn't drop the resolve_ref.

Otherwise all good. Let's fix these, double-check it works, and merge.
Comment 15 Olli Salli 2012-03-26 05:52:50 UTC
Oh, and make check. Please. Fix any coding style issues it reports to you.
Comment 16 Siraj Razick 2012-03-26 07:59:20 UTC
(In reply to comment #15)
> Oh, and make check. Please. Fix any coding style issues it reports to you.

done, and patch updatead
Comment 17 Olli Salli 2012-03-26 08:39:11 UTC
Merged after some further fixes, will be in 0.7.2.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.