Bug 33404 - Move caps hash stuff to Wocky
Summary: Move caps hash stuff to Wocky
Status: RESOLVED FIXED
Alias: None
Product: Telepathy
Classification: Unclassified
Component: gabble (show other bugs)
Version: git master
Hardware: Other All
: medium normal
Assignee: Telepathy bugs list
QA Contact: Telepathy bugs list
URL: http://cgit.freedesktop.org/~jonny/wo...
Whiteboard: review+
Keywords: patch
Depends on:
Blocks:
 
Reported: 2011-01-24 05:02 UTC by Jonny Lamb
Modified: 2011-05-02 01:43 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Description Jonny Lamb 2011-01-24 05:02:21 UTC
Well how about that?

So far we have:

12:57 < wjt> jonnylamb: the data form parser should use WockyDataForm
12:57 < wjt> s/use/be replaced by using/
Comment 1 Will Thompson 2011-01-24 05:04:51 UTC
I think WockyDiscoIdentity should probably be a boxed type.
Comment 2 Jonny Lamb 2011-01-24 05:08:29 UTC
http://git.collabora.co.uk/?p=user/jonny/telepathy-gabble.git;a=shortlog;h=refs/heads/caps-hash

This gabble branch is also interesting.
Comment 3 Will Thompson 2011-01-24 05:30:42 UTC
It's actually not entirely clear to me that there's any value in WockyDiscoIdentity having accessors for all its fields, rather than the caller just using identity->type and friends.


 197   if (!source)
 198     return NULL;

g_return_val_if_fail (source != NULL, NULL);

 226   if (!arr)
 227     return;

if (arr == NULL)
  return;


- * caps-hash.c - Computing verification string hash (XEP-0115 v1.5)
- * Copyright (C) 2008 Collabora Ltd.
+ * wocky-caps-hash.c - Computing verification string hash (XEP-0115 v1.5)
+ * Copyright (C) 2008-2010 Collabora Ltd.

It's 2011.

On top of _compute_from_node(), shouldn't there be a function that computes a hash from a list of features, identities, and data forms? Otherwise Gabble/Salut will need to build up a WockyNode whenever they need to hash their caps… oh! they have to do this anyway, I suppose maybe?
Comment 4 Will Thompson 2011-01-24 06:03:25 UTC
+#ifndef __WOCKY_PLUGINS_DISCO_IDENTITY_H__

o rly

Technically we should not be using __-prefixed guards. It should really be WOCKY_PLUGINS_DISCO_IDENTITY_H. But this is pedantry.


 static gint
 identity_cmp (gconstpointer a, gconstpointer b)

I think the guts of this function should be wocky_disco_identity_cmp (WockyDiscoIdentity *a, WockyDiscoIdentity *b). But obviously you'll still need the wrapper for GPtrArray silliness.


  * Computes verification string hash according to XEP-0115 v1.5

Hashes plural.

+ *
+ * Wocky does not do anything with dataforms (XEP-0128) included in
+ * capabilities.  However, it needs to parse them in order to compute the hash
+ * according to XEP-0115.

I don't think this is a particularly useful comment, honestly. The “does not do anything with” bit is liable to become wrong. I imagine we'll ultimately have something like:

  WockyCapabilities *wocky_capabilities_from_node (WockyNode *);
  /* accessors for identities, features, and forms */
  gchar *wocky_capabilities_calculate_hash (WockyCapabilities *, GChecksumType);
Comment 5 Will Thompson 2011-01-24 06:06:45 UTC
You updated the hashing test in Gabble. That test should be in Wocky if the hashing has moved there.

I guess the Gabble branch isn't finished if it still has all the hash computation stuff. :)
Comment 6 Jonny Lamb 2011-01-26 02:33:35 UTC
(In reply to comment #1)
> I think WockyDiscoIdentity should probably be a boxed type.

Done.

(In reply to comment #5)
> You updated the hashing test in Gabble. That test should be in Wocky if the
> hashing has moved there.

Done.

(In reply to comment #0)
> 12:57 < wjt> jonnylamb: the data form parser should use WockyDataForm
> 12:57 < wjt> s/use/be replaced by using/

Done.

(In reply to comment #3)
> It's actually not entirely clear to me that there's any value in
> WockyDiscoIdentity having accessors for all its fields, rather than the caller
> just using identity->type and friends.

Okay, sure, I just copied them from gabble. They're gone.

> g_return_val_if_fail (source != NULL, NULL);

Done.

> if (arr == NULL)
>   return;

Done.

> It's 2011.

You're right!

> On top of _compute_from_node(), shouldn't there be a function that computes a
> hash from a list of features, identities, and data forms? Otherwise
> Gabble/Salut will need to build up a WockyNode whenever they need to hash their
> caps… oh! they have to do this anyway, I suppose maybe?

Done.

(In reply to comment #4)
> +#ifndef __WOCKY_PLUGINS_DISCO_IDENTITY_H__

Fixed.

> I think the guts of this function should be wocky_disco_identity_cmp
> (WockyDiscoIdentity *a, WockyDiscoIdentity *b). But obviously you'll still need
> the wrapper for GPtrArray silliness.

Done.

>   * Computes verification string hash according to XEP-0115 v1.5
> 
> Hashes plural.

Fixed.

> I don't think this is a particularly useful comment, honestly. The “does not do
> anything with” bit is liable to become wrong. I imagine we'll ultimately have
> something like:

You're the boss, removed.

Okay that's that. You're free to take another swing.
Comment 8 Will Thompson 2011-02-25 06:16:11 UTC
“node: add attribute build tag for wocky_stanza_build”: I think you meant to say “xml:lang attribute build tag” but I'll let it slide.


+      GSList *l;
+
+      for (l = node->children; l != NULL; l = l->next)
+        {
+          WockyNode *child = l->data;
+
+          if (!wocky_strdiff (child->name, "value"))
+            {
+              if (type == WOCKY_DATA_FORM_FIELD_TYPE_TEXT_SINGLE)
+                {
+                  type = WOCKY_DATA_FORM_FIELD_TYPE_TEXT_MULTI;
+                  break;
+                }
+              else
+                {
+                  type = WOCKY_DATA_FORM_FIELD_TYPE_TEXT_SINGLE;
+                }
+            }
+        }

type = WOCKY_DATA_FORM_FIELD_TYPE_TEXT_SINGLE;

WockyNodeIter iter;
wocky_node_iter_init (&iter, node, "value");

if (wocky_node_iter_next (&iter, NULL) &&
    wocky_node_iter_next (&iter, NULL))
  type = WOCKY_DATA_FORM_FIELD_TYPE_TEXT_MULTI;

? The patch as it stands would leave type = 0 in some cases?

+      field = g_hash_table_lookup (dataform->fields, "FORM_TYPE");
+      g_assert (field != NULL);

I don't see any reason why this assertion is valid, given that this is data received over the network.

-      g_ptr_array_sort (form->fields, fields_cmp);
-          g_ptr_array_sort (field->values, char_cmp);

I don't see anything sorting the fields or their values in the new code. The XEP says:

    1. Sort the fields by the value of the "var" attribute.

Ah you fixed this in 
And:

    1. Append the value of the "var" attribute, followed by the '<'
       character.
    2. Sort values by the XML character data of the <value/> element.
    3. For each <value/> element, append the XML character data,
       followed by the '<' character.

Please add a test case which hashes two forms which differ only in the order of their values.

Also how does this deal with forms with fields of types other than text-single or text-multi?
 
-  if (!source)
-    return NULL;
+  g_return_val_if_fail (source != NULL, NULL);


-  if (!arr)
+  if (arr == NULL)

Why does copy critical, but free merely early-return?

+  dataform = wocky_data_form_new_from_form (node, &error);
+  if (error != NULL)
+    {
+      DEBUG ("Failed to parse data form: %s\n", error->message);
+      g_clear_error (&error);
     }

We should fail the entire hashing process if the form is malformed, or we'll get confused. *Particularly* if it's our own form that's malformed.


+      return strcmp (g_value_get_string (left_type->default_value),
+          g_value_get_string (left_type->default_value));

One of those should be 'right_type'. Also this will crash if someone specifies >1 value for FORM_TYPE. I suppose parsing the form should fail in that case. It'd be nice to have wocky_data_form_get_form_type ().

Fixed in a later commit, I see.
Comment 9 Jonny Lamb 2011-03-03 05:18:19 UTC
(In reply to comment #8)
> ? The patch as it stands would leave type = 0 in some cases?

I'd changed it to do the same thing as you did later but not as nicely, I'm now using the node iter.

> I don't see any reason why this assertion is valid, given that this is data
> received over the network.

Fair, fixed.

>     1. Append the value of the "var" attribute, followed by the '<'
>        character.
>     2. Sort values by the XML character data of the <value/> element.
>     3. For each <value/> element, append the XML character data,
>        followed by the '<' character.
> 
> Please add a test case which hashes two forms which differ only in the order of
> their values.

There is a test for testing sorting dataforms and not only are the forms are in a different order but the values are jumbled up too. I don't see how a test that just jumbles up value ordering would give any more?

> Also how does this deal with forms with fields of types other than text-single
> or text-multi?

Well, it won't include the field in the hash to SHA. I've included a debug message now.

> Why does copy critical, but free merely early-return?

Well it's like g_free()? I guess not as g_free() early returns too. Okay, I'll do the same here.

> We should fail the entire hashing process if the form is malformed, or we'll
> get confused. *Particularly* if it's our own form that's malformed.

Fixed.

Check out my branch, homey.
Comment 10 Will Thompson 2011-03-10 09:13:14 UTC
(In reply to comment #9)
> There is a test for testing sorting dataforms and not only are the forms are in
> a different order but the values are jumbled up too. I don't see how a test
> that just jumbles up value ordering would give any more?

Okay, reasonable.

> (In reply to comment #8)
> > Also how does this deal with forms with fields of types other than text-single
> > or text-multi?
> 
> Well, it won't include the field in the hash to SHA. I've included a debug
> message now.

I don't think this is right, I'm afraid. The XEP says:

  • For each field other than FORM_TYPE:
      ‣ Append the value of the "var" attribute, followed by the '<'
        character.
      ‣ Sort values by the XML character data of the <value/> element.
      ‣ For each <value/> element, append the XML character data,
        followed by the '<' character.

If WockyDataForm's API doesn't make this possible without contortions, we should make it possible, rather than risking incorrectly calculating the hash (despite having had all the necessary information at some point).

       field = g_hash_table_lookup (dataform->fields, "FORM_TYPE");
-      g_assert (field != NULL);
+
+      if (field == NULL)
+        continue;

I think this is wrong, too. We should fail to calculate the hash in this
case. Arguably parsing the data form should fail if there's no FORM_TYPE
field, but XEP-0004 doesn't actually mandate that that field exist,
afaict.

I think there should be a test for the cases where a form has no
FORM_TYPE, and where the disco reply contains two forms with the same
FORM_TYPE; in both cases, calculating the hash should fail.

We should not cache the supposed hash negatively, though: otherwise,
malicious contacts could trick us into negatively caching a hash that
some other client is using correctly and legitimately. I think it's
reasonable for Gabble to still use the capabilities, though.

From jdev@:
> Zash: variations of empty fields isn't specified well iirc
> Zash: eg <field><value/></field> vs <field/> vs no field at all

Might be worth testing the first two variants do something sensible.

+  WockyStanza *  stanza = wocky_stanza_build (WOCKY_STANZA_TYPE_IQ,

What's going on ^^ here?
Comment 11 Jonny Lamb 2011-03-11 07:27:12 UTC
(In reply to comment #10)
> I don't think this is right, I'm afraid. The XEP says:
> 
>   • For each field other than FORM_TYPE:
>       ‣ Append the value of the "var" attribute, followed by the '<'
>         character.
>       ‣ Sort values by the XML character data of the <value/> element.
>       ‣ For each <value/> element, append the XML character data,
>         followed by the '<' character.
> 
> If WockyDataForm's API doesn't make this possible without contortions, we
> should make it possible, rather than risking incorrectly calculating the hash
> (despite having had all the necessary information at some point).

Okay, so I thought it would be difficult to start supporting all the other field types, but it turned out to be fine. Fixed and added a test for booleans.

One problem I see might be an issue here is that the data form parser treats both "true" and "1" as value booleans for TRUE (and the opposite for FALSE), but the caps hashing code only gets a GValue back and uses 1 and 0 for TRUE and FALSE respectively. We don't get the actual stanza from the data form field so can't check the actual value.

Also, it switches on the WockyDataFormFieldType value. I'm wondering if a good default value is to act as if it's text-single? What do you think?

> I think this is wrong, too. We should fail to calculate the hash in this
> case. Arguably parsing the data form should fail if there's no FORM_TYPE
> field, but XEP-0004 doesn't actually mandate that that field exist,
> afaict.
> 
> I think there should be a test for the cases where a form has no
> FORM_TYPE,

Done.

> ...and where the disco reply contains two forms with the same
> FORM_TYPE; in both cases, calculating the hash should fail.

Done.

> We should not cache the supposed hash negatively, though: otherwise,
> malicious contacts could trick us into negatively caching a hash that
> some other client is using correctly and legitimately. I think it's
> reasonable for Gabble to still use the capabilities, though.

When gabble gets a reply from a disco request that doesn't match the hash we originally received (including us not being able to hash it ourselves) it sets the trust of the hash back to zero but doesn't appear to do much more than that?

> From jdev@:
> > Zash: variations of empty fields isn't specified well iirc
> > Zash: eg <field><value/></field> vs <field/> vs no field at all
> 
> Might be worth testing the first two variants do something sensible.

Okay I added tests for <field var='lol'><value/></field>, <field var='lol'/> and <field/>. The first two fail because of the presence of the var attribute, but the last one fails because the data form parser just ignores fields with no var attribute.

> +  WockyStanza *  stanza = wocky_stanza_build (WOCKY_STANZA_TYPE_IQ,
> 
> What's going on ^^ here?

Fixed.

Check out my updated branch.
Comment 12 Will Thompson 2011-04-01 09:16:26 UTC
I was sure I'd replied to this. But apparently my reply got lost.

(In reply to comment #11)
> (In reply to comment #10)
> > I don't think this is right, I'm afraid. The XEP says:
> > 
> >   • For each field other than FORM_TYPE:
> >       ‣ Append the value of the "var" attribute, followed by the '<'
> >         character.
> >       ‣ Sort values by the XML character data of the <value/> element.
> >       ‣ For each <value/> element, append the XML character data,
> >         followed by the '<' character.
> > 
> > If WockyDataForm's API doesn't make this possible without contortions, we
> > should make it possible, rather than risking incorrectly calculating the hash
> > (despite having had all the necessary information at some point).
> 
> Okay, so I thought it would be difficult to start supporting all the other
> field types, but it turned out to be fine. Fixed and added a test for booleans.
> 
> One problem I see might be an issue here is that the data form parser treats
> both "true" and "1" as value booleans for TRUE (and the opposite for FALSE),
> but the caps hashing code only gets a GValue back and uses 1 and 0 for TRUE and
> FALSE respectively. We don't get the actual stanza from the data form field so
> can't check the actual value.

Then WockyDataForm should be fixed to give you the actual values.
Comment 13 Will Thompson 2011-04-01 09:26:31 UTC
http://cgit.freedesktop.org/~jonny/wocky/commit/?h=caps-hash&id=9d50b575a41fc5e3648023fbae372c06168ce983

I wouldn't do this like this. I'd make the loop that iterates the forms keep a set of FORM_TYPEs that it has already seen. Given that that loop already has code to bail out if the FORM_TYPE is missing, I don't see why this related validation happens ahead of time.
Comment 14 Jonny Lamb 2011-04-04 05:29:03 UTC
(In reply to comment #12)
> Then WockyDataForm should be fixed to give you the actual values.

I've done something which fixes this, although perhaps not the best way ever? Thoughts?

(In reply to comment #13)
> I wouldn't do this like this. I'd make the loop that iterates the forms keep a
> set of FORM_TYPEs that it has already seen. Given that that loop already has
> code to bail out if the FORM_TYPE is missing, I don't see why this related
> validation happens ahead of time.

Okay, it now does the checking during.

Branch updated.
Comment 15 Will Thompson 2011-04-11 08:20:22 UTC
wjt: don't only save the original strings for booleans
wjt: *always* save them all
wjt: and then you can make all this code much easier

wjt: the top patch [caps-hash: check whether there are multiple forms with the same name during, not before] is fine.
Comment 16 Jonny Lamb 2011-04-11 09:12:03 UTC
(In reply to comment #15)
> wjt: don't only save the original strings for booleans
> wjt: *always* save them all
> wjt: and then you can make all this code much easier

le done.
Comment 17 Jonny Lamb 2011-04-12 00:31:40 UTC
Also, I'm not so happy about "default_value_str". I'm open to suggestions.
Comment 18 Will Thompson 2011-04-14 10:46:08 UTC
I meant, keep a list of the strings inside each <value> for *every* field type, not just booleans. The algorithm specified in XEP-0115 does not say anything at all about field types, so nor should our implementation of it.
Comment 19 Will Thompson 2011-04-14 10:53:41 UTC
Argh. I somehow missed the last patch that actually did exactly this. Sorry for jumping the gun.

(In reply to comment #17)
> Also, I'm not so happy about "default_value_str". I'm open to suggestions.

'raw_value_contents', maybe?

I think it would be better if the filling in of default_value_str when parsing the form happened before the type stuff: always call extract_value_list, stash it in default_value_str, and pass that into get_field_value() maybe? This would mean that we'd not risk accidentally ignoring some spurious extra <value> elements when calculating a hash.

But functionally it would only change the behaviour for malformed forms; so maybe I should let it slide.
Comment 20 Will Thompson 2011-04-14 10:55:20 UTC
The Salut branch looks good (obvs. predicated on the Wocky branch).
Comment 21 Jonny Lamb 2011-04-19 04:11:46 UTC
(In reply to comment #19)
> Argh. I somehow missed the last patch that actually did exactly this. Sorry for
> jumping the gun.

Words cannot express my anger.

> 'raw_value_contents', maybe?

OK, changed.

> But functionally it would only change the behaviour for malformed forms; so
> maybe I should let it slide.

Good point.
Comment 22 Will Thompson 2011-04-21 10:25:16 UTC
cork them up in a bottle and drop them in the ocean
Comment 23 Jonny Lamb 2011-04-22 01:17:11 UTC
Cool, thanks. Merged wocky and salut branches. Did you look at the gabble branch?
Comment 24 Will Thompson 2011-04-28 09:27:50 UTC
(In reply to comment #23)
> Cool, thanks. Merged wocky and salut branches. Did you look at the gabble
> branch?

I can't help but think that, as long as all its test cases are in Wocky, Gabble's test-caps-hash.c could be expunged from the record.

But that branch looks fine for Gabble master. (Not the stable branch.)
Comment 25 Jonny Lamb 2011-05-02 01:43:19 UTC
(In reply to comment #24)
> I can't help but think that, as long as all its test cases are in Wocky,
> Gabble's test-caps-hash.c could be expunged from the record.

Done.

> But that branch looks fine for Gabble master. (Not the stable branch.)

Coooool.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.