Bug 85307

Summary: escape_as_identifier issues
Product: Telepathy Reporter: Andy Grover <agrover>
Component: tp-glibAssignee: Telepathy bugs list <telepathy-bugs>
Status: RESOLVED NOTABUG QA Contact: Telepathy bugs list <telepathy-bugs>
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Andy Grover 2014-10-21 22:33:36 UTC
docstring starts with: "Escape the given string to be a valid D-Bus object path or service name component, using a reversible encoding to ensure uniqueness."

1) Docstring should clarify that this is for "object path components" and "service name components". It could be read as "object paths" (i.e. an entire path) and "service name components".

2) Recommend replacing references to "service name" with "well-known bus name", since this is what the dbus spec calls them.

3) Recommend specific functions escaping the two different things. The two string types have different allowable characters: bus name additionally allows '-', and object path component can start with [0-9], whereas bus name cannot. This could lead to characters being escaped when they needn't be.
Comment 1 Simon McVittie 2014-10-22 12:17:13 UTC
(In reply to Andy Grover from comment #0)
> 1) Docstring should clarify that this is for "object path components" and
> "service name components". It could be read as "object paths" (i.e. an
> entire path) and "service name components".

Something that is a valid service name component cannot possibly be a valid object path, because object paths must start with "/", and service name components must not contain "/".

> 3) Recommend specific functions escaping the two different things. The two
> string types have different allowable characters: bus name additionally
> allows '-', and object path component can start with [0-9], whereas bus name
> cannot. This could lead to characters being escaped when they needn't be.

A few places in Telepathy want to use strings that 1:1 correspond in an object path and in a bus name (e.g. Connection and Client both do this). Using different escaping algorithms would be troublesome for this.

I don't think it's worth introducing additional functions, and potentially confusing API users into using the wrong one, for a minor gain in the number of characters that can remain unescaped.

tp_escape_as_identifier() does what its name says: it outputs a valid C identifier, which corresponds to various other languages' idea of what an identifier is, and is also a strict subset of what is allowed in D-Bus object path components, bus name components, interface name components, member (signal/method) names and so on. We occasionally use it for mechanical generation of parts of C and Python function names, too. It is potentially "too escaped" (i.e. non-optimal for certain situations), but it is never "not escaped enough".

Given any constrained input with particular characteristics / character frequency / whatever, and any reasonable set of output restrictions (e.g. object path component), I expect it to be possible to construct an algorithm better than tp_escape_as_identifier(). For instance, we "escape" Telepathy 0.x protocol names, which look like "local-xmpp", by replacing "-" with "_"[1] and knowing that protocol names are sufficiently constrained that that's reversible and the output will be a valid identifier.

That's not tp_escape_as_identifier()'s purpose; its purpose is to be "efficient enough", particularly for the common case where the input is mostly alphanumeric, while also being fully general so it can't break (except by excessive length).

Telepathy deliberately doesn't currently have an inverse of tp_escape_as_identifier(), because we use it in places where uniqueness and debuggability are the only desired characteristics: a human reader reading logs and knowing how tp_escape_as_identifier() works can decode the username from a Connection's object path, but applications are never meant to do so.

systemd does not have the same philosophy for its analogous functionality, which *is* reversible (and puts fewer constraints on the output - it's only intended for object paths and filenames). That's fine, it's their function, not ours, and they can document it however they want.

[1] Telepathy 1.0 changes the definition of protocol names so they look like "local_xmpp", so that we can use them as-is
Comment 2 Andy Grover 2014-10-22 16:51:09 UTC
Ok, thanks for the thorough explanation! Closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.