Bug 40675

Summary: Create an unparsable file
Product: Telepathy Reporter: Guillaume Desmottes <guillaume.desmottes>
Component: loggerAssignee: Telepathy bugs list <telepathy-bugs>
Status: RESOLVED FIXED QA Contact: Telepathy bugs list <telepathy-bugs>
Severity: normal    
Priority: high CC: bigon, nicolas
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: log file
proposed patch

Description Guillaume Desmottes 2011-09-07 01:22:17 UTC
I was trying to debug https://bugzilla.gnome.org/show_bug.cgi?id=653010 and so sent big chunk of data over IM.  The logger ended up writing a unparsable log file:

(empathy-chat:11545): tp-logger-WARNING **: Failed to parse file:'/home/dev/.local/share/TpLogger/logs/gabble_jabber_cassidy_2dtest1_40jabber_2ebelnet_2ebe0/cassidy@jabber.belnet.be/20110907.log'
Comment 1 Guillaume Desmottes 2011-09-07 01:23:22 UTC
Created attachment 50950 [details]
log file
Comment 2 Nicolas Dufresne 2011-09-07 07:13:36 UTC
Note this may be a bug in GLib g_markup_* methods, one would need to improve the error reports first.
Comment 3 Guillaume Desmottes 2011-09-07 09:09:46 UTC
Actually there are more errors:


/home/dev/.local/share/TpLogger/logs/gabble_jabber_cassidy_2dtest1_40jabber_2ebelnet_2ebe0/cassidy@jabber.belnet.be/20110907.log:83: parser error : xmlParseCharRef: invalid xmlChar value 27
&#x1b;[1m&#x1b;[37mtp_group_mixin_change_flags: emitting group flags changed
      ^
/home/dev/.local/share/TpLogger/logs/gabble_jabber_cassidy_2dtest1_40jabber_2ebelnet_2ebe0/cassidy@jabber.belnet.be/20110907.log:83: parser error : xmlParseCharRef: invalid xmlChar value 27
&#x1b;[1m&#x1b;[37mtp_group_mixin_change_flags: emitting group flags changed
               ^
/home/dev/.local/share/TpLogger/logs/gabble_jabber_cassidy_2dtest1_40jabber_2ebelnet_2ebe0/cassidy@jabber.belnet.be/20110907.log:84: parser error : xmlParseCharRef: invalid xmlChar value 27
  added    : [&#x1b;[22mPROPERTIES&#x1b;[1m]
                    ^
/home/dev/.local/share/TpLogger/logs/gabble_jabber_cassidy_2dtest1_40jabber_2ebelnet_2ebe0/cassidy@jabber.belnet.be/20110907.log:84: parser error : xmlParseCharRef: invalid xmlChar value 27
  added    : [&#x1b;[22mPROPERTIES&#x1b;[1m]
                                        ^
/home/dev/.local/share/TpLogger/logs/gabble_jabber_cassidy_2dtest1_40jabber_2ebelnet_2ebe0/cassidy@jabber.belnet.be/20110907.log:85: parser error : xmlParseCharRef: invalid xmlChar value 27
  removed  : [&#x1b;[22m&#x1b;[1m]



etc etc...
Comment 4 Cosimo Alfarano 2012-04-27 10:10:26 UTC
g_markup_escape says in the doc:

Note also that this function will produce character references in the range of &x1; ... &x1f; for all control sequences except for tabstop, newline and carriage return. The character references in this range are not valid XML 1.0, but they are valid XML 1.1 and will be accepted by the GMarkup parser.

Which is what's happening, we send a char in the invalid range for XML 1.0.

Both libxml and glib are doing the right thing (or at least the documented one).
glib is escaping the char as &#x1...; and libxml is not accepting it.

I think that the problem is moved up to us, to avoid it.
I created a test case for it, which I can reproduce.
Comment 5 Cosimo Alfarano 2012-04-27 10:12:15 UTC
*** Bug 46045 has been marked as a duplicate of this bug. ***
Comment 6 Cosimo Alfarano 2012-05-01 10:35:53 UTC
For the records, the ranges of valid chars:
http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char

Currently the entire file reading fails, so AIUI no backlogs are shown in empathy, which is probably the most annoying part.


A solution would be copy g_markup_escape_text() from glib locally and modify it in order to not allow invalid chars into the XML.
This would fix new events storing, but not the ones already logged -which is a bit silly - and also it would alter the original data - which is not ideal.

This solution is quite quick to implement, but has side effects and does not fix what has been already sent to the store.

Another solution would be leave the xml writing to g_markup and actually use it for parsing logs too, dropping libxml2.

We should be able to identify the bad <message/> element(s) and feed, excluding it/them, or doing something (ignoring?) with the invalid chars in that element.
This is something, along with basic log parsing, that I think g_markup parser can do alone.
Comment 7 Nicolas Dufresne 2012-05-01 12:49:18 UTC
(In reply to comment #6)
> For the records, the ranges of valid chars:
> http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char
> 
> Currently the entire file reading fails, so AIUI no backlogs are shown in
> empathy, which is probably the most annoying part.
> 
> 
> A solution would be copy g_markup_escape_text() from glib locally and modify it
> in order to not allow invalid chars into the XML.
> This would fix new events storing, but not the ones already logged -which is a
> bit silly - and also it would alter the original data - which is not ideal.

I don't like much this solution, we should probably use the xmlWriter code from libxml2. Is there a way to be error resilient using xml2 ?
Comment 8 Sjoerd Simons 2014-01-05 13:22:12 UTC
Created attachment 91517 [details] [review]
proposed patch

Ask libxml2 to be more resilient against parse failures.
Comment 9 Guillaume Desmottes 2014-01-06 09:13:23 UTC
Comment on attachment 91517 [details] [review]
proposed patch

Review of attachment 91517 [details] [review]:
-----------------------------------------------------------------

++

Does it fix bug #73072 as well?
Comment 10 Sjoerd Simons 2014-03-03 09:49:04 UTC
(In reply to comment #9)
> Comment on attachment 91517 [details] [review] [review]
> proposed patch
> 
> Review of attachment 91517 [details] [review] [review]:
> -----------------------------------------------------------------
> 
> ++
> 
> Does it fix bug #73072 as well?

I don't know, would need someone to test that ;)
Comment 11 Debarshi Ray 2015-04-20 11:55:49 UTC
(In reply to Sjoerd Simons from comment #10)
> (In reply to comment #9)
> > Comment on attachment 91517 [details] [review] [review] [review]
> > proposed patch
> > 
> > Review of attachment 91517 [details] [review] [review] [review]:
> > -----------------------------------------------------------------
> > 
> > ++
> > 
> > Does it fix bug #73072 as well?
> 
> I don't know, would need someone to test that ;)

Yes, it does.

I chanced upon these bugs when debugging bug 89595 , which is what a lot of Polari users have been hitting lately [1].

[1] https://bugzilla.gnome.org/show_bug.cgi?id=739816
Comment 12 Debarshi Ray 2015-04-20 11:56:06 UTC
*** Bug 73072 has been marked as a duplicate of this bug. ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.