I was trying to debug https://bugzilla.gnome.org/show_bug.cgi?id=653010 and so sent big chunk of data over IM. The logger ended up writing a unparsable log file: (empathy-chat:11545): tp-logger-WARNING **: Failed to parse file:'/home/dev/.local/share/TpLogger/logs/gabble_jabber_cassidy_2dtest1_40jabber_2ebelnet_2ebe0/cassidy@jabber.belnet.be/20110907.log'
Created attachment 50950 [details] log file
Note this may be a bug in GLib g_markup_* methods, one would need to improve the error reports first.
Actually there are more errors: /home/dev/.local/share/TpLogger/logs/gabble_jabber_cassidy_2dtest1_40jabber_2ebelnet_2ebe0/cassidy@jabber.belnet.be/20110907.log:83: parser error : xmlParseCharRef: invalid xmlChar value 27 [1m[37mtp_group_mixin_change_flags: emitting group flags changed ^ /home/dev/.local/share/TpLogger/logs/gabble_jabber_cassidy_2dtest1_40jabber_2ebelnet_2ebe0/cassidy@jabber.belnet.be/20110907.log:83: parser error : xmlParseCharRef: invalid xmlChar value 27 [1m[37mtp_group_mixin_change_flags: emitting group flags changed ^ /home/dev/.local/share/TpLogger/logs/gabble_jabber_cassidy_2dtest1_40jabber_2ebelnet_2ebe0/cassidy@jabber.belnet.be/20110907.log:84: parser error : xmlParseCharRef: invalid xmlChar value 27 added : [[22mPROPERTIES[1m] ^ /home/dev/.local/share/TpLogger/logs/gabble_jabber_cassidy_2dtest1_40jabber_2ebelnet_2ebe0/cassidy@jabber.belnet.be/20110907.log:84: parser error : xmlParseCharRef: invalid xmlChar value 27 added : [[22mPROPERTIES[1m] ^ /home/dev/.local/share/TpLogger/logs/gabble_jabber_cassidy_2dtest1_40jabber_2ebelnet_2ebe0/cassidy@jabber.belnet.be/20110907.log:85: parser error : xmlParseCharRef: invalid xmlChar value 27 removed : [[22m[1m] etc etc...
g_markup_escape says in the doc: Note also that this function will produce character references in the range of &x1; ... &x1f; for all control sequences except for tabstop, newline and carriage return. The character references in this range are not valid XML 1.0, but they are valid XML 1.1 and will be accepted by the GMarkup parser. Which is what's happening, we send a char in the invalid range for XML 1.0. Both libxml and glib are doing the right thing (or at least the documented one). glib is escaping the char as ...; and libxml is not accepting it. I think that the problem is moved up to us, to avoid it. I created a test case for it, which I can reproduce.
*** Bug 46045 has been marked as a duplicate of this bug. ***
For the records, the ranges of valid chars: http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char Currently the entire file reading fails, so AIUI no backlogs are shown in empathy, which is probably the most annoying part. A solution would be copy g_markup_escape_text() from glib locally and modify it in order to not allow invalid chars into the XML. This would fix new events storing, but not the ones already logged -which is a bit silly - and also it would alter the original data - which is not ideal. This solution is quite quick to implement, but has side effects and does not fix what has been already sent to the store. Another solution would be leave the xml writing to g_markup and actually use it for parsing logs too, dropping libxml2. We should be able to identify the bad <message/> element(s) and feed, excluding it/them, or doing something (ignoring?) with the invalid chars in that element. This is something, along with basic log parsing, that I think g_markup parser can do alone.
(In reply to comment #6) > For the records, the ranges of valid chars: > http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char > > Currently the entire file reading fails, so AIUI no backlogs are shown in > empathy, which is probably the most annoying part. > > > A solution would be copy g_markup_escape_text() from glib locally and modify it > in order to not allow invalid chars into the XML. > This would fix new events storing, but not the ones already logged -which is a > bit silly - and also it would alter the original data - which is not ideal. I don't like much this solution, we should probably use the xmlWriter code from libxml2. Is there a way to be error resilient using xml2 ?
Created attachment 91517 [details] [review] proposed patch Ask libxml2 to be more resilient against parse failures.
Comment on attachment 91517 [details] [review] proposed patch Review of attachment 91517 [details] [review]: ----------------------------------------------------------------- ++ Does it fix bug #73072 as well?
(In reply to comment #9) > Comment on attachment 91517 [details] [review] [review] > proposed patch > > Review of attachment 91517 [details] [review] [review]: > ----------------------------------------------------------------- > > ++ > > Does it fix bug #73072 as well? I don't know, would need someone to test that ;)
(In reply to Sjoerd Simons from comment #10) > (In reply to comment #9) > > Comment on attachment 91517 [details] [review] [review] [review] > > proposed patch > > > > Review of attachment 91517 [details] [review] [review] [review]: > > ----------------------------------------------------------------- > > > > ++ > > > > Does it fix bug #73072 as well? > > I don't know, would need someone to test that ;) Yes, it does. I chanced upon these bugs when debugging bug 89595 , which is what a lot of Polari users have been hitting lately [1]. [1] https://bugzilla.gnome.org/show_bug.cgi?id=739816
*** Bug 73072 has been marked as a duplicate of this bug. ***
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.