Originally filed as evince bug http://bugzilla.gnome.org/show_bug.cgi?id=326129 . With the testcase PDF [http://www.universetoday.com/365days.pdf], some page labels contain data like this: Breakpoint 2, poppler_page_get_property (object=0x83da148, prop_id=1, value=0xbf8382e8, pspec=0x82b7938) at poppler-page.cc:753 753 g_value_set_string (value, label.getCString()); (gdb) x /4x label.s 0x83d9d30: 0xfe 0xff 0x69 0x00 i.e. this is an UTF-8 string with prepended UTF-16 BOM ! The code in glib/poppler-page.c is: GooString label; page->document->doc->getCatalog ()->indexToLabel (page->index, &label); g_value_set_string (value, label.getCString());
That GooString need to be changed to UGooString
Hmm... GooString label; switch (prop_id) { case PROP_LABEL: page->document->doc->getCatalog ()->indexToLabel (page->index, &label); g_value_set_string (value, label.getCString()); page->document is PopplerDocument ->doc is PDFDoc * ->getCatalog() is Catalog * and Catalog::indexToLabel takes a GooString not a UGooString, so this is *not* a problem in the glib layer. --- I've follow it further down to PageLabelInfo::indexToLabel where for index=4 (the fith call when opening the testcase with evince), line 308: label->append(interval->prefix); appends the BOM to |label|: (gdb) x /2x interval->prefix 0x82cf2d0: 0xfe 0xff so the problem is either that PageLabelInfo::indexToLabel appends the interval->prefix unconverted, or that it is stored that way in ::prefix in the first place ( -> back to general ?
i just said GooString needs to be changed to UGooString, obviously in the PageLabelInfo. I really did not want to change it to the glib frontend component, just a mistake there ;-)
Fixed in cvs.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.