Hi. ustring::to_utf8() creates a MiniIconv ic("UTF-8", "UTF-16"); assuming that iconv(3) uses the native byte order for "UTF-16". On OS X w/ Intel CPUs (I installed poppler through MacPorts, but this issue is unrelated, see below) this fails, as a quick $ echo -n 7 | iconv -t utf-16 | hexdump -C 00000000 fe ff 00 37 |...7| reveals: it's UTF-16BE. This breaks page-labels for me, which instead of "78" (UTF-8) return the (hex) values e3 9c 80 e3 a0 80 which is 0x3700 0x3800. A fix might be to not "decode" GooString's UTF-16BE to native byte order in detail::unicode_GooString_to_ustring(GooString *str) or use a source encoding based on the BYTE_ORDER macro instead of just "UTF-16BE" or to check the BOM-character output by iconv(3) (which e.g. ustring::from_utf8(const char *str, int len) currently skips).
I don't have access to a Mac machine, not sure if any of the other poppler developer does, so the best possible way to fix it is you providing a patch
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/553.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.