I'm using poppler 0.16.5. In one pdf with weird goto link entries (for example, in vi editor I see an entry "þÿ") the bsearch (row 677 of file poppler/Catalog.cc) for some value does not work correctly. I founded that the reason is a not totally correct implementation of methods "int GooString::cmp(GooString *str) const", "int GooString::cmpN(GooString *str, int n) const", "int GooString::cmp(const char *sA) const", and "int GooString::cmpN(const char *sA, int n) const". In each of these methods, the row "x = *p1 - *p2;" should be replaced with: "x = (unsigned char)*p1 - (unsigned char)*p2;" The rows of file goo/GooString.cc that needs this replace are: 694, 711, 728, 748. I founded that is fine too the use of strcmp and strncmp c library functions. E.g., "int GooString::cmp(GooString *str) const" could be simply implemented like this: int GooString::cmp(GooString *str) const { return strcmp(s, str->s); }
Can you please explain why x = (unsigned char)*p1 - (unsigned char)*p2; is correct if neither p1 nor p2 are unsigned chars?
(In reply to comment #1) > Can you please explain why > x = (unsigned char)*p1 - (unsigned char)*p2; > is correct if neither p1 nor p2 are unsigned chars? I will provide a code example: #include <stdio.h> #include <string.h> int cmp(const char*s, const char *sA) { int n1, i, x; const char *p1, *p2; n1 = strlen(s); for (i = 0, p1 = s, p2 = sA; i < n1 && *p2; ++i, ++p1, ++p2) { x = *p1 - *p2; if (x != 0) { return x; } } if (i < n1) { return 1; } if (*p2) { return -1; } return 0; } int unsigned_cmp(const char*s, const char *sA) { int n1, i, x; const char *p1, *p2; n1 = strlen(s); for (i = 0, p1 = s, p2 = sA; i < n1 && *p2; ++i, ++p1, ++p2) { x = (unsigned char)*p1 - (unsigned char)*p2; if (x != 0) { return x; } } if (i < n1) { return 1; } if (*p2) { return -1; } return 0; } int main() { const char* str1 = "17_scuole"; const char* str2 = "þÿ"; int ret = strcmp(str1, str2); printf("strcmp ret: %d\n", ret); ret = cmp(str1, str2); printf("cmp ret: %d\n", ret); ret = unsigned_cmp(str1, str2); printf("unsigned_cmp ret: %d\n", ret); } The output is: strcmp ret: -1 cmp ret: 110 unsigned_cmp ret: -146 A correct returned value is either -1 or -146, a positive value is instead incorrect. This fools the bsearch (binary tree sarch), pointing the search for the entry "17_scuole" to a incorrect branch, resulting in a NULL even if the entry "17_scuole" is present. I hope this example explains clearly the problem. Regards.
Ok, i see the issue here, entries is "sorted" by the PDF file and the sorting in GooString::cmp does not agree with that sorting so the binary search fails. I understand you have such a file, could you please attach it to the bug?
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/186.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.