Bug 37745

Summary: Problems in file goo/GooString.cc
Product: poppler Reporter: fusionefredda
Component: generalAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: All   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description fusionefredda 2011-05-30 05:52:13 UTC
I'm using poppler 0.16.5.
In one pdf with weird goto link entries (for example, in vi editor I see an entry "þÿ") the bsearch (row 677 of file poppler/Catalog.cc) for some value does not work correctly. I founded that the reason is a not totally correct implementation of methods "int GooString::cmp(GooString *str) const", "int GooString::cmpN(GooString *str, int n) const", "int GooString::cmp(const char *sA) const", and "int GooString::cmpN(const char *sA, int n) const".
In each of these methods, the row "x = *p1 - *p2;" should be replaced with: 
"x = (unsigned char)*p1 - (unsigned char)*p2;"

The rows of file goo/GooString.cc that needs this replace are:
694, 711, 728, 748.

I founded that is fine too the use of strcmp and strncmp c library functions.
E.g., "int GooString::cmp(GooString *str) const" could be simply implemented like this:

int GooString::cmp(GooString *str) const {
    return strcmp(s, str->s);
}
Comment 1 Albert Astals Cid 2011-05-30 11:37:56 UTC
Can you please explain why
  x = (unsigned char)*p1 - (unsigned char)*p2;
is correct if neither p1 nor p2 are unsigned chars?
Comment 2 fusionefredda 2011-05-31 00:21:55 UTC
(In reply to comment #1)
> Can you please explain why
>   x = (unsigned char)*p1 - (unsigned char)*p2;
> is correct if neither p1 nor p2 are unsigned chars?

I will provide a code example:

#include <stdio.h>
#include <string.h>

int cmp(const char*s, const char *sA) {
    int n1, i, x;
    const char *p1, *p2;

    n1 = strlen(s);
    for (i = 0, p1 = s, p2 = sA; i < n1 && *p2; ++i, ++p1, ++p2) {
        x = *p1 - *p2;
        if (x != 0) {
            return x;
        }
    }
    if (i < n1) {
        return 1;
    }
    if (*p2) {
        return -1;
    }
    return 0;
}

int unsigned_cmp(const char*s, const char *sA) {
    int n1, i, x;
    const char *p1, *p2;

    n1 = strlen(s);
    for (i = 0, p1 = s, p2 = sA; i < n1 && *p2; ++i, ++p1, ++p2) {
        x = (unsigned char)*p1 - (unsigned char)*p2;
        if (x != 0) {
            return x;
        }
    }
    if (i < n1) {
        return 1;
    }
    if (*p2) {
        return -1;
    }
    return 0;
}

int main()
{
    const char* str1 = "17_scuole";
    const char* str2 = "þÿ";
    
    int ret = strcmp(str1, str2);
    printf("strcmp ret: %d\n", ret);

    ret = cmp(str1, str2);
    printf("cmp ret: %d\n", ret);

    ret = unsigned_cmp(str1, str2);
    printf("unsigned_cmp ret: %d\n", ret);
}





The output is:

strcmp ret: -1
cmp ret: 110
unsigned_cmp ret: -146

A correct returned value is either -1 or -146, a positive value is instead incorrect. This fools the bsearch (binary tree sarch), pointing the search for the entry "17_scuole" to a incorrect branch, resulting in a NULL even if the entry "17_scuole" is present. I hope this example explains clearly the problem. 
Regards.
Comment 3 Albert Astals Cid 2011-05-31 15:36:30 UTC
Ok, i see the issue here, entries is "sorted" by the PDF file and the sorting in GooString::cmp does not agree with that sorting so the binary search fails. I understand you have such a file, could you please attach it to the bug?
Comment 4 GitLab Migration User 2018-08-20 22:05:30 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/186.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.