Bug 97262 - Enumerate PDF named destinations
Summary: Enumerate PDF named destinations
Status: RESOLVED MOVED
Alias: None
Product: poppler
Classification: Unclassified
Component: glib frontend (show other bugs)
Version: unspecified
Hardware: All All
: medium enhancement
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-08-09 16:13 UTC by Masamichi Hosoda
Modified: 2018-08-21 11:02 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
foo.pdf (12.02 KB, application/pdf)
2016-08-09 17:37 UTC, Masamichi Hosoda
Details
add-catalog-functions-for-named-dest.tar.gz (1.99 KB, application/x-gzip)
2016-08-20 14:28 UTC, Masamichi Hosoda
Details
0001-Divide-Catalog-findDest.patch (2.91 KB, patch)
2016-08-21 22:42 UTC, Masamichi Hosoda
Details | Splinter Review
0002-Add-functions-for-named-destination-name-in-name-tre.patch (1.79 KB, patch)
2016-08-21 22:42 UTC, Masamichi Hosoda
Details | Splinter Review
0003-Add-functions-for-named-destination-name-in-name-dic.patch (2.01 KB, patch)
2016-08-21 22:43 UTC, Masamichi Hosoda
Details | Splinter Review
0004-Add-poppler_dest_equal-to-glib-frontend.patch (3.14 KB, patch)
2016-08-28 08:07 UTC, Masamichi Hosoda
Details | Splinter Review
0005-Add-poppler_document_get_dests-to-glib-frontend.patch (4.08 KB, patch)
2016-08-28 08:08 UTC, Masamichi Hosoda
Details | Splinter Review
pdfinfo: print dests (4.29 KB, patch)
2016-08-30 13:28 UTC, Adrian Johnson
Details | Splinter Review
utf16.pdf (12.64 KB, application/pdf)
2016-08-31 14:49 UTC, Masamichi Hosoda
Details
pdfinfo: print dests v2 (6.52 KB, patch)
2016-09-01 12:47 UTC, Adrian Johnson
Details | Splinter Review
v2-0001-Divide-Catalog-findDest.patch (2.91 KB, patch)
2016-09-01 14:30 UTC, Masamichi Hosoda
Details | Splinter Review
v2-0002-Add-functions-for-named-destination-name-in-name-.patch (1.78 KB, patch)
2016-09-01 14:31 UTC, Masamichi Hosoda
Details | Splinter Review
v2-0003-Add-functions-for-named-destination-name-in-name-.patch (2.01 KB, patch)
2016-09-01 14:31 UTC, Masamichi Hosoda
Details | Splinter Review
v2-0004-Add-poppler_document_find_dest_bytes-to-glib-fron.patch (4.02 KB, patch)
2016-09-02 17:40 UTC, Masamichi Hosoda
Details | Splinter Review
v2-0005-Add-poppler_document_build_dests_namelist-to-glib.patch (3.17 KB, patch)
2016-09-02 17:40 UTC, Masamichi Hosoda
Details | Splinter Review
v2-0006-Add-poppler_dest_equal-to-glib-frontend.patch (3.14 KB, patch)
2016-09-02 17:40 UTC, Masamichi Hosoda
Details | Splinter Review
v2-0007-Add-poppler_document_build_dests_hashtable-to-gli.patch (3.66 KB, patch)
2016-09-02 17:41 UTC, Masamichi Hosoda
Details | Splinter Review
v3-tree-0001-Add-poppler_document_build_dests_tree-to-glib-fro.patch (4.34 KB, patch)
2016-09-25 18:22 UTC, Masamichi Hosoda
Details | Splinter Review
v3-hash-0001-Add-poppler_dest_equal-to-glib-frontend.patch (3.15 KB, patch)
2016-09-25 18:23 UTC, Masamichi Hosoda
Details | Splinter Review
v3-hash-0002-Add-poppler_document_build_dests_hashtable-to-gli.patch (4.23 KB, patch)
2016-09-25 18:23 UTC, Masamichi Hosoda
Details | Splinter Review
v4-0001-Add-poppler_document_build_dests_tree-to-glib-fro.patch (4.32 KB, patch)
2016-12-07 11:49 UTC, Masamichi Hosoda
Details | Splinter Review
pdfinfo: print dests v3 (6.16 KB, patch)
2017-08-15 12:09 UTC, Adrian Johnson
Details | Splinter Review

Description Masamichi Hosoda 2016-08-09 16:13:48 UTC
I'd like to enumerate PDF named destinations in a PDF.

I've tried `poppler_index_iter_get_action ()'.
It can obtain named destinations which are used in the PDF actions.
But, it cannot obtain named destinations which are not used.

I'd like all named destinations.
So I cannot use `poppler_index_iter_get_action ()'.
Comment 1 Jason Crain 2016-08-09 17:12:42 UTC
Do you have an example of such a PDF?
Comment 2 Masamichi Hosoda 2016-08-09 17:37:17 UTC
Created attachment 125645 [details]
foo.pdf
Comment 3 Masamichi Hosoda 2016-08-09 17:38:05 UTC
I've attached the sample PDF.

Here's its source file.

%%% foo.tex %%%

% The following command generates `foo.pdf'.
% $ pdftex foo.tex

\pdfobjcompresslevel=0

\pdfdest name{one} xyz
page 1

\vfill
\eject

\pdfdest name{two} xyz
page 2

\vfill
\end

%%% foo.tex %%%
Comment 4 Jason Crain 2016-08-16 03:16:59 UTC
Correct, the poppler_index_iter functions are only for the document outline.  If you wish to iterate all named dests, that would require a new API.

The usual way of accessing named dests is indirectly, through the outline via poppler_index_iter functions, through links via poppler_page_get_link_mapping, or by looking up dests by name via poppler_document_find_dest.  But the glib frontend currently does not have a way to iterate named dests.
Comment 5 Masamichi Hosoda 2016-08-19 15:03:33 UTC
Thank you for your answer.

I'd like to iterate all named dists.
I understand that current glib frontend could not iterate them.
Could other frontend iterate them?
Comment 6 Jose Aliste 2016-08-19 15:28:26 UTC
You can iterate them in the core "frontend". If you look into poppler_document_find_dest you would get an idea of what to do. If you can provide a patch for the glib frontend that would be great. Since the Dests can be either stored in a Dict or a TreeName, maybe the best way to implement this in glib would be with a a GHashTable. Then you can use g_hash_table_foreach to iterate over all of them.
Comment 7 Masamichi Hosoda 2016-08-19 16:23:54 UTC
Does the core "frontend" contain `poppler/Catalog.h`?

class `Catalog` has `destNameTree` and `getDestNameTree ()` but they are "private" member.
So they can not be accessed from outside the class `Catalog`.

I've tried to access the private members by ugly tricky way.
https://github.com/trueroad/extractpdfmark

It could iterate all named dest.
But it uses ugly tricky way.

Is there another way?
Or, would you accept the patch which changes the private members to public?
Comment 8 Jose Aliste 2016-08-19 19:52:37 UTC
If you provide a sensible patch for the glib frontend, then I don't see a reason to oppose the making the getDestNameTree() public. Actually the getDests API is already public. Please bear in mind that you need to use both, since different versions of PDF spec allow either for a Dests Dictionary or a Dests Tree Name inside the Names Dictionary. 

Also bear in mind that the API inside poppler/ dir is supposed to be used only by the frontends (cpp, glib, qt4, qt5). You can check if other frontends already do what you want, but since the DestNameTree in Catalog is private, I don't think that's the case.
Comment 9 Masamichi Hosoda 2016-08-20 14:28:18 UTC
Created attachment 125919 [details]
add-catalog-functions-for-named-dest.tar.gz
Comment 10 Masamichi Hosoda 2016-08-20 14:28:41 UTC
I've made patches which adds some Catalog functions for named destinations.
How about them?

If you accept them, I'll make the patch for glib frontend.
Comment 11 Albert Astals Cid 2016-08-21 21:12:55 UTC
Please attach the .patch files directly instead a tar.gz, makes things much easier.
Comment 12 Masamichi Hosoda 2016-08-21 22:42:16 UTC
Created attachment 125938 [details] [review]
0001-Divide-Catalog-findDest.patch
Comment 13 Masamichi Hosoda 2016-08-21 22:42:51 UTC
Created attachment 125939 [details] [review]
0002-Add-functions-for-named-destination-name-in-name-tre.patch
Comment 14 Masamichi Hosoda 2016-08-21 22:43:19 UTC
Created attachment 125940 [details] [review]
0003-Add-functions-for-named-destination-name-in-name-dic.patch
Comment 15 Masamichi Hosoda 2016-08-21 22:46:17 UTC
I've attached patch files.
Comment 16 Albert Astals Cid 2016-08-21 22:57:20 UTC
I don't think this is what Jose was asking, i think he was asking for patches to the glib frontend, but i'll let him confirm.
Comment 17 Masamichi Hosoda 2016-08-21 23:06:16 UTC
Yes, this is not glib frontend.
But, current `Catalog` class does not have destination name related interfaces.
So this patch adds these interfaces.

If you accept this patch, I'll create glib frontend patch which uses these interfaces.
Comment 18 Jose Aliste 2016-08-22 02:03:04 UTC
Yeah, I was thinking more of making the gestDests and GetDestNameTree public and then doing all the processing inside the glib frontend.  But these patches however could be good to implement this in qt4 or cpp if ever needed. So, I'd say it's up to you Albert :).
Comment 19 Albert Astals Cid 2016-08-22 21:42:25 UTC
Let me ask again, what's the use case for this?
Comment 20 Masamichi Hosoda 2016-08-22 21:56:31 UTC
I'd like to iterate all named destinations in a PDF.

If you use Ghostscript for the PDF file size reduction, Ghostscript does not preserve named destinations.
I'm making a tool that can extract named destinations as PDFmark from PDF.
You can get the small PDF that has preserved them by using the tool.

Current poppler does not have public interface for iterating named destinations.
I've tried to access the private members by ugly tricky way.
https://github.com/trueroad/extractpdfmark

It could iterate all named dest.
But it uses ugly tricky way.

So this patch adds these interfaces to `Catalog` class.

If you accept this patch, I'll create glib frontend API patch which uses these interfaces.
Then, I'll create the tool which uses the glib frontend API.
Comment 21 Albert Astals Cid 2016-08-22 22:30:54 UTC
Isn't it better that you actually fix Ghostscript than creating a tool to workaround it?
Comment 22 Masamichi Hosoda 2016-08-23 01:37:07 UTC
I've reported this problem to Ghostscript bugzilla.
http://bugs.ghostscript.com/show_bug.cgi?id=696943

Ghostscript developers knew this problem about two years ago.
http://bugs.ghostscript.com/show_bug.cgi?id=695760

But, Ghostscript has not been enhanced yet.
So I need to create a tool to workaround it.
Comment 23 Masamichi Hosoda 2016-08-28 08:07:45 UTC
Created attachment 126078 [details] [review]
0004-Add-poppler_dest_equal-to-glib-frontend.patch
Comment 24 Masamichi Hosoda 2016-08-28 08:08:21 UTC
Created attachment 126079 [details] [review]
0005-Add-poppler_document_get_dests-to-glib-frontend.patch
Comment 25 Masamichi Hosoda 2016-08-28 08:08:39 UTC
I've created a patch for glib frontend (0004, 0005).
It requires class `Catalog` patches (0001, 0002, 0003).

If you want, I'll create a patch for cpp frontend.
Comment 26 Adrian Johnson 2016-08-30 13:23:39 UTC
I've been working on a patch for CairoOutputDev that requires access to a list of named destinations. The core patches would be helpful. Looking at the patches:

0001 - Looks good. Although I suggest renaming "constructLinkDest" to "createLinkDest".

0002, 0003 - I don't think we need separate functions for Name Trees and Dests Dict access. I suggest have one set of functions for accessing names eg:

int numDests();
GooString *getDestsName(int i);
LinkDest *getDestsLink(int i);

Internally, the functions should combine the Name Tree dests and dict dests into one list.

0005 - I know that Jose suggested returning a GHashTable. Since the main purpose of a hash table is mapping keys to values, and the poppler_document_find_dest() function already does this, maybe the names could be returned by either:

1) A list of just the names. The user can use poppler_document_find_dest() to get the dests.

or
2) A binary tree of names and dests. This has the advantage that iterating the names is in alphabetical order instead of the the random order of the hash table.

Also, poppler_document_get_dests() creates the hash table each time it is called. It maybe better to cache the hash table or use a function name like poppler_document_build_dests_hashtable() since "get" implies fast access.

+ key = g_bytes_new (catalog->getDestsName (i),
+				   strlen (catalog->getDestsName (i)));

You are using g_bytes because the name can contain \0 but you are using strlen here. Also the getDestName() in Catalog will need to return a GooString, not char *.

Although it is not clear to me whether destination names can actually contain \0. PDF32000 section 12.3.2.3 states "The keys in the name tree may be
treated as text strings for display purposes".
Comment 27 Adrian Johnson 2016-08-30 13:28:16 UTC
Created attachment 126119 [details] [review]
pdfinfo: print dests

It would also be useful to be able print a list of named destinations in the PDF.

Note this patch requires C++11, the use of which was discussed in
https://lists.freedesktop.org/archives/poppler/2016-May/011844.html

The patch enable c++11 in configure.ac. I'm not sure how to set gcc flags in the cmake build.
Comment 28 Jose Aliste 2016-08-30 22:02:52 UTC
Adrian, Does find_dest take O(1) time then to find? (I don't know the details of Dict.cc)  The only reason why I suggested a HashTable is because we have some bug in evince where the outline of a very big file is constructed only by using named destinations... If find_dest is already taking constant time to do the lookup, Of course, there is no reason to construct a HashTable (and the bug in evince is elsewhere... )
Comment 29 Adrian Johnson 2016-08-30 22:16:35 UTC
find_dest is O(log(n))

I think just returning a list of names would be a cleaner API. It avoids the overhead of parsing and creating the link dest for each name if it is not required. And even if it is required it reduces peak memory usage if the application can create and destroy each link dest sequentially instead of filling memory with every link. We could always add the hash table later if someone finds a use case for it.
Comment 30 Jose Aliste 2016-08-30 22:19:09 UTC
(In reply to Adrian Johnson from comment #29)
> find_dest is O(log(n))
> 
> I think just returning a list of names would be a cleaner API. It avoids the
> overhead of parsing and creating the link dest for each name if it is not
> required. And even if it is required it reduces peak memory usage if the
> application can create and destroy each link dest sequentially instead of
> filling memory with every link. We could always add the hash table later if
> someone finds a use case for it.

I agree. I will try to find and fix that evince bug to see if the hashtable actually helps :)
Comment 31 Masamichi Hosoda 2016-08-31 13:41:00 UTC
(In reply to Adrian Johnson from comment #26)
> I've been working on a patch for CairoOutputDev that requires access to a
> list of named destinations. The core patches would be helpful. Looking at
> the patches:

Thank you for your reviewing.

> 0001 - Looks good. Although I suggest renaming "constructLinkDest" to
> "createLinkDest".

I'll change the patch.

> 0002, 0003 - I don't think we need separate functions for Name Trees and
> Dests Dict access. I suggest have one set of functions for accessing names
> eg:
> 
> int numDests();
> GooString *getDestsName(int i);
> LinkDest *getDestsLink(int i);
> 
> Internally, the functions should combine the Name Tree dests and dict dests
> into one list.

Name Tree is recorded in Catalog::destNameTree.
It has GooString for its name.
So getDestName() can return the pointer of GooString.
The return value should not be deleted.

Dests Dict is recoded in Catalog::dests.
However, it does not have GooString for its name.
It has only char* for its name.
So getDestName() have to construct GooString from char*.
The return value should be deleted.

Or, would you like to add a new member variable GooString *destnames to class Catalog?
In this case, Catalog::getDestName() creates destnames array and Catalog::~Catalog() deletes it.
The return value should not be deleted.

> 0005 - I know that Jose suggested returning a GHashTable. Since the main
> purpose of a hash table is mapping keys to values, and the
> poppler_document_find_dest() function already does this, maybe the names
> could be returned by either:
> 
> 1) A list of just the names. The user can use poppler_document_find_dest()
> to get the dests.

Is is something like
GList *poppler_document_build_dests_namelist()
?
It returns the pointer of GList which contains GBytes for destination name.
If you would like it, I'll create a patch.

However, unfortunately, if the name contains \0, poppler_document_find_dest() cannot find it.

> or
> 2) A binary tree of names and dests. This has the advantage that iterating
> the names is in alphabetical order instead of the the random order of the
> hash table.

I'd like this.

> Also, poppler_document_get_dests() creates the hash table each time it is
> called. It maybe better to cache the hash table or use a function name like
> poppler_document_build_dests_hashtable() since "get" implies fast access.

I'll change the patch to poppler_document_build_dests().

> + key = g_bytes_new (catalog->getDestsName (i),
> +				   strlen (catalog->getDestsName (i)));
> 
> You are using g_bytes because the name can contain \0 but you are using
> strlen here. Also the getDestName() in Catalog will need to return a
> GooString, not char *.

It handles Dests Dict.
The name of Dests Dict cannot contain \0.
Only the name of Name Tree can contain \0.

> Although it is not clear to me whether destination names can actually
> contain \0. PDF32000 section 12.3.2.3 states "The keys in the name tree may
> be
> treated as text strings for display purposes".

Almost UTF-16 strings contains \0.
Comment 32 Masamichi Hosoda 2016-08-31 14:49:09 UTC
Created attachment 126133 [details]
utf16.pdf

I've attached PDF which contains UTF-16 destination names.

Here's its source file.

%%% utf16.tex %%%

% The following command generates `utf16.pdf'.
% $ pdftex utf16.tex

\pdfobjcompresslevel=0
\pdfcatalog{/PageMode /UseOutlines}

\catcode0=12

\def\furone{^^ff^^fe^^00f^^00^^fc^^00r^^00 ^^00o^^00n^^00e}
\def\furtwo{^^ff^^fe^^00f^^00^^fc^^00r^^00 ^^00t^^00w^^00o}

\pdfoutline goto name{\furone} {\furone}
\pdfoutline goto name{\furtwo} {\furtwo}

\pdfdest name{\furone} xyz
page 1

\vfill
\eject

\pdfdest name{\furtwo} xyz
page 2

\vfill
\end

%%% utf16.tex %%%
Comment 33 Jose Aliste 2016-08-31 17:51:17 UTC
Please note that there is a patch for working around the '\0' issue in named dests without disrupting the API. See bug #60377
Comment 34 Adrian Johnson 2016-09-01 12:47:46 UTC
Created attachment 126158 [details] [review]
pdfinfo: print dests v2

Updated to read dests and name dests and use mapUnicode to print names.
Comment 35 Adrian Johnson 2016-09-01 12:56:45 UTC
(In reply to Masamichi Hosoda from comment #31)
> > Internally, the functions should combine the Name Tree dests and dict dests
> > into one list.
> 
> Name Tree is recorded in Catalog::destNameTree.
> It has GooString for its name.
> So getDestName() can return the pointer of GooString.
> The return value should not be deleted.
> 
> Dests Dict is recoded in Catalog::dests.
> However, it does not have GooString for its name.
> It has only char* for its name.
> So getDestName() have to construct GooString from char*.
> The return value should be deleted.

Ok leave it as two separate sets of functions.

In the longer term I would like to read all the names into a map or hash. The current implementation that reads the nameTree (which should already be sorted except for when it isn't) then runs qsort on it is not ideal. But that can be the topic of another bug.

> > 1) A list of just the names. The user can use poppler_document_find_dest()
> > to get the dests.
> 
> Is is something like
> GList *poppler_document_build_dests_namelist()
> ?
> It returns the pointer of GList which contains GBytes for destination name.
> If you would like it, I'll create a patch.
> 
> However, unfortunately, if the name contains \0,
> poppler_document_find_dest() cannot find it.

If you escape the \0 as per comment 33 it should work.

> 
> > or
> > 2) A binary tree of names and dests. This has the advantage that iterating
> > the names is in alphabetical order instead of the the random order of the
> > hash table.
> 
> I'd like this.

Carlos, do you have an opinion on whether to return a GList of names or binary tree of names/LinkDests?

> Almost UTF-16 strings contains \0.

Isn't the glib API UTF-8 based?
Comment 36 Masamichi Hosoda 2016-09-01 14:30:33 UTC
Created attachment 126159 [details] [review]
v2-0001-Divide-Catalog-findDest.patch
Comment 37 Masamichi Hosoda 2016-09-01 14:31:05 UTC
Created attachment 126160 [details] [review]
v2-0002-Add-functions-for-named-destination-name-in-name-.patch
Comment 38 Masamichi Hosoda 2016-09-01 14:31:35 UTC
Created attachment 126161 [details] [review]
v2-0003-Add-functions-for-named-destination-name-in-name-.patch
Comment 39 Masamichi Hosoda 2016-09-01 14:45:48 UTC
(In reply to Adrian Johnson from comment #35)
> Ok leave it as two separate sets of functions.
> 
> In the longer term I would like to read all the names into a map or hash.
> The current implementation that reads the nameTree (which should already be
> sorted except for when it isn't) then runs qsort on it is not ideal. But
> that can be the topic of another bug.

Thank you.

I've updated core patches (v2-0001, v2-0002, v2-0003).

> > However, unfortunately, if the name contains \0,
> > poppler_document_find_dest() cannot find it.
> 
> If you escape the \0 as per comment 33 it should work.

I'd like GBytes instead of escape.
I'd like to add something like the following function.

PopplerDest *poppler_document_find_dest_bytes (PopplerDocument*, const GBytes);

> > Almost UTF-16 strings contains \0.
> 
> Isn't the glib API UTF-8 based?

Even if the glib API is UTF-8 based, PDFs can contain UTF-16.
Moreover, if I understand correctly, encoding of the PDF destination names has not been defined.
It can contain UTF-16 and PDFDocEncoding.
Also it can even contain pure binaries.
Comment 40 Masamichi Hosoda 2016-09-02 17:40:01 UTC
Created attachment 126179 [details] [review]
v2-0004-Add-poppler_document_find_dest_bytes-to-glib-fron.patch
Comment 41 Masamichi Hosoda 2016-09-02 17:40:23 UTC
Created attachment 126180 [details] [review]
v2-0005-Add-poppler_document_build_dests_namelist-to-glib.patch
Comment 42 Masamichi Hosoda 2016-09-02 17:40:47 UTC
Created attachment 126181 [details] [review]
v2-0006-Add-poppler_dest_equal-to-glib-frontend.patch
Comment 43 Masamichi Hosoda 2016-09-02 17:41:12 UTC
Created attachment 126182 [details] [review]
v2-0007-Add-poppler_document_build_dests_hashtable-to-gli.patch
Comment 44 Masamichi Hosoda 2016-09-02 17:47:47 UTC
I've added and updated glib frontend patches (v2-0004, v2-0005, v2-0006, v2-0007).
They requires the core patches (v2-0001, v2-0002, v2-0003).
They add the following functions.

PopplerDest *
poppler_document_find_dest_bytes (PopplerDocument *document,
				  GBytes          *link_name);
GList *
poppler_document_build_dests_namelist (PopplerDocument *document);
GHashTable *
poppler_document_build_dests_hashtable (PopplerDocument *document);
Comment 45 Masamichi Hosoda 2016-09-13 13:50:27 UTC
I think that we don't need further discussion on the core patches.
(v2-0001, v2-0002, v3-0003)
So would you commit them?

But, we seem to need more discussion on the glib patches.
(v2-0004, v2-0005, v2-0006, v2-0007)


Again, I'd like to iterate all named destinations in a PDF.
There are two ways, hash and list.

If you just would like all of the names, list would be good.
If you would like all of the names and the destinations, hash would be good.

If only the list interface has been provided, 
you need to find a destination from the name.

But, current poppler_document_find_dest () cannot handle the name which contained \0.
I know that an escape is one of the solutions.
I worry about the behavior of many applications that do not consider the escape.
In addition, I'd like to use raw binary since the escape is troublesome.

So I proposed raw binary finding function poppler_document_find_dest_bytes ().

If the hash interface (my imprementation) has been provided,
you can find any named destination, including \0.
In this case, both poppler_document_find_dest () and poppler_document_find_dest_bytes ()
are unnecessary.
The list interface is also unnecessary.

Only the hash interface,
poppler_document_build_dests_hashtable ()
is necessary.

I think that it is simple.
So I'd like this.
Comment 46 Carlos Garcia Campos 2016-09-17 09:40:11 UTC
Ok, I've just pushed the core patches. Modified the second one to take the catalog lock.
Comment 47 Carlos Garcia Campos 2016-09-17 09:42:07 UTC
(In reply to Adrian Johnson from comment #27)
> Created attachment 126119 [details] [review] [review]
> pdfinfo: print dests
> 
> It would also be useful to be able print a list of named destinations in the
> PDF.
> 
> Note this patch requires C++11, the use of which was discussed in
> https://lists.freedesktop.org/archives/poppler/2016-May/011844.html
> 
> The patch enable c++11 in configure.ac. I'm not sure how to set gcc flags in
> the cmake build.

I think you just need to add the flag to CMAKE_CXX_FLAGS somewhere, in CMakeList.txt or maybe in PopplerMacros.cmake, but there flags are only added when building with GCC, I think if it's because those flags are specific to GCC ro because we don't really support clang
Comment 48 Carlos Garcia Campos 2016-09-17 09:48:35 UTC
(In reply to Adrian Johnson from comment #35)
> (In reply to Masamichi Hosoda from comment #31)
> > > Internally, the functions should combine the Name Tree dests and dict dests
> > > into one list.
> > 
> > Name Tree is recorded in Catalog::destNameTree.
> > It has GooString for its name.
> > So getDestName() can return the pointer of GooString.
> > The return value should not be deleted.
> > 
> > Dests Dict is recoded in Catalog::dests.
> > However, it does not have GooString for its name.
> > It has only char* for its name.
> > So getDestName() have to construct GooString from char*.
> > The return value should be deleted.
> 
> Ok leave it as two separate sets of functions.
> 
> In the longer term I would like to read all the names into a map or hash.
> The current implementation that reads the nameTree (which should already be
> sorted except for when it isn't) then runs qsort on it is not ideal. But
> that can be the topic of another bug.
> 
> > > 1) A list of just the names. The user can use poppler_document_find_dest()
> > > to get the dests.
> > 
> > Is is something like
> > GList *poppler_document_build_dests_namelist()
> > ?
> > It returns the pointer of GList which contains GBytes for destination name.
> > If you would like it, I'll create a patch.
> > 
> > However, unfortunately, if the name contains \0,
> > poppler_document_find_dest() cannot find it.
> 
> If you escape the \0 as per comment 33 it should work.
> 
> > 
> > > or
> > > 2) A binary tree of names and dests. This has the advantage that iterating
> > > the names is in alphabetical order instead of the the random order of the
> > > hash table.
> > 
> > I'd like this.
> 
> Carlos, do you have an opinion on whether to return a GList of names or
> binary tree of names/LinkDests?

I guess we could return a GTree, but I'm not sure I understand exactly what we want. Do we need the list to be in a specific order? Do they have an order in the PDF document that we should respect or alphabetical is what want?

> > Almost UTF-16 strings contains \0.
> 
> Isn't the glib API UTF-8 based?

Yes, which means that strings passed to public api methods are expected to be UTF-8 and string returned by public api methods should also be UTF-8.
Comment 49 Carlos Garcia Campos 2016-09-17 09:49:44 UTC
(In reply to Carlos Garcia Campos from comment #47)
> (In reply to Adrian Johnson from comment #27)
> > Created attachment 126119 [details] [review] [review] [review]
> > pdfinfo: print dests
> > 
> > It would also be useful to be able print a list of named destinations in the
> > PDF.
> > 
> > Note this patch requires C++11, the use of which was discussed in
> > https://lists.freedesktop.org/archives/poppler/2016-May/011844.html
> > 
> > The patch enable c++11 in configure.ac. I'm not sure how to set gcc flags in
> > the cmake build.
> 
> I think you just need to add the flag to CMAKE_CXX_FLAGS somewhere, in
> CMakeList.txt or maybe in PopplerMacros.cmake, but there flags are only
> added when building with GCC, I think if it's because those flags are
> specific to GCC ro because we don't really support clang

I'm fine with this patch, btw, feel free to push it once you figure out how to fix the CMake build. I'm sure Albert or Pino can help us with that.
Comment 50 Masamichi Hosoda 2016-09-17 12:50:54 UTC
(In reply to Carlos Garcia Campos from comment #48)
> (In reply to Adrian Johnson from comment #35)
> > Carlos, do you have an opinion on whether to return a GList of names or
> > binary tree of names/LinkDests?
> 
> I guess we could return a GTree, but I'm not sure I understand exactly what
> we want. Do we need the list to be in a specific order? Do they have an
> order in the PDF document that we should respect or alphabetical is what
> want?

I don't matter either GHash or GTree.
I'd like either of the followings:
  GTree with GBytes keys and PopplerDest value
  GHash with GBytes keys and PopplerDest value

> > > Almost UTF-16 strings contains \0.
> > 
> > Isn't the glib API UTF-8 based?
> 
> Yes, which means that strings passed to public api methods are expected to
> be UTF-8 and string returned by public api methods should also be UTF-8.

If I understand correctly,
the encoding of named destination names is not defined.

In other words, we cannot get the encoding of a name.
It might be UTF-16, UTF-8, PDFDocEncoding or other encoding.
It might be pure binary instead of text.

Document Management – Portable Document Format – Part 1: PDF 1.7, First Edition
Adobe (Jul 2008)
http://wwwimages.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf

`12.3.2.3 Named Destinations'
says `..., a destination may be referred to indirectly by means of a name object (PDF 1.1) or a byte string (PDF 1.2).'

So the named destination names are `byte strings' in PDF 1.2+.

`7.9.2.4 Byte String Type'
says `The byte string type shall be used for binary data that shall be represented as a series of bytes, where each byte may be any value representable in 8 bits.'

The point is that the named destination names are not `text strings'.
If `text strings', we can convert to UTF-8 since the encoding is clearly defind.
However, the named destination names are `binary strings'.
So we cannot convert to UTF-8 since the encoding is not known.
Comment 51 Masamichi Hosoda 2016-09-17 17:46:43 UTC
(In reply to Masamichi Hosoda from comment #50)
> I don't matter either GHash or GTree.
> I'd like either of the followings:
>   GTree with GBytes keys and PopplerDest value
>   GHash with GBytes keys and PopplerDest value

My patch uses GBytes but I also don't matter either GBytes, GString or GByteArray.
I'd not like to use gchar* since it cannot handle pure-binary.

Which would you like?
  1. GTree or GHash
  2. GBytes, GString or GByteArray

I'll create a patch.
Comment 52 Carlos Garcia Campos 2016-09-20 16:31:50 UTC
(In reply to Masamichi Hosoda from comment #50)
> (In reply to Carlos Garcia Campos from comment #48)
> > (In reply to Adrian Johnson from comment #35)
> > > Carlos, do you have an opinion on whether to return a GList of names or
> > > binary tree of names/LinkDests?
> > 
> > I guess we could return a GTree, but I'm not sure I understand exactly what
> > we want. Do we need the list to be in a specific order? Do they have an
> > order in the PDF document that we should respect or alphabetical is what
> > want?
> 
> I don't matter either GHash or GTree.
> I'd like either of the followings:
>   GTree with GBytes keys and PopplerDest value
>   GHash with GBytes keys and PopplerDest value
> 

That doesn't answer the questions, Do we need the list to be in a specific order? Because if the order doesn't matter it's probably easier to use GHashTable, otherwise I guess we need a GTree.
Comment 53 Carlos Garcia Campos 2016-09-20 16:32:29 UTC
(In reply to Masamichi Hosoda from comment #51)
> (In reply to Masamichi Hosoda from comment #50)
> > I don't matter either GHash or GTree.
> > I'd like either of the followings:
> >   GTree with GBytes keys and PopplerDest value
> >   GHash with GBytes keys and PopplerDest value
> 
> My patch uses GBytes but I also don't matter either GBytes, GString or
> GByteArray.
> I'd not like to use gchar* since it cannot handle pure-binary.

I see.

> Which would you like?
>   1. GTree or GHash
>   2. GBytes, GString or GByteArray
> 
> I'll create a patch.

I think we should use GBytes then
Comment 54 Masamichi Hosoda 2016-09-21 12:05:49 UTC
(In reply to Carlos Garcia Campos from comment #53)
> (In reply to Masamichi Hosoda from comment #51)
> > (In reply to Masamichi Hosoda from comment #50)
> > > I don't matter either GHash or GTree.
> > > I'd like either of the followings:
> > >   GTree with GBytes keys and PopplerDest value
> > >   GHash with GBytes keys and PopplerDest value
> > 
> > My patch uses GBytes but I also don't matter either GBytes, GString or
> > GByteArray.
> > I'd not like to use gchar* since it cannot handle pure-binary.
> 
> I see.
> 
> > Which would you like?
> >   1. GTree or GHash
> >   2. GBytes, GString or GByteArray
> > 
> > I'll create a patch.
> 
> I think we should use GBytes then

Thank you for your understanding.
Comment 55 Masamichi Hosoda 2016-09-21 12:57:03 UTC
(In reply to Carlos Garcia Campos from comment #52)
> (In reply to Masamichi Hosoda from comment #50)
> > (In reply to Carlos Garcia Campos from comment #48)
> > > (In reply to Adrian Johnson from comment #35)
> > > > Carlos, do you have an opinion on whether to return a GList of names or
> > > > binary tree of names/LinkDests?
> > > 
> > > I guess we could return a GTree, but I'm not sure I understand exactly what
> > > we want. Do we need the list to be in a specific order? Do they have an
> > > order in the PDF document that we should respect or alphabetical is what
> > > want?
> > 
> > I don't matter either GHash or GTree.
> > I'd like either of the followings:
> >   GTree with GBytes keys and PopplerDest value
> >   GHash with GBytes keys and PopplerDest value
> > 
> 
> That doesn't answer the questions, Do we need the list to be in a specific
> order? Because if the order doesn't matter it's probably easier to use
> GHashTable, otherwise I guess we need a GTree.

If I understand correctly, GTree provides sorted order.

If something like ambiguous search is necessary, GTree may be better.
But, I don't think of the other applications which need sorted order.

If exact match search is all that is needed, GHashTable is better.
Although even if sorted order is necessary, we can sort the keys of GHashTable later.
After receiving GHashTable from glib frontend, we can build GTree from all elements of GHashTable.
Comment 56 Masamichi Hosoda 2016-09-25 18:22:24 UTC
Created attachment 126779 [details] [review]
v3-tree-0001-Add-poppler_document_build_dests_tree-to-glib-fro.patch

I've created GTree patch.
(v3-tree-0001)
Comment 57 Masamichi Hosoda 2016-09-25 18:23:08 UTC
Created attachment 126780 [details] [review]
v3-hash-0001-Add-poppler_dest_equal-to-glib-frontend.patch

I've revised GHashTable patches.
(v3-hash-0001 and v3-hash-0002)
Comment 58 Masamichi Hosoda 2016-09-25 18:23:43 UTC
Created attachment 126781 [details] [review]
v3-hash-0002-Add-poppler_document_build_dests_hashtable-to-gli.patch

I've revised GHashTable patches.
(v3-hash-0001 and v3-hash-0002)
Comment 59 Masamichi Hosoda 2016-09-25 18:24:44 UTC
I've created GTree patch.
(v3-tree-0001)

I've revised GHashTable patches.
(v3-hash-0001 and v3-hash-0002)

I prefer GHashTable.
So I'd like that v3-hash-0001 and v3-hash-0002 are merged.

But, if you think tree is better, I'd like that v3-tree-0001 is merged.
Comment 60 Albert Astals Cid 2016-09-28 22:04:12 UTC
Before switching on C++11, please let's have the discussion again on the mailing list, because to my understanding it wasn't really agreed we'd move to C++11
Comment 61 Masamichi Hosoda 2016-09-29 12:59:32 UTC
My patches (both v3-hash-* and v3-tree-*) do not require C++11.
I hope that either v3-hash or v3-tree is merged.
Comment 62 Masamichi Hosoda 2016-10-06 14:52:01 UTC
I've noticed that the problem of the hash way.
So if you agree the following explanation, would you merge v3-tree-0001 patch?

If I understand correctly,
in different platform, hash function might be different.
The different hash function causes different order even if same PDF is used.
I believe that it is not good that the order is dependent on the platform.

On the other hand, the tree way is always sorted.
For same PDF, the order is always stable even if platform is different.
Comment 63 Masamichi Hosoda 2016-10-07 13:03:32 UTC
I've noticed that another way.

GTree provides sorted order.
GHashTable provides unstable order.
But, I think that there is a case that the serial (appeared) order in PDF is necessary.
In this case, both GTree and GHashTable cannot be used since the serial order is not preserved.

So I propose that using GPtrArray with the following structure.

struct PopplerNamedDest {
	GBytes *name;
	PopplerDest *dest;
};

If you would like sorted order, you can use g_ptr_array_sort ().
If you would like to find name, you can build GHashTree from GPtrArray.

How about this?
Comment 64 Adrian Johnson 2016-10-07 13:33:33 UTC
(In reply to Masamichi Hosoda from comment #63)
> But, I think that there is a case that the serial (appeared) order in PDF is
> necessary.
> In this case, both GTree and GHashTable cannot be used since the serial
> order is not preserved.

Poppler Dict objects are sorted and PDF Name trees are required to be sorted. The code in Catalog.cc also sorts the name tree as I think there was a buggy PDF that did not sort the name tree. As a result, the names will always be sorted.

I think GTree is best as it has the advantage of both sorted order and fast lookup.
Comment 65 Masamichi Hosoda 2016-10-07 13:52:59 UTC
(In reply to Adrian Johnson from comment #64)
> (In reply to Masamichi Hosoda from comment #63)
> > But, I think that there is a case that the serial (appeared) order in PDF is
> > necessary.
> > In this case, both GTree and GHashTable cannot be used since the serial
> > order is not preserved.
> 
> Poppler Dict objects are sorted and PDF Name trees are required to be
> sorted. The code in Catalog.cc also sorts the name tree as I think there was
> a buggy PDF that did not sort the name tree. As a result, the names will
> always be sorted.
> 
> I think GTree is best as it has the advantage of both sorted order and fast
> lookup.

Thank you for your detailed explanation.

I understand that there is not the serial order.
I withdraw the proposal of GPtrArray.

Therefore, I also think that GTree is best.

Would you merge v3-tree-0001 patch?
Comment 66 Masamichi Hosoda 2016-12-07 11:49:16 UTC
Created attachment 128367 [details] [review]
v4-0001-Add-poppler_document_build_dests_tree-to-glib-fro.patch

v3-tree-0001 patch conflicts with the current master.
So, I've created v4 patch that resolved the conflict.

Would you review and merge it?
Comment 67 Adrian Johnson 2017-08-15 12:09:55 UTC
Created attachment 133527 [details] [review]
pdfinfo: print dests v3

v3: Updated to remove the "enable C++11 build" code.

Now the poppler is using c++11 this patch no longer needs be rewritten to c++03.
Comment 68 Albert Astals Cid 2017-08-15 18:12:51 UTC
Adrian: Do you need me to review anything or you'll manage these patches?
Comment 69 Adrian Johnson 2017-08-15 22:11:55 UTC
Carlos has previously reviewed my pdfinfo patch so if you don't mind I'll just push it.

Carlos should review the glib patch.
Comment 70 Albert Astals Cid 2017-08-15 22:34:27 UTC
(In reply to Adrian Johnson from comment #69)
> Carlos has previously reviewed my pdfinfo patch so if you don't mind I'll
> just push it.
> 

Sure
Comment 71 GitLab Migration User 2018-08-21 11:02:09 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/472.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.