Bug 97262 - Enumerate PDF named destinations
Enumerate PDF named destinations
 Status: NEW None poppler Unclassified glib frontend (show other bugs) unspecified All All medium enhancement poppler-bugs

 Reported: 2016-08-09 16:13 UTC by Masamichi Hosoda 2017-08-15 22:34 UTC (History) 0 users

Attachments
foo.pdf (12.02 KB, application/pdf)
2016-08-09 17:37 UTC, Masamichi Hosoda
Details
2016-08-20 14:28 UTC, Masamichi Hosoda
Details
0001-Divide-Catalog-findDest.patch (2.91 KB, patch)
2016-08-21 22:42 UTC, Masamichi Hosoda
Details | Splinter Review
2016-08-21 22:42 UTC, Masamichi Hosoda
Details | Splinter Review
2016-08-21 22:43 UTC, Masamichi Hosoda
Details | Splinter Review
2016-08-28 08:07 UTC, Masamichi Hosoda
Details | Splinter Review
2016-08-28 08:08 UTC, Masamichi Hosoda
Details | Splinter Review
pdfinfo: print dests (4.29 KB, patch)
Details | Splinter Review
utf16.pdf (12.64 KB, application/pdf)
2016-08-31 14:49 UTC, Masamichi Hosoda
Details
pdfinfo: print dests v2 (6.52 KB, patch)
Details | Splinter Review
v2-0001-Divide-Catalog-findDest.patch (2.91 KB, patch)
2016-09-01 14:30 UTC, Masamichi Hosoda
Details | Splinter Review
2016-09-01 14:31 UTC, Masamichi Hosoda
Details | Splinter Review
2016-09-01 14:31 UTC, Masamichi Hosoda
Details | Splinter Review
2016-09-02 17:40 UTC, Masamichi Hosoda
Details | Splinter Review
2016-09-02 17:40 UTC, Masamichi Hosoda
Details | Splinter Review
2016-09-02 17:40 UTC, Masamichi Hosoda
Details | Splinter Review
2016-09-02 17:41 UTC, Masamichi Hosoda
Details | Splinter Review
2016-09-25 18:22 UTC, Masamichi Hosoda
Details | Splinter Review
2016-09-25 18:23 UTC, Masamichi Hosoda
Details | Splinter Review
2016-09-25 18:23 UTC, Masamichi Hosoda
Details | Splinter Review
 Masamichi Hosoda 2016-08-09 16:13:48 UTC I'd like to enumerate PDF named destinations in a PDF. I've tried poppler_index_iter_get_action ()'. It can obtain named destinations which are used in the PDF actions. But, it cannot obtain named destinations which are not used. I'd like all named destinations. So I cannot use poppler_index_iter_get_action ()'. Jason Crain 2016-08-09 17:12:42 UTC Do you have an example of such a PDF? Masamichi Hosoda 2016-08-09 17:37:17 UTC Created attachment 125645 [details] foo.pdf Masamichi Hosoda 2016-08-09 17:38:05 UTC I've attached the sample PDF. Here's its source file. %%% foo.tex %%% % The following command generates foo.pdf'. % $pdftex foo.tex \pdfobjcompresslevel=0 \pdfdest name{one} xyz page 1 \vfill \eject \pdfdest name{two} xyz page 2 \vfill \end %%% foo.tex %%% Jason Crain 2016-08-16 03:16:59 UTC Correct, the poppler_index_iter functions are only for the document outline. If you wish to iterate all named dests, that would require a new API. The usual way of accessing named dests is indirectly, through the outline via poppler_index_iter functions, through links via poppler_page_get_link_mapping, or by looking up dests by name via poppler_document_find_dest. But the glib frontend currently does not have a way to iterate named dests. Masamichi Hosoda 2016-08-19 15:03:33 UTC Thank you for your answer. I'd like to iterate all named dists. I understand that current glib frontend could not iterate them. Could other frontend iterate them? Jose Aliste 2016-08-19 15:28:26 UTC You can iterate them in the core "frontend". If you look into poppler_document_find_dest you would get an idea of what to do. If you can provide a patch for the glib frontend that would be great. Since the Dests can be either stored in a Dict or a TreeName, maybe the best way to implement this in glib would be with a a GHashTable. Then you can use g_hash_table_foreach to iterate over all of them. Masamichi Hosoda 2016-08-19 16:23:54 UTC Does the core "frontend" contain poppler/Catalog.h? class Catalog has destNameTree and getDestNameTree () but they are "private" member. So they can not be accessed from outside the class Catalog. I've tried to access the private members by ugly tricky way. https://github.com/trueroad/extractpdfmark It could iterate all named dest. But it uses ugly tricky way. Is there another way? Or, would you accept the patch which changes the private members to public? Jose Aliste 2016-08-19 19:52:37 UTC If you provide a sensible patch for the glib frontend, then I don't see a reason to oppose the making the getDestNameTree() public. Actually the getDests API is already public. Please bear in mind that you need to use both, since different versions of PDF spec allow either for a Dests Dictionary or a Dests Tree Name inside the Names Dictionary. Also bear in mind that the API inside poppler/ dir is supposed to be used only by the frontends (cpp, glib, qt4, qt5). You can check if other frontends already do what you want, but since the DestNameTree in Catalog is private, I don't think that's the case. Masamichi Hosoda 2016-08-20 14:28:18 UTC Created attachment 125919 [details] add-catalog-functions-for-named-dest.tar.gz Masamichi Hosoda 2016-08-20 14:28:41 UTC I've made patches which adds some Catalog functions for named destinations. How about them? If you accept them, I'll make the patch for glib frontend. Albert Astals Cid 2016-08-21 21:12:55 UTC Please attach the .patch files directly instead a tar.gz, makes things much easier. Masamichi Hosoda 2016-08-21 22:42:16 UTC Created attachment 125938 [details] [review] 0001-Divide-Catalog-findDest.patch Masamichi Hosoda 2016-08-21 22:42:51 UTC Created attachment 125939 [details] [review] 0002-Add-functions-for-named-destination-name-in-name-tre.patch Masamichi Hosoda 2016-08-21 22:43:19 UTC Created attachment 125940 [details] [review] 0003-Add-functions-for-named-destination-name-in-name-dic.patch Masamichi Hosoda 2016-08-21 22:46:17 UTC I've attached patch files. Albert Astals Cid 2016-08-21 22:57:20 UTC I don't think this is what Jose was asking, i think he was asking for patches to the glib frontend, but i'll let him confirm. Masamichi Hosoda 2016-08-21 23:06:16 UTC Yes, this is not glib frontend. But, current Catalog class does not have destination name related interfaces. So this patch adds these interfaces. If you accept this patch, I'll create glib frontend patch which uses these interfaces. Jose Aliste 2016-08-22 02:03:04 UTC Yeah, I was thinking more of making the gestDests and GetDestNameTree public and then doing all the processing inside the glib frontend. But these patches however could be good to implement this in qt4 or cpp if ever needed. So, I'd say it's up to you Albert :). Albert Astals Cid 2016-08-22 21:42:25 UTC Let me ask again, what's the use case for this? Masamichi Hosoda 2016-08-22 21:56:31 UTC I'd like to iterate all named destinations in a PDF. If you use Ghostscript for the PDF file size reduction, Ghostscript does not preserve named destinations. I'm making a tool that can extract named destinations as PDFmark from PDF. You can get the small PDF that has preserved them by using the tool. Current poppler does not have public interface for iterating named destinations. I've tried to access the private members by ugly tricky way. https://github.com/trueroad/extractpdfmark It could iterate all named dest. But it uses ugly tricky way. So this patch adds these interfaces to Catalog class. If you accept this patch, I'll create glib frontend API patch which uses these interfaces. Then, I'll create the tool which uses the glib frontend API. Albert Astals Cid 2016-08-22 22:30:54 UTC Isn't it better that you actually fix Ghostscript than creating a tool to workaround it? Masamichi Hosoda 2016-08-23 01:37:07 UTC I've reported this problem to Ghostscript bugzilla. http://bugs.ghostscript.com/show_bug.cgi?id=696943 Ghostscript developers knew this problem about two years ago. http://bugs.ghostscript.com/show_bug.cgi?id=695760 But, Ghostscript has not been enhanced yet. So I need to create a tool to workaround it. Masamichi Hosoda 2016-08-28 08:07:45 UTC Created attachment 126078 [details] [review] 0004-Add-poppler_dest_equal-to-glib-frontend.patch Masamichi Hosoda 2016-08-28 08:08:21 UTC Created attachment 126079 [details] [review] 0005-Add-poppler_document_get_dests-to-glib-frontend.patch Masamichi Hosoda 2016-08-28 08:08:39 UTC I've created a patch for glib frontend (0004, 0005). It requires class Catalog patches (0001, 0002, 0003). If you want, I'll create a patch for cpp frontend. Adrian Johnson 2016-08-30 13:23:39 UTC I've been working on a patch for CairoOutputDev that requires access to a list of named destinations. The core patches would be helpful. Looking at the patches: 0001 - Looks good. Although I suggest renaming "constructLinkDest" to "createLinkDest". 0002, 0003 - I don't think we need separate functions for Name Trees and Dests Dict access. I suggest have one set of functions for accessing names eg: int numDests(); GooString *getDestsName(int i); LinkDest *getDestsLink(int i); Internally, the functions should combine the Name Tree dests and dict dests into one list. 0005 - I know that Jose suggested returning a GHashTable. Since the main purpose of a hash table is mapping keys to values, and the poppler_document_find_dest() function already does this, maybe the names could be returned by either: 1) A list of just the names. The user can use poppler_document_find_dest() to get the dests. or 2) A binary tree of names and dests. This has the advantage that iterating the names is in alphabetical order instead of the the random order of the hash table. Also, poppler_document_get_dests() creates the hash table each time it is called. It maybe better to cache the hash table or use a function name like poppler_document_build_dests_hashtable() since "get" implies fast access. + key = g_bytes_new (catalog->getDestsName (i), + strlen (catalog->getDestsName (i))); You are using g_bytes because the name can contain \0 but you are using strlen here. Also the getDestName() in Catalog will need to return a GooString, not char *. Although it is not clear to me whether destination names can actually contain \0. PDF32000 section 12.3.2.3 states "The keys in the name tree may be treated as text strings for display purposes". Adrian Johnson 2016-08-30 13:28:16 UTC Created attachment 126119 [details] [review] pdfinfo: print dests It would also be useful to be able print a list of named destinations in the PDF. Note this patch requires C++11, the use of which was discussed in https://lists.freedesktop.org/archives/poppler/2016-May/011844.html The patch enable c++11 in configure.ac. I'm not sure how to set gcc flags in the cmake build. Jose Aliste 2016-08-30 22:02:52 UTC Adrian, Does find_dest take O(1) time then to find? (I don't know the details of Dict.cc) The only reason why I suggested a HashTable is because we have some bug in evince where the outline of a very big file is constructed only by using named destinations... If find_dest is already taking constant time to do the lookup, Of course, there is no reason to construct a HashTable (and the bug in evince is elsewhere... ) Adrian Johnson 2016-08-30 22:16:35 UTC find_dest is O(log(n)) I think just returning a list of names would be a cleaner API. It avoids the overhead of parsing and creating the link dest for each name if it is not required. And even if it is required it reduces peak memory usage if the application can create and destroy each link dest sequentially instead of filling memory with every link. We could always add the hash table later if someone finds a use case for it. Jose Aliste 2016-08-30 22:19:09 UTC (In reply to Adrian Johnson from comment #29) > find_dest is O(log(n)) > > I think just returning a list of names would be a cleaner API. It avoids the > overhead of parsing and creating the link dest for each name if it is not > required. And even if it is required it reduces peak memory usage if the > application can create and destroy each link dest sequentially instead of > filling memory with every link. We could always add the hash table later if > someone finds a use case for it. I agree. I will try to find and fix that evince bug to see if the hashtable actually helps :) Masamichi Hosoda 2016-08-31 13:41:00 UTC (In reply to Adrian Johnson from comment #26) > I've been working on a patch for CairoOutputDev that requires access to a > list of named destinations. The core patches would be helpful. Looking at > the patches: Thank you for your reviewing. > 0001 - Looks good. Although I suggest renaming "constructLinkDest" to > "createLinkDest". I'll change the patch. > 0002, 0003 - I don't think we need separate functions for Name Trees and > Dests Dict access. I suggest have one set of functions for accessing names > eg: > > int numDests(); > GooString *getDestsName(int i); > LinkDest *getDestsLink(int i); > > Internally, the functions should combine the Name Tree dests and dict dests > into one list. Name Tree is recorded in Catalog::destNameTree. It has GooString for its name. So getDestName() can return the pointer of GooString. The return value should not be deleted. Dests Dict is recoded in Catalog::dests. However, it does not have GooString for its name. It has only char* for its name. So getDestName() have to construct GooString from char*. The return value should be deleted. Or, would you like to add a new member variable GooString *destnames to class Catalog? In this case, Catalog::getDestName() creates destnames array and Catalog::~Catalog() deletes it. The return value should not be deleted. > 0005 - I know that Jose suggested returning a GHashTable. Since the main > purpose of a hash table is mapping keys to values, and the > poppler_document_find_dest() function already does this, maybe the names > could be returned by either: > > 1) A list of just the names. The user can use poppler_document_find_dest() > to get the dests. Is is something like GList *poppler_document_build_dests_namelist() ? It returns the pointer of GList which contains GBytes for destination name. If you would like it, I'll create a patch. However, unfortunately, if the name contains \0, poppler_document_find_dest() cannot find it. > or > 2) A binary tree of names and dests. This has the advantage that iterating > the names is in alphabetical order instead of the the random order of the > hash table. I'd like this. > Also, poppler_document_get_dests() creates the hash table each time it is > called. It maybe better to cache the hash table or use a function name like > poppler_document_build_dests_hashtable() since "get" implies fast access. I'll change the patch to poppler_document_build_dests(). > + key = g_bytes_new (catalog->getDestsName (i), > + strlen (catalog->getDestsName (i))); > > You are using g_bytes because the name can contain \0 but you are using > strlen here. Also the getDestName() in Catalog will need to return a > GooString, not char *. It handles Dests Dict. The name of Dests Dict cannot contain \0. Only the name of Name Tree can contain \0. > Although it is not clear to me whether destination names can actually > contain \0. PDF32000 section 12.3.2.3 states "The keys in the name tree may > be > treated as text strings for display purposes". Almost UTF-16 strings contains \0. Masamichi Hosoda 2016-08-31 14:49:09 UTC Created attachment 126133 [details] utf16.pdf I've attached PDF which contains UTF-16 destination names. Here's its source file. %%% utf16.tex %%% % The following command generates utf16.pdf'. %$ pdftex utf16.tex \pdfobjcompresslevel=0 \pdfcatalog{/PageMode /UseOutlines} \catcode0=12 \def\furone{^^ff^^fe^^00f^^00^^fc^^00r^^00 ^^00o^^00n^^00e} \def\furtwo{^^ff^^fe^^00f^^00^^fc^^00r^^00 ^^00t^^00w^^00o} \pdfoutline goto name{\furone} {\furone} \pdfoutline goto name{\furtwo} {\furtwo} \pdfdest name{\furone} xyz page 1 \vfill \eject \pdfdest name{\furtwo} xyz page 2 \vfill \end %%% utf16.tex %%% Jose Aliste 2016-08-31 17:51:17 UTC Please note that there is a patch for working around the '\0' issue in named dests without disrupting the API. See bug #60377 Adrian Johnson 2016-09-01 12:47:46 UTC Created attachment 126158 [details] [review] pdfinfo: print dests v2 Updated to read dests and name dests and use mapUnicode to print names. Adrian Johnson 2016-09-01 12:56:45 UTC (In reply to Masamichi Hosoda from comment #31) > > Internally, the functions should combine the Name Tree dests and dict dests > > into one list. > > Name Tree is recorded in Catalog::destNameTree. > It has GooString for its name. > So getDestName() can return the pointer of GooString. > The return value should not be deleted. > > Dests Dict is recoded in Catalog::dests. > However, it does not have GooString for its name. > It has only char* for its name. > So getDestName() have to construct GooString from char*. > The return value should be deleted. Ok leave it as two separate sets of functions. In the longer term I would like to read all the names into a map or hash. The current implementation that reads the nameTree (which should already be sorted except for when it isn't) then runs qsort on it is not ideal. But that can be the topic of another bug. > > 1) A list of just the names. The user can use poppler_document_find_dest() > > to get the dests. > > Is is something like > GList *poppler_document_build_dests_namelist() > ? > It returns the pointer of GList which contains GBytes for destination name. > If you would like it, I'll create a patch. > > However, unfortunately, if the name contains \0, > poppler_document_find_dest() cannot find it. If you escape the \0 as per comment 33 it should work. > > > or > > 2) A binary tree of names and dests. This has the advantage that iterating > > the names is in alphabetical order instead of the the random order of the > > hash table. > > I'd like this. Carlos, do you have an opinion on whether to return a GList of names or binary tree of names/LinkDests? > Almost UTF-16 strings contains \0. Isn't the glib API UTF-8 based? Masamichi Hosoda 2016-09-01 14:30:33 UTC Created attachment 126159 [details] [review] v2-0001-Divide-Catalog-findDest.patch Masamichi Hosoda 2016-09-01 14:31:05 UTC Created attachment 126160 [details] [review] v2-0002-Add-functions-for-named-destination-name-in-name-.patch Masamichi Hosoda 2016-09-01 14:31:35 UTC Created attachment 126161 [details] [review] v2-0003-Add-functions-for-named-destination-name-in-name-.patch Masamichi Hosoda 2016-09-01 14:45:48 UTC (In reply to Adrian Johnson from comment #35) > Ok leave it as two separate sets of functions. > > In the longer term I would like to read all the names into a map or hash. > The current implementation that reads the nameTree (which should already be > sorted except for when it isn't) then runs qsort on it is not ideal. But > that can be the topic of another bug. Thank you. I've updated core patches (v2-0001, v2-0002, v2-0003). > > However, unfortunately, if the name contains \0, > > poppler_document_find_dest() cannot find it. > > If you escape the \0 as per comment 33 it should work. I'd like GBytes instead of escape. I'd like to add something like the following function. PopplerDest *poppler_document_find_dest_bytes (PopplerDocument*, const GBytes); > > Almost UTF-16 strings contains \0. > > Isn't the glib API UTF-8 based? Even if the glib API is UTF-8 based, PDFs can contain UTF-16. Moreover, if I understand correctly, encoding of the PDF destination names has not been defined. It can contain UTF-16 and PDFDocEncoding. Also it can even contain pure binaries. Masamichi Hosoda 2016-09-02 17:40:01 UTC Created attachment 126179 [details] [review] v2-0004-Add-poppler_document_find_dest_bytes-to-glib-fron.patch Masamichi Hosoda 2016-09-02 17:40:23 UTC Created attachment 126180 [details] [review] v2-0005-Add-poppler_document_build_dests_namelist-to-glib.patch Masamichi Hosoda 2016-09-02 17:40:47 UTC Created attachment 126181 [details] [review] v2-0006-Add-poppler_dest_equal-to-glib-frontend.patch Masamichi Hosoda 2016-09-02 17:41:12 UTC Created attachment 126182 [details] [review] v2-0007-Add-poppler_document_build_dests_hashtable-to-gli.patch Masamichi Hosoda 2016-09-02 17:47:47 UTC I've added and updated glib frontend patches (v2-0004, v2-0005, v2-0006, v2-0007). They requires the core patches (v2-0001, v2-0002, v2-0003). They add the following functions. PopplerDest * poppler_document_find_dest_bytes (PopplerDocument *document, GBytes *link_name); GList * poppler_document_build_dests_namelist (PopplerDocument *document); GHashTable * poppler_document_build_dests_hashtable (PopplerDocument *document); Masamichi Hosoda 2016-09-13 13:50:27 UTC I think that we don't need further discussion on the core patches. (v2-0001, v2-0002, v3-0003) So would you commit them? But, we seem to need more discussion on the glib patches. (v2-0004, v2-0005, v2-0006, v2-0007) Again, I'd like to iterate all named destinations in a PDF. There are two ways, hash and list. If you just would like all of the names, list would be good. If you would like all of the names and the destinations, hash would be good. If only the list interface has been provided, you need to find a destination from the name. But, current poppler_document_find_dest () cannot handle the name which contained \0. I know that an escape is one of the solutions. I worry about the behavior of many applications that do not consider the escape. In addition, I'd like to use raw binary since the escape is troublesome. So I proposed raw binary finding function poppler_document_find_dest_bytes (). If the hash interface (my imprementation) has been provided, you can find any named destination, including \0. In this case, both poppler_document_find_dest () and poppler_document_find_dest_bytes () are unnecessary. The list interface is also unnecessary. Only the hash interface, poppler_document_build_dests_hashtable () is necessary. I think that it is simple. So I'd like this. Carlos Garcia Campos 2016-09-17 09:40:11 UTC Ok, I've just pushed the core patches. Modified the second one to take the catalog lock. Carlos Garcia Campos 2016-09-17 09:42:07 UTC (In reply to Adrian Johnson from comment #27) > Created attachment 126119 [details] [review] [review] > pdfinfo: print dests > > It would also be useful to be able print a list of named destinations in the > PDF. > > Note this patch requires C++11, the use of which was discussed in > https://lists.freedesktop.org/archives/poppler/2016-May/011844.html > > The patch enable c++11 in configure.ac. I'm not sure how to set gcc flags in > the cmake build. I think you just need to add the flag to CMAKE_CXX_FLAGS somewhere, in CMakeList.txt or maybe in PopplerMacros.cmake, but there flags are only added when building with GCC, I think if it's because those flags are specific to GCC ro because we don't really support clang Carlos Garcia Campos 2016-09-17 09:48:35 UTC (In reply to Adrian Johnson from comment #35) > (In reply to Masamichi Hosoda from comment #31) > > > Internally, the functions should combine the Name Tree dests and dict dests > > > into one list. > > > > Name Tree is recorded in Catalog::destNameTree. > > It has GooString for its name. > > So getDestName() can return the pointer of GooString. > > The return value should not be deleted. > > > > Dests Dict is recoded in Catalog::dests. > > However, it does not have GooString for its name. > > It has only char* for its name. > > So getDestName() have to construct GooString from char*. > > The return value should be deleted. > > Ok leave it as two separate sets of functions. > > In the longer term I would like to read all the names into a map or hash. > The current implementation that reads the nameTree (which should already be > sorted except for when it isn't) then runs qsort on it is not ideal. But > that can be the topic of another bug. > > > > 1) A list of just the names. The user can use poppler_document_find_dest() > > > to get the dests. > > > > Is is something like > > GList *poppler_document_build_dests_namelist() > > ? > > It returns the pointer of GList which contains GBytes for destination name. > > If you would like it, I'll create a patch. > > > > However, unfortunately, if the name contains \0, > > poppler_document_find_dest() cannot find it. > > If you escape the \0 as per comment 33 it should work. > > > > > > or > > > 2) A binary tree of names and dests. This has the advantage that iterating > > > the names is in alphabetical order instead of the the random order of the > > > hash table. > > > > I'd like this. > > Carlos, do you have an opinion on whether to return a GList of names or > binary tree of names/LinkDests? I guess we could return a GTree, but I'm not sure I understand exactly what we want. Do we need the list to be in a specific order? Do they have an order in the PDF document that we should respect or alphabetical is what want? > > Almost UTF-16 strings contains \0. > > Isn't the glib API UTF-8 based? Yes, which means that strings passed to public api methods are expected to be UTF-8 and string returned by public api methods should also be UTF-8. Carlos Garcia Campos 2016-09-17 09:49:44 UTC (In reply to Carlos Garcia Campos from comment #47) > (In reply to Adrian Johnson from comment #27) > > Created attachment 126119 [details] [review] [review] [review] > > pdfinfo: print dests > > > > It would also be useful to be able print a list of named destinations in the > > PDF. > > > > Note this patch requires C++11, the use of which was discussed in > > https://lists.freedesktop.org/archives/poppler/2016-May/011844.html > > > > The patch enable c++11 in configure.ac. I'm not sure how to set gcc flags in > > the cmake build. > > I think you just need to add the flag to CMAKE_CXX_FLAGS somewhere, in > CMakeList.txt or maybe in PopplerMacros.cmake, but there flags are only > added when building with GCC, I think if it's because those flags are > specific to GCC ro because we don't really support clang I'm fine with this patch, btw, feel free to push it once you figure out how to fix the CMake build. I'm sure Albert or Pino can help us with that. Masamichi Hosoda 2016-09-17 12:50:54 UTC (In reply to Carlos Garcia Campos from comment #48) > (In reply to Adrian Johnson from comment #35) > > Carlos, do you have an opinion on whether to return a GList of names or > > binary tree of names/LinkDests? > > I guess we could return a GTree, but I'm not sure I understand exactly what > we want. Do we need the list to be in a specific order? Do they have an > order in the PDF document that we should respect or alphabetical is what > want? I don't matter either GHash or GTree. I'd like either of the followings: GTree with GBytes keys and PopplerDest value GHash with GBytes keys and PopplerDest value > > > Almost UTF-16 strings contains \0. > > > > Isn't the glib API UTF-8 based? > > Yes, which means that strings passed to public api methods are expected to > be UTF-8 and string returned by public api methods should also be UTF-8. If I understand correctly, the encoding of named destination names is not defined. In other words, we cannot get the encoding of a name. It might be UTF-16, UTF-8, PDFDocEncoding or other encoding. It might be pure binary instead of text. Document Management – Portable Document Format – Part 1: PDF 1.7, First Edition Adobe (Jul 2008) http://wwwimages.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf 12.3.2.3 Named Destinations' says ..., a destination may be referred to indirectly by means of a name object (PDF 1.1) or a byte string (PDF 1.2).' So the named destination names are byte strings' in PDF 1.2+. 7.9.2.4 Byte String Type' says The byte string type shall be used for binary data that shall be represented as a series of bytes, where each byte may be any value representable in 8 bits.' The point is that the named destination names are not text strings'. If text strings', we can convert to UTF-8 since the encoding is clearly defind. However, the named destination names are binary strings'. So we cannot convert to UTF-8 since the encoding is not known. Masamichi Hosoda 2016-09-17 17:46:43 UTC (In reply to Masamichi Hosoda from comment #50) > I don't matter either GHash or GTree. > I'd like either of the followings: > GTree with GBytes keys and PopplerDest value > GHash with GBytes keys and PopplerDest value My patch uses GBytes but I also don't matter either GBytes, GString or GByteArray. I'd not like to use gchar* since it cannot handle pure-binary. Which would you like? 1. GTree or GHash 2. GBytes, GString or GByteArray I'll create a patch. Carlos Garcia Campos 2016-09-20 16:31:50 UTC (In reply to Masamichi Hosoda from comment #50) > (In reply to Carlos Garcia Campos from comment #48) > > (In reply to Adrian Johnson from comment #35) > > > Carlos, do you have an opinion on whether to return a GList of names or > > > binary tree of names/LinkDests? > > > > I guess we could return a GTree, but I'm not sure I understand exactly what > > we want. Do we need the list to be in a specific order? Do they have an > > order in the PDF document that we should respect or alphabetical is what > > want? > > I don't matter either GHash or GTree. > I'd like either of the followings: > GTree with GBytes keys and PopplerDest value > GHash with GBytes keys and PopplerDest value > That doesn't answer the questions, Do we need the list to be in a specific order? Because if the order doesn't matter it's probably easier to use GHashTable, otherwise I guess we need a GTree. Carlos Garcia Campos 2016-09-20 16:32:29 UTC (In reply to Masamichi Hosoda from comment #51) > (In reply to Masamichi Hosoda from comment #50) > > I don't matter either GHash or GTree. > > I'd like either of the followings: > > GTree with GBytes keys and PopplerDest value > > GHash with GBytes keys and PopplerDest value > > My patch uses GBytes but I also don't matter either GBytes, GString or > GByteArray. > I'd not like to use gchar* since it cannot handle pure-binary. I see. > Which would you like? > 1. GTree or GHash > 2. GBytes, GString or GByteArray > > I'll create a patch. I think we should use GBytes then Masamichi Hosoda 2016-09-21 12:05:49 UTC (In reply to Carlos Garcia Campos from comment #53) > (In reply to Masamichi Hosoda from comment #51) > > (In reply to Masamichi Hosoda from comment #50) > > > I don't matter either GHash or GTree. > > > I'd like either of the followings: > > > GTree with GBytes keys and PopplerDest value > > > GHash with GBytes keys and PopplerDest value > > > > My patch uses GBytes but I also don't matter either GBytes, GString or > > GByteArray. > > I'd not like to use gchar* since it cannot handle pure-binary. > > I see. > > > Which would you like? > > 1. GTree or GHash > > 2. GBytes, GString or GByteArray > > > > I'll create a patch. > > I think we should use GBytes then Thank you for your understanding. Masamichi Hosoda 2016-09-21 12:57:03 UTC (In reply to Carlos Garcia Campos from comment #52) > (In reply to Masamichi Hosoda from comment #50) > > (In reply to Carlos Garcia Campos from comment #48) > > > (In reply to Adrian Johnson from comment #35) > > > > Carlos, do you have an opinion on whether to return a GList of names or > > > > binary tree of names/LinkDests? > > > > > > I guess we could return a GTree, but I'm not sure I understand exactly what > > > we want. Do we need the list to be in a specific order? Do they have an > > > order in the PDF document that we should respect or alphabetical is what > > > want? > > > > I don't matter either GHash or GTree. > > I'd like either of the followings: > > GTree with GBytes keys and PopplerDest value > > GHash with GBytes keys and PopplerDest value > > > > That doesn't answer the questions, Do we need the list to be in a specific > order? Because if the order doesn't matter it's probably easier to use > GHashTable, otherwise I guess we need a GTree. If I understand correctly, GTree provides sorted order. If something like ambiguous search is necessary, GTree may be better. But, I don't think of the other applications which need sorted order. If exact match search is all that is needed, GHashTable is better. Although even if sorted order is necessary, we can sort the keys of GHashTable later. After receiving GHashTable from glib frontend, we can build GTree from all elements of GHashTable. Masamichi Hosoda 2016-09-25 18:22:24 UTC Created attachment 126779 [details] [review] v3-tree-0001-Add-poppler_document_build_dests_tree-to-glib-fro.patch I've created GTree patch. (v3-tree-0001) Masamichi Hosoda 2016-09-25 18:23:08 UTC Created attachment 126780 [details] [review] v3-hash-0001-Add-poppler_dest_equal-to-glib-frontend.patch I've revised GHashTable patches. (v3-hash-0001 and v3-hash-0002) Masamichi Hosoda 2016-09-25 18:23:43 UTC Created attachment 126781 [details] [review] v3-hash-0002-Add-poppler_document_build_dests_hashtable-to-gli.patch I've revised GHashTable patches. (v3-hash-0001 and v3-hash-0002) Masamichi Hosoda 2016-09-25 18:24:44 UTC I've created GTree patch. (v3-tree-0001) I've revised GHashTable patches. (v3-hash-0001 and v3-hash-0002) I prefer GHashTable. So I'd like that v3-hash-0001 and v3-hash-0002 are merged. But, if you think tree is better, I'd like that v3-tree-0001 is merged. Albert Astals Cid 2016-09-28 22:04:12 UTC Before switching on C++11, please let's have the discussion again on the mailing list, because to my understanding it wasn't really agreed we'd move to C++11 Masamichi Hosoda 2016-09-29 12:59:32 UTC My patches (both v3-hash-* and v3-tree-*) do not require C++11. I hope that either v3-hash or v3-tree is merged. Masamichi Hosoda 2016-10-06 14:52:01 UTC I've noticed that the problem of the hash way. So if you agree the following explanation, would you merge v3-tree-0001 patch? If I understand correctly, in different platform, hash function might be different. The different hash function causes different order even if same PDF is used. I believe that it is not good that the order is dependent on the platform. On the other hand, the tree way is always sorted. For same PDF, the order is always stable even if platform is different. Masamichi Hosoda 2016-10-07 13:03:32 UTC I've noticed that another way. GTree provides sorted order. GHashTable provides unstable order. But, I think that there is a case that the serial (appeared) order in PDF is necessary. In this case, both GTree and GHashTable cannot be used since the serial order is not preserved. So I propose that using GPtrArray with the following structure. struct PopplerNamedDest { GBytes *name; PopplerDest *dest; }; If you would like sorted order, you can use g_ptr_array_sort (). If you would like to find name, you can build GHashTree from GPtrArray. How about this? Adrian Johnson 2016-10-07 13:33:33 UTC (In reply to Masamichi Hosoda from comment #63) > But, I think that there is a case that the serial (appeared) order in PDF is > necessary. > In this case, both GTree and GHashTable cannot be used since the serial > order is not preserved. Poppler Dict objects are sorted and PDF Name trees are required to be sorted. The code in Catalog.cc also sorts the name tree as I think there was a buggy PDF that did not sort the name tree. As a result, the names will always be sorted. I think GTree is best as it has the advantage of both sorted order and fast lookup. Masamichi Hosoda 2016-10-07 13:52:59 UTC (In reply to Adrian Johnson from comment #64) > (In reply to Masamichi Hosoda from comment #63) > > But, I think that there is a case that the serial (appeared) order in PDF is > > necessary. > > In this case, both GTree and GHashTable cannot be used since the serial > > order is not preserved. > > Poppler Dict objects are sorted and PDF Name trees are required to be > sorted. The code in Catalog.cc also sorts the name tree as I think there was > a buggy PDF that did not sort the name tree. As a result, the names will > always be sorted. > > I think GTree is best as it has the advantage of both sorted order and fast > lookup. Thank you for your detailed explanation. I understand that there is not the serial order. I withdraw the proposal of GPtrArray. Therefore, I also think that GTree is best. Would you merge v3-tree-0001 patch? Masamichi Hosoda 2016-12-07 11:49:16 UTC Created attachment 128367 [details] [review] v4-0001-Add-poppler_document_build_dests_tree-to-glib-fro.patch v3-tree-0001 patch conflicts with the current master. So, I've created v4 patch that resolved the conflict. Would you review and merge it? Adrian Johnson 2017-08-15 12:09:55 UTC Created attachment 133527 [details] [review] pdfinfo: print dests v3 v3: Updated to remove the "enable C++11 build" code. Now the poppler is using c++11 this patch no longer needs be rewritten to c++03. Albert Astals Cid 2017-08-15 18:12:51 UTC Adrian: Do you need me to review anything or you'll manage these patches? Adrian Johnson 2017-08-15 22:11:55 UTC Carlos has previously reviewed my pdfinfo patch so if you don't mind I'll just push it. Carlos should review the glib patch. Albert Astals Cid 2017-08-15 22:34:27 UTC (In reply to Adrian Johnson from comment #69) > Carlos has previously reviewed my pdfinfo patch so if you don't mind I'll > just push it. > Sure