Created attachment 140547 [details] [review] Patch for PDF subtype information (PDF/A) Hi, I have added a poppler document property for the subtype of the PDF format, i.e. PDF/A or PDF/X. This information is read from the PDF Info Dict using the following two keys, GTS_PDFA1Version, GTS_PDFXVersion. These return for example PDF/A-2u:2010 or PDF/X-3:2003. This PDF Format Subtype attribute can indicate if the pdf claims that is compatible with PDF/A or PDF/X ISO Standards. This patch is mentioned in bug 103530. Kind Regards, Evangelos Rigas
Where is GTS_PDFA1Version mentioned? I can't seem to find it in the PDF spec.
(In reply to Albert Astals Cid from comment #1) > Where is GTS_PDFA1Version mentioned? I can't seem to find it in the PDF spec. If you looked in the PDF spec from the other bug, then cannot find it as it is not there. These are mentioned on the ISO standards, see https://www.loc.gov/preservation/digital/formats/fdd/fdd000125.shtml for example. They have a link for the ISO, https://www.iso.org/standard/38920.html but you have to buy it. However, you can find the GTS_PDFA1Version and GTS_PDFXVersion in the code of pdfx (a pdflatex package that adds support for writing PDF/A and PDF/X compliant document using LaTeX). You can see here http://ctan.math.illinois.edu/macros/latex/contrib/pdfx/pdfx.pdf Pages 68 and 70 where it exports the GTS_PDFA1Version or GTS_PDFXVersion to the information dictionary. Additionaly, you can find the possible values of GTS_PDFX here http://www.npes.org/Portals/0/standards/pdf/GTS Registry-March09.pdf Hope it helps!
> Additionaly, you can find the possible values of GTS_PDFX here http://www.npes.org/Portals/0/standards/pdf/GTS Registry-March09.pdf Here is the right link http://www.npes.org/Portals/0/standards/pdf/GTS%20Registry-March09.pdf
Created attachment 140562 [details] [review] Display PDF subtype in the output of pdfinfo if subtype key is present Hi, Added the PDF subtype to the pdfinfo utility so if the GTS_* keys exist, pdfinfo will print "PDF subtype: PDF/?-conformace:date" below PDF version. An example output will be: user@pc ~$ pdfinof Document-1.pdf Title: Document-1 Subject: Keywords: Author: Creator: Scribus 1.5.0.svn Producer: Scribus PDF Library 1.5.0.svn CreationDate: Fri Oct 2 14:59:47 2015 BST ModDate: Fri Oct 2 14:59:47 2015 BST Tagged: no UserProperties: no Suspects: no Form: AcroForm JavaScript: no Pages: 1 Encrypted: no Page size: 612 x 792 pts (letter) Page rot: 0 File size: 155620 bytes Optimized: no PDF version: 1.3 PDF subtype: PDF/X-3:2002
Is there any chance we can enum that instead of it being a string? i.e. is the set of possible values fixed?
I spent the last weeks trying to find information on the ISO standards. From what I gathered, there are 5 ISO standards based on PDF. These are: ISO 19005 - Document management -- Electronic document file format for long-term preservation (PDF/A) ISO 24517 - Document management -- Engineering document format using PDF (PDF/E) ISO 14289 - Document management applications -- Electronic document file format enhancement for accessibility (PDF/UA) ISO 16612 - Graphic technology -- Variable data exchange (PDF/VT) ISO 15930 - Graphic technology -- Prepress digital data exchange (PDF/X) Each standard has multiple parts (i.e. revision) and different conformance levels. To trim down the enum, I decided to split it to three enums: subtype, part, and conformance. The subtype represents the 5 standards (A,E,UA,VT,X), part (1-5) and conformance the 7 levels of document conformance (A ,B, G, N, P, PG, U). These enums are extracted using a regular expression on the GTS version string. I have attached two patches. The first is the implementation in both the core and glib backend, while the second patch adds support for the subtype in pdfinfo utility.
Created attachment 140978 [details] [review] Read GTS information from info dict
Created attachment 140979 [details] [review] Display PDF subtype info in pdfinfo utility
I see you're using regexec which is not available on windows since it's a posix thing. I don't really care much for windows personally, but people get annoyed when we break the build too much. Can you try using the C++11 regexp support that should be more widely supported? https://en.cppreference.com/w/cpp/regex Sorry about that :/
(In reply to Albert Astals Cid from comment #9) > I see you're using regexec which is not available on windows since it's a > posix thing. > > I don't really care much for windows personally, but people get annoyed when > we break the build too much. > Makes total sense! > Can you try using the C++11 regexp support that should be more widely > supported? > https://en.cppreference.com/w/cpp/regex > > Sorry about that :/ Done! Changed regex to C++11 regexp and added documentation reference in glib. P.S. Weirdly enough, I couldn't convert to and from an std::string to GooString. In line 522, instead of: std::string pdfsubver = pdfSubtypeVersion->toStr(); I had to go with: std::string pdfsubver(pdfSubtypeVersion->getCString(), // Which immitates the pdfSubtypeVersion->getLength()); // toStr() declaration. Upon compilation it was throwing an error that toStr is not member of Class GooString. And in line 555 instead of GooString *conf = new GooString(match.str(3)); I went with: GooString *conf = new GooString(match.str(3).c_str()); However, performance-wise the two versions are the same.
Created attachment 141040 [details] [review] Read GTS information from info dict using C++11 regexp
Created attachment 141042 [details] [review] Display PDF subtype info in pdfinfo utility
Created attachment 141043 [details] [review] PDFSubtype documentation in glib
Created attachment 141044 [details] PDFSubtype test documents PDF documents (PDF/A-1b, PDF/A-2u, PDF/E, PDF/VT, PDF/X-5pg) for testing the functionality.
You mean that std::string pdfsubver = pdfSubtypeVersion->toStr(); doesn't work for you? Also there's a few memory leaks in the code, you never free pdfSubtypeVersion nor conf. You should compile poppler with debug mode enabled and then run with valgrind --leak-check=full with pdfinfo and you'll see there's a few leaks. Tell me if you need help understanding/running valgrind/debug.
(In reply to Albert Astals Cid from comment #15) > You mean that > std::string pdfsubver = pdfSubtypeVersion->toStr(); > doesn't work for you? The problem was on my computer, I managed to make it work. > Also there's a few memory leaks in the code, you never free > pdfSubtypeVersion nor conf. Ooops! Sorry about that. I totally forgot that the string returned from getDocInfoStringEntry has to be freed. I have added an extra function to check if a string entry exists in the document's info dictionary, thus solving the issue with the returned strings.
Created attachment 141134 [details] [review] Read GTS information from info dict using C++11 regexp
Created attachment 141135 [details] [review] Display PDF subtype info in pdfinfo utility
Created attachment 141136 [details] [review] PDFSubtype documentation in glib
Hi, As 0.68 has been released, I changed the `Since` tag in glib from 0.68 to 0.69. From the mail list I saw that there is now a gitlab instance, so I have opened a merge request. Hope everything is good for merging.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/363.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.