Description
Rodrigo Moya
2011-04-28 01:29:28 UTC
I wrote a simple test program to test it, so pasting here just in case it's useful: #include <poppler.h> int main (int argc, char *argv[]) { PopplerDocument *document; GError *error = NULL; g_type_init (); if (argc != 6) { g_print ("Usage: %s filename output new_title new_author new_subject\n", argv[0]); return -1; } document = poppler_document_new_from_file (argv[1], NULL, &error); if (!document) { g_print ("Error opening %s: %s\n", argv[1], error->message); g_error_free (error); return -1; } g_print ("Changing title from %s to %s\n", poppler_document_get_title (document), argv[3]); poppler_document_set_title (document, argv[3]); g_print ("Changing author from %s to %s\n", poppler_document_get_author (document), argv[4]); poppler_document_set_author (document, argv[4]); g_print ("Changing subject from %s to %s\n", poppler_document_get_subject (document), argv[5]); poppler_document_set_subject (document, argv[5]); error = NULL; if (!poppler_document_save (document, argv[2], &error)) { g_print ("Error saving %s: %s\n", argv[2], error->message); g_error_free (error); return -1; } g_object_unref (document); return 0; } Comment on attachment 46136 [details] [review] Patch to add this new API Hey rodrigo, patch looks goot to me. Common parts should be in the core, so that other frontends could use them too. Frontends currently get the info dict directly, so it's probably a good time to add a PDFInfo class in the core with setters and getters. See comments inline. >From 3ce14bcbf198fc93365d664b0db5745680967764 Mon Sep 17 00:00:00 2001 >From: Rodrigo Moya <rodrigo@gnome-db.org> >Date: Thu, 28 Apr 2011 10:24:05 +0200 >Subject: [PATCH] glib: Add _set functions for some PopplerDocument properties > >--- > glib/poppler-document.cc | 154 ++++++++++++++++++++++++++++++++++++++++++++++ > glib/poppler-document.h | 6 ++ > 2 files changed, 160 insertions(+), 0 deletions(-) > >diff --git a/glib/poppler-document.cc b/glib/poppler-document.cc >index 710a5b3..360aa42 100644 >--- a/glib/poppler-document.cc >+++ b/glib/poppler-document.cc >@@ -607,6 +607,46 @@ char *_poppler_goo_string_to_utf8(GooString *s) > return result; > } > >+GooString *_poppler_goo_string_from_utf8(const gchar *source) >+{ >+ char *s, *utf8; >+ gsize bytes_written; >+ GooString *result; >+ >+ utf8 = g_locale_to_utf8 (source, strlen (source), NULL, &bytes_written, NULL); you shouldn't need this, we assume all strings used by the poppler-glib api are always utf-8, like glib/gtk+. >+ s = g_convert (utf8, bytes_written, >+ "UTF-16BE", "UTF-8", NULL, &bytes_written, NULL); >+ >+ result = new GooString (s, bytes_written); that's the only part that should be in the glib forntend >+ if (!result->hasUnicodeMarker()) { >+ result->insert(0, 0xff); >+ result->insert(0, 0xfe); >+ } This should be in the core. >+ g_free (s); >+ g_free (utf8); >+ >+ return result; >+} >+ >+static gboolean >+mark_as_modified (PopplerDocument *document, Object *obj) >+{ >+ gboolean result = FALSE; >+ XRef *xref; >+ Object info_dict; >+ >+ xref = document->doc->getXRef (); >+ if (xref != NULL) { >+ xref->getDocInfoNF (&info_dict); >+ xref->setModifiedObject (obj, info_dict.getRef ()); info_dict should be freed here with info_dict.free(); >+ result = TRUE; >+ } >+ >+ return result; >+} This method should be part fo the core too. > static gchar * > info_dict_get_string (Dict *info_dict, const gchar *key) > { >@@ -763,6 +803,44 @@ poppler_document_get_title (PopplerDocument *document) > } > if we add the PDFDocInfo class, all get/set methods would simply use that class. *** Bug 71999 has been marked as a duplicate of this bug. *** Created attachment 107206 [details] [review] glib info-dict API This is a patch that uses Rodrigo's original patch as a guide for usage of the internal API. It adds support for setting/getting strings from the info dict, with the original impetus to make it possible to add custom metadata in a convenient way. It could be used in a later step to add support for setting title/subject/etc, which would relate it pretty directly to this bug. I'm going to repeat Carlos here "Common parts should be in the core, so that other frontends could use them too." (In reply to Albert Astals Cid from comment #5) > I'm going to repeat Carlos here "Common parts should be in the core, so that > other frontends could use them too." (Sorry about the delay, didn't CC myself before) Okay, I will give that a try. The review also recommends creating a PDFDocInfo class, is that still recommended? And if so, should it replace the return type of the "getDocInfo" method? Created attachment 107317 [details] [review] core document info dict API This adds an API to PDFDoc to get/set string entries in the doc's info dict. Please use proper style for the core code, i.e. don't add those jumps on function declarations. Also no need to check if the xref is there or not, if the doc doesn't have an xref you don't really have a doc and should be calling this anyway. (In reply to Albert Astals Cid from comment #8) > Please use proper style for the core code, i.e. don't add those jumps on > function declarations. > > Also no need to check if the xref is there or not, if the doc doesn't have > an xref you don't really have a doc and should be calling this anyway. Thanks for the comments! I've removed the check on the Doc's XRef and modified the type signature appropriately. What "jumps" on the declarations? Do you mean the newline between the get and set in PDFDoc.h? Is there a style guide somewhere for future reference? +GBool +PDFDoc::setDocInfoStringEntry(const char *key, GooString *value) has to be +GBool PDFDoc::setDocInfoStringEntry(const char *key, GooString *value) There is no document guideline, the file you're editing is your style guideline. Also you're not using the same indentation style. In the core we use the crazy 2 space indentation and 8 spaces tab. (In reply to Albert Astals Cid from comment #10) > +GBool > +PDFDoc::setDocInfoStringEntry(const char *key, GooString *value) > > has to be > > +GBool PDFDoc::setDocInfoStringEntry(const char *key, GooString *value) > > There is no document guideline, the file you're editing is your style > guideline. Also you're not using the same indentation style. > > In the core we use the crazy 2 space indentation and 8 spaces tab. Ah, thanks! Okay I've addressed the useless check and style issues. If I understand that tabs are used every 8 spaces... crazy indeed :). Created attachment 107369 [details] [review] core document info dict API rev 2 This addresses useless checks and style issues. I don't think you should add the unicode marker, the goostring should come correctly formated by the caller. (In reply to Albert Astals Cid from comment #13) > I don't think you should add the unicode marker, the goostring should come > correctly formated by the caller. It seemed that Carlos was asking for this to be in the core in his review of the original patch by Rodrigo. I have no opinion on where it should go, so whatever you feel is best :). Carlos why do you want the core to add that marker? Not using the marker is fine and in my opinion whoever calls the function should pass a correctly formatted GooString, not relying on the core to fixup the GooString. (In reply to Albert Astals Cid from comment #15) > Carlos why do you want the core to add that marker? Not using the marker is > fine and in my opinion whoever calls the function should pass a correctly > formatted GooString, not relying on the core to fixup the GooString. I don't remember, but I agree with you. (In reply to Carlos Garcia Campos from comment #16) > (In reply to Albert Astals Cid from comment #15) > > Carlos why do you want the core to add that marker? Not using the marker is > > fine and in my opinion whoever calls the function should pass a correctly > > formatted GooString, not relying on the core to fixup the GooString. > > I don't remember, but I agree with you. Looking at the code, my guess is that I proposed it because that's what we do when settings form and annot contents in the core. Comment on attachment 107369 [details] [review] core document info dict API rev 2 Review of attachment 107369 [details] [review]: ----------------------------------------------------------------- ::: glib/poppler-document.cc @@ +778,5 @@ > } > > +gchar * > +poppler_document_info_get_string (PopplerDocument *document, > + const gchar *key) Instead of adding API to extend the info dict with new metadata, I would start adding support for modifying the well known entries, like Rodrigo's patch did. I think we could split this patch, first adding the core part and the glib API, so that tthe glib api doesn't block the core part that could be used by qt. ::: poppler/PDFDoc.cc @@ +538,5 @@ > > +// Set/get doc info string entries > +GooString *PDFDoc::getDocInfoStringEntry(const char *key) > +{ > + Object info_obj; I agree we are not always consistent, but in the core we usually prefer camel case instead of underscores, so this would be infoObj, instead of info_obj, and the same for all other cases. @@ +546,5 @@ > + if (info_obj.isDict()) { > + Object result_obj; > + > + if (info_obj.dictLookup(key, &result_obj, 0)->isString()) { > + result = new GooString(result_obj.getString()); You can do result = resultObj.takeString() to avoid a memory allocation. @@ +571,5 @@ > + > +GBool PDFDoc::setDocInfoStringEntry(const char *key, GooString *value) > +{ > + Object info_obj; > + GBool success = gFalse; I'm not sure this could fail, the getter can fail because the entry might not be present, but here we are setting a new entry, how can that fail? @@ +578,5 @@ > + > + if (info_obj.isDict()) { > + Object goo_str_obj; > + > + if (value) { The spec says we shouldn't store entry values, but omit the entries, so we should also check that the string is not empty when not NULL. @@ +593,5 @@ > + > + info_obj.dictSet(key, &goo_str_obj); > + setInfoModified(this, &info_obj); > + success = gTrue; > + } If the document doesn't have an info dict we should create one instead of failing. Why do we need getDocInfoStringEntry? there's already ways to get the doc info. Also on setDocInfoStringEntry I don't see why it should fail if there's no info dict, we can create one, no? (In reply to Albert Astals Cid from comment #19) > Why do we need getDocInfoStringEntry? there's already ways to get the doc > info. It could simplify frontend functions for getting document properties. Created attachment 121375 [details] [review] Poppler core patch I implemented the needed functions for the poppler core for now. A patch for frontend will come later. This way core and frontend patches will be split just as Carlos suggested. Please review. I don't want to write the frontend part knowing that the core API might still change. Created attachment 121377 [details] [review] Poppler core patch This patch is the same as the previous. It just adds copyright info. Forgot to add it in the previous one, sorry. Comment on attachment 121377 [details] [review] Poppler core patch Review of attachment 121377 [details] [review]: ----------------------------------------------------------------- Thanks for the patch, it looks good to me in general, I have a just a few comments. ::: poppler/PDFDoc.cc @@ +569,5 @@ > +void PDFDoc::setDocInfoStringEntry(const char *key, GooString *value) > +{ > + Object infoObj; > + getDocInfo(&infoObj); > + createInfoDictIfNoneExists(&infoObj); Since the doc info dict is part of the xref, and we already have XRef::getDocInfo(), I think we could move this to XRef as something like XRef::createDocInfo() that internally marks the dict as modified, so that we only need the getter. @@ +575,5 @@ > + Object gooStrObj; > + gooStrObj.initNull(); > + > + if (value && value->getLength() != 0) { > + gooStrObj.initString(value); Nit: instead of unconditionally init the object to null I would add an else here, we init it to string or null. @@ +579,5 @@ > + gooStrObj.initString(value); > + } > + > + // gooStrObj is set to value or null by now. The latter will cause a removal. > + infoObj.dictSet(key, &gooStrObj); Ok, so if we pass a null or empty string, the entry is removed from the dict. So, it could be the case of calling this with null for a document that doesn't have an info dict, in which case we are creating an info dict, just to remove it. I think we could return early in such case. @@ +583,5 @@ > + infoObj.dictSet(key, &gooStrObj); > + > + if (infoObj.dictGetLength() == 0) { > + // Info dictionary is empty. Remove it altogether. > + removeDocInfo(); hmm, we could probably leave an empty dict, the problem of doing this is that if your API does something like set("Foo", NULL); set("Bar", "baz"); if "Foo" was the only entry, the dictionary is removed and a new one is created when setting "Bar". So, I would just leave the empty dict if the last entry is removed. ::: poppler/PDFDoc.h @@ +226,5 @@ > Object *getDocInfo(Object *obj) { return xref->getDocInfo(obj); } > Object *getDocInfoNF(Object *obj) { return xref->getDocInfoNF(obj); } > > + // Remove the document's Info dictionary and update the trailer dictionary. > + void removeDocInfo(); Why is this public? (In reply to Carlos Garcia Campos from comment #23) > @@ +583,5 @@ > > + infoObj.dictSet(key, &gooStrObj); > > + > > + if (infoObj.dictGetLength() == 0) { > > + // Info dictionary is empty. Remove it altogether. > > + removeDocInfo(); > > hmm, we could probably leave an empty dict, the problem of doing this is > that if your API does something like set("Foo", NULL); set("Bar", "baz"); if > "Foo" was the only entry, the dictionary is removed and a new one is created > when setting "Bar". So, I would just leave the empty dict if the last entry > is removed. Thanks for the review. So if we don't want to save a document with an empty dict, the dict should be removed in one of PDFDoc::saveSth functions? > ::: poppler/PDFDoc.h > @@ +226,5 @@ > > Object *getDocInfo(Object *obj) { return xref->getDocInfo(obj); } > > Object *getDocInfoNF(Object *obj) { return xref->getDocInfoNF(obj); } > > > > + // Remove the document's Info dictionary and update the trailer dictionary. > > + void removeDocInfo(); > > Why is this public? I thought that if someone wanted to strip a PDF of all properties, instead of calling set(key, NULL) for every single entry, they could just remove the info dict. It would be more "direct" and shorter. I'll fix the things you pointed out. Also we could add functions like set_title, set_author etc. and get_title, get_author etc. I think. At last these are the properties described in the PDF spec. It would make things a bit more elegant in the frontends. What do you think? (The getters should probably be added in a separate patch though, since they don't have anything to do with this bug report.) I think you wrote about it in the comment #2, but I might have misunderstood you. (In reply to Carlos Garcia Campos from comment #23) > ::: poppler/PDFDoc.h > @@ +226,5 @@ > > Object *getDocInfo(Object *obj) { return xref->getDocInfo(obj); } > > Object *getDocInfoNF(Object *obj) { return xref->getDocInfoNF(obj); } > > > > + // Remove the document's Info dictionary and update the trailer dictionary. > > + void removeDocInfo(); > > Why is this public? It could also be implemented in XRef and changed in PDFDoc to an inline function just like PDFDoc::getDocInfo. So if you don't mind leaving it public, I'd do it. Created attachment 121427 [details] [review] Poppler core patch So this would look like this, if we agree on several things: (In reply to Carlos Garcia Campos from comment #23) > @@ +583,5 @@ > > + infoObj.dictSet(key, &gooStrObj); > > + > > + if (infoObj.dictGetLength() == 0) { > > + // Info dictionary is empty. Remove it altogether. > > + removeDocInfo(); > > hmm, we could probably leave an empty dict, the problem of doing this is > that if your API does something like set("Foo", NULL); set("Bar", "baz"); if > "Foo" was the only entry, the dictionary is removed and a new one is created > when setting "Bar". So, I would just leave the empty dict if the last entry > is removed. I don't think that saving a document with an empty Info dict is "the right way". Apparently the authors of pdftk would agree with me on that. And removing the document's Info dictionary in saveAs/saveCompleteRewrite/saveIncrement doesn't feel right either. So I would prefer to keep it in setDocInfoStringEntry and don't mind the unnecessary steps occuring in special cases. Then there is the case of removeDocInfo being public, so I'm waiting for your answer. And, as I said, I will introduce the getters in a separate patch under a seperate bug report once the work with this bug report is done. Created attachment 121601 [details] [review] Poppler core patch A new patch for core - validated by the glib frontend. (I'll send the glib patch in a second.) Created attachment 121602 [details] [review] Poppler glib patch The glib frontend part can be tested with these two programs: https://gist.github.com/yakubin/63a7c94d9fee345a33fb Please keep in mind that date getters may not work as you'd expect, since this: https://bugs.freedesktop.org/show_bug.cgi?id=94035 hasn't been committed yet. Now, I think, the only thing left is to do the same for other frontends. Created attachment 121766 [details] [review] Poppler cpp patch Of course the cpp frontend also has a bug concerning timezones, but it doesn't invalidate this patch, so I'll deal with that later. Created attachment 121929 [details] [review] patch So I've finished writing the code. Of course even though it's a single file, there are several patches in it. (I made it a single file in order to make the applying easier.) I split the core patch in order to make it easier to review. The first core patch is a bit different from the previous one. It's simpler and makes the save functions significantly faster (as can be seen by running this benchmark: https://github.com/yakubin/poppler-bench). Patches for Bug 94173 may cause marge conflicts with these patches, so I'll be happy if these patches are applied first, since this is a big series of patches and it would be harder to resolve here, but of course if they aren't, I will resolve the conflicts. I think now is the time for the final review. Cheers. PS. "These patches" meaning "the patches I've just sent". Created attachment 124457 [details] [review] [PATCH 1/7] Added XRef modification flag Rebased the patches and corrected a couple of things (details in descriptions). This patch enables detection of modification of trailer dict. It also speeds up the process of saving a document by removing an unnecessary loop. Created attachment 124458 [details] [review] [PATCH 2/7] Added DocInfo setters & getters This patch adds the necessary functions in core. No recent changes here. Created attachment 124459 [details] [review] [PATCH 3/7] cpp: Added functions to save a document Made the functions return a bool to give feedback to the user about success or failure. Created attachment 124460 [details] [review] [PATCH 4/7] cpp: Added document property setters & getters Same as above. Created attachment 124461 [details] [review] [PATCH 5/7] qt4: Added document property setters & getters Added Document::removeInfo() which I forgot to add to qt frontend in the previous series of patches. Created attachment 124462 [details] [review] [PATCH 6/7] qt5: Added document property setters & getters Same as above. Created attachment 124463 [details] [review] [PATCH 7/7] glib: Added document property setters & simplified getters Changed the "Since" versions in comments for the new functions to the next version of poppler. If you test the cpp frontend with these patches, you may run into problems resulting from Bug 96426. And aside from that if you run 'tail -n 50' on a modified pdf produced by poppler you may notice that it produces 2 trailer dicts (and 2 metadata objects if you modify it). This bug is not related. Its source is somewhere in saveIncrementalUpdate(). I'm going to investigate it further, but I don't think it should be a blocker for these patches. I reported the bug I described (Bug 96529). You can see that it has nothing to do with this one, since it occurs with the code in the master branch. Created attachment 124569 [details] [review] [PATCH 2/7] Added DocInfo setters & getters Added the lines: "if (infoObj.isNull()) { return NULL; }" Without them a program is aborted when it calls a getter if there is no metadata whatsoever. Created attachment 124570 [details] [review] [PATCH 7/7] glib: Added document property setters & simplified getters Changed the "Since" values in comments to the next version of poppler. Created attachment 124572 [details] [review] [PATCH 1/7] Added XRef modification flag Moved all XRef::setModified() calls to XRef::addIndirectObject(), XRef::removeIndirectObject() and XRef::setModifiedObject(). This way introducing bugs related to it will be less probable and the code will be cleaner. Created attachment 124573 [details] [review] [PATCH 2/7] Added DocInfo setters & getters Same as above. Comment on attachment 124572 [details] [review] [PATCH 1/7] Added XRef modification flag Review of attachment 124572 [details] [review]: ----------------------------------------------------------------- This looks goo to me, I'll push it. Comment on attachment 124573 [details] [review] [PATCH 2/7] Added DocInfo setters & getters Review of attachment 124573 [details] [review]: ----------------------------------------------------------------- Looks good to me, I have only a couple of comments, but to not delay this more I'll push a modified patch with my comments addressed. ::: poppler/PDFDoc.cc @@ +655,5 @@ > + > + GooString *result; > + > + if (entryObj.isString()) { > + result = new GooString(entryObj.getString()); As I suggested in a previous review, we can save the allocation here, by using GooString::takeString(). ::: poppler/PDFDoc.h @@ +234,5 @@ > > + // Create and return the document's Info dictionary if none exists. > + // Otherwise return the existing one. > + Object *createDocInfoIfNoneExists(Object *obj) { return xref->createDocInfoIfNoneExists(obj); } > + Object *createDocInfoIfNoneExistsNF(Object *obj) { return xref->createDocInfoIfNoneExistsNF(obj); } This doesn't seem to be used anywhere, I prefer to not add dead code, if we eventually need this we can just add it. ::: poppler/XRef.cc @@ +1312,5 @@ > + > + return obj; > +} > + > +Object *XRef::createDocInfoIfNoneExistsNF(Object *obj) { Ditto. @@ +1338,5 @@ > +} > + > +void XRef::removeDocInfo() { > + Object infoObjRef; > + getDocInfoNF(&infoObjRef); Since this is public in the end, we should check that it's not null. Comment on attachment 124570 [details] [review] [PATCH 7/7] glib: Added document property setters & simplified getters Review of attachment 124570 [details] [review]: ----------------------------------------------------------------- Same here, I have a few more comments, but I'll push a modified patch. ::: glib/poppler-document.cc @@ +754,3 @@ > > + gchar *utf16 = g_convert (src, -1, "UTF-16BE", "UTF-8", NULL, &outlen, NULL); > + GooString *result = new GooString (utf16, outlen); g_convert returns NULL if the conversion fails for whatever reason, we should handle that case. @@ +758,2 @@ > > + assert(result->getLength() != 0); And then this assert is not needed, I would say. @@ +764,3 @@ > } > > + assert((*result).getLength() != 0); Nor this redundant one. @@ +886,5 @@ > + * poppler_document_set_title: > + * @document: A #PopplerDocument > + * @title: A new title > + * > + * Sets the document's title. If @title is NULL or its length is 0, Title entry NULL -> %NULL We don't normally use empty strings in glib APIs, so better to only handle the case of NULL explicitly. @@ +896,5 @@ > +poppler_document_set_title (PopplerDocument *document, const gchar *title) > +{ > + g_return_if_fail (POPPLER_IS_DOCUMENT (document)); > + GooString *goo_title = _poppler_goo_string_from_utf8(title); > + document->doc->setDocInfoTitle(goo_title); We don't want to remove the entry if the conversion fails, so we should handle the case of NULL given by the user separately from the the case of _poppler_goo_string_from_utf8 returning NULL. These comments apply to all setters. @@ +1122,4 @@ > time_t > poppler_document_get_creation_date (PopplerDocument *document) > { > + g_return_val_if_fail (POPPLER_IS_DOCUMENT (document), -1); I think we should always cast (time_t)-1 @@ +2831,5 @@ > return NULL; > } > > +GooString * > +_poppler_convert_gtime_to_pdf_date (time_t gdate) { We don't need to add a new function for this, there's timeToDateString() in the core. I've reviewed and pushed the core and glib patches, thanks!. The other patches should be reviewed by Pino and/or Albert. (In reply to Carlos Garcia Campos from comment #48) > I've reviewed and pushed the core and glib patches, thanks!. The other > patches should be reviewed by Pino and/or Albert. Thanks a lot for your immense patience! :) Created attachment 125116 [details] [review] glib: make document metatag gobject properties writeable I'd completely forgotten about the existence of GObject properties. Sorry. It may be useful to be able to modify the document properties via GObject properties. Thankfully, this patch is small. (In reply to Jakub Kucharski from comment #50) > Created attachment 125116 [details] [review] [review] > glib: make document metatag gobject properties writeable > > I'd completely forgotten about the existence of GObject properties. Sorry. > It may be useful to be able to modify the document properties via GObject > properties. Thankfully, this patch is small. Pushed, thanks! Commited, can you check QDateTime Document::modificationDate() const and QDateTime Document::creationDate() const in the qt frontends? I think they are leaky. Created attachment 125283 [details] [review] [PATCH 1/2] qt4: fix memory leaks in Document::modificationDate() and Document::creationDate() (In reply to Albert Astals Cid from comment #52) > Commited, can you check QDateTime Document::modificationDate() const and > QDateTime Document::creationDate() const in the qt frontends? I think they > are leaky. You are right. Created attachment 125284 [details] [review] [PATCH 2/2] qt5: fix memory leaks in Document::modificationDate() and Document::creationDate() All pushed as far as i can see. (In reply to Albert Astals Cid from comment #55) > All pushed as far as i can see. Thanks! |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.