Bug 103912

Summary: support tags, links, page labels, outline, metadata in cairo backend
Product: poppler Reporter: Adrian Johnson <ajohnson>
Component: cairo backendAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium CC: carlosgc, novalazy+freedesktop
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: Fix some bugs in StructTreeRoot
write document logical structure
copy pdf page labels
pdftocairo: add -fl, -ll options for specify page range using page labels
only output link annotations when setInteractive(true)
thumbnails
outline
metadata

Description Adrian Johnson 2017-11-26 10:29:01 UTC
Current version of cairo can output tagged pdf, hyperlinks, page labels, thumbnails, outline and metadata on PDF output. The following set of patches add support for these features to the cairo backend so that pdftocairo can pass through these features from the source PDF.
Comment 1 Adrian Johnson 2017-11-26 10:32:32 UTC
Created attachment 135717 [details] [review]
Fix some bugs in StructTreeRoot
Comment 2 Adrian Johnson 2017-11-26 10:32:59 UTC
Created attachment 135718 [details] [review]
write document logical structure
Comment 3 Adrian Johnson 2017-11-26 10:33:32 UTC
Created attachment 135719 [details] [review]
copy pdf page labels
Comment 4 Adrian Johnson 2017-11-26 10:34:26 UTC
Created attachment 135720 [details] [review]
pdftocairo: add -fl, -ll options for specify page range using page labels
Comment 5 Adrian Johnson 2017-11-26 10:35:14 UTC
Created attachment 135721 [details] [review]
only output link annotations when setInteractive(true)
Comment 6 Adrian Johnson 2017-11-26 10:35:47 UTC
Created attachment 135722 [details] [review]
thumbnails
Comment 7 Adrian Johnson 2017-11-26 10:36:07 UTC
Created attachment 135723 [details] [review]
outline
Comment 8 Adrian Johnson 2017-11-26 10:36:26 UTC
Created attachment 135724 [details] [review]
metadata
Comment 9 Carlos Garcia Campos 2017-12-12 13:41:04 UTC
Comment on attachment 135717 [details] [review]
Fix some bugs in StructTreeRoot

Review of attachment 135717 [details] [review]:
-----------------------------------------------------------------

LGTM

::: poppler/Object.h
@@ +90,5 @@
> +  bool operator() (const Ref& lhs, const Ref& rhs) const {
> +    if (lhs.num != rhs.num)
> +      return lhs.num < rhs.num;
> +    else
> +      return lhs.gen < rhs.gen;

We don't need the else after a return
Comment 10 Carlos Garcia Campos 2017-12-12 14:11:37 UTC
Comment on attachment 135718 [details] [review]
write document logical structure

Review of attachment 135718 [details] [review]:
-----------------------------------------------------------------

LGTM, but I have a few comments

::: poppler/CairoOutputDev.cc
@@ +170,5 @@
>    actualText = NULL;
> +  logicalStruct = false;
> +  pdfPageNum = 0;
> +  cairoPageNum = 0;
> +  forwardLinkCount = 0;

Initialize firstPage to gFalse here too.

@@ +179,4 @@
>    align_stroke_coords = gFalse;
>    adjusted_stroke_width = gFalse;
>    xref = NULL;
> +  annotations = nullptr;

linkAnnot should be initialized to nullptr here too I guess.

@@ +201,5 @@
>      text->decRefCnt();
>    if (actualText)
> +    delete actualText;
> +  if (annotations)
> +    delete annotations;

Now that we have C++11 support I think it's a good idea to use unique_ptr in newly added code at least. This could be a unique_ptr<Annots> and we don't need this explicit delete.

@@ +227,4 @@
>    }
>  }
>  
> +GBool CairoOutputDev::isPDF()

this could be const

@@ +322,5 @@
> +  if (logicalStruct && isPDF()) {
> +    int numDests = doc->getCatalog()->numDestNameTree();
> +    for (int i = 0; i < numDests; i++) {
> +      GooString *name = doc->getCatalog()->getDestNameTreeName(i);
> +      LinkDest *dest = doc->getCatalog()->getDestNameTreeDest(i);

We could use unique_ptr here too...

@@ +325,5 @@
> +      GooString *name = doc->getCatalog()->getDestNameTreeName(i);
> +      LinkDest *dest = doc->getCatalog()->getDestNameTreeDest(i);
> +      if (dest->isPageRef()) {
> +	destsMap[dest->getPageRef()].insert(
> +	  std::make_pair(std::unique_ptr<GooString>(name->copy()), std::unique_ptr<LinkDest>(dest)));

then we just std::move() the dest here

@@ +327,5 @@
> +      if (dest->isPageRef()) {
> +	destsMap[dest->getPageRef()].insert(
> +	  std::make_pair(std::unique_ptr<GooString>(name->copy()), std::unique_ptr<LinkDest>(dest)));
> +      } else {
> +	delete dest;

And we don't need this.

@@ +336,5 @@
> +    for (int i = 0; i < numDests; i++) {
> +      const char *name = doc->getCatalog()->getDestsName(i);
> +      LinkDest *dest = doc->getCatalog()->getDestsDest(i);
> +      if (dest->isPageRef()) {
> +	destsMap[dest->getPageRef()].insert(

Note that we would need to get the ref here first if we use std::move(), using a local var.

@@ +339,5 @@
> +      if (dest->isPageRef()) {
> +	destsMap[dest->getPageRef()].insert(
> +	  std::make_pair(std::unique_ptr<GooString>(new GooString(name)),
> +			 std::unique_ptr<LinkDest>(dest)));
> +      }

And same here, so we don't leak the dest in case it's not page ref.

@@ +366,5 @@
> +  cairoPageNum++;
> +
> +  if (logicalStruct && isPDF()) {
> +    if (annotations)
> +      delete annotations;

We wouldn't need this

@@ +368,5 @@
> +  if (logicalStruct && isPDF()) {
> +    if (annotations)
> +      delete annotations;
> +    Object obj = doc->getPage(pageNum)->getAnnotsObject(xref);
> +    annotations = new Annots(doc, pageNum, &obj);

and here we would just std::make_unique<Annots>. I don't know if I pushed the patch that adds make_unique impl in the end, though.

@@ +3474,5 @@
> +  } else {
> +    // Link page ref is to a page that has not been emitted.
> +    // Create a named destination and add the name to destsMap so destination will
> +    // be create if/when the page is emitted.
> +    GooString *name = new GooString();

here I would use make_unique and then move below, but it's ok this way too.

::: poppler/CairoOutputDev.h
@@ +263,5 @@
>    void type3D1(GfxState *state, double wx, double wy,
>  	       double llx, double lly, double urx, double ury) override;
>  
> +  virtual void beginMarkedContent(GfxState *state, char *name, Dict *properties) override;
> +  virtual void endMarkedContent(GfxState *state) override;

virtual is not needed here, override is enough

::: poppler/OutputDev.h
@@ +148,4 @@
>    virtual void startPage(int pageNum, GfxState *state, XRef *xref) {}
>  
>    // End a page.
> +  virtual void endPage(GfxState *state) {}

Could we make state default to nullptr? This way we avoid having to modify all the callers to pass nullptr when unused.

::: poppler/UTF.h
@@ +84,5 @@
> +
> +// Convert a PDF Text String to UTF-8
> +//   textStr    - PDF text string
> +//   returns UTF-8 string.
> +char *TextStringToUtf8(GooString *textStr);

Changes to UTF could probably be split to a different patch.
Comment 11 Carlos Garcia Campos 2017-12-12 14:13:40 UTC
Comment on attachment 135719 [details] [review]
copy pdf page labels

Review of attachment 135719 [details] [review]:
-----------------------------------------------------------------

OK!
Comment 12 Carlos Garcia Campos 2017-12-12 14:14:32 UTC
Comment on attachment 135720 [details] [review]
pdftocairo: add -fl, -ll options for specify page range using page labels

Review of attachment 135720 [details] [review]:
-----------------------------------------------------------------

OK
Comment 13 Carlos Garcia Campos 2017-12-12 14:15:25 UTC
Comment on attachment 135721 [details] [review]
only output link annotations when setInteractive(true)

Review of attachment 135721 [details] [review]:
-----------------------------------------------------------------

OK
Comment 14 Carlos Garcia Campos 2017-12-12 14:16:28 UTC
Comment on attachment 135722 [details] [review]
thumbnails

Review of attachment 135722 [details] [review]:
-----------------------------------------------------------------

OK
Comment 15 Albert Astals Cid 2018-01-03 22:47:19 UTC
*** Bug 104203 has been marked as a duplicate of this bug. ***
Comment 16 Adrian Johnson 2018-01-04 05:16:42 UTC
(In reply to Carlos Garcia Campos from comment #9)
> Comment on attachment 135717 [details] [review] [review]
> Fix some bugs in StructTreeRoot
> 
> Review of attachment 135717 [details] [review] [review]:
> -----------------------------------------------------------------
> 
> LGTM
> 
> ::: poppler/Object.h
> @@ +90,5 @@
> > +  bool operator() (const Ref& lhs, const Ref& rhs) const {
> > +    if (lhs.num != rhs.num)
> > +      return lhs.num < rhs.num;
> > +    else
> > +      return lhs.gen < rhs.gen;
> 
> We don't need the else after a return

I've pushed this patch with the above change as it fixes a couple of other bug reports. I'll get onto the other patches soon.
Comment 17 GitLab Migration User 2018-08-20 22:07:59 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/207.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.