Bug 51487 - [PATCH] Add support for TextMarkup Annotations in glib frontend
[PATCH] Add support for TextMarkup Annotations in glib frontend
Status: RESOLVED FIXED
Product: poppler
Classification: Unclassified
Component: glib frontend
unspecified
Other All
: medium normal
Assigned To: poppler-bugs
:
Depends on: 70901
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-27 15:58 UTC by Jose Aliste
Modified: 2014-08-03 11:24 UTC (History)
3 users (show)

See Also:


Attachments
1/2 Patch: Add a wrapper for AnnotQuadriliterals. (4.45 KB, patch)
2012-06-27 15:59 UTC, Jose Aliste
Details | Splinter Review
2/2 Patch: Add PopplerAnnotTextMarkup (12.19 KB, patch)
2012-06-27 15:59 UTC, Jose Aliste
Details | Splinter Review
Patch [1/8]: Add PopplerQuadrilateral boxed type (4.59 KB, patch)
2013-02-01 01:32 UTC, Germán Poo-Caamaño
Details | Splinter Review
Partch [2/8]: Add PopplerAnnotTextMarkup object (8.87 KB, patch)
2013-02-01 01:33 UTC, Germán Poo-Caamaño
Details | Splinter Review
Patch [3/8]: Add support for PopplerTextAnnotMarkup to the demo (2.14 KB, patch)
2013-02-01 01:34 UTC, Germán Poo-Caamaño
Details | Splinter Review
Patch [4/8]: Add PopplerAnnotTextHighlight object (6.70 KB, patch)
2013-02-01 01:35 UTC, Germán Poo-Caamaño
Details | Splinter Review
Patch [5/8]: Add PopplerAnnotTextUnderline object (6.95 KB, patch)
2013-02-01 01:36 UTC, Germán Poo-Caamaño
Details | Splinter Review
Patch [6/8]: Add PopplerAnnotTextSquiggly object (6.17 KB, patch)
2013-02-01 01:37 UTC, Germán Poo-Caamaño
Details | Splinter Review
Patch [7/8]: Add PopplerAnnotTextStrikeOut object (7.00 KB, patch)
2013-02-01 01:38 UTC, Germán Poo-Caamaño
Details | Splinter Review
Patch [8/8]: Add references in poppler-sections (2.86 KB, patch)
2013-02-01 01:39 UTC, Germán Poo-Caamaño
Details | Splinter Review
glib: Add PopplerQuadrilateral boxed type (5.77 KB, patch)
2013-10-29 01:06 UTC, Germán Poo-Caamaño
Details | Splinter Review
glib: Add PopplerAnnotTextMarkup class and subtypes (15.88 KB, patch)
2013-10-29 01:09 UTC, Germán Poo-Caamaño
Details | Splinter Review
glib-demo: Add support for PopplerTextAnnotMarkup (5.81 KB, patch)
2013-10-29 07:19 UTC, Germán Poo-Caamaño
Details | Splinter Review
glib: Add PopplerAnnotTextMarkup class and subtypes (15.94 KB, patch)
2013-10-29 07:20 UTC, Germán Poo-Caamaño
Details | Splinter Review
glib-demo: Add support for PopplerTextAnnotMarkup (13.36 KB, patch)
2013-10-29 07:21 UTC, Germán Poo-Caamaño
Details | Splinter Review
Screenhost that shows the difference between cairo and a list of rectangles (312.91 KB, image/png)
2013-10-29 07:29 UTC, Germán Poo-Caamaño
Details
glib: Add PopplerQuadrilateral boxed type (5.12 KB, patch)
2013-11-19 00:29 UTC, Germán Poo-Caamaño
Details | Splinter Review
glib: Add PopplerAnnotTextMarkup class and subtypes (15.45 KB, patch)
2013-11-20 06:18 UTC, Germán Poo-Caamaño
Details | Splinter Review
glib-demo: Add support for PopplerTextAnnotMarkup (8.07 KB, patch)
2013-11-20 06:20 UTC, Germán Poo-Caamaño
Details | Splinter Review
glib: Add PopplerAnnotTextMarkup class and subtypes (15.38 KB, patch)
2013-11-20 18:54 UTC, Germán Poo-Caamaño
Details | Splinter Review
glib-demo: Add support for PopplerTextAnnotMarkup (8.31 KB, patch)
2013-11-20 18:56 UTC, Germán Poo-Caamaño
Details | Splinter Review
glib: Add PopplerAnnotTextMarkup class and subtypes (21.99 KB, patch)
2013-11-20 19:56 UTC, Germán Poo-Caamaño
Details | Splinter Review
glib-demo: Add support for PopplerTextAnnotMarkup (9.88 KB, patch)
2013-11-20 19:58 UTC, Germán Poo-Caamaño
Details | Splinter Review
glib: Add PopplerAnnotTextMarkup class and subtypes (21.94 KB, patch)
2013-11-21 08:01 UTC, Germán Poo-Caamaño
Details | Splinter Review
glib-demo using poppler_page_get_text_layout_for_area() (2.57 KB, patch)
2013-11-28 08:37 UTC, Germán Poo-Caamaño
Details | Splinter Review
glib-demo using poppler_page_get_text_layout_for_area() (10.84 KB, patch)
2014-01-02 18:58 UTC, Germán Poo-Caamaño
Details | Splinter Review
glib-demo: Fix performance in text markup annotations (2.23 KB, patch)
2014-01-30 22:49 UTC, Germán Poo-Caamaño
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Jose Aliste 2012-06-27 15:58:00 UTC
Please add a PopplerTextMarkup gobject that wraps around AnnotTextMarkup objects on the core poppler.
Comment 1 Jose Aliste 2012-06-27 15:59:13 UTC
Created attachment 63536 [details] [review]
1/2 Patch: Add a wrapper for AnnotQuadriliterals.
Comment 2 Jose Aliste 2012-06-27 15:59:43 UTC
Created attachment 63537 [details] [review]
2/2 Patch: Add PopplerAnnotTextMarkup
Comment 3 Jose Aliste 2012-06-27 16:03:37 UTC
Carlos, the first pacth should easy to review as it is typical code for a boxed type. For the second patch, could you review the API for poppler_annot_text_markup_set_quads and get_quads. I added a basic implementation of get_quads using one API, but I am not sure this is what we want. I'll fix this once we fix the signatures of these funcs. Thanks.
Comment 4 Carlos Garcia Campos 2012-06-28 09:10:58 UTC
Comment on attachment 63536 [details] [review]
1/2 Patch: Add a wrapper for AnnotQuadriliterals.

Review of attachment 63536 [details] [review]:
-----------------------------------------------------------------

Looks good in general.

::: glib/poppler-annot.cc
@@ +184,5 @@
> +                           poppler_annot_quadrilateral_copy,
> +                           poppler_annot_quadrilateral_free)
> +
> +/**
> + * poppler_rectangle_new:

This should be poppler_annot_quadrilateral_new

@@ +186,5 @@
> +
> +/**
> + * poppler_rectangle_new:
> + *
> + * Creates a new #PopplerAnnotQuadrilateral

Add a comment that this should be freed with poppler_annot_quadrilateral_free()

@@ +188,5 @@
> + * poppler_rectangle_new:
> + *
> + * Creates a new #PopplerAnnotQuadrilateral
> + *
> + * Returns: a new #PopplerAnntoQuadrilateral, use poppler_annot_quadrilateral_free() to free it

Add Since: 0.22 tag

@@ +200,5 @@
> +/**
> + * poppler_annot_quadrilateral_copy:
> + * @quad: a #PopplerAnnotQuadrilateral to copy
> + *
> + * Creates a copy of @quad

Add a comment that the returned pointer should be freed with poppler_annot_quadrilateral_free()

@@ +202,5 @@
> + * @quad: a #PopplerAnnotQuadrilateral to copy
> + *
> + * Creates a copy of @quad
> + *
> + * Returns: a new allocated copy of @quadrilateral

parameter is quad, not quadrilateral. Add since tag here too.

::: glib/poppler-annot.h
@@ +186,5 @@
> + *  @x4: x coordinate of the fourth vertex
> + *  @y4: y coordinate of the fourth vertex
> + *  
> + *  A #PopplerAnnotQuadrilateral is used to describe
> + *  locations of annotations on a page.

Since: 0.22

@@ +198,5 @@
> +  gdouble x3;
> +  gdouble y3;
> +  gdouble x4;
> +  gdouble y4;
> +

Extra line here.
Comment 5 Carlos Garcia Campos 2012-06-28 09:26:38 UTC
Comment on attachment 63537 [details] [review]
2/2 Patch: Add PopplerAnnotTextMarkup

Review of attachment 63537 [details] [review]:
-----------------------------------------------------------------

Please, split this patch, moving the demo part into its own patch.

::: glib/poppler-annot.cc
@@ +1204,5 @@
> + * use poppler_annot_get_annot_type
> + *
> + **/
> +void
> +poppler_annot_text_markup_set_annot_type (PopplerAnnotTextMarkup *poppler_annot, 

This is very weird, and annotation shoulnd't be able to change their type once they are created. We have two options here, keep a single object PopplerAnnotTextMarkup and add multiple constructors (poppler_annot_text_markup_new_highlight(), etc.) or make PopplerAnnotTextMarkup an abstract class and add a class for every markup type deriving from PopplerAnnotTextMarkup.

@@ +1238,5 @@
> +    annot->setType (type);
> +}
> +
> +void
> +poppler_annot_text_markup_set_quads (PopplerAnnotTextMarkup    *poppler_annot,

The type is PopplerAnnotQuadrilateral so better name this poppler_annot_text_markup_set_quadrilaterals, even if it's a nit longer.

@@ +1249,5 @@
> +  annot = static_cast<AnnotTextMarkup *>(POPPLER_ANNOT (poppler_annot)->annot);
> +}
> +
> +PopplerAnnotQuadrilateral **
> +poppler_annot_text_markup_get_quads (PopplerAnnotTextMarkup *poppler_annot,

Ditto.
Comment 6 Carlos Garcia Campos 2012-06-28 09:27:34 UTC
Btw, you also need to add the new public symbols to glib/reference/poppler-sections.txt
Comment 7 Germán Poo-Caamaño 2013-02-01 01:32:01 UTC
Created attachment 74021 [details] [review]
Patch [1/8]: Add PopplerQuadrilateral boxed type

Updated patch with comments addressed.
Comment 8 Germán Poo-Caamaño 2013-02-01 01:33:19 UTC
Created attachment 74022 [details] [review]
Partch [2/8]: Add PopplerAnnotTextMarkup object

Updated patch. Addressed comments.
Comment 9 Germán Poo-Caamaño 2013-02-01 01:34:45 UTC
Created attachment 74023 [details] [review]
Patch [3/8]: Add support for PopplerTextAnnotMarkup to the demo

Separated patch addressing the comments
Comment 10 Germán Poo-Caamaño 2013-02-01 01:35:50 UTC
Created attachment 74024 [details] [review]
Patch [4/8]: Add PopplerAnnotTextHighlight object

Create PopplerAnnotTextHighlight as subclass of PopplerAnnotTextMarkup.
Comment 11 Germán Poo-Caamaño 2013-02-01 01:36:33 UTC
Created attachment 74025 [details] [review]
Patch [5/8]: Add PopplerAnnotTextUnderline object

Similar to highlight, but Underline.
Comment 12 Germán Poo-Caamaño 2013-02-01 01:37:30 UTC
Created attachment 74026 [details] [review]
Patch [6/8]: Add PopplerAnnotTextSquiggly object

Similar to highlight, but Squiggly.
Comment 13 Germán Poo-Caamaño 2013-02-01 01:38:08 UTC
Created attachment 74027 [details] [review]
Patch [7/8]: Add PopplerAnnotTextStrikeOut object

Similar to highlight, but StrikeOut.
Comment 14 Germán Poo-Caamaño 2013-02-01 01:39:01 UTC
Created attachment 74028 [details] [review]
Patch [8/8]: Add references in poppler-sections

Addressed comment #c6
Comment 15 Germán Poo-Caamaño 2013-02-01 01:40:34 UTC
FWIW, patches are rebased against 6eebbb9c015f98 (the last commit I can build poppler).
Comment 16 Albert Astals Cid 2013-03-01 18:35:17 UTC
Carlos?
Comment 17 Germán Poo-Caamaño 2013-03-01 19:11:39 UTC
FWIW, these patches are a work in progress.
Comment 18 Martin Spacek 2013-07-31 03:57:09 UTC
Is there anything in particular holding up these patches?
Comment 19 Germán Poo-Caamaño 2013-10-29 01:06:19 UTC
Created attachment 88261 [details] [review]
glib: Add PopplerQuadrilateral boxed type

Updates in the patch:

* Poppler's version
* Add references to poppler-sections.txt
Comment 20 Germán Poo-Caamaño 2013-10-29 01:09:32 UTC
Created attachment 88262 [details] [review]
glib: Add PopplerAnnotTextMarkup class and subtypes

Update patch that implements Text Markups annotations in poppler-glib.

This implementation is ready to be used.  For the demo, it is convenient
to fix: https://bugs.freedesktop.org/show_bug.cgi?id=69978
Comment 21 Germán Poo-Caamaño 2013-10-29 07:19:24 UTC
Created attachment 88272 [details] [review]
glib-demo: Add support for PopplerTextAnnotMarkup

Update patch. Fixed missing references for gtk-doc.
Comment 22 Germán Poo-Caamaño 2013-10-29 07:20:05 UTC
Created attachment 88273 [details] [review]
glib: Add PopplerAnnotTextMarkup class and subtypes

Update patch. Fixes missing references for gtk-doc.
Comment 23 Germán Poo-Caamaño 2013-10-29 07:21:57 UTC
Created attachment 88274 [details] [review]
glib-demo: Add support for PopplerTextAnnotMarkup

The demo requires https://bugs.freedesktop.org/show_bug.cgi?id=70901
Comment 24 Germán Poo-Caamaño 2013-10-29 07:29:36 UTC
Created attachment 88275 [details]
Screenhost that shows the difference between cairo and a list of rectangles

The demo add two implementations to set the Quadrilaterals.  Using cairo_region_t or getting a GList of rectangles.

For the GList of rectangles I use a deprecated function of poppler-glib: 
poppler_page_get_selection_region() that deprecated in favor of poppler_page_get_selected_region().

In the screenshot it is possible to see the difference of both implementations:
* With the GList we get one rectangle per line and it is rendered
  with small rectangles with small rounding ends.
* With cairo_region_t we get only 2 rectangles, making one huge rounding ends
  and one smaller.

The issue with cairo_region_t is exacerbated when using underline, squiggly
or strikeout.  Because those are rendered per rectangle. So, instead of
having every line underline or strikeout, there one big underline for the
big rectangle.

Anyhow, you can try the patch setting cairo = TRUE|FALSE inside pgd_annots_update_selected_text().
Comment 25 Carlos Garcia Campos 2013-11-03 12:24:42 UTC
Comment on attachment 88272 [details] [review]
glib-demo: Add support for PopplerTextAnnotMarkup

Review of attachment 88272 [details] [review]:
-----------------------------------------------------------------

::: glib/poppler-annot.cc
@@ +202,4 @@
>  {
>  }
>  
> +/* PopplerRectangle type */

Wrong comment

@@ +203,5 @@
>  }
>  
> +/* PopplerRectangle type */
> +
> +POPPLER_DEFINE_BOXED_TYPE (PopplerAnnotQuadrilateral, poppler_annot_quadrilateral,

I think this could be PopplerQuadrilateral, this is very generic boxed type even if it's only used by annots.

@@ +214,5 @@
> + * Creates a new #PopplerAnnotQuadrilateral. It must be freed with poppler_annot_quadrilateral_free() after use.
> + *
> + * Returns: a new #PopplerAnnotQuadrilateral.
> + *
> + * Since: 0.25

0.26

@@ +217,5 @@
> + *
> + * Since: 0.25
> + **/
> +PopplerAnnotQuadrilateral *
> +poppler_annot_quadrilateral_new (void)

I think the constructor should receive the points.

::: glib/poppler-annot.h
@@ +226,5 @@
> +  gdouble y2;
> +  gdouble x3;
> +  gdouble y3;
> +  gdouble x4;
> +  gdouble y4;

If we had PopplerPoint or PopplerVertex we could use it here too.
Comment 26 Carlos Garcia Campos 2013-11-03 12:36:36 UTC
Comment on attachment 88273 [details] [review]
glib: Add PopplerAnnotTextMarkup class and subtypes

Review of attachment 88273 [details] [review]:
-----------------------------------------------------------------

::: glib/poppler-annot.cc
@@ +325,5 @@
> + * located on @rect when added to a page. See poppler_page_add_annot()
> + *
> + * Return value: A newly created #PopplerAnnotTextMarkup annotation
> + *
> + * Since: 0.25

0.26

@@ +329,5 @@
> + * Since: 0.25
> + */
> +PopplerAnnot *
> +poppler_annot_text_markup_new_highlight (PopplerDocument  *doc,
> +					 PopplerRectangle *rect)

quad points is required field in text markup annot dict, so I think the constructors should receive the quadrilaterals.

@@ +1536,5 @@
> + * Since: 0.25
> + **/
> +void
> +poppler_annot_text_markup_set_quadrilaterals (PopplerAnnotTextMarkup *poppler_annot,
> +                                              GList *quad_list)

The length of the quad list is usually known, so I think we could use an array instead of a GList. This way the PopplerQuadrilateral structs can be stack allocated.

@@ +1549,5 @@
> +  g_return_if_fail (POPPLER_IS_ANNOT_TEXT_MARKUP (poppler_annot));
> +
> +  annot = static_cast<AnnotTextMarkup *>(POPPLER_ANNOT (poppler_annot)->annot);
> +
> +  quads = (AnnotQuadrilaterals::AnnotQuadrilateral **) g_new0 (

This will be freed by AnnotQuadrilaterals destructor with gfree, you should use gmalloc intead.

@@ +1594,5 @@
> +
> +  annot = static_cast<AnnotTextMarkup *>(POPPLER_ANNOT (poppler_annot)->annot);
> +
> +  back_quads = annot->getQuadrilaterals();
> +  length_quads = back_quads->getQuadrilateralsLength();

Same here, the size is known, we can return an array instead of a GList.
Comment 27 Carlos Garcia Campos 2013-11-03 12:40:35 UTC
(In reply to comment #24)
> Created attachment 88275 [details]
> Screenghost that shows the difference between cairo and a list of rectangles
> 
> The demo add two implementations to set the Quadrilaterals.  Using
> cairo_region_t or getting a GList of rectangles.
> 
> For the GList of rectangles I use a deprecated function of poppler-glib: 
> poppler_page_get_selection_region() that deprecated in favor of
> poppler_page_get_selected_region().

Maye text layout would be more appropriate than selected_region. We could add poppler_page_get_text_layout_for_rectangle as we discussed in the mailing list, so that you can pass the bbox of the annotations, and then you can build the list of lines. Or even a more convenient method that returns the list of lines instad of characters.

> In the screenshot it is possible to see the difference of both
> implementations:
> * With the GList we get one rectangle per line and it is rendered
>   with small rectangles with small rounding ends.
> * With cairo_region_t we get only 2 rectangles, making one huge rounding ends
>   and one smaller.

Yes, it's clear we really want the list of rectangles.

> The issue with cairo_region_t is exacerbated when using underline, squiggly
> or strikeout.  Because those are rendered per rectangle. So, instead of
> having every line underline or strikeout, there one big underline for the
> big rectangle.
> 
> Anyhow, you can try the patch setting cairo = TRUE|FALSE inside
> pgd_annots_update_selected_text().
Comment 28 Germán Poo-Caamaño 2013-11-19 00:24:57 UTC
(In reply to comment #25)
> Comment on attachment 88272 [details] [review] [review]
> glib-demo: Add support for PopplerTextAnnotMarkup
> 
> Review of attachment 88272 [details] [review] [review]:
> -----------------------------------------------------------------
> [...]
> @@ +217,5 @@
> > + *
> > + * Since: 0.25
> > + **/
> > +PopplerAnnotQuadrilateral *
> > +poppler_annot_quadrilateral_new (void)
> 
> I think the constructor should receive the points.

Shouldn't that be inconsistent with other similar boxed types such as PopplerPoint, PopplerRectangle, PopplerColor?
Comment 29 Germán Poo-Caamaño 2013-11-19 00:29:59 UTC
Created attachment 89435 [details] [review]
glib: Add PopplerQuadrilateral boxed type

Updated the patch addressing the comments, except the one I mentioned in #c28
Comment 30 Carlos Garcia Campos 2013-11-19 17:56:12 UTC
(In reply to comment #28)
> (In reply to comment #25)
> > Comment on attachment 88272 [details] [review] [review] [review]
> > glib-demo: Add support for PopplerTextAnnotMarkup
> > 
> > Review of attachment 88272 [details] [review] [review] [review]:
> > -----------------------------------------------------------------
> > [...]
> > @@ +217,5 @@
> > > + *
> > > + * Since: 0.25
> > > + **/
> > > +PopplerAnnotQuadrilateral *
> > > +poppler_annot_quadrilateral_new (void)
> > 
> > I think the constructor should receive the points.
> 
> Shouldn't that be inconsistent with other similar boxed types such as
> PopplerPoint, PopplerRectangle, PopplerColor?

Good point.
Comment 31 Germán Poo-Caamaño 2013-11-20 06:18:26 UTC
Created attachment 89510 [details] [review]
glib: Add PopplerAnnotTextMarkup class and subtypes

Updated patch.

Now it uses a GArray instead of a GList, and the memory is pre-allocated.

It is missing to receive the quadrilaterals in the constructors.
Comment 32 Germán Poo-Caamaño 2013-11-20 06:20:32 UTC
Created attachment 89511 [details] [review]
glib-demo: Add support for PopplerTextAnnotMarkup

Updated patch for the demo to make it work with GArrays.

Still using the deprecated method.  Still is pending something like poppler_page_get_text_layout_for_rectangle.
Comment 33 Germán Poo-Caamaño 2013-11-20 18:42:42 UTC
(In reply to comment #26)
> Comment on attachment 88273 [details] [review] [review]
> glib: Add PopplerAnnotTextMarkup class and subtypes
> [...]
> @@ +329,5 @@
> > + * Since: 0.25
> > + */
> > +PopplerAnnot *
> > +poppler_annot_text_markup_new_highlight (PopplerDocument  *doc,
> > +					 PopplerRectangle *rect)
> 
> quad points is required field in text markup annot dict, so I think the
> constructors should receive the quadrilaterals.

While I was implementing this, I noticed that the constructor in Annot.c
adds a dummy quadrilateral.

http://cgit.freedesktop.org/poppler/poppler/tree/poppler/Annot.cc#n3560

I think a good compromise would be to make it clear in the API
documentation that poppler_annot_text_markup_new_highlight adds a dummy
Quadrilateral, and to call poppler_annot_text_markup_set_quadrilaterals()
for changing them. That is, no requirement for quadrilateral in the
constructor (it already provides a dummy one).

My rationale is: if the constructor require quadrilaterals, in an
interactive use (like the proposed in the demo), we would create
a dummy quadrilateral taken from the rectangle.  For a single click,
it would be (0,0)(0,0), which is the same than the dummy quadrilateral
already created in AnnotTextMarkup::AnnotTextMarkup.

I could do it either way, but I wanted to bring the alternatives we have available before to proceed.
Comment 34 Germán Poo-Caamaño 2013-11-20 18:54:57 UTC
Created attachment 89541 [details] [review]
glib: Add PopplerAnnotTextMarkup class and subtypes

Updated patch. This fixes a leak on the previous patch.
Comment 35 Germán Poo-Caamaño 2013-11-20 18:56:01 UTC
Created attachment 89542 [details] [review]
glib-demo: Add support for PopplerTextAnnotMarkup

Updated patch. This also fixes a leak on the previous patch.
Comment 36 Germán Poo-Caamaño 2013-11-20 19:56:47 UTC
Created attachment 89545 [details] [review]
glib: Add PopplerAnnotTextMarkup class and subtypes

Updated patch.

I decided to go ahead and implemented both constructors:
* poppler_annot_text_markup_new_highlight
* poppler_annot_text_markup_new_highlight_with_quadrilaterals

per each subtype (highlight, squiggly, underline and strikeout).

I added a note in the former regarding to the dummy quadrilateral.
Comment 37 Germán Poo-Caamaño 2013-11-20 19:58:23 UTC
Created attachment 89546 [details] [review]
glib-demo: Add support for PopplerTextAnnotMarkup

Updated patch.

This used both: poppler_annot_text_markup_new_foo and
poppler_annot_text_markup_new_highlight_with_quadrilaterals. So, it can be seen how to use them.

I would make them all the same in the final patch, but you can get the idea.
Comment 38 Germán Poo-Caamaño 2013-11-21 08:01:37 UTC
Created attachment 89566 [details] [review]
glib: Add PopplerAnnotTextMarkup class and subtypes

Updated patch (fixes a declaration mistake introduced in the previous patch).
Comment 39 Germán Poo-Caamaño 2013-11-28 08:37:42 UTC
Created attachment 89945 [details] [review]
glib-demo using poppler_page_get_text_layout_for_area()

This is a patch that uses poppler_page_get_text_layout_for_area(). However, it is notoriously slower than using poppler_page_get_selection_region().
Comment 40 Germán Poo-Caamaño 2014-01-02 18:58:03 UTC
Created attachment 91431 [details] [review]
glib-demo using poppler_page_get_text_layout_for_area()

Updated patch, that used poppler_page_get_text_layout_for_area() and combines the rectangles that are in the same line.
Comment 41 Carlos Garcia Campos 2014-01-19 14:04:24 UTC
(In reply to comment #33)
> (In reply to comment #26)
> > Comment on attachment 88273 [details] [review] [review] [review]
> > glib: Add PopplerAnnotTextMarkup class and subtypes
> > [...]
> > @@ +329,5 @@
> > > + * Since: 0.25
> > > + */
> > > +PopplerAnnot *
> > > +poppler_annot_text_markup_new_highlight (PopplerDocument  *doc,
> > > +					 PopplerRectangle *rect)
> > 
> > quad points is required field in text markup annot dict, so I think the
> > constructors should receive the quadrilaterals.
> 
> While I was implementing this, I noticed that the constructor in Annot.c
> adds a dummy quadrilateral.
> 
> http://cgit.freedesktop.org/poppler/poppler/tree/poppler/Annot.cc#n3560
> 
> I think a good compromise would be to make it clear in the API
> documentation that poppler_annot_text_markup_new_highlight adds a dummy
> Quadrilateral, and to call poppler_annot_text_markup_set_quadrilaterals()
> for changing them. That is, no requirement for quadrilateral in the
> constructor (it already provides a dummy one).
> 
> My rationale is: if the constructor require quadrilaterals, in an
> interactive use (like the proposed in the demo), we would create
> a dummy quadrilateral taken from the rectangle.  For a single click,
> it would be (0,0)(0,0), which is the same than the dummy quadrilateral
> already created in AnnotTextMarkup::AnnotTextMarkup.
> 
> I could do it either way, but I wanted to bring the alternatives we have
> available before to proceed.

I think the dummy thing was actually a hack introduced in 4b13085568df09d8b75099f6a5438f025a028fd5 that we should not expose in our API. I think we should require as parameters of the constructors all values that are required by the PDF spec, even if we internally do a new + set. This means we should change the PopplerAnnotLine annotation.
Comment 42 Carlos Garcia Campos 2014-01-19 15:23:33 UTC
Comment on attachment 89566 [details] [review]
glib: Add PopplerAnnotTextMarkup class and subtypes

Review of attachment 89566 [details] [review]:
-----------------------------------------------------------------

Looks good. I've just pushed it with the modifications I suggested. Thanks!

::: glib/poppler-annot.cc
@@ +276,5 @@
> +                                sizeof (AnnotQuadrilaterals::AnnotQuadrilateral *),
> +                                quadrilaterals->len);
> +
> +  for (i = 0; i < quadrilaterals->len; i++) {
> +    quadrilateral = g_array_index (quadrilaterals, PopplerQuadrilateral, i);

You can use &g_array_index (quadrilaterals, PopplerQuadrilateral, i) to get the pinter and avoid copying the struct.

@@ +286,5 @@
> +                              quadrilateral.p4.x, quadrilateral.p4.y);
> +  }
> +
> +  back_quads = new AnnotQuadrilaterals (quads_array, i);
> +  annot->setQuadrilaterals (back_quads);

back_quads is leaked here, because setQuadrilaterals creates a new AnnotQuadrilaterals for the given one.

@@ +289,5 @@
> +  back_quads = new AnnotQuadrilaterals (quads_array, i);
> +  annot->setQuadrilaterals (back_quads);
> +
> +  return _poppler_create_annot (POPPLER_TYPE_ANNOT_TEXT_MARKUP, annot);
> +}

The content of this function is mostly duplicated in poppler_annot_text_markup_set_quadrilaterals, it could be factored out. We could add create_annot_quads_from_poppler_quads and create_poppler_quads_from_annot_quads

@@ +330,5 @@
> +  return _poppler_annot_text_markup_new (annot);
> +}
> +
> +/**
> + * poppler_annot_text_markup_new_highlight_with_quadrilaterals:

As I commented already, I think we should enforce text markup annot to be created with quads.

@@ +352,5 @@
> +  AnnotTextMarkup *annot;
> +  PDFRectangle pdf_rect(rect->x1, rect->y1,
> +			rect->x2, rect->y2);
> +
> +  g_return_val_if_fail (quadrilaterals != NULL, NULL);

You should aso check that the array is not empty.

@@ +1748,5 @@
> +    quadrilateral.p3.y = back_quads->getY3(i);
> +    quadrilateral.p4.x = back_quads->getX4(i);
> +    quadrilateral.p4.y = back_quads->getY4(i);
> +
> +    g_array_append_val (quads_array, quadrilateral);

We can avoid the memmove dancing here. Since we are creating a sized array and we know the size is not going to change, we can set the size right after creating the array and access its contents with g_array_index to fill the already allocated PopplerQuadrilateral directly.
Comment 43 Carlos Garcia Campos 2014-01-19 15:23:50 UTC
Comment on attachment 89435 [details] [review]
glib: Add PopplerQuadrilateral boxed type

Review of attachment 89435 [details] [review]:
-----------------------------------------------------------------

Pushed to git master, thanks!
Comment 44 Carlos Garcia Campos 2014-01-19 15:28:01 UTC
Comment on attachment 91431 [details] [review]
glib-demo using poppler_page_get_text_layout_for_area()

Review of attachment 91431 [details] [review]:
-----------------------------------------------------------------

I've pushed this one as well, adapted to the API changes introduced in previous commit and with some other minor modifications, see below.

::: glib/demo/annots.c
@@ +890,4 @@
>      rect.x2 = demo->stop.x;
>      rect.y2 = height - demo->stop.y;
>  
> +    quads_array = g_array_sized_new (FALSE, FALSE,

We should create this only when creating text markup annots.

@@ +900,5 @@
> +    quad.p2.y = rect.y1;
> +    quad.p3.x = rect.x1;
> +    quad.p3.y = rect.y2;
> +    quad.p4.x = rect.x2;
> +    quad.p4.y = rect.y2;

This could be moved to a helper function, since it's also used when updating the quads.

@@ +902,5 @@
> +    quad.p3.y = rect.y2;
> +    quad.p4.x = rect.x2;
> +    quad.p4.y = rect.y2;
> +
> +    g_array_append_val (quads_array, quad);

We can avoid the memory moves in this case too.

@@ +992,5 @@
> +    if (! poppler_page_get_text_layout_for_area (demo->page, &doc_area,
> +                                                 &rects, &n_rects))
> +        return;
> +
> +    r = g_new0 (PopplerRectangle, 1);

We can use g_slice here.

@@ +1004,5 @@
> +           the same line */
> +        if (ABS(r->y2 - rects[i].y2) > 0.0001) {
> +            if (i > 0)
> +                l_rects = g_list_append (l_rects, r);
> +            r = g_new0 (PopplerRectangle, 1);

Ditto.

@@ +1037,5 @@
> +        quadrilateral.p3.y = height - r->y2;
> +        quadrilateral.p4.x = r->x2;
> +        quadrilateral.p4.y = height - r->y2;
> +
> +        g_array_append_val (quads_array, quadrilateral);

We can also avoid the memory copies here.

@@ +1038,5 @@
> +        quadrilateral.p4.x = r->x2;
> +        quadrilateral.p4.y = height - r->y2;
> +
> +        g_array_append_val (quads_array, quadrilateral);
> +    }

Since we are iterating the list, we can free the rectangles, so that we don't need to iterate it again to free the contents in g_list_free_full
Comment 45 Germán Poo-Caamaño 2014-01-30 18:38:56 UTC
I am reopening this bug because in master the quadrilaterals are not
being combined for glyphs in the same area/rectangle.  For example,
with master highlighting 3 lines returns me 199 quadrilalerals instead
of 3.
Comment 46 Germán Poo-Caamaño 2014-01-30 22:49:55 UTC
Created attachment 93092 [details] [review]
glib-demo: Fix performance in text markup annotations

The performance issue is in the following snippet:

     if (ABS(r->y2 - rects[i].y2) > 0.0001) {
        if (i > 0)
             l_rects = g_list_append (l_rects, r);
         r = g_slice_new (PopplerRectangle);
         r->x1 = rects[i].x1;
         r->y1 = height - rects[i].y1;
         r->x2 = rects[i].x2;
         r->y2 = height - rects[i].y2;
         lines++;
     } else {
         r->x1 = MIN(r->x1, rects[i].x1);
         r->y1 = height - MIN(r->y1, rects[i].y1);
         r->x2 = MAX(r->x2, rects[i].x2);
         r->y2 = height - MAX(r->y2, rects[i].y2);
     }

The condition is never FALSE, r->y2 is never close to
rects[i].y2 because r->y2 is now relative to the height.
Therefore, this makes the text markup slow because it
creates a rectangle per glyph (in my test 199 glyphs in
3 lines).
 
To make it work, we would need something like:

    if (ABS(r->y2 - (height - rects[i].y2)) > 0.0001) {
        if (i > 0)
            l_rects = g_list_append (l_rects, r);
        r = g_slice_new (PopplerRectangle);
        r->x1 = rects[i].x1;
        r->y1 = height - rects[i].y1;
        r->x2 = rects[i].x2;
        r->y2 = height - rects[i].y2;
        lines++;
    } else {
        r->x1 = MIN(r->x1, rects[i].x1);
        r->y1 = MIN(r->y1, height - rects[i].y1 + 14);
        r->x2 = MAX(r->x2, rects[i].x2);
        r->y2 = MAX(r->y2, height - rects[i].y2);
    }

Do you notice the magic 14?  I suspect that that number is
the line height.  Without that number, the quadrilaterals
ends with something that resembles rectangle coordinates
instead of quadrilateral ones, for example:

P1: 114.0, 114.0    P2: 292.0, 553.0
P3: 114.0, 559.0    P4: 292.0, 559.0

instead of the correct one:

P1: 114.0, 114.0    P2: 292.0, 567.0 **
P3: 114.0, 559.0    P4: 292.0, 559.0

The difference between 567 and 553 is 14, the line height
(font size?) in my test.

If there were such API (which I don't know) we could even
set an extra pixel on both top and bottom, that would make
the highlight looks nicer (IMHO). However, the use of API
would become even more complicated than it should be.

I still think that a solution in the line of what
poppler_page_get_selection_region provided would be better.
Because it would make its use simpler, and we could leave
poppler_page_get_text_layout_for_area for special situations.
Comment 47 Carlos Garcia Campos 2014-02-09 14:01:16 UTC
Pushed to git master, thanks!
Comment 48 Jose Aliste 2014-08-03 10:45:49 UTC
What's the problem with using get_selection_region? I was even thinking on adding a poppler_markup_annot_set_to_selection (annot, rectangle) that would internally 
add the quadrilaterals according to the selection given by the rectangle, using a method similar to get_selection_region inside... This would save us from all the Quads->PopplerQuads->EvQuads conversions while constructing the annotation during drag and drop. 

Otherwise, I believe that poppler_page_get_selection_region is exactly the method we should use. I even argue for having a method poppler_page_get_quadrilaterals_from_region or something like that that already constructs the array of quadrilaterals... 

The fact that the rectangles associated to lines overlap is what makes the cairo_region_t method useless (cairo_region_t is optimized for clipping and drawing, so when you make a union_rectangle, it basically merges rectangles when it can). I think that is is fairly trivial to make the rectangles not to overlap, we need to override TextSelectionSizer::visitLine.

Currently it does something like this

  margin = (line->yMax - line->yMin) / 8;
  x1 = line->edge[edge_begin];
  y1 = line->yMin - margin;
  x2 = line->edge[edge_end];
  y2 = line->yMax + margin;

  rect = new PDFRectangle (floor (x1 * scale),
                           floor (y1 * scale),
                           ceil (x2 * scale),
                           ceil (y2 * scale));
so you clearly see that it's adding a margin to the rectangles. We could just override the method in a new class and use this class in the poppler_page_quadrilaterals_from_selection or in the poppler_anno_set_quadrilaterals_from_selection
Comment 49 Carlos Garcia Campos 2014-08-03 11:24:13 UTC
(In reply to comment #48)
> What's the problem with using get_selection_region?

The only problem is that it's deprecated.

> I was even thinking on
> adding a poppler_markup_annot_set_to_selection (annot, rectangle) that would
> internally 
> add the quadrilaterals according to the selection given by the rectangle,
> using a method similar to get_selection_region inside... This would save us
> from all the Quads->PopplerQuads->EvQuads conversions while constructing the
> annotation during drag and drop. 

text markup annotations already take a GArray as constructor parameter.

> Otherwise, I believe that poppler_page_get_selection_region is exactly the
> method we should use. I even argue for having a method
> poppler_page_get_quadrilaterals_from_region or something like that that
> already constructs the array of quadrilaterals... 

Having a new dedicated method for this sounds good to me. I would use something like

GArray *poppler_page_get_quadrilaterals_for_area (PopplerPage *page, PopplerRectangle *area);

because from_region() sounds to me like if the function would receive a cairo_region_t as parameter. 
We should also document that this doesn't return lines, but blocks.

> The fact that the rectangles associated to lines overlap is what makes the
> cairo_region_t method useless (cairo_region_t is optimized for clipping and
> drawing, so when you make a union_rectangle, it basically merges rectangles
> when it can). I think that is is fairly trivial to make the rectangles not
> to overlap, we need to override TextSelectionSizer::visitLine.
> 
> Currently it does something like this
> 
>   margin = (line->yMax - line->yMin) / 8;
>   x1 = line->edge[edge_begin];
>   y1 = line->yMin - margin;
>   x2 = line->edge[edge_end];
>   y2 = line->yMax + margin;
> 
>   rect = new PDFRectangle (floor (x1 * scale),
>                            floor (y1 * scale),
>                            ceil (x2 * scale),
>                            ceil (y2 * scale));
> so you clearly see that it's adding a margin to the rectangles. We could
> just override the method in a new class and use this class in the
> poppler_page_quadrilaterals_from_selection or in the
> poppler_anno_set_quadrilaterals_from_selection

I think we still want some margin for some of the text markup annotation, at least for highlight, no? TextSelectionSizer is only used by getSelectionRegion. Instead of adding a new derived class, we can just add another construct parameter, passed also to getSelectionRegion, to decide whether to use a margin or not.
Comment 50 Carlos Garcia Campos 2014-08-03 11:24:54 UTC
Btw, could you file a new bug report to discuss this instead of using this closed bug?