Bug 64821 - [TAGGEDPDF] Expose the structure tree and attributes in poppler-glib
Summary: [TAGGEDPDF] Expose the structure tree and attributes in poppler-glib
Status: RESOLVED FIXED
Alias: None
Product: poppler
Classification: Unclassified
Component: glib frontend (show other bugs)
Version: unspecified
Hardware: All All
: medium enhancement
Assignee: Adrian Perez de Castro
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on: 64815
Blocks: tagged-pdf 67710 74753
  Show dependency treegraph
 
Reported: 2013-05-21 09:51 UTC by Adrian Perez de Castro
Modified: 2014-06-17 17:13 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
[PATCH 5/6] Tagged-PDF: Expose the structure tree in poppler-glib (85.91 KB, patch)
2013-05-29 23:53 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH 6/6] Tagged-PDF: Pane in poppler-glib demo showing the structure (10.66 KB, patch)
2013-05-29 23:54 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v3 5/6] Tagged-PDF: Expose the structure tree in poppler-glib (99.49 KB, patch)
2013-06-06 14:49 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v3 6/6] Tagged-PDF: Pane in poppler-glib demo showing the structure (12.25 KB, patch)
2013-06-06 14:53 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v4 5/6] Tagged-PDF: Expose the structure tree in poppler-glib (103.13 KB, patch)
2013-06-12 16:10 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v4 6/6] Tagged-PDF: Pane in poppler-glib demo showing the structure (12.23 KB, patch)
2013-06-12 16:13 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v5 08/10] Tagged-PDF: Expose the structure tree in poppler-glib (103.40 KB, patch)
2013-06-18 16:08 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v5 09/10] Tagged-PDF: Pane in poppler-glib demo showing the structure (12.23 KB, patch)
2013-06-18 16:08 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v5 10/10] Tagged-PDF: Heuristics in poppler-glib for data/layout table identification (5.17 KB, patch)
2013-06-18 16:09 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v7 7/9] Tagged-PDF: Expose the structure tree in poppler-glib (103.41 KB, patch)
2013-08-01 13:48 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v7 8/9] Tagged-PDF: Pane in poppler-glib demo showing the structure (12.23 KB, patch)
2013-08-01 13:48 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v7 9/9] Tagged-PDF: Heuristics in poppler-glib for data/layout table identification (5.17 KB, patch)
2013-08-01 13:49 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v8 08/15] glib: Expose the document structure tree (36.32 KB, patch)
2013-09-26 18:47 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v8 09/15] glib: Private function _poppler_link_mapping_new_from_annot_link() (5.23 KB, patch)
2013-09-26 18:48 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v8 10/15] glib: Private function _poppler_link_mapping_new_from_form_field() (3.29 KB, patch)
2013-09-26 18:49 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v8 11/15] glib: Expose inline attributes of structure elements (9.70 KB, patch)
2013-09-26 18:50 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v8 12/15] glib: Accessors for form fields of structure elements (4.62 KB, patch)
2013-09-26 18:51 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v8 13/15] glib: Accessors for document structure references (7.45 KB, patch)
2013-09-26 18:51 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v8 14/15] glib: Implement accessors for element attributes (39.80 KB, patch)
2013-09-26 18:52 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v8 15/15] glib-demo: Pane showing the document structure (12.16 KB, patch)
2013-09-26 18:52 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v10 05/12] glib: Expose the document structure tree (33.53 KB, patch)
2013-11-11 20:36 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v10 06/12] glib: Private function _poppler_link_mapping_new_from_annot_link() (5.24 KB, patch)
2013-11-11 20:37 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v10 07/12] glib: Private function _poppler_link_mapping_new_from_form_field() (3.29 KB, patch)
2013-11-11 20:38 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v10 08/12] glib: Expose inline attributes of structure elements (14.66 KB, patch)
2013-11-11 20:38 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v10 09/12] glib: Accessors for form fields of structure elements (4.67 KB, patch)
2013-11-11 20:39 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v10 10/12] glib: Accessors for document structure references (7.40 KB, patch)
2013-11-11 20:39 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v10 11/12] glib: Implement accessors for element attributes (39.89 KB, patch)
2013-11-11 20:41 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v10 12/12] glib-demo: Pane showing the document structure (12.17 KB, patch)
2013-11-11 20:42 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v11 07/11] glib: Expose inline attributes of structure elements (13.20 KB, patch)
2013-11-27 20:41 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v12 04/11] glib: Expose the document structure tree (33.52 KB, patch)
2013-12-04 17:38 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v12 05/11] glib: Private function _poppler_link_mapping_new_from_annot_link() (5.22 KB, patch)
2013-12-04 17:40 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v12 06/11] glib: Private function _poppler_link_mapping_new_from_form_field() (3.29 KB, patch)
2013-12-04 17:41 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v12 07/11] glib: Expose inline attributes of structure elements (13.31 KB, patch)
2013-12-04 17:41 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v12 08/11] glib: Accessors for form fields of structure elements (4.66 KB, patch)
2013-12-04 17:42 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v12 09/11] glib: Accessors for document structure references (4.66 KB, patch)
2013-12-04 17:42 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v12 10/11] glib: Implement accessors for element attributes (39.88 KB, patch)
2013-12-04 17:43 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v12 11/11] glib-demo: Pane showing the document structure (12.17 KB, patch)
2013-12-04 17:44 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v14 09/10] glib: Implement accessors for element attributes (110.88 KB, patch)
2013-12-17 01:44 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v15 8/9] glib: Implement accessors for element attributes (111.33 KB, patch)
2014-02-13 17:15 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v16 8/9] glib: Implement accessors for element attributes (111.33 KB, patch)
2014-02-13 18:55 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v17 1/4] glib: Private function _poppler_link_mapping_new_from_form_field() (3.29 KB, patch)
2014-02-28 17:25 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v17 1/5] glib: Private function _poppler_link_mapping_new_from_annot_link() (5.22 KB, patch)
2014-02-28 17:27 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v17 2/5] glib: Private function _poppler_link_mapping_new_from_form_field() (3.29 KB, patch)
2014-02-28 17:27 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v17 3/5] glib: Accessors for form fields of structure elements (4.68 KB, patch)
2014-02-28 17:28 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v17 4/5] glib: Accessors for document structure references (7.91 KB, patch)
2014-02-28 17:29 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v17 5/5] glib: Implement accessors for element attributes (59.62 KB, patch)
2014-02-28 17:30 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v18 1/5] glib: Private function _poppler_link_mapping_new_from_annot_link() (5.22 KB, patch)
2014-03-03 00:19 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v18 2/5] glib: Private function _poppler_link_mapping_new_from_form_field() (3.29 KB, patch)
2014-03-03 00:20 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v18 3/5] glib: Accessors for form fields of structure elements (6.91 KB, patch)
2014-03-03 00:21 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v18 4/5] glib: Methods to obtain link information from structure elements (9.68 KB, patch)
2014-03-03 00:22 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v18 5/5] glib: Implement accessors for element attributes (65.79 KB, patch)
2014-03-03 00:22 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v19 3/5] glib: Implement accessors for element attributes (67.68 KB, patch)
2014-03-03 15:14 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v19 4/5] glib: Accessors for form fields of structure elements (8.38 KB, patch)
2014-03-03 15:14 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v19 5/5] glib: Methods to obtain link information from structure elements (13.66 KB, patch)
2014-03-03 15:15 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v20 4/5] glib: Accessors for form fields of structure elements (7.59 KB, patch)
2014-03-05 09:26 UTC, Adrian Perez de Castro
Details | Splinter Review
[PATCH v20 5/5] glib: Methods to obtain link information from structure elements (13.00 KB, patch)
2014-03-05 09:27 UTC, Adrian Perez de Castro
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Adrian Perez de Castro 2013-05-21 09:51:55 UTC
There are a number of bugs and feature requests in Evince, the GNOME document
viewer, which would benefit from exposing the Tagged-PDF structure of the
document, hence once support for reading it in Poppler is added (bug #64815)
it would be needed to expose it in poppler-glib.

Having the structure tree would enable for tackling (at least the following)
in Evince and other viewers using poppler-glib:

- Greatly improve accessibility support.
- Following logical focus order for form fields.
- Better support for reading multi-column documents
  (https://bugzilla.gnome.org/show_bug.cgi?id=166830)
- Reflow for small screens/windows
  (https://bugzilla.gnome.org/show_bug.cgi?id=556018)
Comment 1 Adrian Perez de Castro 2013-05-29 23:53:58 UTC
Created attachment 79993 [details] [review]
[PATCH 5/6] Tagged-PDF: Expose the structure tree in poppler-glib

The attached patch needs the other patches attached to bug #64815.
Comment 2 Adrian Perez de Castro 2013-05-29 23:54:39 UTC
Created attachment 79994 [details] [review]
[PATCH 6/6] Tagged-PDF: Pane in poppler-glib demo showing the structure

The attached patch needs the other patches attached to bug #64815.
Comment 3 Adrian Perez de Castro 2013-06-06 14:49:41 UTC
Created attachment 80416 [details] [review]
[PATCH v3 5/6] Tagged-PDF: Expose the structure tree in poppler-glib

Attached updated version of the 5/6 patch, with the following additions
on top of the previous version:

* Changes and API additions to handle object reference structure elements:

  - poppler_structure_element_is_reference()
  - poppler_structure_element_get_reference_type()

* API additions to get PopplerLinkMapping structures from object reference
  structure elements:

  - poppler_structure_element_get_reference_link()

  ...and to search for a the reference and obtain the PopplerLinkMapping
  from a POPPLER_STRUCTURE_ELEMENT_LINK element:

  - poppler_structure_element_find_link()

* New poppler_structure_element_get_page() function. Obtains the number of
  the page with the content described by the structure element.

* New poppler_structure_element_get_id() function. Returns the identifier
  of a structure element (or NULL if not defined).

* New poppler_structure_element_get_title() function: Returns the title
  of a structure element (or NULL if not defined).

* New popppler_structure_element_get_abbreviation() function: for
  POPPLER_STRUCTURE_ELEMENT_SPAN elements which contain an abbreviation,
  the function returns the expanded form of the abbreviation (or NULL
  if not defined or the element is not an abbreviation).

* New poppler_structure_element_get_alt_text() function: Returns the
  alternate text for an elemement (or NULL if not defined).

* New poppler_structure_element_get_actual_text() function: Returns
  the actual text (textual representation of a text-like graphic element,
  returns NULL if not defined).

* Function poppler_structure_element_get_language() does no longer have
  an argument to specify whether it should find the language by looking
  up recursively in the structure tree. According to the PDF spec, the
  language must always to be inherited from parent elements.

My plan is to update this patch further to add new functions to obtain
form fields from the structure tree, in a similar way in which the link
mappings are obtained, tentatively I would be adding:

* Definition of POPPLER_STRUCTURE_REFERENCE_FORM_FIELD.

* poppler_structure_element_get_reference_form_field(), to be used in an
  object reference structure element, returning a PopplerFormFieldMapping*.

* poppler_structure_element_find_form_field(), to be used in an element
  of type POPPLER_STRUCTURE_FORM, returning a PopplerFormFieldMapping*.
Comment 4 Adrian Perez de Castro 2013-06-06 14:53:31 UTC
Created attachment 80420 [details] [review]
[PATCH v3 6/6] Tagged-PDF: Pane in poppler-glib demo showing the structure

Also updated patch for the poppler-glib demo program.
Comment 5 Adrian Perez de Castro 2013-06-12 16:10:24 UTC
Created attachment 80732 [details] [review]
[PATCH v4 5/6] Tagged-PDF: Expose the structure tree in poppler-glib

Update with two new functions to obtain form field objects from structure
elements of type POPPLER_STRUCTURE_ELEMENT_FORM:

- poppler_structure_element_get_form_field()
- poppler_structure_element_get_form_field_mapping()
Comment 6 Adrian Perez de Castro 2013-06-12 16:13:02 UTC
Created attachment 80733 [details] [review]
[PATCH v4 6/6] Tagged-PDF: Pane in poppler-glib demo showing the structure

Updated patch for “poppler-glib-demo”, which a fix to avoid passing a
NULL string to gtk_text_buffer_set_text()
Comment 7 Carlos Garcia Campos 2013-06-16 10:27:12 UTC
Comment on attachment 80732 [details] [review]
[PATCH v4 5/6] Tagged-PDF: Expose the structure tree in poppler-glib

I guess this was marked as obsolete by mistake. I'll assume this is the right one.
Comment 8 Carlos Garcia Campos 2013-06-16 11:44:19 UTC
Comment on attachment 80732 [details] [review]
[PATCH v4 5/6] Tagged-PDF: Expose the structure tree in poppler-glib

Review of attachment 80732 [details] [review]:
-----------------------------------------------------------------

We are very later in the release cycle, and I have a lot of doubts about the API, so I think it'll be better to wait for the next cycle to add this API. We can try to merge the logical structure and tagged pdf support in the core for this cycle though.

For contents extracted from the document like the logical structure we usually use an iterator based API, see fonts, index and layer for example. I wonder if we should do the same for document structure, I'm not sure.

I would split this patch too.

::: glib/poppler-document.cc
@@ +1220,5 @@
>    return retval;
>  }
>  
> +/**
> + * poppler_document_get_structure:

What about using poppler_document_logical_structure? It's a bit longer but makes clear this returns information about the logical structure of the PDF document.

@@ +1224,5 @@
> + * poppler_document_get_structure:
> + * @document: A #PopplerDocument
> + *
> + * Returns the #PopplerStructure of the document. This object is owned by
> + * the called.

You should explain a briefly here what the document logical structure is

@@ +1226,5 @@
> + *
> + * Returns the #PopplerStructure of the document. This object is owned by
> + * the called.
> + *
> + * Return value: (transfer full): The #PopplerStructure of the document.

a #PopplerStructure object representing the logical structure of the document, or %NULL

@@ +1236,5 @@
> +
> +  g_return_val_if_fail (POPPLER_IS_DOCUMENT (document), NULL);
> +
> +  tree_root = document->doc->getStructTreeRoot ();
> +  if (!tree_root) return NULL;

Use two lines for this please.

::: glib/poppler-private.h
@@ +112,5 @@
> +  StructElement *elem;
> +  gchar *id;
> +  gchar *title;
> +  gchar *text;
> +  gchar *text_r;

what does r stand for? please don't use abbreviations

::: glib/poppler-structure-element.cc
@@ +599,5 @@
> +}
> +
> +
> +/**
> + * SECTION:poppler-structure-element

Move this to the beginning.

@@ +617,5 @@
> +{
> +  GObjectClass parent_class;
> +};
> +
> +G_DEFINE_TYPE (PopplerStructureElement, poppler_structure_element, G_TYPE_OBJECT);

Trailing ; is not needed.

@@ +628,5 @@
> +
> +  g_assert (structure);
> +  g_assert (element);
> +
> +  poppler_structure_element = (PopplerStructureElement *) g_object_new (POPPLER_TYPE_STRUCTURE_ELEMENT, NULL, NULL);

What's the second NULL? I you are not providing constructor properties, a single NULL is enough.

@@ +631,5 @@
> +
> +  poppler_structure_element = (PopplerStructureElement *) g_object_new (POPPLER_TYPE_STRUCTURE_ELEMENT, NULL, NULL);
> +  poppler_structure_element->text      = NULL;
> +  poppler_structure_element->text_r    = NULL;
> +  poppler_structure_element->children  = NULL;

GObjects are 0 filled when allocated, you can avoid all NULL/0 initializations.

@@ +636,5 @@
> +  poppler_structure_element->structure = structure;
> +  poppler_structure_element->elem      = element;
> +
> +  if (element->getNumElements ())
> +    poppler_structure_element->children = (PopplerStructureElement**) g_new0 (PopplerStructureElement*,

GPtrArray could be more convenient to use. You can use g_ptr_array_new_full (element->getNumElements (), g_object_unref);

@@ +698,5 @@
> +/**
> + * poppler_structure_element_get_page:
> + * @poppler_structure_element: A #PopplerStructureElement
> + *
> + * Return value: Number of the page that contains the element, of

of -> or

@@ +702,5 @@
> + * Return value: Number of the page that contains the element, of
> + *    <code>-1</code> if not defined.
> + */
> +gint
> +poppler_structure_element_get_page (PopplerStructureElement *poppler_structure_element)

Use just element or structure_element to make it a bit shorter.

@@ +720,5 @@
> +/**
> + * poppler_structure_element_is_content:
> + * @poppler_structure_element: A #PopplerStructureElement
> + *
> + * Checks whether an element is actual document content.

You should explain here what document comments means here.

@@ +737,5 @@
> +/**
> + * poppler_structure_element_is_inline:
> + * @poppler_structure_element: A #PopplerStructureElement
> + *
> + * Checks whether an element is an inline element.

Ditto, what's an inline element?

@@ +754,5 @@
> +/**
> + * poppler_structure_element_is_block:
> + * @poppler_structure_element: A #PopplerStructureElement
> + *
> + * Checks whether an element is a block element.

Ditto.

@@ +771,5 @@
> +/**
> + * poppler_structure_element_get_n_children:
> + * @poppler_structure_element: A #PopplerStructureElement
> + *
> + * Gets the number of children of @structure_element.

argument is @poppler_structure_element not @structure_element. I'm fine with the shorter name, but use it consistently.

@@ +791,5 @@
> + * @index: Index of the children element to obtain.
> + *
> + * Return value: (transfer none): A #PopplerStructureElement.
> + */
> +PopplerStructureElement*

PopplerStructureElement *

@@ +796,5 @@
> +poppler_structure_element_get_child (PopplerStructureElement *poppler_structure_element,
> +                                     guint                    index)
> +{
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE_ELEMENT (poppler_structure_element), NULL);
> +  g_assert (poppler_structure_element->elem);

This should not be needed, it's already checked when the object is created.

@@ +797,5 @@
> +                                     guint                    index)
> +{
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE_ELEMENT (poppler_structure_element), NULL);
> +  g_assert (poppler_structure_element->elem);
> +  g_assert (poppler_structure_element->elem->getNumElements () >= 0);

Use g_return_val_if_fail

@@ +816,5 @@
> + *
> + * Return value: (transfer none): The identifier of the element (if
> + *    defined), or %NULL.
> + */
> +const gchar*

const gchar *

@@ +1136,5 @@
> + * @attribute: A #PopperStructureAttribute value.
> + * @value (out): A #GValue in which to return the value of the attribute.
> + * @inherit: Whether to look up for inheritable attribute values in the
> + *    ancestors of the element, if the attribute is not defined in the
> + *    element.

Is this something the API user should decided? or are some attributes that are inherited by definition in the spec?

@@ +1276,5 @@
> + */
> +GVariant*
> +poppler_structure_element_get_attribute (PopplerStructureElement  *poppler_structure_element,
> +                                         PopplerStructureAttribute attribute,
> +                                         gboolean                  inherit)

This kind of API where a single method does everything is not convenient to use in the end, the use always have to deal with a GVariant. I prefer have different methods, get_attribute_int, get_attribute_bool, get_attribute_color, etc.

@@ +1450,5 @@
> + *
> + * Return value: Whether the element is a reference to another object.
> + */
> +gboolean
> +poppler_structure_element_is_reference (PopplerStructureElement *poppler_structure_element)

Is this really needed? Object reference are something internal to the PDF structure that I'm not sure we want to expose in the API. Maybe we can merge the content and reference type and have a get_content_type() or something similar where content type can be link, annot, form field, contents (not sure contents is the best name to expose marked content, though), etc.

@@ +1495,5 @@
> +  return reftype;
> +}
> +
> +/**
> + * poppler_structure_element_get_reference_link:

If I understand this correctly, the struct element corresponds to a link in the document, right? Shouldn't it be poppler_structure_element_get_referenced_link, since the links is referenced by the struct element?

@@ +1502,5 @@
> + * Return value: (transfer full): The #PopplerAnnot pointed by the object
> + *    reference, or %NULL of the element is not a reference pointing to
> + *    a #PopplerLink.
> + */
> +PopplerLinkMapping*

Why do we return a mapping? Struct tree elements are supposed to be the document structure independently from where they are physically in the document, no? To get the link mapping we have API for that. There's, however, a bounding box attribute defined for tagged PDFs for illustration elements and tables, but I guess it should be get as an attribute of the struct element.

@@ +1584,5 @@
> + * Return value: (transfer full): A #PopplerLinkMapping, or %NULL if the
> + *    link cannot be found.
> + */
> +PopplerLinkMapping*
> +poppler_structure_element_find_link (PopplerStructureElement *poppler_structure_element)

I think this is something that can be done with poppler_structure_element_get_child  and poppler_structure_element_get_reference_link. I would avoid this kind of convenient methods for now, if we see this is commonly duplicated by API users we can always add it later.

::: glib/poppler-structure-element.h
@@ +29,5 @@
> +#define POPPLER_STRUCTURE_ELEMENT(obj)    (G_TYPE_CHECK_INSTANCE_CAST ((obj), POPPLER_TYPE_STRUCTURE_ELEMENT, PopplerStructureElement))
> +#define POPPLER_IS_STRUCTURE_ELEMENT(obj) (G_TYPE_CHECK_INSTANCE_TYPE ((obj), POPPLER_TYPE_STRUCTURE_ELEMENT))
> +
> +/**
> + * PopplerStructureElementKind:

Use Type instead of Kind for consistency with other enums like PopplerActionType, PopplerDestType, PopplerAnnotType, etc.

::: glib/poppler-structure.cc
@@ +131,5 @@
> +
> +
> +
> +/**
> + * SECTION:poppler-structure

Move this section description to the beginning, after the includes.

@@ +164,5 @@
> +{
> +  GObjectClass parent_class;
> +};
> +
> +G_DEFINE_TYPE (PopplerStructure, poppler_structure, G_TYPE_OBJECT);

The trailing ; is not needed.

@@ +172,5 @@
> +poppler_structure_init (PopplerStructure *poppler_structure)
> +{
> +}
> +
> +

Leve only one empty line between functions

@@ +200,5 @@
> +  gobject_class->finalize = poppler_structure_finalize;
> +}
> +
> +
> +PopplerStructure*

PopplerStructure *

@@ +207,5 @@
> +{
> +  PopplerStructure *poppler_structure;
> +
> +  g_return_val_if_fail (POPPLER_IS_DOCUMENT (poppler_document), NULL);
> +  g_assert (root);

why g_return_val_if_fail for poppler_document but g_assert for root? Use g_assert in both cases, since this is not public API method.

@@ +212,5 @@
> +
> +  poppler_structure = (PopplerStructure*) g_object_new (POPPLER_TYPE_STRUCTURE, NULL, NULL);
> +
> +  poppler_structure->document = (PopplerDocument*) g_object_ref (poppler_document);
> +  poppler_structure->root     = root;

Do not line up the =

@@ +213,5 @@
> +  poppler_structure = (PopplerStructure*) g_object_new (POPPLER_TYPE_STRUCTURE, NULL, NULL);
> +
> +  poppler_structure->document = (PopplerDocument*) g_object_ref (poppler_document);
> +  poppler_structure->root     = root;
> +  poppler_structure->children = NULL;

GObjects are alwats 0 filled when allocated, this is already NULL.

@@ +217,5 @@
> +  poppler_structure->children = NULL;
> +
> +  if (root->getNumElements ())
> +    poppler_structure->children = (PopplerStructureElement**) g_new0 (PopplerStructureElement*,
> +                                                                      root->getNumElements ());

A GPtrArray could be more convenient to use.

@@ +250,5 @@
> +poppler_structure_get_child (PopplerStructure *poppler_structure,
> +                             guint             index)
> +{
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE (poppler_structure), NULL);
> +  g_assert (poppler_structure->root);

I don't think this is needed, PopplerStructure object is created by poppler internally with a StructTreeRoot and there's already an assert in _poppler_structure_new to make usre it's created with a valid pointer.

@@ +251,5 @@
> +                             guint             index)
> +{
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE (poppler_structure), NULL);
> +  g_assert (poppler_structure->root);
> +  g_assert (poppler_structure->root->getNumElements () >= 0);

Use g_return_val_if_fail here too

@@ +258,5 @@
> +  if (!poppler_structure->children[index])
> +    {
> +      poppler_structure->children[index] = _poppler_structure_element_new (poppler_structure,
> +                                                                           poppler_structure->root->getElement (index));
> +      g_object_ref_sink (poppler_structure->children[index]);

Why are you doing this? PopplerStructureElement is a GObject not a GInitiallyUnowned, so it's created with a real reference. You are leaking this element.
Comment 9 Adrian Perez de Castro 2013-06-18 16:08:08 UTC
Created attachment 81018 [details] [review]
[PATCH v5 08/10] Tagged-PDF: Expose the structure tree in poppler-glib
Comment 10 Adrian Perez de Castro 2013-06-18 16:08:41 UTC
Created attachment 81019 [details] [review]
[PATCH v5 09/10] Tagged-PDF: Pane in poppler-glib demo showing the  structure
Comment 11 Adrian Perez de Castro 2013-06-18 16:09:06 UTC
Created attachment 81020 [details] [review]
[PATCH v5 10/10] Tagged-PDF: Heuristics in poppler-glib for data/layout table identification
Comment 12 Adrian Perez de Castro 2013-08-01 13:48:05 UTC
Created attachment 83436 [details] [review]
[PATCH v7 7/9] Tagged-PDF: Expose the structure tree in poppler-glib
Comment 13 Adrian Perez de Castro 2013-08-01 13:48:41 UTC
Created attachment 83437 [details] [review]
[PATCH v7 8/9] Tagged-PDF: Pane in poppler-glib demo showing the structure
Comment 14 Adrian Perez de Castro 2013-08-01 13:49:10 UTC
Created attachment 83438 [details] [review]
[PATCH v7 9/9] Tagged-PDF: Heuristics in poppler-glib for data/layout table identification
Comment 15 Adrian Perez de Castro 2013-09-26 18:47:58 UTC
Created attachment 86681 [details] [review]
[PATCH v8 08/15] glib: Expose the document structure tree
Comment 16 Adrian Perez de Castro 2013-09-26 18:48:35 UTC
Created attachment 86682 [details] [review]
[PATCH v8 09/15] glib: Private function _poppler_link_mapping_new_from_annot_link()
Comment 17 Adrian Perez de Castro 2013-09-26 18:49:06 UTC
Created attachment 86683 [details] [review]
[PATCH v8 10/15] glib: Private function _poppler_link_mapping_new_from_form_field()
Comment 18 Adrian Perez de Castro 2013-09-26 18:50:43 UTC
Created attachment 86684 [details] [review]
[PATCH v8 11/15] glib: Expose inline attributes of structure elements
Comment 19 Adrian Perez de Castro 2013-09-26 18:51:11 UTC
Created attachment 86685 [details] [review]
[PATCH v8 12/15] glib: Accessors for form fields of structure elements
Comment 20 Adrian Perez de Castro 2013-09-26 18:51:30 UTC
Created attachment 86686 [details] [review]
[PATCH v8 13/15] glib: Accessors for document structure references
Comment 21 Adrian Perez de Castro 2013-09-26 18:52:13 UTC
Created attachment 86687 [details] [review]
[PATCH v8 14/15] glib: Implement accessors for element attributes
Comment 22 Adrian Perez de Castro 2013-09-26 18:52:33 UTC
Created attachment 86688 [details] [review]
[PATCH v8 15/15] glib-demo: Pane showing the document structure
Comment 23 Carlos Garcia Campos 2013-10-15 08:53:38 UTC
Comment on attachment 86681 [details] [review]
[PATCH v8 08/15] glib: Expose the document structure tree

Review of attachment 86681 [details] [review]:
-----------------------------------------------------------------

The API looks good to me in general, I think we should not cache the text in the StructElement, and documentation should be improved a bit. Thanks.

::: glib/poppler-structure-element.cc
@@ +44,5 @@
> + * and poppler_structure_get_child() to enumerate the top level elements.
> + */
> +
> +static PopplerStructureElement *
> +  _poppler_structure_element_new (PopplerDocument *document, StructElement *element);

If this is only used here, simply move the code before it's used so that we don't need the prototype. I think it makes sense to add all the iter implementation after the StructElement impl.

@@ +70,5 @@
> + *
> + * Creates a new #PopplerStructureElementIter as a copy of @iter. The
> + * returned value must be freed with poppler_structure_element_iter_free().
> + *
> + * Return value: a new #PopplerStructureElementIter

(transfer full)

@@ +71,5 @@
> + * Creates a new #PopplerStructureElementIter as a copy of @iter. The
> + * returned value must be freed with poppler_structure_element_iter_free().
> + *
> + * Return value: a new #PopplerStructureElementIter
> + */

You should add Since: 0.26 to all doc comments.

@@ +105,5 @@
> +/**
> + * poppler_structure_element_iter_new:
> + * @poppler_document: a #PopplerDocument.
> + *
> + * Returns the root #PopplerStructureElementIter for @document, or %NULL. The

@document -> @poppler_document or better use just document in parameter name.

@@ +155,5 @@
> +
> +  iter = g_slice_new0 (PopplerStructureElementIter);
> +  iter->document = (PopplerDocument *) g_object_ref (poppler_document);
> +  iter->is_root  = TRUE;
> +  iter->root     = root;

Do not line up the =

@@ +190,5 @@
> + * @iter: a #PopplerStructureElementIter
> + *
> + * Returns the #PopplerStructureElementIter associated with @iter.
> + *
> + * Return value: (transfer full): a new #PopplerStructureElementIter

#PopplerStructureElementIter -> #PopplerStructureElement

@@ +210,5 @@
> +/**
> + * poppler_structure_element_iter_get_child:
> + * @parent: a #PopplerStructureElementIter
> + *
> + * Return value: a new #PopplerStructureElementIter

you should document this can return %NULL if the iter doesn't have any child

@@ +344,5 @@
> +{
> +  GObjectClass parent_class;
> +};
> +
> +G_DEFINE_TYPE (PopplerStructureElement, poppler_structure_element, G_TYPE_OBJECT);

Unneeded trailing semicolon

@@ +355,5 @@
> +
> +  g_assert (POPPLER_IS_DOCUMENT (document));
> +  g_assert (element);
> +
> +  poppler_structure_element = (PopplerStructureElement *) g_object_new (POPPLER_TYPE_STRUCTURE_ELEMENT, NULL, NULL);

What's the second NULL?

@@ +357,5 @@
> +  g_assert (element);
> +
> +  poppler_structure_element = (PopplerStructureElement *) g_object_new (POPPLER_TYPE_STRUCTURE_ELEMENT, NULL, NULL);
> +  poppler_structure_element->document  = (PopplerDocument *) g_object_ref (document);
> +  poppler_structure_element->elem      = element;

Do not line up the =

@@ +401,5 @@
> + *
> + * Return value: A #PopplerStructureElementKind value.
> + */
> +PopplerStructureElementKind
> +poppler_structure_element_get_kind (PopplerStructureElement *poppler_structure_element)

Use structure_element or even just element for the parameter to make everything a bit shorter

@@ +413,5 @@
> +/**
> + * poppler_structure_element_get_page:
> + * @poppler_structure_element: A #PopplerStructureElement
> + *
> + * Return value: Number of the page that contains the element, of

of -> or

@@ +437,5 @@
> + * @poppler_structure_element: A #PopplerStructureElement
> + *
> + * Checks whether an element is actual document content.
> + *
> + * Return value: Whether the element is content.

%TRUE if @poppler_structure_element is document content, or %FALSE otherwise. And same pattern for all other is_foo methods

@@ +485,5 @@
> +
> +/**
> + * poppler_structure_element_get_id:
> + * @poppler_structure_element: A #PopplerStructureElement
> + *

Maybe we could document here what the element identifier is for and why it's optional?

@@ +487,5 @@
> + * poppler_structure_element_get_id:
> + * @poppler_structure_element: A #PopplerStructureElement
> + *
> + * Return value: (transfer none): The identifier of the element (if
> + *    defined), or %NULL.

Do not use the parentheses for if defined.

@@ +489,5 @@
> + *
> + * Return value: (transfer none): The identifier of the element (if
> + *    defined), or %NULL.
> + */
> +const gchar*

gchar* -> gchar *

@@ +496,5 @@
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE_ELEMENT (poppler_structure_element), NULL);
> +  g_assert (poppler_structure_element->elem);
> +
> +  if (!poppler_structure_element->id && poppler_structure_element->elem->getID ())
> +    poppler_structure_element->id = _poppler_goo_string_to_utf8 (poppler_structure_element->elem->getID ());

I'm not sure about caching the values in the object. The PopplerStructElement is mostly a read-only object, so users will typically convert them into a different object and will have to duplicate the contents in any case. Or even if the user wraps the PopplerStructElement in another object, she can still cache the values in the wrapper instead. So, I would return the new allocated string like we currently do in other apis like PopplerAnnot.

@@ +509,5 @@
> + * Return value: (transfer none): The title of the element (if defined),
> + *    or %NULL.
> + */
> +const gchar*
> +poppler_structure_element_get_title (PopplerStructureElement *poppler_structure_element)

Same comments for this an all other methods returning a gchar *

@@ +525,5 @@
> + * popppler_structure_element_get_abbreviation:
> + * @poppler_structure_element: A #PopplerStructureElement
> + *
> + * Acronyms and abbreviations contained in elements of type
> + * #POPPLER_STRUCTURE_ELEMENT_SPAN may have an associated expanded

#POPPLER_STRUCTURE_ELEMENT_SPAN -> %POPPLER_STRUCTURE_ELEMENT_SPAN

@@ +529,5 @@
> + * #POPPLER_STRUCTURE_ELEMENT_SPAN may have an associated expanded
> + * text form, which can be retrieved using this function.
> + *
> + * Return value: (transfer none): Text of the expanded abbreviation, if the
> + *    element text is an abbreviation or acronym.

or %NULL otherwise

@@ +551,5 @@
> + * poppler_structure_element_get_language:
> + * @poppler_structure_element: A #PopplerStructureElement
> + *
> + * Return value: (transfer none): language and country code, in two-letter
> + *    ISO format, e.g. <code>en_US</code>, or %NULL if not defined.

In general I prefer explanations in the body and keep the return tag simpler

@@ +633,5 @@
> +/**
> + * poppler_structure_element_get_text:
> + * @poppler_structure_element: A #PopplerStructureElement
> + * @recursive: If %TRUE, the text of child elements is gathered recursively
> + *   in logical order and returned as part of the result.

Do we really want to expose this? I mean, why would someone pass FALSE here? What about include_child or something like that? I'm not a fan of boolean parameters in public api. Another possibilidy is making this function private with the bool param, and expose two different methods poppler_structure_element_get_text and poppler_structure_element_get_text_full or using better names (get_text_including_child, get_text_recursive, ...)

@@ +635,5 @@
> + * @poppler_structure_element: A #PopplerStructureElement
> + * @recursive: If %TRUE, the text of child elements is gathered recursively
> + *   in logical order and returned as part of the result.
> + *
> + * Obtains the text enclosed by an element, or the subtree under an element.

I think this is a bit confusing, when recursive is TRUE you get the text enclosed by an element *and* the subtree, no? not the *or* the subtree. I would say something like. When @recursive is %TRUE, the text of the child element is gathered recursively in logical order and included as part of the result. Keeping the parameter documentation simpler with just something like: whether to include the text of the children, or something similar.

@@ +662,5 @@
> +      if (s)
> +        poppler_structure_element->text = _poppler_goo_string_to_utf8 (s);
> +      delete s;
> +    }
> +  return poppler_structure_element->text;

Looking at the implementation and the fact that you are caching the values separately (now I see that _r stands for recursive) I really think this should be split in two different methods.

::: glib/poppler-structure-element.h
@@ +29,5 @@
> +#define POPPLER_STRUCTURE_ELEMENT(obj)    (G_TYPE_CHECK_INSTANCE_CAST ((obj), POPPLER_TYPE_STRUCTURE_ELEMENT, PopplerStructureElement))
> +#define POPPLER_IS_STRUCTURE_ELEMENT(obj) (G_TYPE_CHECK_INSTANCE_TYPE ((obj), POPPLER_TYPE_STRUCTURE_ELEMENT))
> +
> +/**
> + * PopplerStructureElementKind:

Use Type instead of Kind for consistency with all other enums exposed in the api.

@@ +30,5 @@
> +#define POPPLER_IS_STRUCTURE_ELEMENT(obj) (G_TYPE_CHECK_INSTANCE_TYPE ((obj), POPPLER_TYPE_STRUCTURE_ELEMENT))
> +
> +/**
> + * PopplerStructureElementKind:
> + */

You should document every enum value here and add the Since tag

::: glib/reference/poppler-docs.sgml
@@ +23,5 @@
>      <xi:include href="xml/poppler-layer.xml"/>
>      <xi:include href="xml/poppler-media.xml"/>
>      <xi:include href="xml/poppler-movie.xml"/>
> +		<xi:include href="xml/poppler-structure.xml"/>
> +		<xi:include href="xml/poppler-structure-element.xml"/>

These are incorrectly indented.

::: glib/reference/poppler-sections.txt
@@ +536,5 @@
>  
>  <SECTION>
> +<FILE>poppler-structure-element</FILE>
> +<TITLE>PopplerStructureElement</TITLE>
> +PopplerTextSpan

Please, only add symbols exposed by this patch.

@@ +538,5 @@
> +<FILE>poppler-structure-element</FILE>
> +<TITLE>PopplerStructureElement</TITLE>
> +PopplerTextSpan
> +PopplerStructureElement
> +PopplerStructureElementIter

You could add the iter api in a anew subsection
Comment 24 Carlos Garcia Campos 2013-11-05 07:54:05 UTC
Comment on attachment 86682 [details] [review]
[PATCH v8 09/15] glib: Private function _poppler_link_mapping_new_from_annot_link()

Review of attachment 86682 [details] [review]:
-----------------------------------------------------------------

::: glib/poppler-page.cc
@@ +1158,5 @@
> +  PopplerLinkMapping *mapping;
> +  double width, height;
> +
> +  g_return_val_if_fail (POPPLER_IS_DOCUMENT (document), NULL);
> +  g_return_val_if_fail (link != NULL, NULL);

Do not use g_return macros in private methods, This is called only internally.

@@ +1160,5 @@
> +
> +  g_return_val_if_fail (POPPLER_IS_DOCUMENT (document), NULL);
> +  g_return_val_if_fail (link != NULL, NULL);
> +
> +  g_assert (page_num >= 0);

This is impossible to fail, it's called from a PopplerPage passing its index.

@@ +1161,5 @@
> +  g_return_val_if_fail (POPPLER_IS_DOCUMENT (document), NULL);
> +  g_return_val_if_fail (link != NULL, NULL);
> +
> +  g_assert (page_num >= 0);
> +  g_assert (page_num < poppler_document_get_n_pages (document));

Ditto.

@@ +1242,4 @@
>        /* Create the mapping */
> +      PopplerLinkMapping *mapping = _poppler_link_mapping_new_from_annot_link (page->document,
> +									       page->index,
> +									       links->getLink (i));

Better split this in two lines.
Comment 25 Carlos Garcia Campos 2013-11-05 07:56:48 UTC
Comment on attachment 86683 [details] [review]
[PATCH v8 10/15] glib: Private function _poppler_link_mapping_new_from_form_field()

Review of attachment 86683 [details] [review]:
-----------------------------------------------------------------

::: glib/poppler-page.cc
@@ +1280,5 @@
> +  Page *page;
> +
> +  g_assert (POPPLER_IS_FORM_FIELD (form_field));
> +  g_assert (page_num >= 0);
> +  g_assert (page_num < poppler_document_get_n_pages (form_field->document));

Same comments here, this asserts look a bit too paranoid for me.

@@ +1325,4 @@
>    
>    for (i = 0; i < forms->getNumWidgets (); i++) {
>      PopplerFormFieldMapping *mapping;
> +    PopplerFormField *field = _poppler_form_field_new (page->document,

If you split this too, you can use a single line for _poppler_form_field_new

@@ +1330,3 @@
>  
> +    mapping = _poppler_form_field_mapping_new_from_form_field (field,
> +							       page->index);

Use a single line here too.
Comment 26 Carlos Garcia Campos 2013-11-05 08:07:56 UTC
Comment on attachment 86684 [details] [review]
[PATCH v8 11/15] glib: Expose inline attributes of structure elements

Review of attachment 86684 [details] [review]:
-----------------------------------------------------------------

::: glib/poppler-structure-element.cc
@@ +784,5 @@
> +  GList *glist;
> +  char buf[8];
> +  Guint flags;
> +  Guint color;
> +};

I think this implementation could go to the core so that other frontends could use it. Using the goo types (or C++ ones) instead of glib ones. Frontends would only need to iterate a list to build their own list.

@@ +796,5 @@
> + * structures. Each item in the list is a piece of text which share the same
> + * attributes, plus its attributes.
> + *
> + * Return value: (transfer none) (element-type PopplerTextSpan): A #GList
> + *    of #PopplerTextSpan structures.

Since: 0.26

@@ +798,5 @@
> + *
> + * Return value: (transfer none) (element-type PopplerTextSpan): A #GList
> + *    of #PopplerTextSpan structures.
> + */
> +GList*

GList* -> GList *. If we get a list from the core and we know its size beforehand, maybe we could return an array instead of a GList.

@@ +819,5 @@
> +
> +/**
> + * poppler_text_span_is_fixed_width:
> + * @poppler_text_span: a #PopplerTextSpan
> + */

This is missing a bried description, Return value and Since tags.

@@ +829,5 @@
> +
> +/**
> + * poppler_text_span_is_serif_font:
> + * @poppler_text_span: a #PopplerTextSpan
> + */

Ditto.

@@ +837,5 @@
> +  return (poppler_text_span->flags & POPPLER_TEXT_SPAN_SERIF_FONT);
> +}
> +
> +/**
> + * poppler_text_span_is_bols:

bols ->bold

@@ +839,5 @@
> +
> +/**
> + * poppler_text_span_is_bols:
> + * @poppler_text_span: a #PopplerTextSpan
> + */

Ditto.

@@ +849,5 @@
> +
> +/**
> + * poppler_text_span_is_link:
> + * @poppler_text_span: a #PopplerTextSpan
> + */

Ditto.

::: glib/poppler-structure-element.h
@@ +87,5 @@
> +  gchar *text;
> +  gchar *font_name;
> +  gchar *link_target;
> +  guint  flags;
> +  guint  color; /* 0x00RRGGBB */

We could conver this to a PopplerColor when building the span to make it more convenient to use for users.

@@ +88,5 @@
> +  gchar *font_name;
> +  gchar *link_target;
> +  guint  flags;
> +  guint  color; /* 0x00RRGGBB */
> +};

Maybe we could make this private and add getters for all other members as well.
Comment 27 Carlos Garcia Campos 2013-11-05 08:08:49 UTC
Comment on attachment 86684 [details] [review]
[PATCH v8 11/15] glib: Expose inline attributes of structure elements

Review of attachment 86684 [details] [review]:
-----------------------------------------------------------------

New symbols added here should be added to the sections.txt file too.
Comment 28 Carlos Garcia Campos 2013-11-05 08:23:47 UTC
Comment on attachment 86685 [details] [review]
[PATCH v8 12/15] glib: Accessors for form fields of structure elements

Review of attachment 86685 [details] [review]:
-----------------------------------------------------------------

New public symbols should be added to the sections.txt file

::: glib/poppler-structure-element.cc
@@ +859,5 @@
> +
> +/**
> + * poppler_structure_element_get_form_field:
> + * @poppler_structure_element: A #PopplerStructureElement
> + *

You should add a brief explanation here.

@@ +862,5 @@
> + * @poppler_structure_element: A #PopplerStructureElement
> + *
> + * Return value: (transfer full): A #PopplerFormField, or %NULL if
> + *    the element is not a %POPPLER_STRUCTURE_ELEMENT_FORM.
> + */

Since: 0.26.

@@ +880,5 @@
> +    return NULL;
> +
> +  gint field_id = -1;
> +  const StructElement *child = poppler_structure_element->elem->getElement (0);
> +  if (child->isContent ())

You can return early here to simplify the return below.

@@ +896,5 @@
> +
> +  if (field_id < 0)
> +    return NULL;
> +
> +  return (field_id < 0) ? NULL : poppler_document_get_form_field (poppler_structure_element->document,

You have already returned in the previous if when field_id < 0.

@@ +903,5 @@
> +
> +/**
> + * poppler_structure_element_get_form_field_mapping:
> + * @poppler_structure_element: a #PopplerStructureElement
> + *

Description.

@@ +906,5 @@
> + * @poppler_structure_element: a #PopplerStructureElement
> + *
> + * Return value: (transfer full): A #PopplerFormFieldMapping, or %NULL if
> + *    the element is not a %POPPLER_STRUCTURE_ELEMENT_FORM
> + */

Since: 0.26

::: glib/poppler-structure-element.h
@@ +118,4 @@
>                                                                                     gboolean                  recursive);
>  const gchar                 *poppler_structure_element_get_alt_text               (PopplerStructureElement  *poppler_structure_element);
>  const gchar                 *poppler_structure_element_get_actual_text            (PopplerStructureElement  *poppler_structure_element);
> +PopplerFormField            *poppler_structure_element_get_form_field             (PopplerStructureElement  *poppler_structure_element);

This is not correctly indented.

@@ +118,5 @@
>                                                                                     gboolean                  recursive);
>  const gchar                 *poppler_structure_element_get_alt_text               (PopplerStructureElement  *poppler_structure_element);
>  const gchar                 *poppler_structure_element_get_actual_text            (PopplerStructureElement  *poppler_structure_element);
> +PopplerFormField            *poppler_structure_element_get_form_field             (PopplerStructureElement  *poppler_structure_element);
> +PopplerFormFieldMapping     *poppler_structure_element_get_form_field_mapping     (PopplerStructureElement  *poppler_structure_element);

Ditto.
Comment 29 Carlos Garcia Campos 2013-11-05 08:25:39 UTC
(In reply to comment #28)
> ::: glib/poppler-structure-element.h
> @@ +118,4 @@
> >                                                                                     gboolean                  recursive);
> >  const gchar                 *poppler_structure_element_get_alt_text               (PopplerStructureElement  *poppler_structure_element);
> >  const gchar                 *poppler_structure_element_get_actual_text            (PopplerStructureElement  *poppler_structure_element);
> > +PopplerFormField            *poppler_structure_element_get_form_field             (PopplerStructureElement  *poppler_structure_element);
> 
> This is not correctly indented.

Forget this, the review UI split this and confused me, sorry.

> @@ +118,5 @@
> >                                                                                     gboolean                  recursive);
> >  const gchar                 *poppler_structure_element_get_alt_text               (PopplerStructureElement  *poppler_structure_element);
> >  const gchar                 *poppler_structure_element_get_actual_text            (PopplerStructureElement  *poppler_structure_element);
> > +PopplerFormField            *poppler_structure_element_get_form_field             (PopplerStructureElement  *poppler_structure_element);
> > +PopplerFormFieldMapping     *poppler_structure_element_get_form_field_mapping     (PopplerStructureElement  *poppler_structure_element);
> 
> Ditto.

Same here.
Comment 30 Adrian Perez de Castro 2013-11-11 17:24:07 UTC
(In reply to comment #26)
> Comment on attachment 86684 [details] [review] [review]
> [PATCH v8 11/15] glib: Expose inline attributes of structure elements
> 
> Review of attachment 86684 [details] [review] [review]:
> -----------------------------------------------------------------
> 
> ::: glib/poppler-structure-element.cc
> @@ +784,5 @@
> > +  GList *glist;
> > +  char buf[8];
> > +  Guint flags;
> > +  Guint color;
> > +};
> 
> I think this implementation could go to the core so that other frontends
> could use it. Using the goo types (or C++ ones) instead of glib ones.
> Frontends would only need to iterate a list to build their own list.

For the moment I am going to provide an updated patch for this, in a
later patch set I will move the SpanBuilder class to Poppler core.

> @@ +798,5 @@
> > + *
> > + * Return value: (transfer none) (element-type PopplerTextSpan): A #GList
> > + *    of #PopplerTextSpan structures.
> > + */
> > +GList*
> 
> GList* -> GList *. If we get a list from the core and we know its size
> beforehand, maybe we could return an array instead of a GList.

I will do this when moving SpanBuilder to Poppler core.

> @@ +819,5 @@
> > +
> > +/**
> > + * poppler_text_span_is_fixed_width:
> > + * @poppler_text_span: a #PopplerTextSpan
> > + */
> 
> This is missing a bried description, Return value and Since tags.

Also, I am renaming it to poppler_text_span_is_fixed_width_font(),
for coherency with poppler_text_span_is_serif_font().
  
> @@ +839,5 @@
> > +
> > +/**
> > + * poppler_text_span_is_bols:
> > + * @poppler_text_span: a #PopplerTextSpan
> > + */
> 
> Ditto.

Also, I am renaming it to poppler_text_span_is_bold_font(),
for coherency with poppler_text_span_is_serif_font().

> ::: glib/poppler-structure-element.h
> @@ +87,5 @@
> > +  gchar *text;
> > +  gchar *font_name;
> > +  gchar *link_target;
> > +  guint  flags;
> > +  guint  color; /* 0x00RRGGBB */
> 
> We could conver this to a PopplerColor when building the span to make it
> more convenient to use for users.
> 
> […]
> 
> Maybe we could make this private and add getters for all other members as
> well.

Sure thing.
Comment 31 Adrian Perez de Castro 2013-11-11 20:36:43 UTC
Created attachment 89048 [details] [review]
[PATCH v10 05/12] glib: Expose the document structure tree
Comment 32 Adrian Perez de Castro 2013-11-11 20:37:28 UTC
Created attachment 89049 [details] [review]
[PATCH v10 06/12] glib: Private function _poppler_link_mapping_new_from_annot_link()
Comment 33 Adrian Perez de Castro 2013-11-11 20:38:13 UTC
Created attachment 89050 [details] [review]
[PATCH v10 07/12] glib: Private function _poppler_link_mapping_new_from_form_field()
Comment 34 Adrian Perez de Castro 2013-11-11 20:38:45 UTC
Created attachment 89051 [details] [review]
[PATCH v10 08/12] glib: Expose inline attributes of structure elements
Comment 35 Adrian Perez de Castro 2013-11-11 20:39:20 UTC
Created attachment 89052 [details] [review]
[PATCH v10 09/12] glib: Accessors for form fields of structure elements
Comment 36 Adrian Perez de Castro 2013-11-11 20:39:48 UTC
Created attachment 89053 [details] [review]
[PATCH v10 10/12] glib: Accessors for document structure references
Comment 37 Adrian Perez de Castro 2013-11-11 20:41:36 UTC
Created attachment 89054 [details] [review]
[PATCH v10 11/12] glib: Implement accessors for element attributes

Note that I will still upload one further improved version of this that
adds a getter per attribute. I am posting this version to keep the patch
updated along with v10 of the patch set.
Comment 38 Adrian Perez de Castro 2013-11-11 20:42:04 UTC
Created attachment 89055 [details] [review]
[PATCH v10 12/12] glib-demo: Pane showing the document structure
Comment 39 Adrian Perez de Castro 2013-11-27 20:41:58 UTC
Created attachment 89919 [details] [review]
[PATCH v11 07/11] glib: Expose inline attributes of structure elements
Comment 40 Carlos Garcia Campos 2013-11-28 13:04:40 UTC
Comment on attachment 89919 [details] [review]
[PATCH v11 07/11] glib: Expose inline attributes of structure elements

Review of attachment 89919 [details] [review]:
-----------------------------------------------------------------

My main concern is that we already have both a way to provide font information and text attributes, but I think none of them fit in this API. We have PopplerFontInfo, but it's actually an iterator, and we have PopplerTextAttributes, that contains the offsets of the text in the page text.

::: glib/poppler-structure-element.cc
@@ +718,5 @@
> + * <informalexample><programlisting>
> + * GList *text_spans = poppler_structure_element_get_text_spans (element);
> + * /<!-- -->* Use the text spans *<!-- -->/
> + * g_list_free_full (text_spans,
> + *                   (GDestroyNotify) poppler_text_span_free);

We usually provide a convenient method to do this, see poppler_page_free_link_mapping, poppler_page_free_text_attributes, etc.

@@ +743,5 @@
> +    {
> +      PopplerTextSpan *span = g_slice_new0 (PopplerTextSpan);
> +      span->text = _poppler_goo_string_to_utf8 (i->getText ());
> +
> +      GfxRGB rgb = i->getColor();

Is this copying the struct? should it return a const reference instead or a pointer?

@@ +761,5 @@
> +            span->flags |= POPPLER_TEXT_SPAN_SERIF_FONT;
> +          if (i->getFont ()->isItalic ())
> +            span->flags |= POPPLER_TEXT_SPAN_ITALIC;
> +          if (i->getFont ()->isBold ())
> +            span->flags |= POPPLER_TEXT_SPAN_BOLD;

Why not doing this in the core too and here simply do i->getFlags()?

@@ +775,5 @@
> +              span->flags |= POPPLER_TEXT_SPAN_BOLD;
> +            default:
> +              break;
> +            }
> +        }

I would move this block to a helper function that creates a PopplerTextSpan from a given TextSpan

@@ +796,5 @@
> + */
> +PopplerTextSpan *
> +poppler_text_span_copy (PopplerTextSpan *poppler_text_span)
> +{
> +  PopplerTextSpan *new_span = g_slice_new0 (PopplerTextSpan);

You can use g_slice_dup and then you only need to manually copy the heap allocated members.

@@ +938,5 @@
> +  return poppler_text_span->font_name;
> +}
> +
> +/**
> + * poppler_text_span_get_link_target:

What is a link target? a named destination?
Comment 41 Adrian Perez de Castro 2013-12-04 17:37:44 UTC
(In reply to comment #40)
> Comment on attachment 89919 [details] [review] [review]
> [PATCH v11 07/11] glib: Expose inline attributes of structure elements
> 
> Review of attachment 89919 [details] [review] [review]:
> -----------------------------------------------------------------
> 
> My main concern is that we already have both a way to provide font
> information and text attributes, but I think none of them fit in this API.
> We have PopplerFontInfo, but it's actually an iterator, and we have
> PopplerTextAttributes, that contains the offsets of the text in the page
> text.

Yes, I looked into PopplerTextAttributes first but it is not a good fit
for this kind of access :-(

> ::: glib/poppler-structure-element.cc
> @@ +718,5 @@
> > + * <informalexample><programlisting>
> > + * GList *text_spans = poppler_structure_element_get_text_spans (element);
> > + * /<!-- -->* Use the text spans *<!-- -->/
> > + * g_list_free_full (text_spans,
> > + *                   (GDestroyNotify) poppler_text_span_free);
> 
> We usually provide a convenient method to do this, see
> poppler_page_free_link_mapping, poppler_page_free_text_attributes, etc.

I have added a poppler_structure_element_free_text_spans() function

> @@ +743,5 @@
> > +    {
> > +      PopplerTextSpan *span = g_slice_new0 (PopplerTextSpan);
> > +      span->text = _poppler_goo_string_to_utf8 (i->getText ());
> > +
> > +      GfxRGB rgb = i->getColor();
> 
> Is this copying the struct? should it return a const reference instead or a
> pointer?

In the last version of the patch for Poppler core the TextSpan::getColor()
method returns a reference.

> @@ +761,5 @@
> > +            span->flags |= POPPLER_TEXT_SPAN_SERIF_FONT;
> > +          if (i->getFont ()->isItalic ())
> > +            span->flags |= POPPLER_TEXT_SPAN_ITALIC;
> > +          if (i->getFont ()->isBold ())
> > +            span->flags |= POPPLER_TEXT_SPAN_BOLD;
> 
> Why not doing this in the core too and here simply do i->getFlags()?

Adding TextSpan::getFlags() in core seems redundant to me because
TextSpan provides access to the GfxFont. Also, to be able to do a
plain “span->flags = i->getFlags()” would need to either manually
type the same bits for the flags in the poppler-glib header (not
nice if it changes in the future), or to include the core header
(not an option), or to check each flag from core and set it with
the corresponding poppler-glib value... IMHO it is not worth the
effort to have a TextSpan::getFlags() in core.

> @@ +775,5 @@
> > +              span->flags |= POPPLER_TEXT_SPAN_BOLD;
> > +            default:
> > +              break;
> > +            }
> > +        }
> 
> I would move this block to a helper function that creates a PopplerTextSpan
> from a given TextSpan

Moved.

> @@ +796,5 @@
> > + */
> > +PopplerTextSpan *
> > +poppler_text_span_copy (PopplerTextSpan *poppler_text_span)
> > +{
> > +  PopplerTextSpan *new_span = g_slice_new0 (PopplerTextSpan);
> 
> You can use g_slice_dup and then you only need to manually copy the heap
> allocated members.

Ah, nice. I didn't know about g_slice_dup(), it is quite handy  :)

> @@ +938,5 @@
> > +  return poppler_text_span->font_name;
> > +}
> > +
> > +/**
> > + * poppler_text_span_get_link_target:
> 
> What is a link target? a named destination?

This is removed now, it does not make sense as links are exposed as
a PopplerStructureElement object.
Comment 42 Adrian Perez de Castro 2013-12-04 17:38:32 UTC
Created attachment 90261 [details] [review]
[PATCH v12 04/11] glib: Expose the document structure tree
Comment 43 Adrian Perez de Castro 2013-12-04 17:40:18 UTC
Created attachment 90262 [details] [review]
[PATCH v12 05/11] glib: Private function _poppler_link_mapping_new_from_annot_link()
Comment 44 Adrian Perez de Castro 2013-12-04 17:41:00 UTC
Created attachment 90263 [details] [review]
[PATCH v12 06/11] glib: Private function _poppler_link_mapping_new_from_form_field()
Comment 45 Adrian Perez de Castro 2013-12-04 17:41:28 UTC
Created attachment 90264 [details] [review]
[PATCH v12 07/11] glib: Expose inline attributes of structure elements
Comment 46 Adrian Perez de Castro 2013-12-04 17:42:12 UTC
Created attachment 90265 [details] [review]
[PATCH v12 08/11] glib: Accessors for form fields of structure elements
Comment 47 Adrian Perez de Castro 2013-12-04 17:42:50 UTC
Created attachment 90266 [details] [review]
[PATCH v12 09/11] glib: Accessors for document structure references
Comment 48 Adrian Perez de Castro 2013-12-04 17:43:58 UTC
Created attachment 90267 [details] [review]
[PATCH v12 10/11] glib: Implement accessors for element attributes
Comment 49 Adrian Perez de Castro 2013-12-04 17:44:25 UTC
Created attachment 90268 [details] [review]
[PATCH v12 11/11] glib-demo: Pane showing the document structure
Comment 50 Adrian Perez de Castro 2013-12-17 01:44:21 UTC
Created attachment 90852 [details] [review]
[PATCH v14 09/10] glib: Implement accessors for element attributes

New version of the patch with the support for querying structure element
attributes. The change comparing to the previous version is quite huge:

 * Now there is one accessor method per standard attribute, e.g.
   poppler_structure_element_get_color().
 * All the accessors have a flag to optionally lookup inheritable
   attributes or restrict the lookup to the passed structure element.
 * All the accessors return a “gboolean” which indicates whether the
   attribute has been looked up. If it returns FALSE, it means that
   the value could not be found and there is no default value for the
   attribute, or that the attribute was incorrectly formatted.
 * The generic poppler_structure_element_get_attribute() method which
   returns a GVariant —and which may be friendlier in some cases like
   bindings for other languages— is still provided, and is implemented
   in terms of the individual accessors.
Comment 51 Carlos Garcia Campos 2014-02-09 14:46:40 UTC
Comment on attachment 90261 [details] [review]
[PATCH v12 04/11] glib: Expose the document structure tree

Review of attachment 90261 [details] [review]:
-----------------------------------------------------------------

::: glib/poppler-structure-element.cc
@@ +44,5 @@
> + * and poppler_structure_get_child() to enumerate the top level elements.
> + */
> +
> +static PopplerStructureElement *
> +  _poppler_structure_element_new (PopplerDocument *document, StructElement *element);

You should move the iter implementation to the bottom of the file and we don't need this prototype here.

@@ +139,5 @@
> + *   poppler_structure_element_iter_free (iter);
> + * }
> + * </programlisting></informalexample>
> + *
> + * Return value: (transfer full): a new #PopplerStructureElementIter

, or %NULL if ....

@@ +417,5 @@
> +PopplerStructureElementKind
> +poppler_structure_element_get_kind (PopplerStructureElement *poppler_structure_element)
> +{
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE_ELEMENT (poppler_structure_element), POPPLER_STRUCTURE_ELEMENT_UNKNOWN);
> +  g_assert (poppler_structure_element->elem);

Use g_return macros in public methods.

@@ +419,5 @@
> +{
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE_ELEMENT (poppler_structure_element), POPPLER_STRUCTURE_ELEMENT_UNKNOWN);
> +  g_assert (poppler_structure_element->elem);
> +
> +  return _poppler_structelement_type_to_poppler_structure_element_kind (poppler_structure_element->elem->getType ());

This method is only used here, I think we can add th switch here instead.

@@ +436,5 @@
> + */
> +gint
> +poppler_structure_element_get_page (PopplerStructureElement *poppler_structure_element)
> +{
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE_ELEMENT (poppler_structure_element), POPPLER_STRUCTURE_ELEMENT_UNKNOWN);

POPPLER_STRUCTURE_ELEMENT_UNKNOWN -> -1

@@ +454,5 @@
> + * @poppler_structure_element: A #PopplerStructureElement
> + *
> + * Checks whether an element is actual document content.
> + *
> + * Return value: Whether the element is content.

%TRUE if ..., %FALSE otherwise. And this pattern in all other methods.

@@ +524,5 @@
> +  g_assert (poppler_structure_element->elem);
> +
> +  GooString *string = poppler_structure_element->elem->getID ();
> +  gchar *result = string ? _poppler_goo_string_to_utf8 (string) : NULL;
> +  delete string;

getID() doesn't copy the string

@@ +546,5 @@
> +  g_assert (poppler_structure_element->elem);
> +
> +  GooString *string = poppler_structure_element->elem->getTitle ();
> +  gchar *result = string ? _poppler_goo_string_to_utf8 (string) : NULL;
> +  delete string;

Ditto.

@@ +574,5 @@
> +    return NULL;
> +
> +  GooString *string = poppler_structure_element->elem->getExpandedAbbr ();
> +  gchar *result = string ? _poppler_goo_string_to_utf8 (string) : NULL;
> +  delete string;

Ditto.

@@ +598,5 @@
> +  g_assert (poppler_structure_element->elem);
> +
> +  GooString *string = poppler_structure_element->elem->getLanguage ();
> +  gchar *result = string ? _poppler_goo_string_to_utf8 (string) : NULL;
> +  delete string;

Ditto.

@@ +626,5 @@
> +  g_assert (poppler_structure_element->elem);
> +
> +  GooString *string = poppler_structure_element->elem->getAltText ();
> +  gchar *result = string ? _poppler_goo_string_to_utf8 (string) : NULL;
> +  delete string;

Ditto.

@@ +656,5 @@
> +  g_assert (poppler_structure_element->elem);
> +
> +  GooString *string = poppler_structure_element->elem->getActualText ();
> +  gchar *result = string ? _poppler_goo_string_to_utf8 (string) : NULL;
> +  delete string;

Ditto.

::: glib/reference/poppler-docs.sgml
@@ +23,5 @@
>      <xi:include href="xml/poppler-layer.xml"/>
>      <xi:include href="xml/poppler-media.xml"/>
>      <xi:include href="xml/poppler-movie.xml"/>
> +		<xi:include href="xml/poppler-structure.xml"/>
> +		<xi:include href="xml/poppler-structure-element.xml"/>

Bad indentation here.
Comment 52 Carlos Garcia Campos 2014-02-09 15:49:12 UTC
Comment on attachment 90264 [details] [review]
[PATCH v12 07/11] glib: Expose inline attributes of structure elements

Review of attachment 90264 [details] [review]:
-----------------------------------------------------------------

::: glib/poppler-structure-element.cc
@@ +705,5 @@
>  }
> +
> +
> +static PopplerTextSpan *
> +_poppler_convert_text_span (const TextSpan& span)

This could be called text_span_to_poppler_text_span for consistency with other similar methods. Do not use _ prefix for private static methods.

@@ +708,5 @@
> +static PopplerTextSpan *
> +_poppler_convert_text_span (const TextSpan& span)
> +{
> +    PopplerTextSpan *new_span = g_slice_new0 (PopplerTextSpan);
> +    new_span->text = _poppler_goo_string_to_utf8 (span.getText ());

We should check that the returned text is not NULL.

@@ +766,5 @@
> + * guint n_spans;
> + * PopplerTextSpan **text_spans =
> + *    poppler_structure_element_get_text_spans (element, &n_spans);
> + * /<!-- -->* Use the text spans *<!-- -->/
> + * poppler_text_spans_free (text_spans, n_spans);

This is not correct, the free method doesn't return the size of the array.

@@ +769,5 @@
> + * /<!-- -->* Use the text spans *<!-- -->/
> + * poppler_text_spans_free (text_spans, n_spans);
> + * </programlisting></informalexample>
> + *
> + * Return value: (transfer full) (array length=n_text_spans):

You should also add the element type annot so that bindings know how to free the contents.

@@ +770,5 @@
> + * poppler_text_spans_free (text_spans, n_spans);
> + * </programlisting></informalexample>
> + *
> + * Return value: (transfer full) (array length=n_text_spans):
> + *    Null-terminated array of #PopplerTextSpan structures.

I think we should either make the array null terminated and not return the size or return the size making the out param mandatory and not adding the extra null element

@@ +779,5 @@
> +poppler_structure_element_get_text_spans (PopplerStructureElement *poppler_structure_element,
> +                                          guint                   *n_text_spans)
> +{
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE_ELEMENT (poppler_structure_element), 0);
> +  g_assert (poppler_structure_element->elem);

Use g_return macros in public methods.

@@ +805,5 @@
> + * Frees the memory used by an array of #PopplerTextSpan elements, typically
> + * obtained using poppler_structure_element_get_text_spans().
> + */
> +void
> +poppler_structure_element_free_text_spans (PopplerTextSpan **poppler_text_spans)

Previous patch returned a GList for which a convenient free method makes more sense. I think case, if we return the size of the array, I think we can avoid this free method and let the user do it.

@@ +828,5 @@
> + * Since: 0.26
> + */
> +PopplerTextSpan *
> +poppler_text_span_copy (PopplerTextSpan *poppler_text_span)
> +{

Use g_return macros in all public methods

@@ +840,5 @@
> +/**
> + * poppler_text_span_free:
> + * @poppler_text_span: A #PopplerTextSpan
> + *
> + * Freed a text span.

Frees

@@ +911,5 @@
> + *
> + * Since: 0.26
> + */
> +PopplerColor *
> +poppler_text_span_get_color (PopplerTextSpan *poppler_text_span)

I think it would be better to return the color as an out parameter

::: glib/poppler-structure-element.h
@@ +82,4 @@
>    POPPLER_STRUCTURE_ELEMENT_FORM,
>  } PopplerStructureElementKind;
>  
> +typedef struct _PopplerTextSpan PopplerTextSpan;

This should be in poppler.h with all other forward declarations.

@@ +112,4 @@
>  gboolean                     poppler_structure_element_iter_next                  (PopplerStructureElementIter *iter);
>  void                         poppler_structure_element_iter_free                  (PopplerStructureElementIter *iter);
>  
> +PopplerTextSpan             *poppler_text_span_copy                               (PopplerTextSpan *poppler_text_span);

You should expose here the get_type method too

::: glib/reference/poppler-sections.txt
@@ +600,1 @@
>  poppler_structure_element_iter_get_type

poppler_text_span_get_type is missing here.
Comment 53 Carlos Garcia Campos 2014-02-09 16:26:57 UTC
Comment on attachment 90268 [details] [review]
[PATCH v12 11/11] glib-demo: Pane showing the document structure

Review of attachment 90268 [details] [review]:
-----------------------------------------------------------------

::: glib/demo/main.c
@@ +66,4 @@
>  	{ "Attachments",      pgd_attachments_create_widget },
>  	{ "Layers",           pgd_layers_create_widget },
>  	{ "Text",             pgd_text_create_widget },
> +    { "Tagged Structure", pgd_taggedstruct_create_widget },

Bad indented.

::: glib/demo/taggedstruct.c
@@ +52,5 @@
> +
> +static void
> +populate_store_aux (GtkTreeStore *store, GtkTreeIter *parent, PopplerStructureElementIter *iter)
> +{
> +  g_print ("iter %p\n", iter);

Remove this.

@@ +95,5 @@
> +}
> +
> +
> +static void
> +pgd_row_activated (GtkTreeView *tree_view, GtkTreePath *path, GtkTreeViewColumn *column, gpointer user_data)

You can use PgdTaggedStructDemo *demo directly instead of user_data since you are using a cast for the callback in g_signal_connect

@@ +138,5 @@
> +    }
> +
> +  gtk_label_set_text (GTK_LABEL (demo->link_target), "");
> +  if (poppler_structure_element_get_reference_type (element) ==
> +      POPPLER_STRUCTURE_REFERENCE_LINK)

I haven't found the patch that adds the reference link.

@@ +250,5 @@
> +                    G_CALLBACK (pgd_row_activated),
> +                    demo);
> +  g_signal_connect (demo->view, "cursor-changed",
> +                    G_CALLBACK (pgd_cursor_changed),
> +                    demo);

I think GtkTreeSelection::changed is what you want instead of these two signals.
Comment 54 Carlos Garcia Campos 2014-02-09 16:28:49 UTC
I've pushed the patched that I have reviewed so far with my comments addressed.
Comment 55 Carlos Garcia Campos 2014-02-09 16:32:44 UTC
Comment on attachment 90266 [details] [review]
[PATCH v12 09/11] glib: Accessors for document structure references

Review of attachment 90266 [details] [review]:
-----------------------------------------------------------------

::: glib/poppler-structure-element.cc
@@ +819,5 @@
>  
>  /**
> + * poppler_structure_element_get_form_field:
> + * @poppler_structure_element: A #PopplerStructureElement
> + *

You should explain briefly here what this method is for.

@@ +822,5 @@
> + * @poppler_structure_element: A #PopplerStructureElement
> + *
> + * Return value: (transfer full): A #PopplerFormField, or %NULL if
> + *    the element is not a %POPPLER_STRUCTURE_ELEMENT_FORM.
> + */

Since: 0.26

@@ +827,5 @@
> +PopplerFormField *
> +poppler_structure_element_get_form_field (PopplerStructureElement *poppler_structure_element)
> +{
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE_ELEMENT (poppler_structure_element), NULL);
> +  g_assert (poppler_structure_element->elem);

Use g_return macros in public elements.

@@ +861,5 @@
> +                                                                  field_id);
> +}
> +
> +/**
> + * poppler_structure_element_get_form_field_mapping:

I still think this method is a bit weird. I think it should be the page the one returning a mapping.
Comment 56 Adrian Perez de Castro 2014-02-13 17:15:05 UTC
Created attachment 94016 [details] [review]
[PATCH v15 8/9] glib: Implement accessors for element attributes

This version of the patch returns a PopplerRectangle instead of an array
of 4 doubles in poppler_structure_element_get_bounding_box().
Comment 57 Adrian Perez de Castro 2014-02-13 18:55:44 UTC
Created attachment 94018 [details] [review]
[PATCH v16 8/9] glib: Implement accessors for element attributes
Comment 58 Carlos Garcia Campos 2014-02-23 13:08:27 UTC
Comment on attachment 94018 [details] [review]
[PATCH v16 8/9] glib: Implement accessors for element attributes

Review of attachment 94018 [details] [review]:
-----------------------------------------------------------------

I have a lot of doubts about this API.

::: glib/poppler-structure-element.cc
@@ +212,5 @@
> +  const gchar *name;
> +  EnumType     value;
> +
> +  static const EnumNameValue<EnumType> values[];
> +  static const EnumType null = static_cast<EnumType> (-1);

I don't understand this null value, all attributed have a default value, so we should always return the default value when the attribute is not present.

@@ +371,5 @@
> +static EnumType
> +name_to_enum (Object  *name_value,
> +              EnumType default_value = EnumNameValue<EnumType>::null)
> +{
> +  if (!name_value)

Can this really be called with NULL name_value?

@@ +378,5 @@
> +  for (const EnumNameValue<EnumType> *item = EnumNameValue<EnumType>::values ; item->name; item++)
> +    if (name_value->isName (item->name))
> +      return item->value;
> +
> +  return default_value;

Can this happen? in the frontend we assume that invalid attributes are ignored by the core.

@@ +386,5 @@
> +template <typename EnumType>
> +static bool
> +name_to_enum (Object *name_value, EnumType& result)
> +{
> +  return (result = name_to_enum<EnumType> (name_value)) != EnumNameValue<EnumType>::null;

name_to_enum should never return null I think

@@ +410,5 @@
> +              g_variant_builder_add (&builder, "d", item.getNum ());
> +            }
> +          else
> +            {
> +              /* Free the resources in-use by the builder */

Aren't these values already checked in the core?

@@ +450,5 @@
> +              g_free (utf8_string);
> +            }
> +          else
> +            {
> +              /* Free the resources in-use by the builder */

Ditto.

@@ +468,5 @@
> +{
> +  Object *value = Attribute::getDefaultValue (attribute_type);
> +  const Attribute *attr;
> +
> +  if ((attr = poppler_structure_element->elem->findAttribute (attribute_type, inherit)))

Are we using inherit with any attribute? Shouldn't findAttribute use the inherit internally only for attributes that are inheritable?

@@ +469,5 @@
> +  Object *value = Attribute::getDefaultValue (attribute_type);
> +  const Attribute *attr;
> +
> +  if ((attr = poppler_structure_element->elem->findAttribute (attribute_type, inherit)))
> +    value = attr->getValue ();

Here we could return attr->getValue (); directly, and call Attribute::getDefaultValue (attribute_type) after this when the attr was not found.

@@ +485,5 @@
> +
> +  EnumType result = name_to_enum<EnumType> ((attr == NULL)
> +                                            ? Attribute::getDefaultValue (attribute_type)
> +                                            : attr->getValue ());
> +  if (result_ptr != NULL)

Can this be called with a NULL result_ptr?

@@ +488,5 @@
> +                                            : attr->getValue ());
> +  if (result_ptr != NULL)
> +    *result_ptr = result;
> +
> +  return (result == EnumNameValue<EnumType>::null);

How can this be null if we are using the default value when the attribute is not present?

@@ +1192,5 @@
> + */
> +gboolean
> +poppler_structure_element_get_placement (PopplerStructureElement    *poppler_structure_element,
> +                                         PopplerStructurePlacement *value,
> +                                         gboolean                   inherit)

hmm, I don't understand this generic way of getting the attributes. Since, all attributes have a default value, we should always return the default value when the attribute is not present. If we really need API to let the user know whether an attribute is explicitly present we could add has_foo() methods instead. Also, not all attributes are inheritable, why do we allow the user decide? I think that inheritable attributes should be always looked up  in their ancestors automatically, it should be transparent for the user. We could add in the documentation what the default value is in case of the attr is not found and which once are inheriable. So, I think this method should be something like:

PopplerStructurePlacement 
poppler_structure_element_get_placement (PopplerStructureElement    *poppler_structure_element);

It's not inheritable, and when not present inline will be returned. And the same approach for all other methods.

@@ +1255,5 @@
> +                                                       POPPLER_STRUCTURE_BORDER_STYLE_NONE);
> +        }
> +    }
> +  else
> +    return FALSE;

This should never fail, invalid objects should be ignored by the core and never passed to the forntends.

@@ +1286,5 @@
> +      if (value != NULL)
> +        value[0] = value[1] = value[2] = value[3] = object->getNum ();
> +    }
> +  else
> +    return FALSE;

Ditto.

@@ +1304,5 @@
> +        *value = object->getNum ();
> +      return TRUE;
> +    }
> +  else
> +    return FALSE;

Ditto.

@@ +1311,5 @@
> +static inline gboolean
> +convert_doubles_array (Object *object, gdouble **values, guint *n_values)
> +{
> +  if (!object->isArray ())
> +    return FALSE;

Ditto.

@@ +1353,5 @@
> +        *value = (guint) object->getNum ();
> +      return TRUE;
> +    }
> +  else
> +    return FALSE;

Ditto.

@@ +1360,5 @@
> +static gboolean
> +convert_color (Object *object, PopplerColor *color)
> +{
> +  if (!object->isArray () || object->arrayGetLength () != 3)
> +    return FALSE;

Ditto, and same in all  other cases.

@@ +2885,5 @@
> + *    on the attribute requested, as specified in the table. If the
> + *    attribute is not defined, <code>NULL</code> is returned.
> + */
> +GVariant *
> +poppler_structure_element_get_attribute (PopplerStructureElement  *poppler_structure_element,

Since we have convenient methods to get every property, I would not add this generic method for now.

::: glib/poppler-structure-element.h
@@ +85,5 @@
>  /**
> + * PopplerStructureAttribute:
> + */
> +typedef enum {
> +  POPPLER_STRUCTURE_ATTRIBUTE_UNKNOWN,

Do we use this?
Comment 59 Adrian Perez de Castro 2014-02-26 18:23:43 UTC
(In reply to comment #58)

> @@ +2885,5 @@
> > + *    on the attribute requested, as specified in the table. If the
> > + *    attribute is not defined, <code>NULL</code> is returned.
> > + */
> > +GVariant *
> > +poppler_structure_element_get_attribute (PopplerStructureElement  *poppler_structure_element,
> 
> Since we have convenient methods to get every property, I would not add this
> generic method for now.

Provided that there are methods for retrieving individual properties,
I think it would be okay to leave out this method for the moment. My
main motivation for having it is making it easier to work with the
Tagged-PDF structure from dynamic languages with GObject-Introspection
bindings.

Let's skip it for the moment and reconsider/rework it later on.

> ::: glib/poppler-structure-element.h
> @@ +85,5 @@
> >  /**
> > + * PopplerStructureAttribute:
> > + */
> > +typedef enum {
> > +  POPPLER_STRUCTURE_ATTRIBUTE_UNKNOWN,
> 
> Do we use this?

The poppler_structure_element_get_kind() method returns it when the
when StructElement::getType() for the underlying object returns
StructElement::Unknown, or when a NULL structure element is passed
(by means of g_return_val_if_fail).

That being said, in Poppler elements for which StructElement::isOk()
returns gFalse are never added to the structure tree, so it should
never happen that there would be an invalid element with type 
StructElement::Unknown that poppler-glib could access... So it seems
safe to remove POPPLER_STRUCTURE_ATTRIBUTE_UNKNOWN.

I have posted a patch to remove it in bug #75541
Comment 60 Adrian Perez de Castro 2014-02-27 20:14:42 UTC
(In reply to comment #58)
> Comment on attachment 94018 [details] [review] [review]
> [PATCH v16 8/9] glib: Implement accessors for element attributes
> 
> Review of attachment 94018 [details] [review] [review]:
> -----------------------------------------------------------------
> 
> I have a lot of doubts about this API.
> 
> ::: glib/poppler-structure-element.cc
> @@ +212,5 @@
> > +  const gchar *name;
> > +  EnumType     value;
> > +
> > +  static const EnumNameValue<EnumType> values[];
> > +  static const EnumType null = static_cast<EnumType> (-1);
> 
> I don't understand this null value, all attributed have a default value, so
> we should always return the default value when the attribute is not present.

Only some attributes have default values (check the attribute maps in
poppler/StructElement.cc, line 302). The value EnumType<T>::null is
returned during attribute lookup if:
 
- The attribute does not have a default value, AND
- the attribute is not defined, AND
- the value for the attribute is not inherited from a parent element.

BTW, in the attributes map the items with a default value are those
that should have one according to the PDF spec.

> @@ +371,5 @@
> > +static EnumType
> > +name_to_enum (Object  *name_value,
> > +              EnumType default_value = EnumNameValue<EnumType>::null)
> > +{
> > +  if (!name_value)
> 
> Can this really be called with NULL name_value?

Yes, for example in line 501 there is:

  EnumType result = name_to_enum<EnumType> ((attr == NULL)
                         ? Attribute::getDefaultValue (attribute_type)
                         : attr->getValue ());

In this particular case, “NULL” is passed as “name_value” to
“name_to_enum()” when the attribute is not defined (“attr == NULL”)
for a structure element, and “Attribute::getDefault()” returns
“NULL” (that is, the attribute does not have a default value).

> @@ +378,5 @@
> > +  for (const EnumNameValue<EnumType> *item = EnumNameValue<EnumType>::values ; item->name; item++)
> > +    if (name_value->isName (item->name))
> > +      return item->value;
> > +
> > +  return default_value;
> 
> Can this happen? in the frontend we assume that invalid attributes are
> ignored by the core.

You are right with this one: if “name_value” is not “NULL”, then it
must be one of the values of the enum because the invalid attributes
are indeed discarded by the core.

I am adding “g_assert_not_reached()” before the “return” statement.
Unfortunately, I have to keep the “return” to make the compiler happy
and avoid it to complain about the function ending without returning
qa value.
 
> @@ +386,5 @@
> > +template <typename EnumType>
> > +static bool
> > +name_to_enum (Object *name_value, EnumType& result)
> > +{
> > +  return (result = name_to_enum<EnumType> (name_value)) != EnumNameValue<EnumType>::null;
> 
> name_to_enum should never return null I think

For the reason mentioned above: the “null” value is returned when
the attribute is not defined and it has no default value. Checking
for “null” is what makes it possible for the API methods to return
“FALSE” when the attribute lookup failed and there is no default.

Funny thing: I never got to use this overloaded “name_to_enum()”
version, so I am going to remove it — the actual check is done in
“attr_to_enum()”.

> @@ +410,5 @@
> > +              g_variant_builder_add (&builder, "d", item.getNum ());
> > +            }
> > +          else
> > +            {
> > +              /* Free the resources in-use by the builder */
> 
> Aren't these values already checked in the core?

Correct. This check is redundant, I will remove it.

> @@ +450,5 @@
> > +              g_free (utf8_string);
> > +            }
> > +          else
> > +            {
> > +              /* Free the resources in-use by the builder */
> 
> Ditto.

Re: Ditto :)

> @@ +468,5 @@
> > +{
> > +  Object *value = Attribute::getDefaultValue (attribute_type);
> > +  const Attribute *attr;
> > +
> > +  if ((attr = poppler_structure_element->elem->findAttribute (attribute_type, inherit)))
> 
> Are we using inherit with any attribute? Shouldn't findAttribute use the
> inherit internally only for attributes that are inheritable?

Yes, it makes sense to hide the fact that some attributes can be inherited in the frontend.
If some use-case arises in the future that needs to know exactly where an attribute is
defined, we can add something like poppler_structure_element_has_attribute() later on.

> @@ +469,5 @@
> > +  Object *value = Attribute::getDefaultValue (attribute_type);
> > +  const Attribute *attr;
> > +
> > +  if ((attr = poppler_structure_element->elem->findAttribute (attribute_type, inherit)))
> > +    value = attr->getValue ();
> 
> Here we could return attr->getValue (); directly, and call
> Attribute::getDefaultValue (attribute_type) after this when the attr was not
> found.

You are right, it is better to avoid doing the call to “Attribute::getDefault()”
when the attribute has a value. I think it will look nice like this:

  const Attribute *attr =
      poppler_structure_element->elem->findAttribute (attribute_type, inherit);
  return (attr != NULL) ? attr->getValue () : Attribute::getDefaultValue (attribute_type);


> @@ +485,5 @@
> > +
> > +  EnumType result = name_to_enum<EnumType> ((attr == NULL)
> > +                                            ? Attribute::getDefaultValue (attribute_type)
> > +                                            : attr->getValue ());
> > +  if (result_ptr != NULL)
> 
> Can this be called with a NULL result_ptr?

Nope. It is possible, but I remember that we talked at some point on changing
the API documentation and GI-annotations to disallow passing “NULL”. I am going
to remove this check here, and in methods that use the function I will be adding
non-NULL checks with “g_return_val_if_fail()”.

> @@ +488,5 @@
> > +                                            : attr->getValue ());
> > +  if (result_ptr != NULL)
> > +    *result_ptr = result;
> > +
> > +  return (result == EnumNameValue<EnumType>::null);
> 
> How can this be null if we are using the default value when the attribute is
> not present?
> 
> @@ +1192,5 @@
> > + */
> > +gboolean
> > +poppler_structure_element_get_placement (PopplerStructureElement    *poppler_structure_element,
> > +                                         PopplerStructurePlacement *value,
> > +                                         gboolean                   inherit)
> 
> hmm, I don't understand this generic way of getting the attributes. Since,
> all attributes have a default value, we should always return the default
> value when the attribute is not present. If we really need API to let the
> user know whether an attribute is explicitly present we could add has_foo()
> methods instead. Also, not all attributes are inheritable, why do we allow
> the user decide? I think that inheritable attributes should be always looked
> up  in their ancestors automatically, it should be transparent for the user.
> We could add in the documentation what the default value is in case of the
> attr is not found and which once are inheriable. So, I think this method
> should be something like:
> 
> PopplerStructurePlacement 
> poppler_structure_element_get_placement (PopplerStructureElement   
> *poppler_structure_element);

As discussed today in IRC, the API is going to be simplified so the frontend
returns reasonable default values for attributes which do not have a default,
or they will signal the non-existence of the attribute by other means (like
for example returning a NULL pointer and so on.)

> It's not inheritable, and when not present inline will be returned. And the
> same approach for all other methods.
> 
> @@ +1255,5 @@
> > +                                                       POPPLER_STRUCTURE_BORDER_STYLE_NONE);
> > +        }
> > +    }
> > +  else
> > +    return FALSE;
> 
> This should never fail, invalid objects should be ignored by the core and
> never passed to the forntends.

True.

> @@ +1286,5 @@
> > +      if (value != NULL)
> > +        value[0] = value[1] = value[2] = value[3] = object->getNum ();
> > +    }
> > +  else
> > +    return FALSE;
> 
> Ditto.

Yep.

> @@ +1304,5 @@
> > +        *value = object->getNum ();
> > +      return TRUE;
> > +    }
> > +  else
> > +    return FALSE;
> 
> Ditto.
> 
> @@ +1311,5 @@
> > +static inline gboolean
> > +convert_doubles_array (Object *object, gdouble **values, guint *n_values)
> > +{
> > +  if (!object->isArray ())
> > +    return FALSE;
> 
> Ditto.
> 
> @@ +1353,5 @@
> > +        *value = (guint) object->getNum ();
> > +      return TRUE;
> > +    }
> > +  else
> > +    return FALSE;
> 
> Ditto.
> 
> @@ +1360,5 @@
> > +static gboolean
> > +convert_color (Object *object, PopplerColor *color)
> > +{
> > +  if (!object->isArray () || object->arrayGetLength () != 3)
> > +    return FALSE;
> 
> Ditto, and same in all  other cases.

Certainly.
Comment 61 Adrian Perez de Castro 2014-02-28 17:25:23 UTC
Created attachment 94904 [details] [review]
[PATCH v17 1/4] glib: Private function _poppler_link_mapping_new_from_form_field()
Comment 62 Adrian Perez de Castro 2014-02-28 17:27:00 UTC
Created attachment 94905 [details] [review]
[PATCH v17 1/5] glib: Private function _poppler_link_mapping_new_from_annot_link()
Comment 63 Adrian Perez de Castro 2014-02-28 17:27:38 UTC
Created attachment 94906 [details] [review]
[PATCH v17 2/5] glib: Private function _poppler_link_mapping_new_from_form_field()
Comment 64 Adrian Perez de Castro 2014-02-28 17:28:25 UTC
Created attachment 94907 [details] [review]
[PATCH v17 3/5] glib: Accessors for form fields of structure elements
Comment 65 Adrian Perez de Castro 2014-02-28 17:29:21 UTC
Created attachment 94908 [details] [review]
[PATCH v17 4/5] glib: Accessors for document structure references
Comment 66 Adrian Perez de Castro 2014-02-28 17:30:01 UTC
Created attachment 94909 [details] [review]
[PATCH v17 5/5] glib: Implement accessors for element attributes
Comment 67 Carlos Garcia Campos 2014-03-01 08:39:54 UTC
Comment on attachment 94907 [details] [review]
[PATCH v17 3/5] glib: Accessors for form fields of structure elements

Review of attachment 94907 [details] [review]:
-----------------------------------------------------------------

You should also add the new methods to the sections.txt file.

::: glib/poppler-structure-element.cc
@@ +765,5 @@
>  
>  /**
> + * poppler_structure_element_get_form_field:
> + * @poppler_structure_element: A #PopplerStructureElement
> + *

We need a brief explanation here, and comment that this should be called only for FORM elements

@@ +775,5 @@
> +{
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE_ELEMENT (poppler_structure_element), NULL);
> +  g_return_val_if_fail (poppler_structure_element->elem != NULL, NULL);
> +
> +  if (poppler_structure_element->elem->getType () != StructElement::Form)

I think it's a programmer error to call this for a different element type, so I would make this a g_return

@@ +781,5 @@
> +
> +  // TODO Handle elements which have a Role attribute (used sometimes for
> +  //      non-editable widgets, to describe their appearance). Editable
> +  //      fields have only a single child, with the field identifier.
> +  if (poppler_structure_element->elem->getNumElements () != 1)

I think I'm going to rename the getElements api as getChildren, this confused me

@@ +795,5 @@
> +        }
> +      else
> +        {
> +          // Element contains the form field ID as the MCID attribute.
> +          field_id = child->getMCID ();

Where is this documented? According to the spec, if the role is not present the element should have a single child that is an object reference.

"(Form) A widget annotation representing an interactive form field (see 12.7,
“Interactive Forms”). If the element contains a Role attribute, it may contain content
items that represent the value of the (non-interactive) form field. If the element
omits a Role attribute (see Table 348), it shall have only one child: an object
reference (see 14.7.4.3, “PDF Objects as Content Items”) identifying the widget
annotation. The annotations’ appearance stream (see 12.5.5, “Appearance
Streams”) shall describe the appearance of the form element."

So, no MCID allowed here. I still think this should be done by the core.

@@ +814,5 @@
> + * Return value: (transfer full): A #PopplerFormFieldMapping, or %NULL if
> + *    the element is not a %POPPLER_STRUCTURE_ELEMENT_FORM
> + */
> +PopplerFormFieldMapping *
> +poppler_structure_element_get_form_field_mapping (PopplerStructureElement *poppler_structure_element)

I insist I'm not sure about this API. The mapping is something associated to a page, it's the coordinates of the form field in a page. I think this could be moved to poppler page as something like:

poppler_page_get_form_field_mapping_for_element (page, element);

This would be more unconvenient to use because the user has to create a PopplerPage, but I think it's less confusing.
Comment 68 Carlos Garcia Campos 2014-03-01 08:55:09 UTC
Comment on attachment 94908 [details] [review]
[PATCH v17 4/5] glib: Accessors for document structure references

Review of attachment 94908 [details] [review]:
-----------------------------------------------------------------

::: glib/poppler-structure-element.cc
@@ +931,5 @@
> + *
> + * Return value: Whether the element is a reference to another object.
> + */
> +gboolean
> +poppler_structure_element_is_reference (PopplerStructureElement *poppler_structure_element)

I would avoid this generic api, since we don't have a way to get the referenced object when it's not link, annot or form field

@@ +947,5 @@
> + * Return value: The type of object pointed to by the reference, a value of
> + *    #PopplerStructureReference.
> + */
> +PopplerStructureReference
> +poppler_structure_element_get_reference_type (PopplerStructureElement *poppler_structure_element)

Same here, we should use the referenced object only for structure elements that can contain children that are object references, but knowing what objects we are looking for.

@@ +1002,5 @@
> +  return NULL;
> +}
> +
> +/**
> + * poppler_structure_element_get_reference_link:

The method name is link_action. I prefer to name this just link, though as this is what we do already in our api, poppler_get_link_mapping actually returns a list of LinkActions

@@ +1015,5 @@
> +{
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE_ELEMENT (poppler_structure_element), NULL);
> +  g_return_val_if_fail (poppler_structure_element->elem != NULL, NULL);
> +
> +  AnnotLink *link = _poppler_structure_element_find_annot_link (poppler_structure_element);

Maybe I'm missing something from the spec, but why do we have to find the link? Shouldn't the link structure element reference a link annotation object?

@@ +1028,5 @@
> + *    object, or %NULL if the element is not a reference pointing to
> + *    a link.
> + */
> +PopplerLinkMapping *
> +poppler_structure_element_get_reference_link_mapping (PopplerStructureElement *poppler_structure_element)

I have the same concerns about this API as with the form field mapping

::: glib/poppler-structure-element.h
@@ +92,5 @@
> +typedef enum {
> +  POPPLER_STRUCTURE_REFERENCE_UNKNOWN,
> +  POPPLER_STRUCTURE_REFERENCE_ANNOT,
> +  POPPLER_STRUCTURE_REFERENCE_LINK,
> +} PopplerStructureReference;

Why are we handling this differently than form fields? Annot and Link are also element types, so the same way I can get a form field for a FORM element, I should be able to get the link or annot for a LINK or ANNOT element, without any special api for references.

@@ +95,5 @@
> +  POPPLER_STRUCTURE_REFERENCE_LINK,
> +} PopplerStructureReference;
> +
> +
> +typedef struct _PopplerTextSpan PopplerTextSpan;

This looks unrelated to this patch.
Comment 69 Carlos Garcia Campos 2014-03-01 09:49:11 UTC
Comment on attachment 94909 [details] [review]
[PATCH v17 5/5] glib: Implement accessors for element attributes

Review of attachment 94909 [details] [review]:
-----------------------------------------------------------------

This looks much better now, thanks. But still have some doubts and comments

::: glib/poppler-structure-element.cc
@@ +385,5 @@
> +  /*
> +   * Non-NULL names must always be valid because Poppler
> +   * discards the invalid attributes when parsing them.
> +   */
> +  g_assert (name_value != NULL);

This assert is duplicated.

@@ +401,5 @@
> +static EnumType
> +attr_to_enum (PopplerStructureElement *poppler_structure_element)
> +{
> +  const Attribute *attr =
> +      poppler_structure_element->elem->findAttribute (EnumNameValue<EnumType>::attribute_type, gTrue);

I wonder if we should do the same for the core api, and always inherit when the attr is inheritable.

@@ +967,5 @@
>   *
> + * Obtains the #PopplerFormField out of an element of type
> + * %POPPLER_STRUCTURE_ELEMENT_FORM. The returned value must be freed
> + * with g_object_unref(). Note that there is one structure element
> + * per form field.

This is unrelated to this patch.

@@ +973,4 @@
>   * Return value: (transfer full): A #PopplerFormField, or %NULL if
>   *    the element is not a %POPPLER_STRUCTURE_ELEMENT_FORM.
> + *
> + * Since: 0.26

Ditto.

@@ +1018,5 @@
>   *
> + * Obtains the #PopplerFormField out of an element of type
> + * %POPPLER_STRUCTURE_ELEMENT_FORM. The returned value must be freed
> + * with g_object_unref(). Note that there is one structure element
> + * per form field.

Ditto.

@@ +1024,4 @@
>   * Return value: (transfer full): A #PopplerFormFieldMapping, or %NULL if
>   *    the element is not a %POPPLER_STRUCTURE_ELEMENT_FORM
> + *
> + * Since: 0.26

Ditto.

@@ +1154,5 @@
> + */
> +PopplerStructurePlacement
> +poppler_structure_element_get_placement (PopplerStructureElement *poppler_structure_element)
> +{
> +  ATTRIBUTE_GETTER_BODY (PopplerStructurePlacement);

You are going too far, this is just two lines of code

@@ +1174,5 @@
> +{
> +  ATTRIBUTE_GETTER_BODY (PopplerStructureWritingMode);
> +}
> +
> +

Nit: Extra empty line here.

@@ +1186,5 @@
> +    {
> +      g_assert (object->arrayGetLength () == 4);
> +      Object item;
> +      for (guint i = 0; i < 4; i++)
> +        values[i] = name_to_enum<PopplerStructureBorderStyle> (object->arrayGet (i, &item));

I think you should item.free() everytime.

@@ +1205,5 @@
> +    {
> +      g_assert (object->arrayGetLength () == 4);
> +      Object item;
> +      for (guint i = 0; i < 4; i++)
> +        value[i] = object->arrayGet (i, &item)->getNum ();

Ditto.

@@ +1227,5 @@
> +  gdouble* doubles = g_new (gdouble, *n_values);
> +  Object item;
> +
> +  for (guint i = 0; i < *n_values; i++)
> +    doubles[i] = object->arrayGet (i, &item)->getNum ();

Ditto.

@@ +1241,5 @@
> +  Object item;
> +
> +  color->red   = object->arrayGet (0, &item)->getNum () * 65535;
> +  color->green = object->arrayGet (1, &item)->getNum () * 65535;
> +  color->blue  = object->arrayGet (2, &item)->getNum () * 65535;

Ditto.

@@ +1262,5 @@
> + */
> +PopplerStructureBorderStyle *
> +poppler_structure_element_get_border_style (PopplerStructureElement *poppler_structure_element)
> +{
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE_ELEMENT (poppler_structure_element), NULL);

Should we check also that this is called for an element than can have a border style attr? and the same for all other attributes.

@@ +1279,5 @@
> + * are in before-after-start-end ordering (for the typical Western
> + * left-to-right writing, that is top-bottom-left-right). The returned
> + * value must be freed with g_free().
> + *
> + * Return value: (transfer full) (array fixed-size=4): An array with the

(element-type PopplerStructureBorderStyle)

@@ +1331,5 @@
> +/**
> + * poppler_structure_element_get_inline_align:
> + * @poppler_structure_element: A #PopplerStructureElement
> + *
> + * Obtains the block-alignment mode of the structure element.

next method is block_alignment

@@ +1473,5 @@
> + *
> + * Since: 0.26
> + */
> +PopplerStructureChecked
> +poppler_structure_element_get_checked (PopplerStructureElement *poppler_structure_element)

I also wonder if we should make more obvious that fact that some attributes are only availble for some kinds for elements. For example, this method could be:

poppler_structure_element_form_get_state and the previous one poppler_structure_element_form_get_role()

Also, I would rename PopplerStructureChecked as PopplerStructureFormState since it described better what it is. In the core API we should follow the spec and use the same names, to make it easier to follow the code reading the spec. But in the public API we can change the exposed names when we think are better. And this is definitely one of those cases.

@@ +1486,5 @@
> + * Obtains the thickness of the border of an element. The result values
> + * are in before-after-start-end ordering (for the typical Western
> + * left-to-right writing, that is top-bottom-left-right).
> + *
> + * Return value: (transfer full) (array fixed-length=4): Thickness of

(element-type gdouble)

@@ +1512,5 @@
> + *
> + * Obtains the thickness of the text decoration for the text contained
> + * in the element.
> + *
> + * Return value: Thickness of the text decoration, or %NaN if not defined.

Can the thickness be a negative number? I prefer to use something like -1 instead of NAN

@@ +1530,5 @@
> + * poppler_structure_element_get_padding:
> + * @poppler_structure_element: A #PopplerStructureElement
> + *
> + * Obtains the padding of an element (space around it). The result
> + * values are in before-after-start-end ordering (for the typical

s/typical//

@@ +1533,5 @@
> + * Obtains the padding of an element (space around it). The result
> + * values are in before-after-start-end ordering (for the typical
> + * Western left-to-right writing, that is top-bottom-left-right).
> + *
> + * Return value: (transfer full) (array fixed-size=4): Padding for

(element-type gdouble)

@@ +1550,5 @@
> +  return result;
> +}
> +
> +/**
> + * poppler_structure_element_get_table_padding:

poppler_structure_element_table_get_padding maybe?

@@ +1557,5 @@
> + * Obtains the padding of an element (space around it). The result
> + * values are in before-after-start-end ordering (for the typical
> + * Western left-to-right writing, that is top-bottom-left-right).
> + * This has to be used instead of poppler_structure_element_get_padding()
> + * for table elements.

If the name is what I propose, table_get_padding, I think it's more obvious that this is used for table elements.

@@ +1559,5 @@
> + * Western left-to-right writing, that is top-bottom-left-right).
> + * This has to be used instead of poppler_structure_element_get_padding()
> + * for table elements.
> + *
> + * Return value: (transfer full) (array fixed-size=4): The paddings for

(element-type gdouble)

@@ +1580,5 @@
> + * poppler_structure_element_get_bounding_box:
> + * @poppler_structure_element: A #PopplerStructureElement
> + *
> + * Obtains the size of the bounding box of the element. This has to be
> + * used instead of poppler_structure_element_get_padding() for table

Why is this related to the padding?

@@ +1589,5 @@
> + *
> + * Since: 0.26
> + */
> +PopplerRectangle *
> +poppler_structure_element_get_bounding_box (PopplerStructureElement *poppler_structure_element)

This is one of the case where we could use the approach of returning a boolean and pass a PopplerRectangle * as parameter, since that allows to use a stack allocated rectangle.

@@ +1622,5 @@
> + */
> +gdouble
> +poppler_structure_element_get_space_before (PopplerStructureElement *poppler_structure_element)
> +{
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE_ELEMENT (poppler_structure_element), NAN);

-1

@@ +1639,5 @@
> + */
> +gdouble
> +poppler_structure_element_get_space_after (PopplerStructureElement *poppler_structure_element)
> +{
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE_ELEMENT (poppler_structure_element), NAN);

-1

@@ +1656,5 @@
> + */
> +gdouble
> +poppler_structure_element_get_start_indent (PopplerStructureElement *poppler_structure_element)
> +{
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE_ELEMENT (poppler_structure_element), NAN);

-1

@@ +1673,5 @@
> + */
> +gdouble
> +poppler_structure_element_get_end_indent (PopplerStructureElement *poppler_structure_element)
> +{
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE_ELEMENT (poppler_structure_element), NAN);

I prefer to use -1 instead of NAN everywhere.

@@ +1743,5 @@
> + *
> + * Since: 0.26
> + */
> +PopplerColor *
> +poppler_structure_element_get_color (PopplerStructureElement *poppler_structure_element)

This is another case where we could return boolean when not present and allow to pass a pointer to a stack allocated color.

@@ +1768,5 @@
> + *
> + * Since: 0.26
> + */
> +PopplerColor *
> +poppler_structure_element_get_background_color (PopplerStructureElement *poppler_structure_element)

Ditto.

@@ +1794,5 @@
> + *
> + * Since: 0.26
> + */
> +PopplerColor *
> +poppler_structure_element_get_text_decoration_color (PopplerStructureElement *poppler_structure_element)

Ditto.

@@ +1856,5 @@
> + *
> + * Since: 0.26
> + */
> +gdouble
> +poppler_structure_element_get_width (PopplerStructureElement *poppler_structure_element)

Why not merging both functions and have a special value for auto? For example when this returns -1 is because it's auto

@@ +1869,5 @@
> + * @poppler_structure_element: A #PopplerStructureElement
> + *
> + * Check whether the width of the element is to be calculated automatically
> + * depending on its contents. Note that when the width is automatic, using
> + * poppler_structure_element_get_width() will fail.

And we avoid this situation.

@@ +1908,5 @@
> + * @poppler_structure_element: A #PopplerStructureElement
> + *
> + * Check whether the height of the element is to be calculated automatically
> + * depending on its contents. Note that when the height is automatic, using
> + * poppler_structure_element_get_height() will fail.

Same for the height.

@@ +1949,5 @@
> + *
> + * Check whether the line height for the text contained in the element is
> + * to be calculated automatically depending on the font size. Note that
> + * when the line height is automatic,
> + * poppler_structure_element_get_line_height() will fail.

Same here.

@@ +1997,5 @@
> +
> +  Object *value = attr_value_or_default (poppler_structure_element, Attribute::ColumnGap);
> +  if (value == NULL)
> +    {
> +      *n_values = static_cast<guint> (-1);

The doc says this is set to 0, something I agree with.

@@ +2031,5 @@
> +
> +  Object *value = attr_value_or_default (poppler_structure_element, Attribute::ColumnWidths);
> +  if (value == NULL)
> +    {
> +      *n_values = static_cast<guint> (-1);

Return 0 here.

@@ +2063,5 @@
> +poppler_structure_element_get_description (PopplerStructureElement *poppler_structure_element)
> +{
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE_ELEMENT (poppler_structure_element), NULL);
> +
> +  Object *value = attr_value_or_default (poppler_structure_element, Attribute::Desc);

Who owns the returned Object? the element?

@@ +2069,5 @@
> +    return NULL;
> +  if (value->isString ())
> +    return _poppler_goo_string_to_utf8 (value->getString ());
> +  if (value->isName ())
> +    return g_strdup (value->getName ());

What encoding is this, we should ensure all strings returned by our api a valid utf8

@@ +2106,5 @@
> +  if (value->isString ())
> +    return _poppler_goo_string_to_utf8 (value->getString ());
> +  if (value->isName ())
> +    return g_strdup (value->getName ());
> +

Same comments here.

@@ +2119,5 @@
> + * Obtains the color of border around the element. The result values
> + * are in before-after-start-end ordering (for the typical Western
> + * left-to-right writing, that is top-bottom-left-right).
> + *
> + * Return value: (transfer full) (array fixed-size=4): Border colors,

(element-type PopplerColor)

@@ +2141,5 @@
> +    {
> +      // One color per side.
> +      Object item;
> +      for (guint i = 0; i < 4; i++)
> +        convert_color (value->arrayGet (i, &item), &result[i]);

item.free() everytime

@@ +2174,5 @@
> + * Since: 0.26
> + */
> +gchar **
> +poppler_structure_element_get_headers (PopplerStructureElement *poppler_structure_element,
> +                                       guint                   *n_values)

We don't need this, since it's a null terminated array. You can iterate ntil null or get the length with g_strv_length if needed

@@ +2196,5 @@
> +        result[i] = _poppler_goo_string_to_utf8 (item.getString ());
> +      else if (item.isName ())
> +        result[i] = g_strdup (item.getName ());
> +      else
> +        g_assert_not_reached ();

item.free() everytime
Comment 70 Adrian Perez de Castro 2014-03-03 00:19:15 UTC
Created attachment 94989 [details] [review]
[PATCH v18 1/5] glib: Private function _poppler_link_mapping_new_from_annot_link()
Comment 71 Adrian Perez de Castro 2014-03-03 00:20:30 UTC
Created attachment 94990 [details] [review]
[PATCH v18 2/5] glib: Private function _poppler_link_mapping_new_from_form_field()
Comment 72 Adrian Perez de Castro 2014-03-03 00:21:16 UTC
Created attachment 94991 [details] [review]
[PATCH v18 3/5] glib: Accessors for form fields of structure elements
Comment 73 Adrian Perez de Castro 2014-03-03 00:22:14 UTC
Created attachment 94992 [details] [review]
[PATCH v18 4/5] glib: Methods to obtain link information from structure elements
Comment 74 Adrian Perez de Castro 2014-03-03 00:22:46 UTC
Created attachment 94993 [details] [review]
[PATCH v18 5/5] glib: Implement accessors for element attributes
Comment 75 Carlos Garcia Campos 2014-03-03 11:52:44 UTC
Comment on attachment 94991 [details] [review]
[PATCH v18 3/5] glib: Accessors for form fields of structure elements

Review of attachment 94991 [details] [review]:
-----------------------------------------------------------------

This patch doesn't address or replies to my comments of the previous review.

::: glib/poppler-page.cc
@@ +1385,5 @@
> + * @structure_element: a #PopplerStructureElement.
> + *
> + * Obtains the #PopplerFormFieldMapping for a single form field associated
> + * with a structure element of type %POPPLER_STRUCTURE_ELEMENT_FORM. The
> + * returned value must be freed with g_object_unref().

PopplerFormFieldMapping is not a GObject, but a boxed type, that should be freed with poppler_form_field_mapping_free().

::: glib/poppler-page.h
@@ +89,4 @@
>  cairo_surface_t       *poppler_page_get_image            (PopplerPage        *page,
>  							  gint                image_id);
>  GList              *poppler_page_get_form_field_mapping  (PopplerPage        *page);
> +PopplerFormFieldMapping*

PopplerFormFieldMapping* -> PopplerFormFieldMapping *

::: glib/poppler-structure-element.cc
@@ +783,5 @@
> +  g_return_val_if_fail (POPPLER_IS_STRUCTURE_ELEMENT (poppler_structure_element), NULL);
> +  g_return_val_if_fail (poppler_structure_element->elem != NULL, NULL);
> +
> +  if (poppler_structure_element->elem->getType () != StructElement::Form)
> +    return NULL;

As I said in the previous review, I think this is a programmer error, it should use a g_return macro.

@@ +804,5 @@
> +    }
> +  else
> +    {
> +      // Element contains the form field ID as the MCID attribute.
> +      field_id = child->getMCID ();

Please, see my previous review.

::: glib/reference/poppler-sections.txt
@@ +611,4 @@
>  poppler_structure_element_get_alt_text
>  poppler_structure_element_get_actual_text
>  poppler_structure_element_get_text_spans
> +poppler_structure_element_form_get_field

You should add PopplerPage new api to the docs too.
Comment 76 Adrian Perez de Castro 2014-03-03 15:14:15 UTC
Created attachment 95036 [details] [review]
[PATCH v19 3/5] glib: Implement accessors for element attributes
Comment 77 Adrian Perez de Castro 2014-03-03 15:14:54 UTC
Created attachment 95037 [details] [review]
[PATCH v19 4/5] glib: Accessors for form fields of structure elements
Comment 78 Adrian Perez de Castro 2014-03-03 15:15:26 UTC
Created attachment 95038 [details] [review]
[PATCH v19 5/5] glib: Methods to obtain link information from structure elements
Comment 79 Adrian Perez de Castro 2014-03-05 09:26:33 UTC
Created attachment 95148 [details] [review]
[PATCH v20 4/5] glib: Accessors for form fields of structure elements

Removed spurious conflict marker
Comment 80 Adrian Perez de Castro 2014-03-05 09:27:25 UTC
Created attachment 95149 [details] [review]
[PATCH v20 5/5] glib: Methods to obtain link information from structure elements

Removed spurious conflict marker
Comment 81 Carlos Garcia Campos 2014-03-06 08:13:06 UTC
The attributes patch landed already in master, and the pending patches here are for the referenced objects in struct elements. So, I'm closing this, please use a new bug for new patches.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.