Bug 28780 - Add getLabel method to FormWidget
Summary: Add getLabel method to FormWidget
Status: RESOLVED FIXED
Alias: None
Product: poppler
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium enhancement
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-06-27 12:14 UTC by Mark Riedesel
Modified: 2010-09-22 04:05 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Patch to add getLabel method to FormWidgets (2.82 KB, application/octet-stream)
2010-06-27 12:14 UTC, Mark Riedesel
Details
adds getPartialName and getMappingName (3.78 KB, patch)
2010-06-27 13:02 UTC, Mark Riedesel
Details | Splinter Review
adds getPartialName, getMappingName, and getFullyQualifiedName (5.82 KB, patch)
2010-07-17 14:19 UTC, Mark Riedesel
Details | Splinter Review
FormField name patch (5.73 KB, patch)
2010-07-31 11:36 UTC, Mark Riedesel
Details | Splinter Review

Description Mark Riedesel 2010-06-27 12:14:45 UTC
Created attachment 36551 [details]
Patch to add getLabel method to FormWidgets

Greetings,

I needed access to the form field label names so I wrote this patch. I was torn between having getLabel read directly from the object's Dict, but opted to follow the patterns used in AnnotMarkup.

Also exposed the accessor to glib and I intend to submit a small patch to python-poppler if this patch is accepted into the trunk.


Regards,

Mark Riedesel
klowner.om
Comment 1 Albert Astals Cid 2010-06-27 12:32:17 UTC
I think we shouldn rename it to getPartialName() instead of getLabel()?
Also it would probably would make sense adding a method that gives the full name too "getFullName()"

Though Carlos might want to add some input too since he has done a bit more of work on the forms code
Comment 2 Mark Riedesel 2010-06-27 13:02:42 UTC
Created attachment 36552 [details] [review]
adds getPartialName and getMappingName

Changed name to match the PDF spec more accurately. /T field accessed via getPartialName and /TM accessed via getMappingName.
Comment 3 Mark Riedesel 2010-07-17 14:19:08 UTC
Created attachment 37162 [details] [review]
adds getPartialName, getMappingName, and getFullyQualifiedName

Added getFullyQualifiedName accessor method which concatenates all parent names separated by periods. Also exposed the 3 new methods to glib.
Comment 4 Albert Astals Cid 2010-07-23 14:25:56 UTC
I am assuming you have actually tested getFullyQualifiedName, right?

I have three small gotchas, first, thought it is everywhere in poppler, you don't need to do 

+  if (partialName)
+    delete partialName;

just do

+  delete partialName;

Second is that the indentation in getFullyQualifiedName seems a bit fishy, some lines use 1 space only

I'm actually a bit more concerned about the third, you concatenate strings in getFullyQualifiedName but noone guarantees they all will be either unicode encoded or not, so we could end up concatenating some utf 8 encoded and some non utf8 encoded ones, right? Though i don't really see how we can fix this, maybe we can just hope the document encodes all T fields the same way?
Comment 5 Mark Riedesel 2010-07-31 11:36:17 UTC
Created attachment 37487 [details] [review]
FormField name patch

Thanks for the input Albert.

You are indeed correct that calling delete on a null value is safe, I was just following the code convention in the rest of the source. The new patch doesn't have the non-null checks.

You are also correct regarding the strange indentation. It looks like I introduced some indentation problems while moving the changes around for the patch. That should now be fixed.

The issue regarding UTF16 and ASCII T fields in a document is also valid, although I haven't seen any PDFs (yet, at least) that have mixed encodings in the T fields. One solution would be to convert everything to UTF16 (using pdfDocEncodingToUTF16 perhaps?), but it seems to me as though that might introduce problems.
Comment 6 Carlos Garcia Campos 2010-08-03 02:47:39 UTC
Comment on attachment 37487 [details] [review]
FormField name patch

Some comments below

>diff --git a/glib/poppler-form-field.cc b/glib/poppler-form-field.cc
>index 33c4b15..e774880 100644
>--- a/glib/poppler-form-field.cc
>+++ b/glib/poppler-form-field.cc
>@@ -220,6 +220,67 @@ poppler_form_field_button_set_state (PopplerFormField *field,
>   static_cast<FormWidgetButton*>(field->widget)->setState ((GBool)state);
> }
> 
>+/**
>+ * poppler_form_field_get_fully_qualified_name:
>+ * @field: a #PopplerFormField
>+ *
>+ * Gets the fully qualified name (concatenated PDF /T properties) for @field.
>+ *
>+ * Return value: a new allocated string. It must be freed with g_free() when done.
>+ **/
>+gchar*
>+poppler_form_field_get_fully_qualified_name (PopplerFormField *field)
>+{
>+  GooString *tmp;
>+
>+  g_return_val_if_fail (POPPLER_IS_FORM_FIELD (field), NULL);
>+  
>+  tmp = field->widget->getFullyQualifiedName();
>+
>+  return tmp ? _poppler_goo_string_to_utf8 (tmp) : NULL;
>+}

I think we can just call this poppler_form_field_get_name()


>+GooString* FormWidget::getFullyQualifiedName() {
>+  Object obj1, obj2;
>+  Object parent;
>+  GooString *parent_name;
>+  GooString *full_name;
>+
>+  if (fullyQualifiedName)
>+    return fullyQualifiedName;
>+
>+  if (!partialName)
>+    return NULL;

this is not correct, according to the spec: 

"It is possible for different field dictionaries to have the same fully qualified field name if they are descendants of
a common ancestor with that name and have no partial field names (T entries) of their own."

so, even if the field doesn't have a partial name, it might have a fully qualified name if its parent has a partial name. 

>+  full_name = new GooString(partialName);

why not building the full name first and then append the partial name if there is one?

>+  obj.copy(&obj1);
>+  while (obj1.getDict()->lookup("Parent", &parent)->isDict()) {
>+    if (parent.getDict()->lookup("T", &obj2)->isString()) {

Use Object::dictLookup() rather than getDict()->lookup() in both cases here.

>+      parent_name = obj2.getString()->copy();
>+      obj2.free();
>+
>+      if (full_name->hasUnicodeMarker()) {
>+        full_name->del(0, 2); // Remove the unicode BOM
>+        parent_name->append("\0.", 2); // 2-byte unicode period
>+      } else {
>+        parent_name->append('.'); // 1-byte ascii period
>+      }
>+
>+      parent_name->append(full_name);
>+
>+      delete full_name;
>+      full_name = parent_name;

We can avoid duplicating the string and deleting all the time, by using GooString::insert() instead of append. We insert the period first, and then the partial name directly into the full_name GooString.

>+    }
>+    obj1.free();
>+    parent.copy(&obj1);
>+    parent.free();
>+  }
>+  fullyQualifiedName = full_name;

we should make sure the fullyQualifiedName is not an empty string because in that case we should delete the string and return NULL.

>+  return fullyQualifiedName;
>+}
>+

Thank you very much for the patch.
Comment 7 Carlos Garcia Campos 2010-09-22 04:05:16 UTC
I've just pushed the patch with the modifications I proposed in comment 6 and some other minor changes. Thanks!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.