Bug 77790 - get_text_layout introspection mismatch
Summary: get_text_layout introspection mismatch
Status: RESOLVED MOVED
Alias: None
Product: poppler
Classification: Unclassified
Component: glib frontend (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: poppler-bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-04-23 03:35 UTC by Marcus Brinkmann
Modified: 2018-08-21 11:18 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Description Marcus Brinkmann 2014-04-23 03:35:51 UTC
Hi,

$ cat girtest.js
const pop = imports.gi.Poppler;
const doc = pop.Document.new_from_file("file:///path/to/test.pdf", '')
const page=doc.get_page(0)
log(page.get_text_layout())

$ gjs girtest.js
Segmentation fault (core dumped)

Backtrace follows below.  What actually happens is that gir expects get_text_layout to return an array of PopplerRectangle* (an array of pointers to allocated PopplerRectangle objects), while it actually returns an array of PopplerRectangle (one continuous malloc region with all rectangles side-by-side).

The confusion occurs naturally in C, as the type "Foo*" can be a pointer to a single Foo object, or an array of Foo objects.

From a cursory glance at gir, it seems the actual data layout currently implemented is not supported by gir, and that the PopplerRectangles have to be allocated separately.  This would be an API change.

If one tries to trick gir and change the header file to a PopplerRectangle* (instead of the **), one gets:

$ gjs girtest.js

(gjs:20441): Gjs-WARNING **: JS ERROR: Error: Unsupported type array for (out caller-allocates)
@girtest.js:4

JS_EvaluateScript() failed

Here is the backtrace:

(gdb) bt
#0  __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:37
#1  0x00007ffff69393a7 in g_slice_copy (mem_size=32, mem_block=0x40519322d0e56042) at gslice.c:1056
#2  0x00007ffff6c166d3 in g_boxed_copy (boxed_type=8036912, src_boxed=0x40519322d0e56042) at gboxed.c:352
#3  0x00007ffff7d9cf54 in gjs_boxed_from_c_struct (context=0x636ff0, info=<optimized out>, gboxed=0x40519322d0e56042, flags=<optimized out>) at gi/boxed.cpp:1236
#4  0x00007ffff7d99d43 in gjs_value_from_g_argument (context=context@entry=0x636ff0, value_p=value_p@entry=0x7fffffffc500, type_info=type_info@entry=0x72aa30,
    arg=arg@entry=0x7fffffffc510, copy_structs=copy_structs@entry=1) at gi/arg.cpp:2642
#5  0x00007ffff7d9a1fb in gjs_array_from_carray_internal (context=context@entry=0x636ff0, value_p=value_p@entry=0x7fffffffc5c8, param_info=param_info@entry=0x72aa30,
    length=length@entry=483, array=<optimized out>) at gi/arg.cpp:2143
#6  0x00007ffff7d9a695 in gjs_value_from_explicit_array (context=0x636ff0, value_p=0x7fffffffc5c8, type_info=<optimized out>, arg=0x7fffffffc618, length=483)
    at gi/arg.cpp:2195
#7  0x00007ffff7d9fe03 in gjs_invoke_c_function (context=context@entry=0x636ff0, function=function@entry=0x6d0de0, obj=obj@entry=0x7fffee735cd0, js_argc=js_argc@entry=0,
    js_argv=js_argv@entry=0x68e508, js_rval=js_rval@entry=0x7fffffffc970, r_value=r_value@entry=0x0) at gi/function.cpp:1140
Comment 1 Anselm Kruis 2014-06-11 08:24:32 UTC
I have exactly the same issue with python. I was able to work around problem using the ctypes foreign function interface of Python.
The code is for Python 2.7. It should be possible create a gi.overrides.Poppler module based on this code. This would fix the issue for Python.

from gi.repository import Poppler, GLib
import ctypes
lib_poppler = ctypes.cdll.LoadLibrary("libpoppler-glib-8")

ctypes.pythonapi.PyCapsule_GetPointer.restype = ctypes.c_void_p
ctypes.pythonapi.PyCapsule_GetPointer.argtypes = [ctypes.py_object, ctypes.c_char_p]
PyCapsule_GetPointer = ctypes.pythonapi.PyCapsule_GetPointer

class Poppler_Rectangle(ctypes.Structure):
    _fields_ = [ ("x1", ctypes.c_double), ("y1", ctypes.c_double), ("x2", ctypes.c_double), ("y2", ctypes.c_double) ]
LP_Poppler_Rectangle = ctypes.POINTER(Poppler_Rectangle)
poppler_page_get_text_layout = ctypes.CFUNCTYPE(ctypes.c_int, 
                                                ctypes.c_void_p, 
                                                ctypes.POINTER(LP_Poppler_Rectangle), 
                                                ctypes.POINTER(ctypes.c_uint)
                                                )(lib_poppler.poppler_page_get_text_layout)

def get_page_layout(page):
    assert isinstance(page, Poppler.Page)
    capsule = page.__gpointer__
    page_addr = PyCapsule_GetPointer(capsule, None)
    rectangles = LP_Poppler_Rectangle()
    n_rectangles = ctypes.c_uint(0)
    has_text = poppler_page_get_text_layout(page_addr, ctypes.byref(rectangles), ctypes.byref(n_rectangles))
    try:
        result = []
        if has_text:
            assert n_rectangles.value > 0, "n_rectangles.value > 0: {}".format(n_rectangles.value)
            assert rectangles, "rectangles: {}".format(rectangles)
            for i in range(n_rectangles.value):
                r = rectangles[i]
                result.append((r.x1, r.y1, r.x2, r.y2))
        return result
    finally:
        if rectangles:
            GLib.free(ctypes.addressof(rectangles.contents))
Comment 2 ivan.zderadicka 2016-02-14 11:13:09 UTC
I still have this issues with poppler v 0.40. in python. (or assume that it's caused by this issue). get_text_layout is returning nonsence numbers like 5.77367485245e-317 1.44295714099e-312 2.56761490707e-312 2.60372595358e-321
Any chance to get this fixed?
I.
Comment 3 GitLab Migration User 2018-08-21 11:18:13 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/612.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.