Bug 55540

Summary: Expose poppler-cairo
Product: poppler Reporter: Lu Wang <coolwanglu>
Component: cairo backendAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: Install CairoOutputDev.h
Patch
Expose library, install headers for poppler-cairo

Description Lu Wang 2012-10-02 13:29:48 UTC
Created attachment 67993 [details]
Install CairoOutputDev.h

Not sure why it has been hidden (not installed even with --enable-xpdf-headers)
Guess not harmful to expose it for hacking.
Comment 1 Lu Wang 2012-10-02 13:53:35 UTC
I just realized that the entire poppler-cairo is hidden.
Why ?
Comment 2 Adrian Johnson 2013-01-27 05:56:35 UTC
CairoOutputDev.h is internal and not part of the public API. The glib API is the public interface to the cairo backend. Is there anything missing from the glib API that you need?
Comment 3 Lu Wang 2013-01-27 06:00:00 UTC
I'm writing a pdf->html converter (pdf2htmlEX) with font extraction etc.

There are a few things I need CairoOutputDev
- Convert Type 3 fonts to SVG fonts
- Convert Backgrounds to SVG images (without text)

So basically I need to examine every PDF element, instead of a general PDF rendering.

I don't know why CairoOutputDev is not exposed as SplashOutputDev
Comment 4 Lu Wang 2013-03-22 12:20:40 UTC
Why is this one marked as Won't Fix?

Currently SplashOutputDev is exposed, which is very handy. 

I'm going to create a patch for CairoOutputDev, such that I don't have to sync my copy of those files all the time. 

What do you think? Are there any reason that you might be against this?
Comment 5 Carlos Garcia Campos 2013-03-31 12:19:16 UTC
hmm, we are installing other internal headers when building with xpdf headers enabled, we discourage its usage, but since we have the option after all, I don't see why not installing the cairo internal headers when building with cairo backend.
Comment 6 Carlos Garcia Campos 2013-03-31 12:21:05 UTC
Comment on attachment 67993 [details]
Install CairoOutputDev.h

>diff --git a/poppler/Makefile.am b/poppler/Makefile.am
>index 3a5f4ca..2ecc97b 100644
>--- a/poppler/Makefile.am
>+++ b/poppler/Makefile.am
>@@ -190,6 +190,7 @@ poppler_include_HEADERS =	\
> 	BuiltinFont.h		\
> 	BuiltinFontTables.h	\
> 	CachedFile.h		\
>+	CairoOutputDev.h	\
> 	Catalog.h		\
> 	CharCodeToUnicode.h	\
> 	CMap.h			\

CairoOutputDev.h should only be included when building with cairo backend.
Comment 7 Carlos Garcia Campos 2013-03-31 12:22:43 UTC
Created attachment 77246 [details] [review]
Patch

If nobody objects I'll commit this patch.
Comment 8 Lu Wang 2013-03-31 12:39:47 UTC
(In reply to comment #7)
> Created attachment 77246 [details] [review] [review]
> Patch
> 
> If nobody objects I'll commit this patch.

Hi, Thank you very much for your reply.

Unfortunately, the problem is not as easy as `installing CairoOutputDev.h`. In previous comments I've submitted a patch but then disabled it. I need also those header for CairoFontEngine.h. And what's more, `poppler-cairo` is not installed separately.

The reason I need those internal headers is that, I'm writing `pdf2htmlEX`, something like pdftohtml in poppler but more powerful. I need to dig deep into PDF elements so the general interfaces are far from enough. Currently I want to convert Type 3 fonts into SVG fonts, and CairoFontEngine is doing something very similar.

I'm aware that those headers are not officially supported, but I'll really appreciate it if they can be exposed.
Comment 9 Carlos Garcia Campos 2013-03-31 14:23:33 UTC
(In reply to comment #8)
 
> The reason I need those internal headers is that, I'm writing `pdf2htmlEX`,
> something like pdftohtml in poppler but more powerful.

Isn't it possible to improve pdftohtml instead of writing a different tool?
Comment 10 Lu Wang 2013-03-31 15:02:48 UTC
(In reply to comment #9)
> (In reply to comment #8)
>  
> > The reason I need those internal headers is that, I'm writing `pdf2htmlEX`,
> > something like pdftohtml in poppler but more powerful.
> 
> Isn't it possible to improve pdftohtml instead of writing a different tool?

In fact pdf2htmlEX was based on pdftohtml in a very early stage. I have thought about improving pdftohtml directly, such that I can always enjoy all the internal stuffs. but there might be a few problems:

1. Seems that pdftohtml aims to provide a (source) human-readable HTML document, which is compatible with slightly older browsers; but the target of pdf2htmlEX is to provide a pixel-wise accurate HTML document. which is also optimized for publisher (e.g. split pages and assets). Therefore
 - It relies on lots of HTML5/CSS3 features, such that only latest browsers are supported.
 - There are lots of ugly HTML element for adjusting the layout, so the source can never be human readable

2. The crucial part is font manipulation. The most difficult and important work in pdf2htmEX is to convert the font into web-friendly formats, together with proper re-encoding. Without which pixel-wise accuracy can never be achieved. For example, annotation links (I mean the borders) produced by pdftohtml are not likely to work since the text are usually in the wrong positions. Also printing is supported. Due to this, FontForge is heavily used, I don't think it's appropriate for poppler to rely on it (or is it?)
 - FontForge has never been, although improved recently, binary-linking friendly. There has been no documentations about header files. So basically all what I've been doing are hacking. So sure if this may meet the quality requirements of poppler.
 - Font conversion might be illegal (regional)? I remember reading old email archives about font handling in poppler, which had been rejected.

3. There have been lots of tricks and hacks for HTML, which have made the codebase complicated enough to be separated (IMHO). It may not be contained in 1-2 files in the util/ folder.

In case you would like to take a glance of pdf2htmlEX:
Here is a demo: http://coolwanglu.github.com/pdf2htmlEX/demo/demo.html
And here is the project page: https://github.com/coolwanglu/pdf2htmlEX

I would very much like to contribute some parts back into pdftohtml, but unfortunately, they cannot work in pdftohtml, since almost everything depend on proper font conversion.
Comment 11 Lu Wang 2013-04-24 10:13:05 UTC
Created attachment 78420 [details] [review]
Expose library, install headers for poppler-cairo

This patch link poppler.so with poppler-cairo.so
And Cairo*.h are installed.

Please review.
Thanks!



- Lu Wang
Comment 12 Lu Wang 2013-04-24 10:13:54 UTC
Actually I don't know why cairo has been hidden in the first place.
So if the patch does anything stupid, please kindly remind me.
Comment 13 Lu Wang 2013-09-17 08:43:47 UTC
Can any one review the patch please?
It is not quite comfortable to keep merging upstream patches to a local copy.
Comment 14 GitLab Migration User 2018-08-20 21:48:33 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/83.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.