Created attachment 67993 [details]
Not sure why it has been hidden (not installed even with --enable-xpdf-headers)
Guess not harmful to expose it for hacking.
I just realized that the entire poppler-cairo is hidden.
CairoOutputDev.h is internal and not part of the public API. The glib API is the public interface to the cairo backend. Is there anything missing from the glib API that you need?
I'm writing a pdf->html converter (pdf2htmlEX) with font extraction etc.
There are a few things I need CairoOutputDev
- Convert Type 3 fonts to SVG fonts
- Convert Backgrounds to SVG images (without text)
So basically I need to examine every PDF element, instead of a general PDF rendering.
I don't know why CairoOutputDev is not exposed as SplashOutputDev
Why is this one marked as Won't Fix?
Currently SplashOutputDev is exposed, which is very handy.
I'm going to create a patch for CairoOutputDev, such that I don't have to sync my copy of those files all the time.
What do you think? Are there any reason that you might be against this?
hmm, we are installing other internal headers when building with xpdf headers enabled, we discourage its usage, but since we have the option after all, I don't see why not installing the cairo internal headers when building with cairo backend.
Comment on attachment 67993 [details]
>diff --git a/poppler/Makefile.am b/poppler/Makefile.am
>index 3a5f4ca..2ecc97b 100644
>@@ -190,6 +190,7 @@ poppler_include_HEADERS = \
> BuiltinFont.h \
> BuiltinFontTables.h \
> CachedFile.h \
>+ CairoOutputDev.h \
> Catalog.h \
> CharCodeToUnicode.h \
> CMap.h \
CairoOutputDev.h should only be included when building with cairo backend.
Created attachment 77246 [details] [review]
If nobody objects I'll commit this patch.
(In reply to comment #7)
> Created attachment 77246 [details] [review] [review]
> If nobody objects I'll commit this patch.
Hi, Thank you very much for your reply.
Unfortunately, the problem is not as easy as `installing CairoOutputDev.h`. In previous comments I've submitted a patch but then disabled it. I need also those header for CairoFontEngine.h. And what's more, `poppler-cairo` is not installed separately.
The reason I need those internal headers is that, I'm writing `pdf2htmlEX`, something like pdftohtml in poppler but more powerful. I need to dig deep into PDF elements so the general interfaces are far from enough. Currently I want to convert Type 3 fonts into SVG fonts, and CairoFontEngine is doing something very similar.
I'm aware that those headers are not officially supported, but I'll really appreciate it if they can be exposed.
(In reply to comment #8)
> The reason I need those internal headers is that, I'm writing `pdf2htmlEX`,
> something like pdftohtml in poppler but more powerful.
Isn't it possible to improve pdftohtml instead of writing a different tool?
(In reply to comment #9)
> (In reply to comment #8)
> > The reason I need those internal headers is that, I'm writing `pdf2htmlEX`,
> > something like pdftohtml in poppler but more powerful.
> Isn't it possible to improve pdftohtml instead of writing a different tool?
In fact pdf2htmlEX was based on pdftohtml in a very early stage. I have thought about improving pdftohtml directly, such that I can always enjoy all the internal stuffs. but there might be a few problems:
1. Seems that pdftohtml aims to provide a (source) human-readable HTML document, which is compatible with slightly older browsers; but the target of pdf2htmlEX is to provide a pixel-wise accurate HTML document. which is also optimized for publisher (e.g. split pages and assets). Therefore
- It relies on lots of HTML5/CSS3 features, such that only latest browsers are supported.
- There are lots of ugly HTML element for adjusting the layout, so the source can never be human readable
2. The crucial part is font manipulation. The most difficult and important work in pdf2htmEX is to convert the font into web-friendly formats, together with proper re-encoding. Without which pixel-wise accuracy can never be achieved. For example, annotation links (I mean the borders) produced by pdftohtml are not likely to work since the text are usually in the wrong positions. Also printing is supported. Due to this, FontForge is heavily used, I don't think it's appropriate for poppler to rely on it (or is it?)
- FontForge has never been, although improved recently, binary-linking friendly. There has been no documentations about header files. So basically all what I've been doing are hacking. So sure if this may meet the quality requirements of poppler.
- Font conversion might be illegal (regional)? I remember reading old email archives about font handling in poppler, which had been rejected.
3. There have been lots of tricks and hacks for HTML, which have made the codebase complicated enough to be separated (IMHO). It may not be contained in 1-2 files in the util/ folder.
In case you would like to take a glance of pdf2htmlEX:
Here is a demo: http://coolwanglu.github.com/pdf2htmlEX/demo/demo.html
And here is the project page: https://github.com/coolwanglu/pdf2htmlEX
I would very much like to contribute some parts back into pdftohtml, but unfortunately, they cannot work in pdftohtml, since almost everything depend on proper font conversion.
Created attachment 78420 [details] [review]
Expose library, install headers for poppler-cairo
This patch link poppler.so with poppler-cairo.so
And Cairo*.h are installed.
- Lu Wang
Actually I don't know why cairo has been hidden in the first place.
So if the patch does anything stupid, please kindly remind me.
Can any one review the patch please?
It is not quite comfortable to keep merging upstream patches to a local copy.