Bug 23551

Summary: Wishlist: psfrag compatible postscript output
Product: cairo Reporter: Ferdinand Rau <rauferd>
Component: postscript backendAssignee: Adrian Johnson <ajohnson>
Status: RESOLVED FIXED QA Contact: cairo-bugs mailing list <cairo-bugs>
Severity: enhancement    
Priority: medium CC: daniel.hornung, freedesktop, gtdev, heng, mike, rauferd, sv
Version: 1.8.6   
Hardware: All   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: patch
svg2eps.py

Description Ferdinand Rau 2009-08-27 04:53:24 UTC
psfrag is a tool used by LaTeX users to subsitute text in figures with
formatted LaTeX text, equations or anything else. psfrag relies on the
text being exported as one word, because it simply searches and replaces
»tags« in the ascii file describing the figure.
Since cairo substitutes text by glyphs one character at a time, includes
subsets of the fonts used and uses unicode notation even for purely ascii
text, output produced by Cairo can't be parsed by psfrag.

A typical postscipt command, as psfrag expects it:
> (Text) show

Cairo instead does it like this:
> (...)
> Encoding 1 /uni0054 put
> Encoding 2 /uni0065 put
> Encoding 3 /uni0078 put
> Encoding 4 /uni0074 put
> (...) + 200 Lines of inclueded glyphs

I agree that your approach has many advantages , but perhaps it would be
possible to introduce some compatability option to export text as-is.

psfrag is a very powerful tool and it would be pity if the output of
all applications build on top of Cairo couldn't be used together with it.
For more information on psfrag, see:
http://www.ctan.org/tex-archive/help/Catalogue/entries/psfrag.html

I stumbled upon this, since the upcoming version ( 0.47) of Inkscape
(the popular vector graphics editor, see: inkscape.org) will support
Cairo only for exporting to ps, eps, pdf. When I tested this version,
its export didn't work with my LaTeX documents anymore.
Comment 1 Daniel Hornung 2009-10-27 07:09:05 UTC
This one just struck me hard when upgrading Inkscape to 0.47 (which seems to come with a this cairo version for Mac).  I think much more users will be affected by this "regression" (when relying on PS analyzing software), more or less everyone making figures for LaTeX with Inkscape, once Inkscape 0.47 is released as stable.

See also the discussion on https://bugs.launchpad.net/inkscape/+bug/375323/
Comment 2 Sebastian Vorköper 2009-11-15 02:21:28 UTC
Hello,

I have exactly the same problem as described above.
For me, it would be perfectly if cairo can support a "native text" option.
This way, we are able to get the text as pure ascii text in the eps/pdf for people who are using things like psfrag and other programs, which rely on text as placeholders.

Thank you very much.

Comment 3 Adrian Johnson 2009-11-27 23:40:41 UTC
I did some testing with psfrag. Using the tag "abc". The PS output of this text is:

<010203>Tj

if I change this to:

(abc)Tj

psfrag works.

So the solution here is for cairo to put ASCII text into a separate subset and use an identity mapping between the character code and glyph index.
Comment 4 Behdad Esfahbod 2009-11-30 16:36:03 UTC
Adrian: We should consider the effect on European languages.  Changing subsets for every diacritic mark sounds a bit undesirable.

Maybe we can analyze the whole page before deciding whether to use direct ASCII mapping?  It's true that the font is shared by the entire document, but the first page it's used should give us enough context to make a good enough decision.
Comment 5 Adrian Johnson 2009-11-30 23:51:52 UTC
I was thinking of using one of the PDF builtin encodings like WinAnsiEncoding for the 8-bit subset. I want to avoid custom 8-bit encodings in PDF due to buggy PDF readers. Before cairo switched to CID fonts we were getting bug reports as a result of PDF readers that didn't handle custom encodings well, particularly when printing.

I don't really like the idea of choosing the subsetting based on the first page. It is too unpredictable and likely to result in bug reports where the user complains that X stopped working after changing something in the document. It would be better to add API for controlling the subsetting. That way the application which should have visibility of all characters used in the document can make the decision.
Comment 6 Adrian Johnson 2010-01-26 04:01:39 UTC
Here is a branch with an experimental implementation of 8-bit latin subsets for PS/PDF:

  http://cgit.freedesktop.org/~ajohnson/cairo/log/?h=latin-subsets

PostScript output is working but PDF has some issues such as latin TrueType 
subsets not working in acroread or gs and latin CFF subsets not yet implemented.

You should be able to test this with Inkscape to see if it generates output suitable for psfrag.
Comment 7 Henry Gomersall 2010-02-01 05:35:11 UTC
(In reply to comment #6)
> You should be able to test this with Inkscape to see if it generates output
> suitable for psfrag.
> 

Thanks for the effort. I can get your code to partially work. It yields the following snippet:

BT
32 0 0 32 486.670098 234.193408 Tm
/f-0-0 1 Tf
(f\(t\))Tj
ET
BT
32 0 0 32 71.442998 609.309937 Tm
/f-0-0 1 Tf
[(Multiple sour)19(ces)]TJ
ET
BT
32 0 0 32 73.681695 349.094629 Tm
/f-0-0 1 Tf
[(Measur)18(ed )]TJ
/f-0-1 1 Tf
<01>Tj
/f-0-0 1 Tf
[(eld f\(t\))]TJ
ET

the f(t) (line 4) is correctly interpreted by psfrag, but the other 2 phrases seem to have been broken up. It should be "Measured sources" and "Measured field f(t)".
Comment 8 Henry Gomersall 2010-02-01 05:38:13 UTC
(In reply to comment #7)
> the f(t) (line 4) is correctly interpreted by psfrag, but the other 2 phrases
> seem to have been broken up. It should be "Measured sources" and "Measured
> field f(t)".
> 

Apologies, that should be "Multiple sources" and "Measured field f(t)". Incidentally, that is how it is stored in the SVG file.
Comment 9 Adrian Johnson 2010-02-01 12:25:51 UTC
Created attachment 32976 [details] [review]
patch

It looks like a bug in Inkscape. Testing with pango-view:

$ echo "The quick brown fox jumps over the lazy dog." > test.txt
$ pango-view -o test.ps test.txt
$ tail -12 test.ps
12 0 0 12 10 12.861328 Tm
/f-0-0 1 Tf
(The quick brown fox)Tj
10.382812 0 Td
( jumps over the lazy)Tj
10.209473 0 Td
( dog.)Tj
ET
Q Q
showpage
%%Trailer
%%EOF

So the maximum PDF kerning value of 10 is too conservative. Apply the attached patch to increase it too 100. The output is now:

$ tail test.ps
0 g
BT
12 0 0 12 10 12.861328 Tm
/f-0-0 1 Tf
(The quick brown fox jumps over the lazy dog.)Tj
ET
Q Q
showpage
%%Trailer
%%EOF

I used Inkscape to create an SVG file with the same text:

$ tail drawing.svg
       style="font-size:40px;font-style:normal;font-weight:normal;fill:#000000;fill-opacity:1;stroke:none;font-family:DejaVu Sans;-inkscape-font-specification:DejaVu Sans"><flowRegion
         id="flowRegion2818"><rect
           id="rect2820"
           width="557.36444"
           height="166.96379"
           x="135.04425"
           y="254.37344" /></flowRegion><flowPara
         id="flowPara2822"
         style="font-size:20">The quick brown fox jumps over the lazy dog.</flowPara></flowRoot>  </g>
</svg>

$ inkscape -P drawing.ps drawing.svg
$ tail drawing.ps
0 g
BT
16 0 0 16 108.025 622.952038 Tm
/f-0-0 1 Tf
[(The qu)-3(ick br)19(own fo)30(x jumps over)-3( the lazy )-3(dog.)]TJ
ET
Q Q
showpage
%%Trailer
%%EOF


The problem here is for some reason Inkscape is not positioning glyphs at their natural glyph advances. As a result text will not appear in the PS output as a contiguous string.
Comment 10 Henry Gomersall 2010-02-04 07:33:08 UTC
Created attachment 33068 [details]
svg2eps.py
Comment 11 Henry Gomersall 2010-02-04 07:34:15 UTC
(In reply to comment #9)
> The problem here is for some reason Inkscape is not positioning glyphs at their
> natural glyph advances. As a result text will not appear in the PS output as a
> contiguous string.
> 

So I've written a little python app that outputs an EPS file from an SVG file via a cairo surface (attached above). It doesn't seem to work as intended, even linking against your modified cairo libs. Is this an overly naive way to do things?
Comment 12 Adrian Johnson 2010-02-05 00:35:32 UTC
(In reply to comment #11)
> So I've written a little python app that outputs an EPS file from an SVG file
> via a cairo surface (attached above). It doesn't seem to work as intended, even
> linking against your modified cairo libs. Is this an overly naive way to do
> things?

Creating a "hello world" test case with Inkscape than running your svg2eps results in a EPS file that displays a black rectangle. There is no text in the file. Running rsvg-view also shows a black rectangle. Looks like a bug in rsvg.

psfrag should work with Inkscape and my patch if you keep the tag name short enough that it does not get split and avoid the "fi" ligature. ie use names like "tag1", "tag2", "tag3".
Comment 13 Adrian Johnson 2010-10-01 04:23:55 UTC
Fixed in master.
Comment 14 Adrian Johnson 2010-11-22 04:54:18 UTC
*** Bug 31834 has been marked as a duplicate of this bug. ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.