Created attachment 128605 [details] The pdf document used in the example output. When I separate a pdf into pages and then unite it again, fonts are duplicated. The attached document a.pdf uses a regular font in page 1, and in page 2 it uses the same regular font as well as a bold font. After separating and uniting, the united file contains two copies of the regular font. It would be nice if the tools removed the identical duplicated fonts. Details of an example are included below. Note that in the output to pdffonts, I removed the following columns which were common to all fonts: type encoding emb sub uni ----------------- ---------------- --- --- --- Type 1 Custom yes yes no File sizes and font information: $ wc -c a.pdf; pdffonts a.pdf 14244 a.pdf name object ID ------------------------------------ --------- ULOTVD+NimbusRomNo9L-Regu 4 0 EBGCWF+NimbusRomNo9L-Medi 10 0 $ pdfseparate a.pdf b%d.pdf $ wc -c b1.pdf; pdffonts b1.pdf 8404 b1.pdf name object ID ------------------------------------ --------- ULOTVD+NimbusRomNo9L-Regu 4 0 $ wc -c b2.pdf; pdffonts b2.pdf 15120 b2.pdf name object ID ------------------------------------ --------- ULOTVD+NimbusRomNo9L-Regu 4 0 EBGCWF+NimbusRomNo9L-Medi 10 0 $ pdfunite b1.pdf b2.pdf c.pdf $ wc -c c.pdf; pdffonts c.pdf 22916 c.pdf name object ID ------------------------------------ --------- ULOTVD+NimbusRomNo9L-Regu 4 0 ULOTVD+NimbusRomNo9L-Regu 23 0 EBGCWF+NimbusRomNo9L-Medi 29 0 $ pdfseparate c.pdf d%d.pdf $ wc -c d1.pdf; pdffonts d1.pdf 8061 d1.pdf name object ID ------------------------------------ --------- ULOTVD+NimbusRomNo9L-Regu 4 0 $ wc -c d2.pdf; pdffonts d2.pdf 14778 d2.pdf name object ID ------------------------------------ --------- ULOTVD+NimbusRomNo9L-Regu 23 0 EBGCWF+NimbusRomNo9L-Medi 29 0 $ pdfunite d1.pdf d2.pdf e.pdf $ wc -c e.pdf; pdffonts e.pdf 23296 e.pdf name object ID ------------------------------------ --------- ULOTVD+NimbusRomNo9L-Regu 4 0 ULOTVD+NimbusRomNo9L-Regu 42 0 EBGCWF+NimbusRomNo9L-Medi 48 0
You know that the fonts are identical! But if You provide me a good algorithm which compares two embedded fonts and returns if they are identical or not, i.e. - the encoding is the same - all glyphs needed for both usages of the fonts are included - all glyphs have the same character width, height and looks the same and convince me that your algorithm works in all use cases, I will think about to provide a patch for pdfunite.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/76.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.