Created attachment 39271 [details] [review] proposed patch pdftops sets the bounding box but does not crop anything outside the bounding box. I have a pdf with a block of descriptive text outside the crop box, and the text gets into the eps (i.e. is not clipped out), whether or not I specify -nocrop. With pdftops from xpdf-3.02pl4, the text gets in only if I specify -nocrop. The pdf has a crop box that is smaller than the media box, and the text is in the space above the crop box but inside the media box. It looks like Page::makeBox() sets *crop to gFALSE if useMediaBox is false. pdftops calls doc->displayPages() with noCrop for the useMediaBox (because cropping should use the crop box, so noCrop should use the media box). Since noCrop defaults to gFalse (which means that pdftops crops by default), useMediaBox is false, and makeBox() sets *crop = gFalse. xpdf-3.02 has the same assignment of *crop = gFalse in Page.cc, but it stores the crop flag in GlobalParams (poppler does not), and it has a separate pageCrop flag (poppler reuses noCrop for pageCrop). --- poppler-30sep10/poppler/Page.cc- 2010-09-30 22:48:46.463386325 +0200 +++ poppler-30sep10/poppler/Page.cc 2010-10-08 00:02:27.034792873 +0200 @@ -678,7 +678,7 @@ *box = *mediaBox; } else { *box = *cropBox; - *crop = gFalse; + /* *crop = gFalse; */ } }
Created attachment 39274 [details] pdf file with a block of text outside the crop box run pdftops -eps pdfad.pdf then view pdfad.eps with a larger bounding box, either with options in the ps viewer or by increasing the height in the bounding box comments. The block of text above the image shows with poppler pdftops, but it should be cropped out by default, and it should show only if you run pdftops -eps -nocrop pdfad.pdf This causes problems in applications like lout http://savannah.nongnu.org/projects/lout that embed eps files without clipping them.
I don't see how that patch can make any difference since Page::makeBox is used in PSOutputDev.cc:3006 and the crop variable there is not used anymore after the call to makeBox, so what is not changing it going to do?
The patch to Page::makeBox() works. You can test it with the PDF attachment that I sent along with the patch. PSOutputDev::checkPageSlice() is not the only place that calls Page::makeBox(). Page::createGfx() also calls Page::makeBox() and then calls new Gfx() with "crop ? cropBox : (PDFRectangle *)NULL". The alternate patch below to Page::createGfx() also works for that PDF. --- poppler-30sep10-alt/poppler/Page.cc- 2010-09-30 22:48:46.463386325 +0200 +++ poppler-30sep10-alt/poppler/Page.cc 2010-10-13 19:29:46.109048269 +0200 @@ -427,8 +427,9 @@ rotate += 360; } + GBool tempCrop = crop; makeBox(hDPI, vDPI, rotate, useMediaBox, out->upsideDown(), - sliceX, sliceY, sliceW, sliceH, &box, &crop); + sliceX, sliceY, sliceW, sliceH, &box, &tempCrop); cropBox = getCropBox(); mediaBox = getMediaBox();
Created attachment 39416 [details] [review] alternate patch
I know there are more places. The code in Page::createGfx pases crop by pointer on purpose so it can get upadted, your code simply breaks that, it might gix some issue with printing to PS but i'm more than certain that it will break splash/cairo backends. The code is logically correct, if Page::makeBox goes by the last if, you don't need to crop since you are already using the cropbox as box for your rendering.
Yes, I realize that the patch might break something. I was just experimenting to track down the location of the problem. >if Page::makeBox goes by the last if, you don't need to crop since you are already using the cropbox as box for your rendering. By inspection of the generated EPS, it is more complicated. pdftops always writes the correct BoundingBox comment, but it writes the postscript code to create a clipping region only if "crop" is true. If you view the EPS and the viewer clips the display window to the BoundingBox, everything is OK, but if you print the EPS, the postscript clipping code is necessary or else things might show that should have been clipped out. This issue is a regression from the behavior of pdftops in xpdf. William
Downloaded xpdf-3.02 and the ps it generates crashes gs. Looked at the code and didn't see any noticeable difference (yes they use two flags and a they store one of them in globalparams, nothing that should make the ps writing code different from what i can see)
Thanks for testing with xpdf. I'm not sure why you had problems, because it works for me. I am using xpdf-3.02pl4, which I think is the latest release. I think that earlier versions work the same as this one. I am using gs9.00. I am running Fedora 13, and the distributed gs8.71 can also view the eps. I can send the eps files or gv screen captures or other information if you want. Here is what happens if I run normally and with -nocrop and then compare the results. "re" makes a rectangular path from four numbers. "/W" clips and then starts a new path. $ /usr/local/bin/pdftops-3.02pl4 -v pdftops version 3.02 Copyright 1996-2007 Glyph & Cog, LLC $ /usr/local/bin/pdftops-3.02pl4 -eps 20185846-cropbug.pdf crop.eps $ /usr/local/bin/pdftops-3.02pl4 -eps -nocrop 20185846-cropbug.pdf nocrop.eps $ ls -l crop.eps nocrop.eps -rw-rw-rw- 1 william william 1254692 Oct 13 22:40 crop.eps -rw-rw-rw- 1 william william 1254653 Oct 13 22:40 nocrop.eps $ diff -U 10 crop.eps nocrop.eps --- crop.eps 2010-10-13 22:40:14.022258605 +0200 +++ nocrop.eps 2010-10-13 22:40:18.782008510 +0200 @@ -412,22 +412,20 @@ 0 J 10 M 1 w /DeviceGray {} cs [0] sc /DeviceGray {} CS [0] SC false op false OP {} settransfer -7.8891 16.1876 415.7905 1009.8124 re -W q q 1 i 9.072 17.001 413.856 1008 re W /DeviceCMYK {} cs [0 0 0 0.2] sc false OP false op {} settransfer $ If you have gs9.00, try the patch --- gs9.00/base/gxccman.c- 2010-08-10 12:20:19.000000000 -0400 +++ gs9.00/base/gxccman.c 2010-09-28 15:50:14.053145974 -0400 @@ -609,7 +609,7 @@ gs_make_mem_mono_device(pdev, pdev->memory, target); rc_decrement_only(target, "gx_alloc_char_bits"); /* can't go to 0 */ /* Decrement the ICC profile also. Same device is getting reinitialized */ - rc_decrement(target->device_icc_profile,"gx_alloc_char_bits(icc profile)"); + if (target != NULL) rc_decrement(target->device_icc_profile,"gx_alloc_char_bits(icc profile)"); pdev->rc = rc; pdev->retained = retained; pdev->width = iwidth; If you have gs8.71, try the patch --- gs8.71/lib/pdf2dsc.ps- 2008-02-25 06:48:45.000000000 +0100 +++ gs8.71/lib/pdf2dsc.ps 2010-02-19 22:14:55.000000000 +0100 @@ -116,7 +116,7 @@ DSCfile PDFname write==only ( \(r\) file { DELAYSAFER { .setsafe } if } stopped pop\n) puts ( pdfopen begin\n) puts - ( copy_trailer_attrs\n) puts + ( process_trailer_attrs\n) puts (%%EndSetup\n) puts /.hasPageLabels false def % see "Page Labels" in the PDF Reference If you have gs8.70, try the patch --- gs8.70/Resource/Init/pdf_draw.ps- 2009-07-08 20:28:02.000000000 -0400 +++ gs8.70/Resource/Init/pdf_draw.ps 2009-08-17 17:42:54.000000000 -0400 @@ -262,7 +262,7 @@ } bdef /resolvehalftone { % <dict> resolvehalftone <halftone> - dup /HalftoneType get + dup /HalftoneType oget dup //htrdict exch .knownget { exch pop exec } { If gs crashes with a floating point exception, try the patch below (I can give you versions for other versions of gs). --- gs9.00/base/gsmisc.c- 2010-05-06 12:04:27.000000000 -0400 +++ gs9.00/base/gsmisc.c 2010-09-28 14:17:42.677102939 -0400 @@ -883,17 +883,25 @@ */ static double const_90_degrees = 90.; +/* const_90_degrees no longer works for GCC 4.1.2. + * As old fortran programmers know, to avoid rounding problems, + * real numbers must be compared as "fabs(x-y) < eps" + * and not for equality like "floor(x) == y". + */ + +#define is_integer(x) (fabs(floor((x) + 0.5) - (x)) <= 1.0e-15) + double gs_sin_degrees(double ang) { double quot = ang / const_90_degrees; - if (floor(quot) == quot) { + if (is_integer(quot)) { /* * We need 4.0, rather than 4, here because of non-ANSI compilers. * The & 3 is because quot might be negative. */ - return isincos[(int)fmod(quot, 4.0) & 3]; + return isincos[(int)fmod(floor(quot+0.5), 4.0) & 3]; } return sin(ang * (M_PI / 180)); } @@ -903,9 +911,9 @@ { double quot = ang / const_90_degrees; - if (floor(quot) == quot) { + if (is_integer(quot)) { /* See above re the following line. */ - return isincos[((int)fmod(quot, 4.0) & 3) + 1]; + return isincos[((int)fmod(floor(quot+0.5), 4.0) & 3) + 1]; } return cos(ang * (M_PI / 180)); } @@ -915,9 +923,9 @@ { double quot = ang / const_90_degrees; - if (floor(quot) == quot) { + if (is_integer(quot)) { /* See above re the following line. */ - int quads = (int)fmod(quot, 4.0) & 3; + int quads = (int)fmod(floor(quot+0.5), 4.0) & 3; psincos->sin = isincos[quads]; psincos->cos = isincos[quads + 1];
The last post is from a couple of years ago, but the problem seems to persist, at least in Libre-Office (https://bugs.freedesktop.org/show_bug.cgi?id=40163). Some recent "playing around" and experiments for workarounds can be seen in this LibreOfficeForum thread: http://libreofficeforum.org/node/4183 Is there any work on solving this issue, or is planned for an incoming version? Thanks, dror
When I build poppler pdftops for my own use, I continue to apply the patch that I suggested in comment 3. Can you check if that fixes the problem in LibreOffice?
Is this patch still needed? Can you push a rebased (i guess this one no longer applies) version?
Created attachment 70289 [details] [review] patch against the 19 Nov 12 git source I made a version of the patch against 19 Nov 12 source. The patch still works for me with the PDF that I attached previously. createGfx() calls makeBox() and then Gfx(). makeBox() sometimes sets crop to FALSE, and then createGfx() does not pass a crop box to Gfx(). The patch uses the original value of crop in the call to Gfx(). Some files, like the one that I attached, draw outside the crop box, so even if the PDF sets a crop box of a given size, you still need to generate the postscript commands to crop to that size. William
(In reply to comment #12) > Created attachment 70289 [details] [review] [review] > patch against the 19 Nov 12 git source > > I made a version of the patch against 19 Nov 12 source. The patch still > works for me with the PDF that I attached previously. > createGfx() calls makeBox() and then Gfx(). > makeBox() sometimes sets crop to FALSE, and then createGfx() does not pass a > crop box to Gfx(). > The patch uses the original value of crop in the call to Gfx(). > Some files, like the one that I attached, draw outside the crop box, so even > if the PDF sets a crop box of a given size, you still need to generate the > postscript commands to crop to that size. Right, this still does not make sense, you are creating a tempvalue so that the returned value for makeBox is not used, if that is what you really want, why just not remove that parameter? On the other hand if this is about the PS conversion i do think that it is the PSOutputDev that should be modified and for example cropping itself on the box passed in the state of startPage ?
>if that is what you really want, why just not remove that parameter? There are several calls to makeBox(). One of the calls to makeBox() in PSOutputDev::checkPageSlice() uses the crop parameter. >i do think that it is the PSOutputDev that should be modified I think that Page::createGfx() is the right place because it calls makeBox() (which returns both the box and the crop flag) and it makes the crop decision when it sets arguments to new Gfx(). The further that you separate the place that has the information from the place that makes the decision, the messier it gets. Wouldn't doing it anywhere else require splitting Page::createGfx() into two parts or making more internals of Page or Gfx public? William
(In reply to comment #14) > >if that is what you really want, why just not remove that parameter? > > There are several calls to makeBox(). One of the calls to makeBox() in > PSOutputDev::checkPageSlice() uses the crop parameter. Ah, right, didn't see them > > >i do think that it is the PSOutputDev that should be modified > > I think that Page::createGfx() is the right place because it calls makeBox() > (which returns both the box and the crop flag) and it makes the crop > decision when it sets arguments to new Gfx(). The further that you separate > the place that has the information from the place that makes the decision, > the messier it gets. > > Wouldn't doing it anywhere else require splitting Page::createGfx() into two > parts or making more internals of Page or Gfx public? Why? The issue as I see it is that you give the PSOutputDev a box, and it allows commands outside that box to be rendered, that is wrong and shall be fixed at the PSOutputDev level, not somewhere else.
>The issue as I see it is that you give the PSOutputDev a box, and it allows commands outside that box to be rendered, that is wrong and shall be fixed at the PSOutputDev level, not somewhere else. With the patch, pdftops crops by generating a single crop box in postscript at the beginning of the page. I think that xpdf (which works) does that also. Checking the crop box inside poppler before rendering each object could be expensive, and it could be tricky with text that was part inside and part outside. In theory, a pdf should not attempt to draw much outside the crop box other than a small amount of descriptive information (as in the example pdf that I attached) or registration marks for printing. The example pdf has an image of an ad that was intended to be placed on a newspaper page. The crop box marks the part of the image that should appear on the page. The objects outside the crop box are for proofing the image as a stand-alone item. I have the same problem as you that I have very little time. If you can tell me exactly where poppler should make the crop test, I can try it. The patch has been working for me for years, and one of the libreoffice people said that it worked for them also, and it is only a very small change. Isn't a bigger change more likely risk side-effects that could take a while to discover?
(In reply to comment #16) > >The issue as I see it is that you give the PSOutputDev a box, and it allows commands outside that box to be rendered, that is wrong and shall be fixed at the PSOutputDev level, not somewhere else. > > With the patch, pdftops crops by generating a single crop box in postscript > at the beginning of the page. I think that xpdf (which works) does that > also. Checking the crop box inside poppler before rendering each object > could be expensive, and it could be tricky with text that was part inside > and part outside. I never suggested checking the for each object. I suggested that PSOutputDev::startPage generates the cropbox with the given box in the state > > In theory, a pdf should not attempt to draw much outside the crop box other > than a small amount of descriptive information (as in the example pdf that I > attached) or registration marks for printing. The example pdf has an image > of an ad that was intended to be placed on a newspaper page. The crop box > marks the part of the image that should appear on the page. The objects > outside the crop box are for proofing the image as a stand-alone item. Yes I know what a crop box is for. > I have the same problem as you that I have very little time. If you can > tell me exactly where poppler should make the crop test, I can try it. The > patch has been working for me for years, and one of the libreoffice people > said that it worked for them also, and it is only a very small change. > Isn't a bigger change more likely risk side-effects that could take a while > to discover? Because to me the patch looks like a hack, yes, hacks work, but they are not proper fixes.
>I suggested that PSOutputDev::startPage generates the cropbox with the given box in the state I think that with the patch, the postscript clip is generated by calls in Gfx::Gfx() when Page::createGfx() calls new Gfx(). I agree that my patch is hacky. So I understand, you want to keep Gfx::Gfx() and have PSOutputDev::startPage() generate the clip line? Can startPage() use page->getCropBox() ? That would save passing it. Gfx::Gfx() calls out->startPage(pageNum, state); out->setDefaultCTM(state->getCTM()); before calling out->clip(state); so that the CTM is initialized before generating the clip. If PSOutputDev::startPage() sets up the clip, it would be generated before the CTM is initialized. Is that a problem? Since Page::createGfx() is generic to all output, while PSOutputDev::startPage() is specific to postscript, would moving the patch from createGfx() to startPage() mean other devices that wanted to crop to the cropbox would need to implement it in their own startPage()> William
(In reply to comment #18) > >I suggested that PSOutputDev::startPage generates the cropbox with the given box in the state > > I think that with the patch, the postscript clip is generated by calls in > Gfx::Gfx() when Page::createGfx() calls new Gfx(). Yes > I agree that my patch is hacky. So I understand, you want to keep > Gfx::Gfx() and have PSOutputDev::startPage() generate the clip line? Yes, well not startPage specifically but something in PSOutputDev and startPage seemed like a good place with a quick look > Can startPage() use page->getCropBox() ? That would save passing it. No, but you can use the pageBox of the state (ok, now i see that the GfxState takes the pageBox but doesn't really store it, but i wouldn't mind it getting stored if we need to) > > Gfx::Gfx() calls > out->startPage(pageNum, state); > out->setDefaultCTM(state->getCTM()); > before calling > out->clip(state); > so that the CTM is initialized before generating the clip. > > If PSOutputDev::startPage() sets up the clip, it would be generated before > the CTM is initialized. Is that a problem? Hmmm, that might be a problem yes, it is weird that this is done in two steps :-/ > Since Page::createGfx() is generic to all output, while > PSOutputDev::startPage() is specific to postscript, would moving the patch > from createGfx() to startPage() mean other devices that wanted to crop to > the cropbox would need to implement it in their own startPage()> No, other devices that crop to the cropbox already work, note that the situation you are mentioning is not cropping to the cropbox but using the cropbox as box for your rendering (the only way for Page::makeBox to set crop to false is when you say you don't want to use the mediabox) so effectively you don't need to crop since your box is already the cropbox, i understand you say that drawing outside the box when using the cropbox as box creates wrong renderings, and i think this is a bug in the PSOutputDev and that's where I'd prefer the fix to be located there. This is the situation we want to fix, right? Now another solution would be adding a virtual doINeedAClipAlways to OutputDev and make PSOutputDev say true and hook this up in the clip calls of Gfx::Gfx
Created attachment 70901 [details] [review] Alternate patch against the 1 Dec 12 git source that passes a crop flag This patch adds virtual GBool needClipToCropBox() to OutputDev and PSOutputDev. It returns gFalse in OutputDev and the value of a class variable in PSOutputDev. pdftops sets the flag when it creates a new PSOutputDev. Page::createGfx() checks out->needClipToCropBox(). This changes the signature of PSOutputDev(). If that is bad, I can make a function to set the flag. William
This is not really what i had in mind, but before i propose what i had in mind, can you explain why the "doEPS && !noCrop"? I thought we needed this all the time (i.e the initial patch doesn't seem to do anything with doEPS)
When pdftops makes a PS instead of an EPS, it already crops. The problem with not cropping happens only when it is making an EPS, so I set the force crop flag with doEPS && !noCrop. Page::makeBox() sets *crop = gFalse (which is the core of the problem) only when using the cropbox instead of the mediabox. William
Interesting, so this is only about EPS? Because i see in PSOutputDev::writeHeader case psModeEPS: the cropbox is being used to set the BoundingBox already Also i've run pdftops -eps file_in_this_bug.pdf output.eps and the result looks fine (to me) when opening such eps in gimp for example Could you attach an image of gimp opening the output of the and pointing to where the problem lies?
Created attachment 71362 [details] screen capture to show the problem The initial PDF that I attached to this bug is sufficient to reproduce the problem. After running pdftops, you need to view the EPS with a program that is capable of showing all of the image and not only what is inside the area specified in the %%BoundingBox comment. In the screen capture, I used the unpatched pdftops 0.18.4 distributed with Fedora 17. I made both a PS and an EPS and then viewed them with "gv". gimp might not be a good test, because as far as I know, it does not natively support PDF, PS or EPS, and so it must rasterize them, and it probably rasterizes them to the size in the %%BoundingBox comment, which means that you will never see anything outside the crop box. gv allows you to override the %%BoundingBox comment. I selected a large size that I used for previewing broadsheet newspaper pages. pdftops crops PS. If you try this with my test PDF and then compare the PS and EPS files, the PS has the additional code 0.5 55 translate 7 16 417 1010 re W immediately before the %%EndPageSetup comment. My guess is that someone thought that it was not necessary to crop the EPS because application embedding the EPS on a page would always crop it to the size in the %%BoundingBox. William
Created attachment 71452 [details] [review] Reduced version of your patch Here's a reduced version of your patch, what do you think?
Thanks, I tested your reduced patch, and it works fine for me.
Nice, commited!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.