Summary: | EDITING : Impossible opening *.doc with OCR images inside. | ||
---|---|---|---|
Product: | LibreOffice | Reporter: | FMJ Vezelay <retraitesvezelay> |
Component: | Writer | Assignee: | Not Assigned <libreoffice-bugs> |
Status: | NEW --- | QA Contact: | |
Severity: | major | ||
Priority: | high | CC: | cno, fdbugs, l.lunak, philipz85, qubit, retraitesvezelay |
Version: | 4.3.4.1 release | Keywords: | bisected, regression |
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | bibisected | ||
i915 platform: | i915 features: | ||
Attachments: | Text file made with scan / OCR process, .doc. |
Comment on attachment 111394 [details]
Text file made with scan / OCR process, .doc.
fix mimetype
(In reply to FMJ Vezelay from comment #0) > I try to open the .doc (attachment 111394 [details]), with ocr extracts; Working from an OCR source can be always be challenging... > it may be have done > with Microsoft Word. LibreOffice generate many pages, in thin columns and > then it blocks! TESTING on Ubuntu 14.04: In LO 4.4.0.1, I see a large number of pages (~180), and lots of content in thin columns In LO 3.5.7.2, I see ~109 pages, many with content in 2 columns in landscape mode. In both, the document definitely makes LibreOffice run slowly. The layout in 3.5 looks A LOT better, so I'm going to tag this as a regression. Keywords -> regression Whiteboard -> bibisectRequest Status -> NEW The thin column problem happened in 4.3 as 4.2.6 is fine and it has 92 pages when you open page preview after repagination. Version: 4.2.6.2 Build ID: 185f2ce4dcc34af9bd97dec29e6d42c39557298f (In reply to Jay Philips from comment #3) > The thin column problem happened in 4.3 as 4.2.6 is fine and it has 92 pages > when you open page preview after repagination. I just tested LO 4.3.2.2, and it's got the same 4 skinny columns problems. Narrowing-down on the problem, but I'm cc'ing the (bi)bisect maestro here to figure out what went wrong. If we work quickly, we can take this bug from opening -> commit identified in about one day ;-) Bibisect results from 43all and 44: In the course of history this has been broken, fixed and then broken again, as summarised below 43all: Broken at [8a2068ec09e531c6943ef0f090bd02a1cab565b7] source-hash-5218c0d6a8171400bee0d972ff05757849df4d19 43all: Fixed at [251dbe932a666e83c91816fcf755a4c3be51e078] source-hash-fff4d120866a0be3cd8185f2c67bb9f59b1a6a3f 44: Broken at [626531d9052fe067359170d41bd943b59766b551] source-hash-3d3401a6397e893808309ec374f5d8f890144906 The most recent breakage of the attached file seems to have appeared at the below commit Adding a Cc: to l.lunak@collabora.com. Could you shed any light on what's going on with this bug? Thanks commit c5ed52b1cd6f22787c94bec035ceecf9e1da3271 Author: Luboš Luňák <l.lunak@collabora.com> Date: Mon Jul 21 10:56:52 2014 +0200 ww8import create a pagedesc if continuous section changes margins (bnc#875383) This is similar to what writerfilter does. MSWord can have one page with several different margins, which are saved using continuous sections, which causes all kinds of trouble, because either we treat them as Writer sections, which means we lose some of the data, or we treat them as Writer page styles, which causes spurious page breaks if in the wrong place. Either option has its problems, but here it seems slightly better to go for keeping the data and hoping the page break will be in a place where a break will be anyway. Change-Id: I8f52aa820750da6788ea04180a15ac334f6bf87b |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 111394 [details] Text file made with scan / OCR process, .doc. Hello, I try to open the .doc attached file, with ocr extracts; it may be have done with Microsoft Word. LibreOffice generate many pages, in thin columns and then it blocks! Apache Ooo manage to open it quite correctly, why LibreOffice cannot ?