Bug 87764 - EDITING : Impossible opening *.doc with OCR images inside.
Summary: EDITING : Impossible opening *.doc with OCR images inside.
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version: 4.3.4.1 release
Hardware: Other All
: high major
Assignee: Not Assigned
QA Contact:
URL:
Whiteboard: bibisected
Keywords: bisected, regression
Depends on:
Blocks:
 
Reported: 2014-12-27 09:55 UTC by FMJ Vezelay
Modified: 2014-12-28 03:26 UTC (History)
6 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Text file made with scan / OCR process, .doc. (567.00 KB, application/msword)
2014-12-27 09:55 UTC, FMJ Vezelay
Details

Description FMJ Vezelay 2014-12-27 09:55:58 UTC
Created attachment 111394 [details]
Text file made with scan / OCR process, .doc.

Hello,
I try to open the .doc attached file, with ocr extracts; it may be have done with Microsoft Word. LibreOffice generate many pages, in thin columns and then it blocks!
Apache Ooo manage to open it quite correctly, why LibreOffice cannot ?
Comment 1 Robinson Tryon (qubit) 2014-12-27 14:49:04 UTC
Comment on attachment 111394 [details]
Text file made with scan / OCR process, .doc.

fix mimetype
Comment 2 Robinson Tryon (qubit) 2014-12-27 15:00:49 UTC
(In reply to FMJ Vezelay from comment #0)
> I try to open the .doc (attachment 111394 [details]), with ocr extracts; 

Working from an OCR source can be always be challenging...

> it may be have done
> with Microsoft Word. LibreOffice generate many pages, in thin columns and
> then it blocks!

TESTING on Ubuntu 14.04:

In LO 4.4.0.1, I see a large number of pages (~180), and lots of content in thin columns
In LO 3.5.7.2, I see ~109 pages, many with content in 2 columns in landscape mode.

In both, the document definitely makes LibreOffice run slowly. The layout in 3.5 looks A LOT better, so I'm going to tag this as a regression.

Keywords -> regression
Whiteboard -> bibisectRequest
Status -> NEW
Comment 3 Jay Philips 2014-12-27 18:52:28 UTC
The thin column problem happened in 4.3 as 4.2.6 is fine and it has 92 pages when you open page preview after repagination.

Version: 4.2.6.2
Build ID: 185f2ce4dcc34af9bd97dec29e6d42c39557298f
Comment 4 Robinson Tryon (qubit) 2014-12-27 19:21:46 UTC
(In reply to Jay Philips from comment #3)
> The thin column problem happened in 4.3 as 4.2.6 is fine and it has 92 pages
> when you open page preview after repagination.

I just tested LO 4.3.2.2, and it's got the same 4 skinny columns problems. Narrowing-down on the problem, but I'm cc'ing the (bi)bisect maestro here to figure out what went wrong. If we work quickly, we can take this bug from opening -> commit identified in about one day ;-)
Comment 5 Matthew Francis 2014-12-28 02:26:37 UTC
Bibisect results from 43all and 44:
In the course of history this has been broken, fixed and then broken again, as summarised below


43all: Broken at
[8a2068ec09e531c6943ef0f090bd02a1cab565b7] source-hash-5218c0d6a8171400bee0d972ff05757849df4d19

43all: Fixed at
[251dbe932a666e83c91816fcf755a4c3be51e078] source-hash-fff4d120866a0be3cd8185f2c67bb9f59b1a6a3f

44: Broken at
[626531d9052fe067359170d41bd943b59766b551] source-hash-3d3401a6397e893808309ec374f5d8f890144906
Comment 6 Matthew Francis 2014-12-28 03:26:48 UTC
The most recent breakage of the attached file seems to have appeared at the below commit

Adding a Cc: to l.lunak@collabora.com. Could you shed any light on what's going on with this bug? Thanks


commit c5ed52b1cd6f22787c94bec035ceecf9e1da3271
Author: Luboš Luňák <l.lunak@collabora.com>
Date:   Mon Jul 21 10:56:52 2014 +0200

    ww8import create a pagedesc if continuous section changes margins (bnc#875383)
    
    This is similar to what writerfilter does. MSWord can have one page with several
    different margins, which are saved using continuous sections, which causes all
    kinds of trouble, because either we treat them as Writer sections, which means
    we lose some of the data, or we treat them as Writer page styles, which causes
    spurious page breaks if in the wrong place. Either option has its problems, but
    here it seems slightly better to go for keeping the data and hoping the page
    break will be in a place where a break will be anyway.
    
    Change-Id: I8f52aa820750da6788ea04180a15ac334f6bf87b


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.