Bug 31633

Summary: Split long paragraphs inside flat ODT files (`.fodt`)
Product: LibreOffice Reporter: Gioele Barabucci <gioele>
Component: LibreofficeAssignee: Not Assigned <libreoffice-bugs>
Status: CLOSED FIXED QA Contact:
Severity: enhancement    
Priority: medium CC: gautier.sophie, thb
Version: unspecified   
Hardware: All   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: XSLT templates to split lines inside `<text:p>` (based on 'odfflatxmlexport.xsl')
XSLT templates to split lines inside `<text:p>` and `<text:span>`

Description Gioele Barabucci 2010-11-15 04:42:04 UTC
Flat ODT files usually contain very long paragraphs. As each `<text:p>` is put on its own line, these `.fodt` files end up containing very long lines. Long lines in XML files cause problems when using flat ODT files in text-oriented SCM systems like git or when opening these files in XML editors.

A very simple yet XML-compliant approach to make these lines shorter would be to, during the XML serialization, output a newline instead of a space every N spaces (only inside paragraphs, if you want to be conservative). Files created using this method are, from an XML point of view, equivalent to the currently created `.fodt` files. Their files size is also identical.
Comment 1 Gioele Barabucci 2010-11-15 05:12:34 UTC
Created attachment 40283 [details]
XSLT templates to split lines inside `<text:p>` (based on 'odfflatxmlexport.xsl')

This XSLT stylesheet adds a template for `<text:p>` elements; it replaces long lines in the text nodes with shorter lines.

This is an XSLT 1.0 stylesheet. It could be converted in a much more concise XSLT 2.0 stylesheet but I preferred to keep using XSLT 1.0 as that is the version that 'odfflatxmlexport.xsl' originally used.
Comment 2 Thorsten Behrens 2010-11-17 09:26:58 UTC
Wow, very nice idea! replaced the filter/source/odfflatxml/odfflatxmlexport.xsl with it, works great. Two things: we'd ask you to license this under LGPLv3+ / MPL 1.1 (http://www.freedesktop.org/wiki/Software/LibreOffice/LicenseHeader). And, maybe we could have a line length (the usual ~70 chars come to mind), instead of a max number of spaces per para?
Comment 3 Gioele Barabucci 2010-11-18 02:36:22 UTC
Created attachment 40365 [details] [review]
XSLT templates to split lines inside `<text:p>` and `<text:span>`

New patch in git-format-patch form.

The new separate XSLT template file contains an ISC license header.
Comment 4 Gioele Barabucci 2010-11-18 02:38:45 UTC
Happy to see you liked my modifications. I added a similar template that also splits the text inside `<text:span>` elements.

License: if possible, I would like to release it under the permissive ISC license (I modified the suggested header to fit the ISC license). It that is a problem, I do not mind releasing it under the usual LibreOffice license.

Splitting around the 70th character: it is doable, but I would prefer to see this file integrated in the git repo first, and then make additional modifications on top of that.
Comment 5 Michael Meeks 2010-11-25 04:49:05 UTC
Thanks Gioele - I updated the license block there to LGPLv3+/MPL (but I notice the original didn't even have a license block) - perhaps it's better just to have none ?

I've committed it anyhow - many thanks - it should make the output much prettier and more readable :-)

> Splitting around the 70th character: it is doable, but I would prefer to see
> this file integrated in the git repo first, and then make additional
> modifications on top of that.

Thanks again ! really good to have you working on this sort of polish :-)
Comment 6 sophie 2011-01-13 08:35:09 UTC
Closing - Sophie
Comment 7 Björn Michaelsen 2011-12-22 05:36:08 UTC
Remove infoprovider from closed and resolved bugs.
Comment 8 Björn Michaelsen 2011-12-22 05:52:33 UTC
RESOLVED, FIXED or CLOSED bugs cant be KEYWORD NEEDINFO.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.