Created attachment 63817 [details] the rtf file to load in writer and then export. Problem description: I'm exporting an RTF document to xhtml. I load the RTF, select export, select xhtml The resulting xhtml document has two duplicate copies of the text of the document. I tried this in Word 2007, and it exported the document correctly. Steps to reproduce: 1. load attached rtf file 2. export to xhtml 3. see duplicate body sections in html file Current behavior: Expected behavior: I expect it to export teh text as it is in the rtf Platform (if different from the browser): Browser: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.47 Safari/536.11
On pc Debian x86-64, with master sources updated today I reproduced the problem. If I create an odt file from rtf, the text isn't present twice in odt. If I export the odt to xhtml, the whole text is present twice too.
I created a simple rtf file with LO containing just 1 word without any formatting. I exported it to XHTML, everything was ok. Something in the file seems to trigger the problem. Scott Derik : do you reproduce this problem with other rtf files ?
Yes I have. As I said in the initial posting, Word 2007, doesn't seem to have a problem exporting the file to html. I don't know if it's significant or not but the RTF's I'm converting were created 10-20 years ago. They are part of an archive we are converting to xml(TEI), using xhtml as an interim format in the conversion process. Neither LibreOffice nor MSOffice complain when loading them? I have more rtf's that exhibit the problem if needed. Scott
I installed unrtf (included in Debian repo) to test your file, it was ok. So I put back to New status.
For anyone interested: the code is in filter/source/xslt/odf2xhtml/export/xhtml/body.xsl and I think the problem is in the template matching draw:frame at line 889. It calls template createDrawFrame, which for some reason prints all the following siblings of the frame wrapped by a div. But after the draw:frame template is finished, the processing continues so the following siblings are processed (and printed) again. IHMO the best course of action is to abandon the crazy idea that XSLT is suitable tool for processing ODF and rewrite the filter in C++.
XSLT is a great tool for processing xml formatted content, as long as the content is well formed. If its not it can get very confused. I have worked around the problem by using the "save as" instead of "export" option and selecting html as the target format. Interesting the problem doesn't occur when using "save as"...
(In reply to comment #6) > XSLT is a great tool for processing xml formatted content, as long as the > content is well formed. If it is not well-formed, it is not XML .-) Anyway, that is not what I meant. The problem is in the complexity of the ODF format and impedance mismatch with HTML. XSLT is just not suitable for the heavy processing that is necessary to do the transformation (and no, XSLT 2.0 is not a solution. XSLT 2.0 is another problem.) Any attempt to do it anyway just leads to the WORN (Write Once, Read Never) type of code we have today, where any fix creates two new bugs. > I have worked around the problem by using the "save as" instead of "export" > option and selecting html as the target format. > > Interesting the problem doesn't occur when using "save as"... AFAIK "save as" uses the old HTML export code from sw/source/filter/html . There is also writer2latex extension, that (despite its name) contains a filter for XHTML export too. It is written in Java and hopefully is in a better shape than the XSLT filter.
Hello. I reproduce the bug with LibreOffice 3.5.4 from Debian Wheezy. I can not reproduce it with LibreOffice 4.2.5 from Debian Wheezy backports and 4.3.0.2. I do not know which patch solves the problem hence I set bugstatus to RESOLVED WORKSFORME. Feel free to reopen it if you can reproduce the issue with LibreOffice 4.2.5 or later.