Bug 59704 - Odt to html conversion with characters Iranian / Arabic
Summary: Odt to html conversion with characters Iranian / Arabic
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
4.0.0.1 rc
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-01-22 10:10 UTC by isaric
Modified: 2013-11-17 17:15 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
odt original (20.11 KB, application/vnd.oasis.opendocument.text)
2013-01-22 10:10 UTC, isaric
Details
Screenshot : orginal and converted doc (18.77 KB, image/png)
2013-01-24 14:38 UTC, pierre-yves samyn
Details
Rendering of master's XHTML export (13.72 KB, image/png)
2013-01-26 12:39 UTC, Urmas
Details

Note You need to log in before you can comment on or make changes to this bug.
Description isaric 2013-01-22 10:10:03 UTC
Created attachment 73438 [details]
odt original

I know too little English so I hope to be in the right place.

If I turn one odt to html

I am having problems with newline characters and inversion.

odt original
http://isaric.cof.free.fr/LibO/Test_Second%20part%20in%20Persian.odt
html
http://isaric.cof.free.fr/LibO/Test_Second%20part%20in%20Persian.html
Comment 1 Urmas 2013-01-22 13:44:29 UTC
As a work-around, remove all "padding:100%;" occurrences in the result file.

It has been fixed in master.
Comment 2 pierre-yves samyn 2013-01-24 14:37:20 UTC
Hello

I reproduce with windows XP pro & windows 7 64bits
with Version 4.0.0.1 (Build ID: 527dba6f6e0cfbbc71bd6e7b88a52699bb48799)

(In reply to comment #1)
> As a work-around, remove all "padding:100%;" occurrences in the result file.

Another work-around is File> Export> XHTML 

However, both are not perfect : as shown in the attached screenshot, the return line of paragraph preceding  "o friend" is wrong: right-aligned in the original document, left in the converted document


> It has been fixed in master.

Can you confirm that the correction of the problem above is included in this version?
If not, would it not reopen this Issue?

Regards
Pierre-Yves
Comment 3 pierre-yves samyn 2013-01-24 14:38:21 UTC
Created attachment 73586 [details]
Screenshot : orginal and converted doc
Comment 4 Urmas 2013-01-26 12:30:10 UTC
Looks like a Firefox bug to me.
Comment 5 Urmas 2013-01-26 12:39:14 UTC
Created attachment 73682 [details]
Rendering of master's XHTML export
Comment 6 isaric 2013-02-26 09:40:10 UTC
I's comments on website
https://bugzilla.mozilla.org/show_bug.cgi?id=842121
of Jonathan Kew :
"AFAICT, I don't think this is a Firefox bug.

The "problem" is that the paragraph in question is being laid out with left-to-right directionality, even though it happens to contain Persian text. Hence, the paragraph indent appears at the left-hand end of the first line; and the last (partial) line is aligned to the left. This is correct layout for a left-to-right paragraph, and is not changed by the fact that some of the words - even as many as 100% of them! - in the paragraph happen to be in a right-to-left script.

Note that there's nothing in the HTML suggesting that this paragraph should in fact be laid out in right-to-left mode. The <body> element is explicitly tagged with dir="ltr"; this is inherited by all the paragraphs within it, as they do not override it.

The layout would be "fixed" if that Persian paragraph were tagged with dir="rtl" in the source, or if it included direction:rtl in its CSS styling. But apparently the LibreOffice export failed to do either of those things, even though the paragraph was presumably presented in RTL layout within LO.

Another possibility would be to tag the paragraphs with dir="auto", which would make them infer directionality from the text content. This would have provided the desired result here, though it's possible that it might fail in more complex cases where the paragraph contains a -mixture- of LTR and RTL text.

It looks as though IE must be doing something like dir="auto" here, even though the document is explicitly marked (on the <body> element) as dir="ltr". While this is giving the desired result in this particular case, I don't think it is spec-conformant; the real problem is the deficiency in LO's export.

(cc-ing smontagu for confirmation of my understanding here, as he knows more about these properties.)"
Comment 7 Urmas 2013-02-26 09:56:39 UTC
It has "writing-mode:rl-tb" CSS attribute which supersedes "direction:rtl". Firefox not supporting is a problem with Firefox.

If you need the latter attribute explicitly, turn this into an enhancement request.
Comment 8 isaric 2013-02-26 19:37:31 UTC
New comment (5) of Simon Montagu on :
https://bugzilla.mozilla.org/show_bug.cgi?id=842121

I confirm what Jonathan says in comment 4.

"Meanwhile there is an interesting response in https://bugs.freedesktop.org/show_bug.cgi?id=59704#c7: 
 'It has "writing-mode:rl-tb" CSS attribute which supersedes "direction:rtl". Firefox not supporting is a problem with Firefox.'

I don't think it's accurate to say that writing-mode:rl-tb supersedes direction:rtl.
It's true that writing-mode:lr-tb and rl-tb appeared in an earlier version of CSS3 Text (http://www.w3.org/TR/2003/CR-css3-text-20030514/), but they are no longer in the current version (http://www.w3.org/TR/css3-writing-modes/), where writing-mode only specifies the block flow direction and not the inline progression direction. As far as I know, the old syntax of writing-mode has only ever been supported by IE.

I would be happy if someone could copy this comment into the LO bug report, since I don't have a bugzilla account at freedesktop.org"