Bug 48887 - FILESAVE: Save as HTML in Writer should not embed images
Summary: FILESAVE: Save as HTML in Writer should not embed images
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version: 4.2.0.4 release
Hardware: All Linux (All)
: medium major
Assignee: Not Assigned
QA Contact:
URL:
Whiteboard: bibisected
Keywords: bisected, regression
: 79730 80973 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-04-18 10:21 UTC by Antonio
Modified: 2015-01-16 14:48 UTC (History)
9 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Example files (613 bytes, application/zip)
2012-04-18 10:21 UTC, Antonio
Details
resulting html file and original file (68.53 KB, application/x-zip)
2012-08-31 09:26 UTC, cristi falcas
Details
ODT and example X/HTML output under v3304 and v4132. (43.97 KB, application/zip)
2013-12-06 11:50 UTC, Owen Genat
Details

Description Antonio 2012-04-18 10:21:27 UTC
Created attachment 60266 [details]
Example files

When you export a document that contains images to a HTML file it embeds the images into the html. That is, it embeds the binary data.

I believe that images should instead be saved in a folder with the same name of the document and instead be link to those files. Embedding the images into the html file is a bad idea for two reasons:

* It results in the browser having to load a much larger file
* It actually results in an overall larger file. In the attachments. In the attachment the 4x4px gif (39 bytes), when embedded, results in an html file that is 86 bytes. When the image is instead link the combine file size of the html file and gif is 61 bytes. 25 bytes is wasted, which will only grow larger as the images used get larger
Comment 1 a.l.e 2012-04-18 10:51:52 UTC
embedding the images in the html should probably be optional.
Comment 2 cristi falcas 2012-08-31 09:26:10 UTC
Created attachment 66396 [details]
resulting html file and original file

Please also see the attached html file generated with 

/opt/libreoffice3.6/program/soffice --display :1020 --convert-to html --outdir ./ Untitled\ 2.odt

The embedded image is not show by any browser.
Comment 3 cristi falcas 2012-08-31 09:26:43 UTC
I forgot to mention that this was tested in fedora with 3.6.1
Comment 4 Owen Genat 2013-12-06 11:50:40 UTC
Created attachment 90351 [details]
ODT and example X/HTML output under v3304 and v4132.

I think this bug can be RESOLVED as NOTABUG. At the very least this bug would need to be changed to an enhancement request as it is asking for a new option that has never existed. There are separate filters for converting a file opened in Writer to HTML in such a way as to embed or link any included graphics:

- File > Export > select File Type of "XHTML" will embed the graphics.
- File > Save As... > select File Type of "HTML Document" will link the graphics.

These two filters can be used via the command line as:

$ soffice --headless --convert-to html:"HTML (StarWriter)" file_to_convert.odt
$ soffice --headless --convert-to html:"XHTML Writer File" file_to_convert.odt

I have tested this functionality under Ubuntu 10.04 x86_64 running:

- v3.3.0.4 OOO330m19 Build: 6
- v4.1.3.2 Build ID: 70feb7d99726f064edab4605a8ab840c50ec57a

... indicating it has always been this way. The problem indicated by comment #2 is a completely separate issue that is raised (using the same attachment) in bug 54315.
Comment 5 Owen Genat 2013-12-06 11:52:39 UTC
Changed Version to Inherited From OOo as a result of comment #4.
Comment 6 Owen Genat 2014-03-21 00:41:56 UTC
(In reply to comment #4)
> - File > Export > select File Type of "XHTML" will embed the graphics.
> - File > Save As... > select File Type of "HTML Document" will link the
> graphics.

It would appear that since LO v4.2 the fix to bug 63211 has made this situation worse as now it is impossible to create a HTML file from any of these sources:

- ODT with embedded graphic.
- ODT with linked graphic.
- HTML with embedded graphic.
- HTML with linked graphic.

... and produce HTML output (via either quoted method) with a linked graphic. All graphics are written out as base64 embedded. Unsure whether to mark this now as a regression, since the v4.2 change.
Comment 7 Otto 2014-03-21 14:08:51 UTC
I agree to Antonio and Owen Genat. It is not any longer possible to create simple html with pictures linked in 4.2

Furthermore, and much important, html files with images saved in lo 4.2 writer don't open anymore in lo. lo crashes if you try to do.

That means, you can open them in a browser (slowly) but if you wan't to change the code of the html file you must use another text editor.

For my opinion, it should be restored as it was until 4.1 and as Owen Genat says:

- File > Export > select File Type of "XHTML" will embed the graphics.
- File > Save As... > select File Type of "HTML Document" will link the graphics.
Comment 8 Regina Henschel 2014-04-28 11:53:19 UTC
It is a regression and users complain about it.
Comment 9 Bryan 2014-05-02 14:34:01 UTC
This one is a killer. It makes using LOweb useless as a simple wysiwyg html editor. Any web page with images is corrupted with filesize expanded by nearly an order of magnitude. That often results in web pages that cause the infamous 'read error' on the image showing pages of 'code' that text and html code editors often complain is a line too long for editing.

The fact that there is no obvious work around leaves me high and dry with no options. I finally got a client to upgrade - and boy did I get blown out of the water on this bug.
Comment 10 Owen Genat 2014-05-04 04:17:43 UTC
Please disregard my comment 4 and comment 5. I am setting the version back to the earliest 4.2 release due to comment 6 through comment 8. Summary amended for clarity. I am also setting the Severity to major, although it would appear, according to the Bug Triage flowchart (https://wiki.documentfoundation.org/images/0/06/Prioritizing_Bugs_Flowchart.jpg), that it may be a blocker. I will leave it to QA to confirm this.
Comment 11 Jay Philips 2014-06-07 13:40:25 UTC
Pulled from my bug 79730 comment 3:

Yes since 4.2, images added to files are saved as base64 encoded images embedded into html files, unless you check the 'insert as link' checkbox in the file insert dialog, but for some reason this isnt happening.

Confirmed in Linux Mint in 4.2.4, 4.2.6 and 4.3 beta. It does work correctly with referenced images in .odt files but not .html files.
Comment 12 dE 2014-09-05 16:42:23 UTC
*** Bug 79730 has been marked as a duplicate of this bug. ***
Comment 13 dE 2014-09-05 16:44:01 UTC
Confirmed 4.2.5.2
Comment 14 Björn Michaelsen 2014-10-11 00:07:05 UTC
(In reply to Regina Henschel from comment #8)
> It is a regression and users complain about it.

FWIW, a quick note for the next time: if this is about a behaviour change in 4.2 it would have likely be better to open a clear new bug, not reuse an old one from 2012.
Comment 15 Björn Michaelsen 2014-10-16 14:59:01 UTC
(This is an automated message.)

It seems that the commit that caused this regression was identified. (Or at least a commit is suspected as the offending one.)

Thus setting keyword "bisected".
Comment 16 David N. Welton 2014-11-07 08:55:12 UTC
I'm not sure if this is the correct place to comment, but if this is added as an option, it would be wonderful if it were possible to control it from the command line.  I use libreoffice with --headless --convert-to to automate conversion of some files, so a fix that only involves the UI would not solve this problem.

Thank you!
Comment 17 Joel Madero 2015-01-05 17:16:06 UTC
Adding bibisected to whiteboard as bisected keyword is a subset of bibisected whiteboard.

Thanks!
Comment 18 Matthew Francis 2015-01-16 14:48:26 UTC
*** Bug 80973 has been marked as a duplicate of this bug. ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.