Bug 74670 - Saving a pptx (created with MSO 2010) in Impress as .odp or pptx more than quadruples file-size (due to copies of images)
Summary: Saving a pptx (created with MSO 2010) in Impress as .odp or pptx more than qu...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Presentation (show other bugs)
Version: 4.2.0.4 release
Hardware: Other All
: medium normal
Assignee: Not Assigned
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-07 12:34 UTC by Liam Smit
Modified: 2015-01-24 13:09 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Internal file listing of MSO.pptx, LO.pptx and LO.odp (5.12 KB, application/gzip)
2014-02-07 12:34 UTC, Liam Smit
Details
md5sum hashes of files in ppt/media (923 bytes, application/gzip)
2014-02-07 13:03 UTC, Liam Smit
Details
LibreOffice save as odp (315.42 KB, application/gzip)
2014-02-27 13:46 UTC, Liam Smit
Details
LibreOffice save as pptx (789.28 KB, application/gzip)
2014-02-27 13:47 UTC, Liam Smit
Details
Original MS Office Power Point presentation file saved in MS PowerPoint 2010. (1.42 MB, application/gzip)
2014-09-11 14:08 UTC, Liam Smit
Details
MS Office Power Point presentation format file saved in LibreOffice Impress 4.2.6. (2.15 MB, application/gzip)
2014-09-11 14:10 UTC, Liam Smit
Details
MS Office Power Point presentation file saved in LibreOffice Impress 4.2.6 in ODP format. (843.45 KB, application/gzip)
2014-09-11 14:12 UTC, Liam Smit
Details

Description Liam Smit 2014-02-07 12:34:53 UTC
Created attachment 93600 [details]
Internal file listing of MSO.pptx, LO.pptx and LO.odp

I received a presentation save in .pptx format by MS Office 2010:
1,4M Feb  7 11:28 mo_save_as_pptx.pptx

I opened it in Impress and then saved it as a .pptx file without making any changes. The resulting file is more than four times as large:
6,5M Feb  7 12:11 lo_save_as_pptx.pptx

Saving the same file as a .odp file without making any changes results in a file that is four times smaller than the original. 
368K Feb  7 14:15 lo_save_as_odp.odp

So it seems that Libre Office can import the .pptx file correctly but can not write it out correctly as a .pptx i.e. writing to .odp seems fine.

Examining the resulting files reveals that when Impress writes out a .pptx file many of the images are copies of the images used in the presentation (logos, backgrounds, etc).

Please see the attached archive for the file-listing of each of the three files i.e. MS Office .pptx, LibreOffice .pptx and LibreOffice .odp.
Comment 1 Liam Smit 2014-02-07 13:03:28 UTC
Created attachment 93601 [details]
md5sum hashes of files in ppt/media

Attached is a file containing the md5sum hashes of the files in the /ppt/media sub-directory. It clearly shows that same file is saved many times with different names.
Comment 2 Liam Smit 2014-02-24 10:30:32 UTC
Bug is also present in 4.2.1 (rc1).
Comment 3 foss 2014-02-27 12:43:56 UTC
Those test files seem to have no suffix?

Could you please attach the pptx file in question so this can be confirmed.
Comment 4 Liam Smit 2014-02-27 13:00:14 UTC
Those text files contain file listings with md5 hashes to show that many of the files are exact duplicates of the same file.

I'll try and sanitise the presentation and upload it.
Comment 5 Liam Smit 2014-02-27 13:46:09 UTC
Created attachment 94817 [details]
LibreOffice save as odp

Note I've stripped out some of the images and fudged the numbers to make it less sensitive.
Comment 6 Liam Smit 2014-02-27 13:47:27 UTC
Created attachment 94818 [details]
LibreOffice save as pptx

Note I've removed some of the images and messed with the numbers to make it less sensitive.
Comment 7 QA Administrators 2014-09-03 21:32:37 UTC
Dear Bug Submitter,

This bug has been in NEEDINFO status with no change for at least 6 months. Please provide the requested information as soon as possible and mark the bug as UNCONFIRMED. Due to regular bug tracker maintenance, if the bug is still in NEEDINFO status with no change in 30 days the QA team will close the bug as INVALID due to lack of needed information.

For more information about our NEEDINFO policy please read the wiki located here: 
https://wiki.documentfoundation.org/QA/FDO/NEEDINFO

If you have already provided the requested information, please mark the bug as UNCONFIRMED so that the QA team knows that the bug is ready to be confirmed.


Thank you for helping us make LibreOffice even better for everyone!


Warm Regards,
QA Team
Comment 8 Liam Smit 2014-09-11 14:08:40 UTC
Created attachment 106135 [details]
Original MS Office Power Point presentation file saved in MS PowerPoint 2010.

Original MS Office Power Point presentation file saved in MS PowerPoint 2010.

Note the text and numbers were scrambled.
Comment 9 Liam Smit 2014-09-11 14:10:45 UTC
Created attachment 106136 [details]
MS Office Power Point presentation format file saved in LibreOffice Impress 4.2.6.

Note text and numbers intentionally scrambled.
Comment 10 Liam Smit 2014-09-11 14:12:56 UTC
Created attachment 106137 [details]
MS Office Power Point presentation file saved in LibreOffice Impress 4.2.6 in ODP format.

Note text and numbers were intentionally scrambled.
Comment 11 Liam Smit 2014-09-11 14:28:05 UTC
OK let's start again. Please ignore everything before Comment 8.

I've uploaded three files:

1.) PowerPoint format file saved in MS Office PowerPoint 2010 which is 1.6MB big:
1.6M Sep 11 15:26 Staff_Update_September_2014_mso.pptx

2.) Open the file from 1.) in LibreOffice Impress 4.2.6 and save it as a pptx results in it more than tripling in size to 5.8MB:
5.8M Sep 11 15:31 Staff_Update_September_2014_lo.pptx

3.) Open the file from 1.) in LibreOffice Impress 4.2.6 and save it as a .odp results in it halving in size to 0.8MB:
883K Sep 11 15:50 Staff_Update_September_2014_lo.odp


From what I can determine, when Impress writes out a .pptx file (i.e. file 2 above) then many of the images that are used in the presentation (logos, backgrounds, etc) get saved multiple times as separate copies of the same image.


Possibly a problem in the export filter to .pptx?
Comment 12 ign_christian 2014-09-12 15:29:16 UTC
Confirmed under Ubuntu 12.04 x86 with:
- LO 4.0.6.2 : size become 3.1 MB
- LO 4.1.6.2 : size become 5.9 MB
- LO 4.2.6.3 : size become 6.0 MB
- LO 4.3.1.2 : can't be saved again as pptx, strange..

After unzipping the pptx, under 'media' folder we can see many duplicate images. In original pptx, 'media' folder only contains 21 image files. But after saving in 4.2.6.3 or 4.1.6.2, it grows to 269 files. In 4.0.6.2: 110 files.

Not all image has duplicates, most duplicates is logo in footer (possibly because of master slides).
Comment 13 ign_christian 2014-09-12 15:35:36 UTC
(In reply to comment #12)
Sorry..I forgot to tell, only tried with saving as pptx.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.