Bug 61703 - Writer does not ask the character set of a .txt file at opening
Summary: Writer does not ask the character set of a .txt file at opening
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version: 3.6.5.2 release
Hardware: Other All
: medium major
Assignee: Not Assigned
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-03-02 18:11 UTC by Zoltán Hegedüs
Modified: 2013-11-10 12:20 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Writer uses huge memory at this file. This is a little-endian Unicode file without BOM. 3.5.5.3. asks the code page, 3.6.5.2. do not, and opens this badly. (351.91 KB, text/plain)
2013-03-02 18:11 UTC, Zoltán Hegedüs
Details
The all 65536 unicodes in order, little-endian, without BOM. 3.5.5.3. asks the code page, 3.6.5.2. do not, and opens this badly. (128.00 KB, text/plain)
2013-03-02 18:14 UTC, Zoltán Hegedüs
Details
The all 256 codes in order, 3.5.5.3. opens this in Calc, and asks the code page, 3.6.5.2. opens this in Writer, badly. (256 bytes, text/plain)
2013-03-02 18:17 UTC, Zoltán Hegedüs
Details
Codes 27-255 in ordet, both version opens this in Writer, and do not ask the code page. (229 bytes, text/plain)
2013-03-02 18:18 UTC, Zoltán Hegedüs
Details
An ASCII table, 3.5.5.3. asks the code page, but opens this in Calc. 3.6.5.2. do no asks teh code page, and opens this in Writer, but always in code page 1250. (697 bytes, text/plain)
2013-03-02 18:21 UTC, Zoltán Hegedüs
Details

Description Zoltán Hegedüs 2013-03-02 18:11:39 UTC
Created attachment 75793 [details]
Writer uses huge memory at this file. This is a little-endian Unicode file without BOM. 3.5.5.3. asks the code page, 3.6.5.2. do not, and opens this badly.

When I open a .txt file, the Writer never asks the character set. So I tried it with portable 3.5.5.3. This asked sometimes, but this opened the file in Writer only some times, regularly in Calc.
Another example: it is possible that an 1 byte/character .txt file started with FF FE or FE FF (BOM: byte order mark: Unicode is big endian or little endian).
I tried to open a .bmp file in Writer. I want to see what I see when I open the .bmp file in Notepad. I used the Open menu item in Writer, but file always opened in Draw. If a text file has bad extension, it is impossible to open without renaming the file. Other example: a .txt file has extension .ods.
There is no ISO-8859-16 in the code page list (3.5.5.3. portable, Writer/Calc).
When I opened the attached utable.txt (this is a little-endian Unicode file without BOM), Writer used more than 512 MB of memory. While I was editing it, Writer used more than 1 GB of memory (I have 1 GB really + 1 GB virtual memory). This error reported separately.
Writer can not save in big endian Unicode, and this is not in the list when opening: if there is no BOM, Writer opens this badly.
Comment 1 Zoltán Hegedüs 2013-03-02 18:14:02 UTC
Created attachment 75794 [details]
The all 65536 unicodes in order, little-endian, without BOM. 3.5.5.3. asks the code page, 3.6.5.2. do not, and opens this badly.
Comment 2 Zoltán Hegedüs 2013-03-02 18:17:29 UTC
Created attachment 75795 [details]
The all 256 codes in order, 3.5.5.3. opens this in Calc, and asks the code page, 3.6.5.2. opens this in Writer, badly.
Comment 3 Zoltán Hegedüs 2013-03-02 18:18:38 UTC
Created attachment 75796 [details]
Codes 27-255 in ordet, both version opens this in Writer, and do not ask the code page.
Comment 4 Zoltán Hegedüs 2013-03-02 18:21:05 UTC
Created attachment 75797 [details]
An ASCII table, 3.5.5.3. asks the code page, but opens this in Calc. 3.6.5.2. do no asks teh code page, and opens this in Writer, but always in code page 1250.
Comment 5 Zoltán Hegedüs 2013-03-03 12:58:46 UTC
There is "encoded text" type at opening, but some problem remained:

There is no ISO-8859-16 on the list.
There is no big-endian Unicode on the list: if there is no BOM, the program can not recognize this.
Writer can not open these files correctly, and can not save files in these formats.
Comment 6 Thomas van der Meulen 2013-06-21 08:37:05 UTC
Thank you for jour bug report, I can reproduce this bug running LibreOffice Version: 4.1.0.1
Build ID: 1b3956717a60d6ac35b133d7b0a0f5eb55e9155 on mac os x 10.6.8.

I can see that the files are't getting inported correcly. there is a lot of '#'.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.