Bug 74056

Summary: EDITING: Index Quirks
Product: LibreOffice Reporter: Frank <foberle>
Component: WriterAssignee: Not Assigned <libreoffice-bugs>
Status: NEW --- QA Contact:
Severity: normal    
Priority: medium CC: foberle, quikee
Version: 4.3.0.0.alpha0+ Master   
Hardware: Other   
OS: Linux (All)   
Whiteboard: BSA
i915 platform: i915 features:

Description Frank 2014-01-25 19:21:31 UTC
I'm using Writer Version: 4.3.0.0.alpha0+ Build ID: be4035d00f37c492494fa7860955b6d0868c7f77 on Ubuntu 12.04 LTS 64 bit Linux, and this is regarding the Index feature:

I know the table of contents feature seems to work well, as I've used it quite a bit in the past.
But creating an alphabetical index (a first for me, at least using Writer) seems to have a number of rather annoying features. I've provided an experiment you can do at the end, but first let me discuss the issues.

1) When creating an alphabetical index, the Columns feature works as expected, BUT ONLY FOR THE STANDARD (DEFAULT) PAPER SIZE! If I attempt to use columns on a 6" x 9" page, Writer still seems to assume a "standard" paper size and doesn't seem to know that I'm using a 6" x 9" page. Thus no matter how I try to tweak the setup, the right column is almost completely off the page on the right. So I can't use columns for the Index - Bummer, but not the end of the world.

2) If I go through and highlight each entry that I want indexed, everything works great (so far as I can tell), and I have the options for "match case" and "whole words only." But in a long document, using a concordance file certainly seems to make more sense, and it SEEMS TO WORK, but it actually doesn't.

  Issue a) No matter what boxes are checked, Writer goes and marks ALL instances of whatever the concordance entry has: regardless of whether or not it is a whole word or whether the capitalization matches.

  Issue b) If you add another entry to the concordance file and "Update Index/Table," all instances (even incorrect ones) of the new entry are indeed marked within the file and added to the index, but every earlier entry GETS AN ADDITIONAL marker. As I modify the concordance file to add new items and update the index, I find that the earliest entries have as many markers as the number of updates I've done. 

Here's an experiment you can do:

Open a new document using whatever standard size is in effect.
At the top of the document, type dt and press F3. This generates the dummy text.
In a separate text editor create a concordance file with the following entries:

Breeze;Breeze;;;0;0
Long;Long;;;0;0
Self;Self;;;0;0
Wrist;Wrist;;;0;0

Back in the document, create a new index at the bottom of the page, and mark "Case Sensitive." Then choose the concordance file created above.

You'll see that the words "himself" and "along" are marked in the document and included in the index as "Self" and "Long", even though these words never appear in the text. Now add the following entry to the concordance file:

Eat;Eat;;;0;0

Find it? It actually marked the last three letters of the word "sweat" which doesn't match the case.

In a three hundred plus page document with a surname index this tends to make the index pretty useless, since it's cluttered with erroneous entries which will drive you nuts looking on a referenced page to find something that just isn't there.

Sadly, I've obviously done that, which brings up another issue:

How do I get rid of the index markers? Going through one by one is far too tedious. I recall that the alternate search and replace had a selection for "index" under properties, but the I downloaded that and the latest version seems to have a serious bug which doesn't let you get to a specific entry on the drop down list for "properties" so that's out. And I don't know if that would be what I need anyway.

Any help would be appreciated.

Operating System: Ubuntu
Version: 4.3.0.0.alpha0+ Master
Comment 1 Tomaz Vajngerl 2014-03-12 19:09:30 UTC
oh wow.. I actually learned something trying to reproduce this :) 

Confirming..
Comment 2 Frank 2014-03-12 20:16:43 UTC
Hi Tomaz:

Always glad to aid in someone's education !!!

Seriously, if I can contribute to testing any "fixes" to these things (I believe there may be several contributing but not necessarily related bugs involved), let me know and I'll try to help. Obviously, since I reported the issue, I've become familiar with its use.

Also be aware that I also provided the following instructions to someone on some forum that I can't remember now to allow them to remove all the index markers, and this might help whoever is tasked with working on the index issues:

===== QUOTE =====
So, here's how to remove all of the index markers from a Writer document so you can start with a clean slate. To do this, you will need to be running LibreOffice on some flavor of Linux/Unix, or at least on a system that has a command line or some text editor with "sed" capabilities.

1: Make a backup of your Writer document. You know the consequences if something goes amiss.
2: Open the document in Writer, and choose Save As "OpenDocument Text (Flat XML) (fodt)"
   This creates an uncompressed XML version of the document.
   On my system (Ubuntu), I was unable to decompress the odt version, as the OS complained it was malformed.
3: Close the document and exit Writer.
4: Open a command line shell, preferably in the directory containing the fodt file.
5: Run the following command (all one line - broken apart here for clarity):
   sed 's/<text:alphabetical-index-mark text:string-value="\([A-Za-z]*\)"\/>//g'
   < Old_File_Name_and_Path.fodt
   > New_File_Name_and_Path.fodt
   Depending on the file size and processor speed, this may take a bit.
   If this gives errors, you're on your own.
6: Close the command line shell.
7: Open the new "cleansed" fodt file with Writer.
8: The file should look the same but without any alphabetical index markers. (The formatting is still there, though)
9: Go to where your alphabetical index is located, right click and select "Update Index/Table"
A: All of the index entries should disappear; if any remain, go find them and manually delete them.
   Apparently, some of the indexes are somehow embedded in others and aren't found by the sed command above.
   I didn't bother to try figuring out how or why that happened. I had several hundred markers, of which only five weren't removed.
B: Now, go back to the index and select Edit Index/Table, then File | Open.
C: Select the original file (assuming you have it where you want it), and let Writer go to it.
D: You now have a "clean" document with no duplicate index entries.
E: LOOK AT IT CAREFULLY, of course, before replacing your original. The document I tried this on was over four hundred pages with lots of tables, graphics and so forth, and I found no problems, but it's up to you to determine if everything is ok.

I hope this helps any others who might be using alphabetic indexes.
===== END QUOTE =====

Best of Luck,
Frank

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.