Bug 62603 - Regular expression replacements affect formatting in undesired ways
Summary: Regular expression replacements affect formatting in undesired ways
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Libreoffice (show other bugs)
Version: 4.0.0.3 release
Hardware: All All
: medium normal
Assignee: Not Assigned
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-03-21 18:19 UTC by Christian Gagné
Modified: 2013-06-21 15:46 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Christian Gagné 2013-03-21 18:19:37 UTC
This bug corresponds to the Apache OpenOffice bug 121482, which was previously marked as RESOLVED FIXED but has now been reopened.

Since LibreOffice 4.0 now uses the the same ICU-based regexp engine as AOO 3.4, it also suffers from the same formatting-related problems. Regexp-based search and replace operations now affect a text portion’s formatting, even though no style-related operation was specified.

For example, the regexp search "([:alnum:]) replaced with “$1 to replace straight quotes with curly quotes affects the formatting (specifically, italics in this case are removed from part of the text portion).

This seems to suggest that search and replace operations using regular expressions now not only operate on the underlying text content, but also interfere with the text’s *representation*, which is of great concern since one of the most important principles of both AOO and LibO is that they are supposed to cleanly separate the “model” or content structure from the “frame” or visible representation of the content.

This bug yields an underlying question: does the ICU regexp engine really allow clean seperation between content and presentation? Is the problem solely related to AOO and LibO’s implementation of ICU or is there an inherent problem in ICU?

It appears necessary for the LibO project to fix this bug by themselves and independently of AOO, since it is assumed that LibO will not re-base their code on AOO’s in the future. The fact that AOO once thought that the bug was fixed but then changed their mind and realized that they were not sure is troubling.

Eventually, fixing this bug might require cleaning up and improving the API’s *search descriptors*, especially with regards to the way text portions are treated by search descriptors. The very old and as of yet unfixed enhancement bug (OOo/AOO bug 2997) asking for the addition of character styles searches through the search and replace dialog comes to mind. It is troubling that this particular issue was never fixed in more than ten years. The search decriptors’ use of the `awt` module for locating character formatting in paragraphs might be a hint to understanding this issue.
Comment 1 Christian Gagné 2013-03-22 15:06:52 UTC
Note: In the previous comment, the reference to the AOO issue number 2997 inadvertently links to an unrelated issue that pertains to another software module; please disregard the link.
Comment 2 manj_k 2013-03-22 15:25:37 UTC
Added AOO bug URLs to "See Also".
Comment 3 Thomas Hackert 2013-06-21 15:32:55 UTC
Hello Christian, *,
thank you for reporting this bug :) As I am only one of the QA guys and not able to understand enough from your description, but can reproduce the bug as follows:

1. Open a new Writer document
2. Enter <quote>"test "test</quote>, where the second "test" is set to italic
3. <Ctrl>+<H>
4. Enter <quote>"([:alnum:])</quote> in "Search for"
5. Enter <quote>“$1</quote> in "Replace with"
6. Click on "Other Options" and mark "Regular expressions"
7. Now click on "Replace All"

Result: Quotation marks are replaced, the "t" from the second test is not italic any more ...
Expected result: LO should replace the quotation marks, but not touch the formation

I have to say, that I have to copy the quotation mark to Writer, as Writer replaced my inserted ones with curly quotation marks ... :(

LO: Version: 4.1.0.1 Build ID: 1b3956717a60d6ac35b133d7b0a0f5eb55e9155 with Germanophone lang- as well as helppack
OS: Debian Testing AMD64
HTH
Thomas.
Comment 4 Thomas van der Meulen 2013-06-21 15:46:39 UTC
Thank you for your bug report, I can reproduce this bug running LibreOffice Version: 4.1.0.1
Build ID: 1b3956717a60d6ac35b133d7b0a0f5eb55e9155 on Mac osx 10.8.4. 

I just did the steps that Thomas gave me and the T wasn't italic anymore.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.