Bug 39937 - Missing dc:language metadata element for default language
Summary: Missing dc:language metadata element for default language
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
3.3.1 release
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 39795 Languages
  Show dependency treegraph
 
Reported: 2011-08-08 10:31 UTC by Christophe Strobbe
Modified: 2024-03-08 16:23 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christophe Strobbe 2011-08-08 10:31:39 UTC
The meta.xml file inside ODF files can be used for identifying a document's default language (from the ODF 1.2 specification, section 4.3.2.15: "The <dc:language> element specifies the default language of a document."
LibreOffice 3.3.1 Writer does not output this element.

Please output dc:language for the default language, i.e. the language used for the largest share in the document. Note that ODF knows three types of languages: Western language, Asian language and CTL language. In meta.xml, these language are represented by the attributes 
* fo:language and fo:country for Western languages, 
* style:language-asian and style:country-asian for Asian languages, and
* style:language-complex and style:country-complex for CTL (complex text layout) languages. 
(See Tools > Options > Language Settings > Languages in LibreOffice.)
When any of these languages is "zxx" (i.e. "[None]" in the Options dialog), that language should not be output. If more than one language is really in use (i.e. its use can be detected with the language guesser function, as opposed to the language merely being enabled in the Options dialog), it seems best to output a dc:language element per language. The data type for dc:language is the same as for xml:lang (see the ODF 1.2 specification section 18.3.16: "The language datatype is the same as the [xmlschema-2] language datatype, except that its value range is not restricted to values of [RFC3066], but follows the syntax of the xml:lang attribute. See §2.12 of [XML1.0].".)

The XSLT for the XHTML export filter assumes that a dc:language element is present, so Bug 39795 - Writer XHTML export loses language information [accessibility] - depends on this issue.
Comment 1 Björn Michaelsen 2011-12-23 12:21:28 UTC
[This is an automated message.]
This bug was filed before the changes to Bugzilla on 2011-10-16. Thus it
started right out as NEW without ever being explicitly confirmed. The bug is
changed to state NEEDINFO for this reason. To move this bug from NEEDINFO back
to NEW please check if the bug still persists with the 3.5.0 beta1 or beta2 prereleases.
Details on how to test the 3.5.0 beta1 can be found at:
http://wiki.documentfoundation.org/QA/BugHunting_Session_3.5.0.-1

more detail on this bulk operation: http://nabble.documentfoundation.org/RFC-Operation-Spamzilla-tp3607474p3607474.html
Comment 2 Christophe Strobbe 2012-01-27 04:16:53 UTC
Version info was previously LibO 3.3.1. I confirm that this bug still applies to LibreOffice 3.5.0 RC1: in newly created files (with Writer or Impress) meta.xml does not contain a dc:language element. Changing the status from NEEDINFO to NEW.
Comment 3 Julien Nabet 2014-02-23 21:59:16 UTC
Put it back to 3.3.1 since version must contain the oldest version of LO when the bug appeared.

For the record, I can still reproduce this with master sources (future 4.3.0) updated today.
Comment 4 Stéphane Guillou (stragu) 2021-05-29 13:47:26 UTC
Steps to reproduce:

1. Create a new Writer document, write some text in it
2. Save it as the default ODS
3. Extract contents of the ODS file (for example with "unzip file.ods" in a Bash shell)
4. Open "meta.xml"
5. Search for the string "dc:language"

Reproducible with the following versions, which save the file in ODS 1.3:

Version: 7.2.0.0.alpha1+ / LibreOffice Community
Build ID: e2970060121824650f95421d8d2411840a40311f
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2021-05-28_08:38:25
Calc: threaded

Version: 7.1.3.2 / LibreOffice Community
Build ID: 47f78053abe362b9384784d31a6e56f8511eb1c1
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded

Version: 7.0.4.2
Build ID: 00(Build:2)
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Ubuntu package version: 1:7.0.4_rc2-0ubuntu0.18.04.2
Calc: threaded
Comment 5 Eyal Rozenberg 2022-11-25 11:32:21 UTC
The ODF 1.2 spec says that 

> The <dc:language> element specifies the default language of a document.

But where is it defined what the "default language" of a document means? 

Also, what does "dc" stand for? The ODF spec doesn't seem to offer an expansion of this acronym.
Comment 6 Christophe Strobbe 2022-11-25 14:30:26 UTC
(In reply to Eyal Rozenberg from comment #5)
> The ODF 1.2 spec says that 
> 
> > The <dc:language> element specifies the default language of a document.
> 
> But where is it defined what the "default language" of a document means? 

That is understood. It refers to the document's main language. Most documents have just one main language, possibly with passages or quotes in other languages. "Default language" corresponds with the language you define in the html element in web pages (e.g. <html lang="en">).
This language is used by speech synthesisers (including those in screen readers used by blind people and in reading software used by dyslexics) and for transformation into Braille (i.e. for selecting the correct Braille table), for example.

> 
> Also, what does "dc" stand for? The ODF spec doesn't seem to offer an
> expansion of this acronym.

DC is a namespace that is typically used for metadata defined by the Dublin Core Metadata Initiative or DCMI. See for example the DCMI Metadata Terms: https://www.dublincore.org/specifications/dublin-core/dcmi-terms/