Bug 20275

Summary: Some msword documents not detected without .doc extension
Product: shared-mime-info Reporter: Milan Bouchet-Valat <nalimilan>
Component: freedesktop.org.xmlAssignee: Jonathan Blandford <jrb>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: Good testcase
Bad testcase

Description Milan Bouchet-Valat 2009-02-23 12:31:53 UTC
If you remove the .doc extension from the filename of some application/msword documents, they are recognized as application/x-msi files. Attached are two testcases: good works as expected, bad does not. I've noticed that files edited with OpenOffice.org are correctly recognized.

Both files have the same start sequence, so the problem must be in other magic numbers... Word documents have a fairly rich set of patterns, which should be sufficient to detect them. You may notice that MSI files are a subclass of application/x-ole-storage too, that may be a hint.

I have not tested Excel and PowerPoint files, but they may suffer from the same issue. And BTW OpenXML documents are only detected a ZIP archives when without extension, but I'm not sure we can do anything about it...


For reference, the relevant parts of freedesktop.org.xml:

  <mime-type type="application/x-msi">
    <comment>Windows Installer package</comment>
    <sub-class-of type="application/x-ole-storage"/>
    <magic priority="50">
      <match value="\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x3E\x00\x03\x00\xFE\xFF\x09\x00\x06" type="string" offset="0"/>
    </magic>
    <glob pattern="*.msi"/>
  </mime-type>

  <mime-type type="application/msword">
    <comment>Word document</comment>
    <sub-class-of type="application/x-ole-storage"/>
    <generic-icon name="x-office-document"/>
    <magic priority="50">
      <match value="\x31\xbe\x00\x00" type="string" offset="0"/>
      <match value="PO^Q`" type="string" offset="0"/>
      <match value="\376\067\0\043" type="string" offset="0"/>
      <match value="\333\245-\0\0\0" type="string" offset="0"/>
      <match value="MSWordDoc" type="string" offset="2112"/>
      <match value="MSWordDoc" type="string" offset="2108"/>
      <match value="Microsoft Word document data" type="string" offset="2112"/>
    </magic>
    <glob pattern="*.doc"/>
    <alias type="application/vnd.ms-word"/>
    <alias type="application/x-msword"/>
  </mime-type>
Comment 1 Milan Bouchet-Valat 2009-02-23 12:33:28 UTC
Created attachment 23230 [details]
Good testcase
Comment 2 Milan Bouchet-Valat 2009-02-23 12:34:13 UTC
Created attachment 23231 [details]
Bad testcase
Comment 3 Bastien Nocera 2009-02-23 16:25:36 UTC
Which version of shared-mime-info? That should be the first thing to mention.
Comment 4 Milan Bouchet-Valat 2009-02-24 00:35:55 UTC
Sorry, it is 0.51-0ubuntu1.
Comment 5 Bastien Nocera 2009-04-20 09:28:27 UTC
* freedesktop.org.xml.in:
* tests/list: Add another magic for Word documents, along with a test
case update (Closes: #20275)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.