Bug 20275 - Some msword documents not detected without .doc extension
Summary: Some msword documents not detected without .doc extension
Status: RESOLVED FIXED
Alias: None
Product: shared-mime-info
Classification: Unclassified
Component: freedesktop.org.xml (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Jonathan Blandford
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-02-23 12:31 UTC by Milan Bouchet-Valat
Modified: 2009-04-20 09:28 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Good testcase (8.00 KB, application/msword)
2009-02-23 12:33 UTC, Milan Bouchet-Valat
Details
Bad testcase (36.00 KB, application/msword)
2009-02-23 12:34 UTC, Milan Bouchet-Valat
Details

Description Milan Bouchet-Valat 2009-02-23 12:31:53 UTC
If you remove the .doc extension from the filename of some application/msword documents, they are recognized as application/x-msi files. Attached are two testcases: good works as expected, bad does not. I've noticed that files edited with OpenOffice.org are correctly recognized.

Both files have the same start sequence, so the problem must be in other magic numbers... Word documents have a fairly rich set of patterns, which should be sufficient to detect them. You may notice that MSI files are a subclass of application/x-ole-storage too, that may be a hint.

I have not tested Excel and PowerPoint files, but they may suffer from the same issue. And BTW OpenXML documents are only detected a ZIP archives when without extension, but I'm not sure we can do anything about it...


For reference, the relevant parts of freedesktop.org.xml:

  <mime-type type="application/x-msi">
    <comment>Windows Installer package</comment>
    <sub-class-of type="application/x-ole-storage"/>
    <magic priority="50">
      <match value="\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x3E\x00\x03\x00\xFE\xFF\x09\x00\x06" type="string" offset="0"/>
    </magic>
    <glob pattern="*.msi"/>
  </mime-type>

  <mime-type type="application/msword">
    <comment>Word document</comment>
    <sub-class-of type="application/x-ole-storage"/>
    <generic-icon name="x-office-document"/>
    <magic priority="50">
      <match value="\x31\xbe\x00\x00" type="string" offset="0"/>
      <match value="PO^Q`" type="string" offset="0"/>
      <match value="\376\067\0\043" type="string" offset="0"/>
      <match value="\333\245-\0\0\0" type="string" offset="0"/>
      <match value="MSWordDoc" type="string" offset="2112"/>
      <match value="MSWordDoc" type="string" offset="2108"/>
      <match value="Microsoft Word document data" type="string" offset="2112"/>
    </magic>
    <glob pattern="*.doc"/>
    <alias type="application/vnd.ms-word"/>
    <alias type="application/x-msword"/>
  </mime-type>
Comment 1 Milan Bouchet-Valat 2009-02-23 12:33:28 UTC
Created attachment 23230 [details]
Good testcase
Comment 2 Milan Bouchet-Valat 2009-02-23 12:34:13 UTC
Created attachment 23231 [details]
Bad testcase
Comment 3 Bastien Nocera 2009-02-23 16:25:36 UTC
Which version of shared-mime-info? That should be the first thing to mention.
Comment 4 Milan Bouchet-Valat 2009-02-24 00:35:55 UTC
Sorry, it is 0.51-0ubuntu1.
Comment 5 Bastien Nocera 2009-04-20 09:28:27 UTC
* freedesktop.org.xml.in:
* tests/list: Add another magic for Word documents, along with a test
case update (Closes: #20275)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.