Bug 82782

Summary: Two types that only differ in case cause the data from one to be overwritten
Product: shared-mime-info Reporter: Jann Horn <jann+freedesktop_bugzilla>
Component: generalAssignee: Shared Mime Info group <shared_mime_info>
Status: RESOLVED NOTABUG QA Contact:
Severity: normal    
Priority: medium CC: iplaw67
Version: unspecified   
Hardware: Other   
OS: All   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=88004
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 82711    

Description Jann Horn 2014-08-18 18:38:47 UTC
On my system (Debian jessie), the two MIME types 'application/vnd.ms-powerpoint.presentation.macroenabled.12' and 'application/vnd.ms-powerpoint.presentation.macroEnabled.12' that only differ in case exist, along with a bunch of other types like that:

$ grep -iRF application/vnd.ms-powerpoint.presentation.macroenabled.12 /usr/share/mime
/usr/share/mime/generic-icons:application/vnd.ms-powerpoint.presentation.macroEnabled.12:x-office-presentation
/usr/share/mime/packages/libreoffice.xml:  <mime-type type="application/vnd.ms-powerpoint.presentation.macroenabled.12">
/usr/share/mime/packages/freedesktop.org.xml:  <mime-type type="application/vnd.ms-powerpoint.presentation.macroEnabled.12">
/usr/share/mime/types:application/vnd.ms-powerpoint.presentation.macroEnabled.12
/usr/share/mime/types:application/vnd.ms-powerpoint.presentation.macroenabled.12
/usr/share/mime/subclasses:application/vnd.ms-powerpoint.presentation.macroEnabled.12 application/vnd.openxmlformats-officedocument.presentationml.presentation
/usr/share/mime/globs2:50:application/vnd.ms-powerpoint.presentation.macroEnabled.12:*.pptm
/usr/share/mime/globs2:50:application/vnd.ms-powerpoint.presentation.macroenabled.12:*.pptm
Übereinstimmungen in Binärdatei /usr/share/mime/mime.cache.
/usr/share/mime/application/vnd.ms-powerpoint.presentation.macroenabled.12.xml:<mime-type xmlns="http://www.freedesktop.org/standards/shared-mime-info" type="application/vnd.ms-powerpoint.presentation.macroEnabled.12">
/usr/share/mime/globs:application/vnd.ms-powerpoint.presentation.macroEnabled.12:*.pptm
/usr/share/mime/globs:application/vnd.ms-powerpoint.presentation.macroenabled.12:*.pptm

update-mime-database stores type names in the case-sensitive hashtable "types". However, before using them as filenames, it forces them to lowercase. The result is this:

$ strace -f update-mime-database ./mime_copy/ 2>&1 | grep -Fi -A4 application/vnd.ms-powerpoint.presentation.macroenabled.12
open("./mime_copy/application/vnd.ms-powerpoint.presentation.macroenabled.12.xml.new", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd94ec49000
write(3, "<?xml version=\"1.0\" encoding=\"ut"..., 4096) = 4096
write(3, "ment>\n  <comment xml:lang=\"nb\">M"..., 3090) = 3090
--
open("./mime_copy/application/vnd.ms-powerpoint.presentation.macroenabled.12.xml.new", O_RDWR) = 3
fdatasync(3)                            = 0
close(3)                                = 0
rename("./mime_copy/application/vnd.ms-powerpoint.presentation.macroenabled.12.xml.new", "./mime_copy/application/vnd.ms-powerpoint.presentation.macroenabled.12.xml") = 0
mkdir("./mime_copy/application", 0755)  = -1 EEXIST (File exists)
open("./mime_copy/application/x-krita.xml.new", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd94ec49000
--
open("./mime_copy/application/vnd.ms-powerpoint.presentation.macroenabled.12.xml.new", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd94ec49000
write(3, "<?xml version=\"1.0\" encoding=\"ut"..., 2917) = 2917
close(3)                                = 0
--
open("./mime_copy/application/vnd.ms-powerpoint.presentation.macroenabled.12.xml.new", O_RDWR) = 3
fdatasync(3)                            = 0
close(3)                                = 0
rename("./mime_copy/application/vnd.ms-powerpoint.presentation.macroenabled.12.xml.new", "./mime_copy/application/vnd.ms-powerpoint.presentation.macroenabled.12.xml") = 0
mkdir("./mime_copy/application", 0755)  = -1 EEXIST (File exists)
open("./mime_copy/application/x-applix-word.xml.new", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd94ec49000

As you can see, two times, the file is created, written, synced and moved. This means that the first file (7186 bytes) was overwritten with the second one (2917 bytes) immediately. Here are the two files: https://gist.github.com/thejh/46f8e6621a7f51b0a484

I'm not very familiar with the whole MIME system, but that's a bug, isn't it?
Comment 1 Jann Horn 2014-08-18 19:01:32 UTC
If types that only differ in case should be treated the same, what would be the best way to fix it? Create functions g_str_equal_nocase and g_str_hash_nocase or so that call g_str_equal and g_str_hash with the argument lowercased, then pass pointers to those functions to g_hash_table_new?
Comment 2 Jann Horn 2014-09-02 12:55:33 UTC
I would write a patch, but I'm not sure what the expected behavior is here. Can some developer please comment on that?
Comment 3 Alex Thurgood 2015-01-03 17:38:06 UTC
Adding self to CC if not already on
Comment 4 Bastien Nocera 2015-01-28 11:19:47 UTC
That's not a bug. Mime-types are case unsensitive. See:
https://bugs.freedesktop.org/show_bug.cgi?id=62473
for details.

> On my system (Debian jessie), the two MIME types
> 'application/vnd.ms-powerpoint.presentation.macroenabled.12' and
> 'application/vnd.ms-powerpoint.presentation.macroEnabled.12' that
> only differ in case exist

They're not two different mime-types, it's a duplicated mime-type.
Comment 5 Jann Horn 2015-01-28 12:54:16 UTC
So what exactly is your position on duplicate mimetypes?
"They must not exist and if they do, that invokes undefined behavior"?
"They must not exist and if they do, one copy wins and the others are discarded silently?"

In my opinion, if a tool can't cope with its input, it should throw an error message, or at least a warning, instead of blindly soldiering on.

> That's not a bug. Mime-types are case unsensitive.

And what I complained about here specifically is that the hashtable "types" is case-sensitive (while filenames are lowercased). If MIME types are case-insensitive, shouldn't the hashtable be case-insensitive, too?
Comment 6 Jann Horn 2015-01-28 12:57:52 UTC
I opened this bug because it blocks the patch in bug #82711. The problem is that update-mime-database ends up writing a new file, then immediately writes over the same file again. Certainly that's not desirable behavior?
Comment 7 Bastien Nocera 2015-01-28 16:05:11 UTC
(In reply to Jann Horn from comment #5)
> So what exactly is your position on duplicate mimetypes?
> "They must not exist and if they do, that invokes undefined behavior"?
> "They must not exist and if they do, one copy wins and the others are
> discarded silently?"
> 
> In my opinion, if a tool can't cope with its input, it should throw an error
> message, or at least a warning, instead of blindly soldiering on.

My "position" is the status quo. It's not mentioned in the spec, and the behaviour is undefined. In our case, we don't merge, we override. So, yes, the others would be discarded silently.

> > That's not a bug. Mime-types are case unsensitive.
> 
> And what I complained about here specifically is that the hashtable "types"
> is case-sensitive (while filenames are lowercased). If MIME types are
> case-insensitive, shouldn't the hashtable be case-insensitive, too?

Feel free to submit a test case that would fail before that change, and wouldn't afterwards.

If you want a change of behaviour, then it must first be defined, and for that, discussions happen on the xdg mailing-list.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.