1) It's not clear what "host32" is supposed to mean. I thought it meant "this integer dumped as binary in host format", i.e. you read four bytes from file into an int32 and you get the integer written in XML file; but it's not correct. If you have type="host16" value="0xABCD", then it means the file should contain two consequent bytes 0xAB and 0xCD, i.e. 0xABCD is actually what you have in the file as an integer in BE format. For example, jar file matches 0xCAFE, and 0xCAFE is what you see in a hex editor. If you read the first two bytes of jar file as an int16, you get 0xFECA (on LE machine). 2) update-mime-database ignores prefix of "foo16" and "foo32", i.e. it treats host16 and big16 in the same way. All in all, update-mime-database and xdgmime work together correctly on i386 (except that xdgmime doesn't *if* you do define LITTLE_ENDIAN in xdgmimemagic.c on little-endian machine; xdgmimecache works right). Maybe they don't on BE machines, or maybe they do. Given that all the world is intel, spec should be adapted to what update-mime-database does, and then everything will be fine. Also note that with current freedesktop.org.xml you can only reproduce the problem if you turn off the cache, and #define LITTLE_ENDIAN in xdgmimemagic.c, and take java class file without extension. I also attach few files which have magic in BE and LE format.
Created attachment 8314 [details] test Attached is few files with magic inside, and test.xml with their mime types. test16-BE and test32-BE are the files which should be detected as text/x-test-mime-N; testXX-native should not. If it's other way around, there's a bug.
I am only someone trying to implement the spec (in kde, so I don't know the gnome implementation you are referring to), but I think I can bring my own interpretation here. host32 means "this integer are to be interpreted in this host's order". host means native. test16-BE should _not_ match, on little-endian hosts, and test16-native _should_ match, since host means native. If big endian should match and little endian shouldn't match, then the xml snippet should say big32, not host32. All this being said... there might be a problem with the jar file magic indeed. They start with \xCA \xFE, i.e. 0xFECA as little-endian. This means type="host16" value="0xcafe" is wrong indeed... It was a bug in an old version of the file(1) magic file... It said "short 0xcafe". It has been fixed in more recent versions to say: "belong 0xcafebabe", which looks much more correct to me. It does mean \xCA \xFE \xBA \xBE on any host, which is correct.
(In reply to comment #2) > I am only someone trying to implement the spec (in kde, so I don't know the > gnome implementation you are referring to), It's xdgmime, the implementation used in GTK and Gnome. It lives in CVS here, mime/xdgmime/. But to see a version with bug fixes, use http://svn.gnome.org/viewcvs/gtk%2B/trunk/gtk/xdgmime/ . > but I think I can bring my own > interpretation here. host32 means "this integer are to be interpreted in this > host's order". host means native. test16-BE should _not_ match, on > little-endian hosts, and test16-native _should_ match, since host means native. > > If big endian should match and little endian shouldn't match, then the xml > snippet should say big32, not host32. This is also how I understand "big", "host", and "little", and I'd think it's the only sensible interpretation. The problem is it's not quite what xdgmime and update-mime-database do. I guess it just should be fixed, and the xml file should be fixed too. I said spec should be fixed because of problems with xdgmime (it's not released as a library, so everybody uses kind-of-private-branch), so I simply said whatever junk I had in mind; not because there are real correct reasons.
I based my observations, among other things, on the data I saw in the magic file generated by update-mime-database. From my tests (on a little-endian machine only) I don't see a bug in update-mime-database related to endianness. I do see a bug in the jar magic though, which needs to be fixed. Are you sure there's a bug in update-mime-database? Can you explain which one exactly?
(In reply to comment #4) > Are you sure there's a bug in update-mime-database? Can you explain which one > exactly? The relevant places are match_word_size() and parse_value() in http://webcvs.freedesktop.org/mime/shared-mime-info/update-mime-database.c?revision=1.41&view=markup parse_value() treats hostXX and bigXX in the same way. By the way, what I said in comment #1 is wrong, it's all backwards. text/x-test-mime-2 type has <match value="0x1234" type="host16" offset="0"/>, i.e. it should match bytes 0x34 0x12 on little-endian machine. But, xdgmime matches 0x12 0x34 (on little-endian machine, no Sun here). I am not sure whose bug it is, maybe xdgmime, maybe update-mime-database, maybe both. Looks like it's update-mime-database fault.
> parse_value() treats hostXX and bigXX in the same way. Yes, but match_word_size doesn't, and this is why this is no bug in update-mime-database, which overall treats those two differently: host16 gives ">0=\0\x12\x34~2" in the generated magic file, while big16 gives ">0=\0\x12\x34" (and little16 gives ">0=\0\x34\x12"). When no word size (~2) is given, the data in the generated magic file is matched byte-per-byte with the data in the file, so big16 and little16 are generating correct output. When a word size is given (as happens when using host16), the data from the generated magic file is swapped on little-endian hosts, so we end up with \x34\x12, which is correct as well. I added all the above cases to my unit tests, and I can say that update-mime-database behaves just like I expect, now that I understand how it's supposed to work [the spec could certainly be much more verbose about this]. You said: > xdgmime matches 0x12 0x34 Then this is a bug in xdg mime, if you're sure. Its code looks correct though, so I guess the bug is simply that LITTLE_ENDIAN is not being defined?
Created attachment 9206 [details] Extract from the unit tests I wrote for kde
(In reply to comment #6) > > parse_value() treats hostXX and bigXX in the same way. > Yes, but match_word_size doesn't, and this is why this is no bug in > update-mime-database, which overall treats those two differently: > host16 gives ">0=\0\x12\x34~2" in the generated magic file, while big16 gives > ">0=\0\x12\x34" (and little16 gives ">0=\0\x34\x12"). Thanks for explanation! > You said: > > xdgmime matches 0x12 0x34 > Then this is a bug in xdg mime, if you're sure. Its code looks correct though, > so I guess the bug is simply that LITTLE_ENDIAN is not being defined? In case of magic file, looks so, yes. But the bug I see here is with the cache, and code looks like it actually compares byte by byte whatever is written in cache file with data from file, i.e. it treats hostXX entries as bigXX. I may be wrong again though (or my copy of xdgmime may not be in sync with what you see). In any case, looks like indeed there is no problem with update-mime-database, so this one is NOTABUG.
Ah, when I was talking about the xdgmime code, I wasn't talking about the code that deals with the cache. If you see a bug there, better report it indeed. Anyway. I'll write a patch for the bug in the jar magic and I'll create a bug report for it.
Sorry, I meant application/x-java, not jar. jar is correct. https://bugs.freedesktop.org/show_bug.cgi?id=10334
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.