Created attachment 122122 [details] example which produces wrong literal count It seems like update-mime-database writes the count of different literals to mime.cache, while it should write its number in conjunction with mime types (LiteralEntry in spec) E.g. if many mime types have the same literal pattern, they will be counted as one. See example in the attachment. It has two mime types. text/x-cmake has glob-deleteall and CMakeLists.txt, hence two literal patterns. text/x-dsrc has glob-deletall, hence one literal pattern. So 2 different literal patterns in total, but these must be 3 literal entries: text/x-cmake __NOGLOBS__ text/x-dsrc __NOGLOBS__ text/x-cmake CMakeLists.txt literal count must be 3 upon updating. Start update-mime-database with -V flag (verbose). It will show in which position literal globs were written. Open mime.cache in hex editor, go to this position, it will have 00 00 00 02 value.
Can anyone look at it? This is serious bug for me.
Can you describe what misbehavior this causes ? E.g. do you have a program that crashes due to this ?
(In reply to Matthias Clasen from comment #2) > Can you describe what misbehavior this causes ? E.g. do you have a program > that crashes due to this ? I have a library (my own implementation of shared mime database spec) that detects MIME types via mime.cache files. For now it can't read literal patterns properly because update-mime-database writes wrong number of literals into generated mime.cache, hence it's unable to detect MIME type properly in some cases. I found this bug by writing tests for my implementation of spec and using generated mime.cache as test input. I hope you understood the problem. If not let's look at how literal entries are defined in spec: LiteralList: 4 CARD32 N_LITERALS 12*N_LITERALS LiteralEntry LiteralEntry: 4 CARD32 LITERAL_OFFSET 4 CARD32 MIME_TYPE_OFFSET 4 CARD32 WEIGHT in lower 8 bits FLAGS in rest: 0x100 = case-sensitive Implementations should read N_LITERALS and then this number of LiteralEntry. update-mime-database writes all existing pairs of literal/mimetype, but since it writes the wrong N_LITERALS, it's impossible to find out the real count of literal and read them all. It's hard to find other affected program, because most don't respect glob-deleteall at all or detect types using other files (e.g. globs2 or MEDIA/TYPE.xml).
(In reply to Matthias Clasen from comment #2) > Can you describe what misbehavior this causes ? E.g. do you have a program > that crashes due to this ? Ok, I found at least one affected programs. It's dolphin for KDE4 and pcmanfm. Having source file: <?xml version="1.0" encoding="UTF-8"?> <mime-info xmlns="http://www.freedesktop.org/standards/shared-mime-info"> <mime-type type="application/x-mymakefile"> <comment>My make file</comment> <glob pattern="myfile"/> <glob pattern="mymakefile"/> <icon name="image-png"/> <sub-class-of type="text/plain"/> </mime-type> <mime-type type="application/x-mycfile"> <comment>My C file</comment> <glob pattern="myfile"/> <glob pattern="mycfile"/> <icon name="image-png"/> <sub-class-of type="text/plain"/> </mime-type> </mime-info> After update-mime-database generated mime.cache, it has 00 00 00 03 value in literal offset, but actually lists all 4 combinations: mycfile:application/x-mycfile myfile:application/x-mymakefile myfile:application/x-mycfile mymakefile:application/x-mymakefile But pcmanfm and dolphin reads only the first three because of wrong literal count and fails to detect type for mymakefile.
Nemo and Nautilous are affected too.
Not saying that there is no bug, but I will say that it is likely intentional that we don't store identical strings multiple times. I'll have to reread the spec and the implementation more carefully.
(In reply to Matthias Clasen from comment #6) > Not saying that there is no bug, but I will say that it is likely > intentional that we don't store identical strings multiple times. I'll have > to reread the spec and the implementation more carefully. Strings are not stored multiple times. Offsets are. As they should. So nothing wrong with that. In provided example I just followed offsets in hex editor to reproduce literal/mimetype relations and wrote them in human-friendly way. The only problem is N_LITERALS which is written wrong by update-mime-database. It looks like it writes the number of different literal patterns instead of the number of combinations.
Created attachment 125012 [details] [review] Write the correct length for literal and glob lists to the cache As pointed out in bugzilla, we were not counting 'duplicates' properly here.
Here is an untested patch; please let me know if it fixes the problem for you
(In reply to Matthias Clasen from comment #9) > Here is an untested patch; please let me know if it fixes the problem for you I applied patch to shared-mime-info-1.6/update-mime-database.c, recompiled, then tried to run it on files from my first attachment, it gave me error: $ export XDG_DATA_HOME="$(pwd)" $ ./update-mime-database $XDG_DATA_HOME/mime ** GLib:ERROR:/build/glib2.0-dCKQ11/glib2.0-2.44.1/./glib/ghash.c:373:g_hash_table_lookup_node: assertion failed: (hash_table->ref_count > 0) Aborted
(In reply to Matthias Clasen from comment #9) > Here is an untested patch; please let me know if it fixes the problem for you Looks like you forgot assign map to count_data as data. After adding this line count_data.data = map; patch seems to work.
(In reply to Matthias Clasen from comment #9) > Here is an untested patch; please let me know if it fixes the problem for you This breaks other things though. I suppose it's because you assumed that every entry type consists of 3 items, while e.g. alias entries have only 2.
Created attachment 125115 [details] [review] Fix for literal/glob patterns counting with no breaking other code I made it work by ugly hack. Specifically I check if get_value is equal to get_glob_list_value and only then count items using CoundData.
Created attachment 125238 [details] [review] upated patch We can use the 'weighted' parameter to determine whether individual list entries are 3 words or 2 words each.
(In reply to Matthias Clasen from comment #14) > Created attachment 125238 [details] [review] [review] > upated patch > > We can use the 'weighted' parameter to determine whether individual list > entries are 3 words or 2 words each. Works for me. All my tests are passed with this patch.
(In reply to Matthias Clasen from comment #14) > Created attachment 125238 [details] [review] [review] > upated patch > > We can use the 'weighted' parameter to determine whether individual list > entries are 3 words or 2 words each. Does shared-mime-info 1.7 include this fix? I don't see it. Will it be included in the next release?
(In reply to Roman from comment #16) > (In reply to Matthias Clasen from comment #14) > > Created attachment 125238 [details] [review] [review] [review] > > upated patch > > > > We can use the 'weighted' parameter to determine whether individual list > > entries are 3 words or 2 words each. > > Does shared-mime-info 1.7 include this fix? I don't see it. It's not, otherwise the bug would have been marked as fixed. > Will it be included in the next release? It will, I'm committing it now.
Attachment 125238 [details] pushed as 0907448 - Write the correct length for literal and glob lists to the cache
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.