Bug 97372 - Tar archive detected by content as text/plain
Summary: Tar archive detected by content as text/plain
Status: RESOLVED FIXED
Alias: None
Product: xdgmime
Classification: Unclassified
Component: xdgmime (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Jonathan Blandford
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-08-16 20:00 UTC by Elvis Angelaccio
Modified: 2018-05-30 09:56 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Tar archive misdetect as text/plain (244.14 KB, application/octet-stream)
2016-08-16 20:00 UTC, Elvis Angelaccio
Details

Description Elvis Angelaccio 2016-08-16 20:00:36 UTC
Created attachment 125826 [details]
Tar archive misdetect as text/plain

Not sure if this is a duplicate of #93549 or #96660

Steps to reproduce:

1. Download the attached file
2. xdg-mime query filetype MP190_debian_printer

Actual result:
text/plain

Expected result:
application/x-tar

shared-mime-info version: 1.6

Downstream bug report: https://bugs.kde.org/show_bug.cgi?id=366899
Comment 1 Bastien Nocera 2017-09-05 11:59:09 UTC
MP190_debian_printer.bin:
	name: application/octet-stream
	data: text/plain
	file: application/octet-stream

xdgmime only considers the first 32 bytes of data from the file:
199   for (i = 0; i < 32 && i < len; ++i)•
200     {•
201        if (chardata[i] < 32 && chardata[i] != 9 && chardata[i] != 10 && chardata[i] != 13)•
202          return XDG_MIME_TYPE_UNKNOWN; /* binary data */•
203     }•
204 •
205   return XDG_MIME_TYPE_TEXTPLAIN;•

If they're all printable characters, then it's text/plain.

"file" detects it as a tar archive, but file says:
# pre-POSIX "tar" archives are handled in the C code.

So we'd need to go look into the file code, and see if it can be adapted.
Comment 2 Bastien Nocera 2017-09-05 12:19:09 UTC
The file code checks for whether something is a tarball by calculating the checksum and checking whether it matches:
https://github.com/file/file/blob/master/src/is_tar.c#L85

We can't do that in shared-mime-info. I've bumped the limit for checking for printable characters in xdgmime:

commit 9c5802b8da56187c5c6abaf70042d14b12d832a9
Author: Bastien Nocera <hadess@hadess.net>
Date:   Tue Sep 5 14:14:53 2017 +0200

    Check further into the file whether it is text or binary
    
    We were only checking 32 bytes into the file, which might not be enough
    for some tar archives with long filenames.
    
    https://bugs.freedesktop.org/show_bug.cgi?id=97372

Anything further would require changes in the implementation of xdgmime that you're using (I'm guessing the one in Qt).
Comment 3 Elvis Angelaccio 2018-02-24 16:17:08 UTC
Hi Bastien, are you sure you actually pushed this commit?

$ git show 9c5802b8da56187c5c6abaf70042d14b12d832a9
fatal: bad object 9c5802b8da56187c5c6abaf70042d14b12d832a9

and also cgit doesn't find it: https://cgit.freedesktop.org/xdg/shared-mime-info/commit/?id=9c5802b8da56187c5c6abaf70042d14b12d832a9
Comment 4 Elvis Angelaccio 2018-04-22 16:30:37 UTC
@Bastien: ping?
Comment 5 Bastien Nocera 2018-04-22 18:07:51 UTC
(In reply to Elvis Angelaccio from comment #3)
> Hi Bastien, are you sure you actually pushed this commit?
> 
> $ git show 9c5802b8da56187c5c6abaf70042d14b12d832a9
> fatal: bad object 9c5802b8da56187c5c6abaf70042d14b12d832a9
> 
> and also cgit doesn't find it:
> https://cgit.freedesktop.org/xdg/shared-mime-info/commit/
> ?id=9c5802b8da56187c5c6abaf70042d14b12d832a9

Comment 2 says:
"I've bumped the limit for checking for printable characters in xdgmime"

So the URL would be:
https://cgit.freedesktop.org/xdg/xdgmime/commit/?id=9c5802b8da56187c5c6abaf70042d14b12d832a9

If this is still a problem, then will be in individual shared mime spec implementations. The above fixes the problem for the code used in shared-mime-info's test cases.
Comment 6 Elvis Angelaccio 2018-04-25 12:19:05 UTC
@Bastien: thanks for the reply. This is still a problem, so we will probably need to fix it in the Qt implementation.

@David: can you please confirm that in qtbase we would need something similar to Bastien's patch?
Comment 7 Bastien Nocera 2018-04-25 13:58:34 UTC
(In reply to Elvis Angelaccio from comment #6)
> @Bastien: thanks for the reply. This is still a problem, so we will probably
> need to fix it in the Qt implementation.

FWIW, similar code doesn't exist in glib's GIO implementation, I filed this bug about adding it:
https://bugzilla.gnome.org/show_bug.cgi?id=795544
Comment 8 David Faure 2018-04-26 17:19:07 UTC
OK, I just changed the MIME spec to say 128 rather than 32.
(commit ce12f18 in xdg/shared-mime-info)

And I updated Qt accordingly.
https://codereview.qt-project.org/227674

Thanks for the heads up.
Comment 9 Elvis Angelaccio 2018-04-29 09:58:21 UTC
@David: thanks for the change, I can confirm that it does fix this issue. :)
Comment 10 Bastien Nocera 2018-05-29 11:23:29 UTC
(In reply to David Faure from comment #8)
> OK, I just changed the MIME spec to say 128 rather than 32.
> (commit ce12f18 in xdg/shared-mime-info)

Can you at least prefix the commit message in the future? It's absolutely impossible to know what the commit message refers to. It would be good to also add the reasoning behind the change so that the commit log is somewhat self sufficient.
Comment 11 David Faure 2018-05-30 09:56:05 UTC
You're right, and I'm usually very careful about that, in fact. Not sure what happened on that day, must have been in a hurry. Sorry about that.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.