The implementation notes of the PDF Reference document states that the %PDF- magic header may not be strictly at the start of the file [1]: 3.4.1, “File Header” 13. Acrobat viewers require only that the header appear somewhere within the first 1024 bytes of the file. 14. Acrobat viewers also accept a header of the form %!PS−Adobe−N.n PDF−M.m If I understand correctly, this could be represented the following way: <match value="%PDF-" type="string" offset="0:1024"/> It should be easy to create a test case file (the file I have with this problem is my bank account slip). If such a file does not have a .pdf extension, Evince refuses to open the document because it sees it as application/octet-stream. [1] http://www.adobe.com/devnet/acrobat/pdfs/pdf_reference_1-7.pdf Appendix H, section 3.4.1, page 1102.
Any chance you could provide such a test file then?
Created attachment 37245 [details] Test PDF file with prepended binary data Before : $ gvfs-info testcase.is-really-a-pdf | grep content-type standard::content-type: application/octet-stream standard::fast-content-type: application/octet-stream After : $ gvfs-info testcase.is-really-a-pdf | grep content-type standard::content-type: application/pdf standard::fast-content-type: application/octet-stream
commit 7d42fc0da8068df8892842cc4005395471f4d2b0 Author: Bastien Nocera <hadess@hadess.net> Date: Tue Jul 20 18:33:05 2010 +0100 Fix PDF magic detection As per spec, the first 1024 bytes can contain binary garbage, before the actual PDF magic header. With help from Philippe Gauthier <philippe.gauthier@deuxpi.ca> https://bugs.freedesktop.org/show_bug.cgi?id=29083
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.