Bug 88596 - x-octave and x-matlab mime types should be different
Summary: x-octave and x-matlab mime types should be different
Status: RESOLVED MOVED
Alias: None
Product: shared-mime-info
Classification: Unclassified
Component: freedesktop.org.xml (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Shared Mime Info group
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-01-19 18:20 UTC by Carnë Draug
Modified: 2018-10-13 10:39 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
separate x-octave and x-matlab mimetypes (4.48 KB, patch)
2015-01-19 18:22 UTC, Carnë Draug
Details | Splinter Review

Description Carnë Draug 2015-01-19 18:20:23 UTC
At the moment, Octave and Matlab share the same mime type (one is an alias for the other) but these languages are different. Indeed, one of the magic being used to identify a file as x-matlab is even inavlid matlab syntax (string "##" at offset 0).

The attached commit treats them as separate mime types, adds new tests for this, and adjusts the current ones.

Here's the rationale for the changes:

* since it is possible to write octave executables, I have added magic for a shebang line with typical paths for octave.
* octave and matlab can be distinguished by the character used for comments (octave uses #, and matlab %). A file using % is also a valid octave program but if the author used % then it probably wants matlab compatibility in which case it makes sense to treat it as matlab.
* an octave script can have function definitions. In such case, to distinguish between a script and a function file, the first statement must not be a function declaration. The statement "1;" is the most common so I added it as magic too.


Despite these changes, a file that starts with "function" is still undifferentiated between the two.  The easiest methods to do the distinction are:

1. if the file ends with endfunction.  Matlab uses general statements "end" while Octave allows for specific "endfunction". Is it possible to do this? something like 'value="endfunction" offset="-11"'?

2. if there is any # or % comment line (first is Octave only). However, checking for such a character near the start of the file would be too much. Files which are plain text would be identified as x-matlab just because they have a "%". Is there a way to use a specific magic if another one worked? I.e., only search for "#" or "%" comments if we already checked the file starts with "function"?
Comment 1 Carnë Draug 2015-01-19 18:22:27 UTC
Created attachment 112492 [details] [review]
separate x-octave and x-matlab mimetypes
Comment 2 Bastien Nocera 2015-01-28 11:35:18 UTC
Comment on attachment 112492 [details] [review]
separate x-octave and x-matlab mimetypes

Review of attachment 112492 [details] [review]:
-----------------------------------------------------------------

::: freedesktop.org.xml.in
@@ +5497,5 @@
> +      <match type="string" value='#! /usr/bin/env octave' offset="0"/>
> +      <match type="string" value="function" offset="0"/>
> +    </magic>
> +    <magic priority="30">
> +      <match type="string" value="1;" offset="0"/>

This and...

@@ +5502,3 @@
>      </magic>
>      <magic priority="10">
> +      <match type="string" value="#" offset="0"/>

... this are far to small as magic patterns, even at low priority.

Remove them.

@@ +5511,5 @@
>      <magic priority="50">
>        <match type="string" value="function" offset="0"/>
>      </magic>
> +    <magic priority="10">
> +      <match type="string" value="%" offset="0"/>

Ditto, that's far too small.
Comment 3 Carnë Draug 2015-01-28 12:37:33 UTC
(In reply to Bastien Nocera from comment #2)
> Comment on attachment 112492 [details] [review] [review]
> separate x-octave and x-matlab mimetypes
> 
> Review of attachment 112492 [details] [review] [review]:
> -----------------------------------------------------------------
> 
> ::: freedesktop.org.xml.in
> @@ +5497,5 @@
> > +      <match type="string" value='#! /usr/bin/env octave' offset="0"/>
> > +      <match type="string" value="function" offset="0"/>
> > +    </magic>
> > +    <magic priority="30">
> > +      <match type="string" value="1;" offset="0"/>
> 
> This and...
> 
> @@ +5502,3 @@
> >      </magic>
> >      <magic priority="10">
> > +      <match type="string" value="#" offset="0"/>
> 
> ... this are far to small as magic patterns, even at low priority.
> 
> Remove them.
> 
> @@ +5511,5 @@
> >      <magic priority="50">
> >        <match type="string" value="function" offset="0"/>
> >      </magic>
> > +    <magic priority="10">
> > +      <match type="string" value="%" offset="0"/>
> 
> Ditto, that's far too small.

If I remove all of these, then this patch does little more than splitting x-octave and x-matlab and leave them with the same magic.  The shebang line I added for the x-octave mime only applies to Octave programs which are a very small minority within Octave files.

The current magic recognizes "##" at offset=0 as valid magic for x-matlab. That is just as small and common as what this patch proposes. In addition, is it at least correct since "##" is not even valid Matlab syntax.  Will it be accepted if I replace the "#" and "%" by "##" and "%%" then? If not, what magic would you propose to be used?
Comment 4 Bastien Nocera 2015-01-28 16:08:21 UTC
(In reply to Carnë Draug from comment #3)
> (In reply to Bastien Nocera from comment #2)
> > Comment on attachment 112492 [details] [review] [review] [review]
> > separate x-octave and x-matlab mimetypes
> > 
> > Review of attachment 112492 [details] [review] [review] [review]:
> > -----------------------------------------------------------------
> > 
> > ::: freedesktop.org.xml.in
> > @@ +5497,5 @@
> > > +      <match type="string" value='#! /usr/bin/env octave' offset="0"/>
> > > +      <match type="string" value="function" offset="0"/>
> > > +    </magic>
> > > +    <magic priority="30">
> > > +      <match type="string" value="1;" offset="0"/>
> > 
> > This and...
> > 
> > @@ +5502,3 @@
> > >      </magic>
> > >      <magic priority="10">
> > > +      <match type="string" value="#" offset="0"/>
> > 
> > ... this are far to small as magic patterns, even at low priority.
> > 
> > Remove them.
> > 
> > @@ +5511,5 @@
> > >      <magic priority="50">
> > >        <match type="string" value="function" offset="0"/>
> > >      </magic>
> > > +    <magic priority="10">
> > > +      <match type="string" value="%" offset="0"/>
> > 
> > Ditto, that's far too small.
> 
> If I remove all of these, then this patch does little more than splitting
> x-octave and x-matlab and leave them with the same magic.  The shebang line
> I added for the x-octave mime only applies to Octave programs which are a
> very small minority within Octave files.
> 
> The current magic recognizes "##" at offset=0 as valid magic for x-matlab.
> That is just as small and common as what this patch proposes. In addition,
> is it at least correct since "##" is not even valid Matlab syntax.  Will it
> be accepted if I replace the "#" and "%" by "##" and "%%" then? If not, what
> magic would you propose to be used?

2 characters is still too small...

I would propose that no new mime-type be used, and discuss a new suffix specifically for Octave files with the Octave maintainers.
Comment 5 Carnë Draug 2015-01-28 16:42:52 UTC
(In reply to Bastien Nocera from comment #4)
> (In reply to Carnë Draug from comment #3)
> > 
> > The current magic recognizes "##" at offset=0 as valid magic for x-matlab.
> > That is just as small and common as what this patch proposes. In addition,
> > is it at least correct since "##" is not even valid Matlab syntax.  Will it
> > be accepted if I replace the "#" and "%" by "##" and "%%" then? If not, what
> > magic would you propose to be used?
> 
> 2 characters is still too small...

How come? The 2 characters magic is already there. What I am proposing now is even more specific than what currently exists, i.e., pick x-octave or x-matlab based on whether these are "##" or "%%", instead of incorrectly assigning a matlab mimetype to what is for sure not a Matlab file. See

    http://cgit.freedesktop.org/xdg/shared-mime-info/commit/?id=a1f1b88f

This commits adds "##" as magic for x-matlab.  This is not even valid Matlab. Can this magic be moved from a x-matlab mimetype to x-octave? The first is completely incorrect.  The later, while also valid on other mimetypes is at least correct (and that's what low weight is for. The weight of having a ## comment is small but with the wrong file extension, any other glob magic would be used instead of x-octave).

> I would propose that no new mime-type be used, [...]

There is no new mimetype being added.  One was just incorrectly marked as an alias of the other.

> and discuss a new suffix specifically for Octave files with the Octave maintainers.

Why? Octave, Matlab, and Objective-C all share the same *.m file extension. This should not be a problem as file extension does not define a mimetype. It is unfortunate that this makes mimetype identification harder but there are enough file content differences between these 3 to pick the right one. The keypart is reducing the identification to those 3 types.  How can this be done other than having file content with much lower weight than glob?

Also, Octave has been around since 1998, it should not have to change its file extension because of this, no more than Objective-C or Matlab should.
Comment 6 GitLab Migration User 2018-10-13 10:39:46 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xdg/shared-mime-info/issues/63.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.