Skip to content

Incorrect identification for JS content starting with comments #86

@tzabaman

Description

@tzabaman

I am opening this issue for discussion due to some inconsistent behaviour, coming from ActiveStorage #identify method...

There is an inconsistent behaviour in method Marcel::MimeType.for concerning (at least) javascript files starting with comments and given specific declared types (text/javascript or application/x-javascript). For example:

Marcel::MimeType.for("//Comment\nconsole.log('test');", name: 'sample.js', declared_type: 'application/javascript')
# => application/javascript --> Correct

Marcel::MimeType.for("console.log('test');", name: 'sample.js', declared_type: 'text/javascript')
# => text/javascript --> Correct

Marcel::MimeType.for("//Comment\nconsole.log('test');", name: 'sample.js', declared_type: 'text/javascript')
# => text/plain --> Inconsistent (Incorrect?)

Marcel::MimeType.for("//Comment\nconsole.log('test');", name: 'sample.js', declared_type: 'application/x-javascript')
# => text/plain --> Inconsistent (Incorrect?)

Of course, the explanation is that on (generated) file lib/marcel/tables.rb there are definitions only for type application/javascript so the above behaviour is somehow expected.

The thing is that on file data/tika.xml, we have reference for types text/javascript and application/x-javascript, as they are described as "alias" of application/javascript.

marcel/data/tika.xml

Lines 335 to 340 in 8e28563

<mime-type type="application/javascript">
<alias type="application/x-javascript"/>
<alias type="text/javascript"/>
<sub-class-of type="text/plain"/>
<_comment>JavaScript Source Code</_comment>
<glob pattern="*.js"/>

So my question is, should we change the script/generate_tables.rb script in order to also take into consideration the aliases, when adding values on constants TYPE_EXTS and TYPE_PARENTS?? Something like:

TYPE_EXTS = {
  ...
  'application/javascript' => %w(js),
  'application/x-javascript' => %w(js),
  'text/javascript' => %w(js),
  ...
}

TYPE_PARENTS = {
    ...
    'application/javascript' => %w(text/plain),
    'application/x-javascript' => %w(text/plain),
    'text/javascript' => %w(text/plain),
    ...
}

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions