Following up on #102, which was closed last week — the reported behaviour still reproduces on 1.2.1. PR #130 tightened HTML magic to position 0 only, which doesn't help when there's a leading HTML comment before the opening tag.
Repro
require "marcel"
require "stringio"
html = StringIO.new(<<~HTML)
<!--/* Throwaway comment but it has to be over 128 characters to fail AND have a lowercase HTML tag, we can pad this one out a bit to get it longer */-->
<html>
<head></head>
<body><h1>Magic!</h1></body>
</html>
HTML
Marcel::VERSION # => "1.2.1"
Marcel::MimeType.for(html, name: "index.html") # => "application/xml"
Marcel::MimeType.for(html) # => "application/xml"
Expected: text/html in both cases.
Suggested fix
Marcel already supports regex magic (see the <(html|head|body|title|div) entry in tables.rb), so allowing leading whitespace and any number of HTML comments before the opening tag should be a small addition to definitions.rb:
Marcel::MimeType.extend "text/html",
extensions: %w( html htm ),
magic: [
# existing position-0 entries...
[0, /\A(?:\xEF\xBB\xBF)?\s*(?:<!--.*?-->\s*)*(?:<!DOCTYPE\s+html\b|<\s*html\b)/mi]
]
That covers:
- optional UTF-8 BOM
- arbitrary leading whitespace
- any number of HTML comments before the tag
- both
<!DOCTYPE html ...> and <html ...> (case-insensitive)
We've been carrying a local monkey patch doing roughly this plus a filename-extension fallback. Happy to open a PR if there's appetite.
Following up on #102, which was closed last week — the reported behaviour still reproduces on 1.2.1. PR #130 tightened HTML magic to position 0 only, which doesn't help when there's a leading HTML comment before the opening tag.
Repro
Expected:
text/htmlin both cases.Suggested fix
Marcel already supports regex magic (see the
<(html|head|body|title|div)entry intables.rb), so allowing leading whitespace and any number of HTML comments before the opening tag should be a small addition todefinitions.rb:That covers:
<!DOCTYPE html ...>and<html ...>(case-insensitive)We've been carrying a local monkey patch doing roughly this plus a filename-extension fallback. Happy to open a PR if there's appetite.