Skip to content

fix: prevent trailing backslash from "eating" triple quotes#307

Merged
thalesmg merged 2 commits into
emqx:masterfrom
thalesmg:20250422-fix-trailing-slash
Apr 22, 2025
Merged

fix: prevent trailing backslash from "eating" triple quotes#307
thalesmg merged 2 commits into
emqx:masterfrom
thalesmg:20250422-fix-trailing-slash

Conversation

@thalesmg
Copy link
Copy Markdown
Contributor

@thalesmg thalesmg force-pushed the 20250422-fix-trailing-slash branch from 31c714e to bf1a20b Compare April 22, 2025 14:52
@thalesmg thalesmg force-pushed the 20250422-fix-trailing-slash branch from bf1a20b to d4c8c83 Compare April 22, 2025 14:53
@thalesmg thalesmg marked this pull request as ready for review April 22, 2025 14:57
@thalesmg thalesmg requested a review from zmstone April 22, 2025 17:28
Comment thread src/hocon_scanner.xrl
String = "{Char}*"
MultilineChar = ([^"]|"[^"]|""[^"]|\\{Escape}|\\{UnicodeEscape})
%% Special handling for trailing quote: if we don't assert it's not followed by two other
%% quotes, `{Escape}` would "eat" one of the quotes in the triple quote...
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you help to explain why this "eat" happens?

This should work?

([^"]|"[^"]|""[^"]|\\{EscapeNoQuote}|\\{UnicodeEscape})

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The eat happens because the dangling backslash is the last thing the triple quoted string. ...\\""" essentially eats the last quote and breaks out of the triple quote.

"""aaa\"""\nx = 10\ny = """~...

Since \ was not being escaped when "unparsing" the config, it would eat the triple quote, and everything after that gets interpreted as the string until it hits another triple quote.

It's easy to see if you try the test case out without the code changes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record (with added print debug to hocon_scanner:unindent's 3rd clause):

======================== EUnit ========================
hocon_pp_tests: triple_quote_string_ending_in_backslash_test...
----------------------------------------------------
2025-04-22 15:10:25.849
{nonode@nohost,hocon_scanner,132,<0.388.0>}>>>>>>>>>
  #{chars => "\t\"\\\"\\t\\\"\"\"\n}\nroot2 {\n  x = "}

*failed*
in function erlymatch:run/5 (/home/thales/dev/emqx/hocon/_build/test/lib/erlymatch/src/erlymatch.erl, line 36)
in call from hocon_pp_tests:triple_quote_string_ending_in_backslash_test/0 (/home/thales/dev/emqx/hocon/test/hocon_pp_tests.erl, line 359)
in call from eunit_test:'-mf_wrapper/2-fun-0-'/2 (eunit_test.erl, line 274)
in call from eunit_test:run_testfun/1 (eunit_test.erl, line 72)
in call from eunit_proc:run_test/1 (eunit_proc.erl, line 544)
in call from eunit_proc:with_timeout/3 (eunit_proc.erl, line 369)
in call from eunit_proc:handle_test/2 (eunit_proc.erl, line 527)
in call from eunit_proc:tests_inorder/3 (eunit_proc.erl, line 469)
**throw:{mismatch,{hocon_pp_tests,359,
                          {details,tuple,mismatch,
                                   [{1,value,mismatch,{ok,error}},
                                    {2,value,mismatch,
                                     {#{<<"root1">> =>
                                            #{<<"x">> => <<"\t\"\\\""...>>},
                                        <<"root2">> =>
                                            #{<<"x">> => <<"sele"...>>}},
                                      {scan_error,#{...}}}}]}}}
  output:<<"Warning: ct_logs not started
{nonode@nohost,hocon_scanner,132,<0.388.0>}>>>>>>>>>
  #{chars => "\t\"\\\"\\t\\\"\"\"\n}\nroot2 {\n  x = "}Match failed in module 'hocon_pp_tests' at line 359:
  details: {...}
            1: EXPECT = ok
                  GOT = error
            2: EXPECT = #{<<"root1">> => #{<<"x">> => <<"\t\"\\\"\\t\\">>},<<"root2">> => #{<<"x">> => <<"select \n from\n \"hello\" ">>}}
                  GOT = {scan_error,#{line => 5,reason => "illegal characters \"~\""}}
">>

=======================================================
  Failed: 1.  Skipped: 0.  Passed: 0.
===> Error running tests

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I had tried your suggestion before, but paired with other changes (escaping or not escaping the string, various combinations of changes), but didn't work before, but now the added test case works with that regex. 🤔

Comment thread src/hocon_scanner.xrl Outdated
MultilineChar = ([^"]|"[^"]|""[^"]|\\{Escape}|\\{UnicodeEscape})
%% Special handling for trailing quote: if we don't assert it's not followed by two other
%% quotes, `{Escape}` would "eat" one of the quotes in the triple quote...
MultilineChar = (\\"[^"][^"]|[^"]|"[^"]|""[^"]|\\{EscapeNoQuote}|\\{UnicodeEscape})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\\{Escap} is changed to \\{EscapenoQuote}
does it mean \\ is now parsed as \\, but not \ ?

Copy link
Copy Markdown
Member

@zmstone zmstone Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, \\ is in EscapNoQuote, it's "no quote", not "no backslash".

I wonder if \\ should have been added in MultilineChar at all.
it's a breaking change if we remove it though.

Copy link
Copy Markdown
Contributor Author

@thalesmg thalesmg Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand the question.

the regex for EscapeNoQuote is simply the old Escape without ".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just tried your suggestion, and with the fix I added to escape contents of triple quotes that are not multiline, it seems to work.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S.: somehow, GH showed only your 1st comment when I wrote my 1st reply 🙈 🙉 🙊

@thalesmg thalesmg merged commit 4c69e26 into emqx:master Apr 22, 2025
4 checks passed
@thalesmg thalesmg deleted the 20250422-fix-trailing-slash branch April 22, 2025 19:46
@thalesmg
Copy link
Copy Markdown
Contributor Author

tagged 0.45.3

thalesmg added a commit to thalesmg/emqx that referenced this pull request Apr 22, 2025
thalesmg added a commit to thalesmg/emqx that referenced this pull request Apr 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants