Don't double-include line and column information in SyntaxError (fixes #5357)#5409
Don't double-include line and column information in SyntaxError (fixes #5357)#5409MatrixFrog wants to merge 1 commit into
Conversation
Test262 conformance changes
Tested main commit: |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5409 +/- ##
===========================================
+ Coverage 47.24% 60.28% +13.04%
===========================================
Files 476 567 +91
Lines 46892 63147 +16255
===========================================
+ Hits 22154 38068 +15914
- Misses 24738 25079 +341 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
nekevss
left a comment
There was a problem hiding this comment.
Thanks for working on this! Just had a quick question 😄
| } else { | ||
| Err(Error::syntax( | ||
| format!( | ||
| "unexpected utf-8 char '\\u{next_ch}' at line {}, column {}", |
There was a problem hiding this comment.
Does this potentially cause the same issue that is mentioned in the issue? Can we test this branch as well?
There was a problem hiding this comment.
Probably. I'm not sure how to reach this branch though -- from what I can tell, even though next_ch is a u32 not a char, it will always be a valid char because the lexer ultimately uses a UTF8Input which will only yield valid UTF8 chars. But I may have missed something.
There was a problem hiding this comment.
Ok, I dug in a little more, what seems to be happening is that ReadChar's next_char method is documented as "Returns an error if the next input in the input is not a valid unicode code point." but the implementations (both UTF8Input and UTF16Input) from what I can tell don't follow that and instead if the input is invalid they could yield u32s which are not valid chars. It seems to me that it would be nice to switch the implementations to match the docs. Then, next_char could yield chars instead of u32s, and a lot of the lexer/parser code could become much more readable. For instance instead of code like (0x30..=0x39/* 0..=9 */).contains(c) needing a comment to explain what char 0x30 corresponds to, it could just look like ('0'..='9').contains(c).
I started down this path a little bit but I don't know if that's the direction the project wants to go. It would also work to keep it as is but then it would be good to assemble some tests that feed in byte sequences which are not valid UTF8/UTF16, to make sure those code paths do what's expected.
In any case that would end up being a bigger change, happy to add it to this PR or start a separate one, whatever is preferred.
This Pull Request fixes #5357
The SyntaxError includes the location information already so it doesn't need to add the line and column number in the error message as well.