Skip to content

Conversation

@eregon
Copy link
Member

@eregon eregon commented Jan 18, 2026

Ref: #3838
On top of #3858 should be merged first.

@eregon
Copy link
Member Author

eregon commented Jan 18, 2026

Failures that were added to excludes:

==============================================================================================================================
Failure: test_dos_endings.txt_lex(Prism::RipperTest)
test/prism/ruby/ripper_test.rb:148:in 'block in Prism::RipperTest#assert_ripper_lex'
<internal:numeric>:257:in 'Integer#times'
test/prism/ruby/ripper_test.rb:127:in 'Prism::RipperTest#assert_ripper_lex'
test/prism/ruby/ripper_test.rb:84:in 'block (2 levels) in <class:RipperTest>'
<[[1, 9], :on_sp, "\\\r\n", nil]> expected but was
<[[1, 9], :on_sp, "\\\r\n" + "     ", nil]>

diff:
? [[1, 9], :on_sp, "\\\r\n" + "     ", nil]
==============================================================================================================================
F
==============================================================================================================================
Failure: test_seattlerb/str_lit_concat_bad_encodings.txt_lex(Prism::RipperTest)
test/prism/ruby/ripper_test.rb:148:in 'block in Prism::RipperTest#assert_ripper_lex'
<internal:numeric>:257:in 'Integer#times'
test/prism/ruby/ripper_test.rb:127:in 'Prism::RipperTest#assert_ripper_lex'
test/prism/ruby/ripper_test.rb:84:in 'block (2 levels) in <class:RipperTest>'
<[[1, 62], :on_sp, " ", nil]> expected but was
<[[1, 62], :on_sp, " \\\n" + "        ", nil]>

diff:
? [[1, 62], :on_sp, " \\\n" + "        ", nil]
==============================================================================================================================
F
==============================================================================================================================
Failure: test_seattlerb/utf8_bom.txt_lex(Prism::RipperTest)
test/prism/ruby/ripper_test.rb:148:in 'block in Prism::RipperTest#assert_ripper_lex'
<internal:numeric>:257:in 'Integer#times'
test/prism/ruby/ripper_test.rb:127:in 'Prism::RipperTest#assert_ripper_lex'
test/prism/ruby/ripper_test.rb:84:in 'block (2 levels) in <class:RipperTest>'
<[[2, 0], :on_ident, "p", CMDARG]> expected but was
<[[1, 23], :on_sp, "-w\n", BEG]>

diff:
? [[   2, 0], :on_ide     nt, "p", CMDARG]
?   1,  3         sp, "-w\         BE     
?   +++ ???         ??? -----   ?????     
==============================================================================================================================
F
==============================================================================================================================
Failure: test_unparser/corpus/semantic/dstr.txt_lex(Prism::RipperTest)
test/prism/ruby/ripper_test.rb:148:in 'block in Prism::RipperTest#assert_ripper_lex'
<internal:numeric>:257:in 'Integer#times'
test/prism/ruby/ripper_test.rb:127:in 'Prism::RipperTest#assert_ripper_lex'
test/prism/ruby/ripper_test.rb:84:in 'block (2 levels) in <class:RipperTest>'
<[[119, 3], :on_sp, " ", nil]> expected but was
<[[119, 3], :on_sp, " \\\n", nil]>

diff:
? [[119, 3], :on_sp, "     ", nil]
?                      \\\n       
?                     ?       
==============================================================================================================================
Finished in 1.197953795 seconds.
------------------------------------------------------------------------------------------------------------------------------
1892 tests, 36583 assertions, 4 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
99.7886% passed
------------------------------------------------------------------------------------------------------------------------------

@eregon
Copy link
Member Author

eregon commented Jan 18, 2026

:on_sp events also have state on Ripper and it's not always the last token state, example:

$ ruby -rripper -e 'pp Ripper.lex("def m(a) nil end")'  
[[[1, 0], :on_kw, "def", FNAME],
 [[1, 3], :on_sp, " ", FNAME],
 [[1, 4], :on_ident, "m", ENDFN],
 [[1, 5], :on_lparen, "(", BEG|LABEL],
 [[1, 6], :on_ident, "a", ARG],
 [[1, 7], :on_rparen, ")", ENDFN],
 [[1, 8], :on_sp, " ", BEG], <=
 [[1, 9], :on_kw, "nil", END],
 [[1, 12], :on_sp, " ", END],
 [[1, 13], :on_kw, "end", END]]

And more complex cases like whitequark/space_args_block.txt:

EXPECTED:
[[[1, 0], :on_kw, "undef", FNAME|FITEM],
 [[1, 5], :on_sp, " ", FNAME|FITEM], # OK
 [[1, 6], :on_ident, "foo", END],
 [[1, 9], :on_comma, ",", BEG|LABEL],
 [[1, 10], :on_sp, " ", FNAME|FITEM], # what?
 [[1, 11], :on_symbeg, ":", FNAME],
 [[1, 12], :on_ident, "bar", ENDFN],
 [[1, 15], :on_comma, ",", BEG|LABEL],
 [[1, 16], :on_sp, " ", FNAME|FITEM], # what?
 [[1, 17], :on_symbeg, ":\"", FNAME],
 [[1, 19], :on_tstring_content, "foo", FNAME],
 [[1, 22], :on_embexpr_beg, "\#{", FNAME],
 [[1, 24], :on_int, "1", END],
 [[1, 25], :on_embexpr_end, "}", END],
 [[1, 26], :on_tstring_end, "\"", END],
 [[1, 27], :on_nl, "\n", BEG]]

Would there be any way to get the correct state with Prism?

@Earlopain
Copy link
Collaborator

I would not bother with comparing state for this. I'm not particularly motivated to understand what is happening in detail here.

I doubt anyone is relying on this information being correct.

@eregon
Copy link
Member Author

eregon commented Jan 19, 2026

Yeah it's what I'm doing in test/prism/ruby/ripper_test.rb, I'm ignoring the state of :on_sp events for now.
Maybe we should always give them some "0/initial" state or so? Or even nil or one less Array element?

Slightly related, is Translation::Ripper::Lexer::State.new(Translation::Ripper::EXPR_BEG) the correct way to create Ripper states? It seems wasteful to create new instances every time when there are only a few possible states and they are frozen.

@Earlopain
Copy link
Collaborator

Maybe we should always give them some "0/initial" state or so? Or even nil or one less Array element?

Just putting in the state from the previous token seems fine. When it lines up, great. If not, oh well. Doesn't really matter in what way it's wrong. nil doesn't sound great for compatibility.

Slightly related, is Translation::Ripper::Lexer::State.new(Translation::Ripper::EXPR_BEG) the correct way to create Ripper states?

I believe that is what ripper itself does. I haven't benchmarked this in detail yet but there are definitly some hotspots. Much time is spend sorting tokens by location for example.

If changing this makes a noticable difference, I wouldn't be opposed to it. Here's what I use to quickly check:

require "ripper"
require "prism"
require "benchmark/ips"

codes = Dir["**/*.rb"].map { File.read(it) }

Benchmark.ips do |x|
  x.report("prism") { codes.each { Prism::Translation::Ripper.lex(it) } }
  x.report("ripper") { codes.each { Ripper.lex(it) } }

  x.compare!
end

I was also a bit concerned about the performance of this implementation but it is not so bad. Compared to ripper it goes from 2.3x times slower to 2.6x times slower which seems fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants