perf: use FxHashSet for O(1) name lookups in declaration instantiation#5024
perf: use FxHashSet for O(1) name lookups in declaration instantiation#5024tkshsbcue wants to merge 4 commits intoboa-dev:mainfrom
Conversation
Replace Vec with FxHashSet for name-tracking collections in bytecode compiler declaration instantiation. 22 .contains() calls go from O(n) to O(1), eliminating O(n^2) behavior when compiling scripts with many declarations. Affected: global/eval/function declaration instantiation functions.
|
@jedel1043 i think you might fight this interesting haha! |
Test262 conformance changes
Fixed tests (2):Broken tests (157):Tested main commit: |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5024 +/- ##
===========================================
+ Coverage 47.24% 59.76% +12.51%
===========================================
Files 476 589 +113
Lines 46892 63632 +16740
===========================================
+ Hits 22154 38027 +15873
- Misses 24738 25605 +867 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
do you have benchmark results? 👀 |
|
hey hi @zhuzhu81998 these are the benchmarks i compiled
|
|
Need to fix formatting first |
Test262 conformance changes
Tested main commit: |
|
hey @jedel1043 fixed formatting! |
|
The names that are hashed here, they are not internal right? They come from JS? |
|
hey @zhuzhu81998 The keys here are Sym, which is just a NonZeroUsize it's an integer index into the string interner, not the raw JS string itself. FxHash on sequential/near-sequential |
| @@ -718,6 +710,7 @@ impl ByteCompiler<'_> { | |||
|
|
|||
| // 12. Let declaredVarNames be a new empty List. | |||
| let mut declared_var_names = Vec::new(); | |||
There was a problem hiding this comment.
do we still need this separately?
There was a problem hiding this comment.
Are you suggesting we drop the Vec and use only the FxHashSet here?
There was a problem hiding this comment.
yea unless there is a reason to keep this that i am not seeing? you dropped the Vec elsewhere already no?
There was a problem hiding this comment.
We could try to drop it, but IIRC the spec requires vars to be initialized in a certain order, so I don't know if using a hash set will break assumptions about this
There was a problem hiding this comment.
Hmm, looking at it the iteration at step 18 just calls CreateGlobalVarBinding for each name, and those bindings are independent of each other. Wouldn't the end result be the same regardless of order?
There was a problem hiding this comment.
uhhh what should i do then lmao?
There was a problem hiding this comment.
Are you suggesting we drop the Vec and use only the FxHashSet here?
This, but use a separate commit just in case it breaks tests. Makes reversing easier
There was a problem hiding this comment.
alright i will implement this and push another follow up commit !
Remove the redundant Vec alongside the FxHashSet for declared_var_names in global_declaration_instantiation and eval_declaration_instantiation. The Vec was only iterated at the end, and iteration order is not required by the spec for these bindings.
Description
This PR replaces
VecwithFxHashSetfor name-tracking collections in the bytecode compiler's declaration instantiation functions, reducing.contains()lookups from O(n) to O(1).The problem
During script/module/function compilation, the bytecode compiler tracks declared function names, variable names, and parameter bindings using
Vec<Sym>. These vectors are repeatedly checked with.contains()inside loops, creating O(n^2) behavior when a script has many declarations.There are 22
.contains()calls indeclarations.rs, many inside nested loops:The fix
Replace
Vec<Sym>withFxHashSet<Sym>for all name-tracking collections used only for membership checks. Where a collection is also iterated (e.g.,declared_var_namesfor emitting bindings), a parallelFxHashSetis maintained alongside theVec.FxHashSet(fromrustc_hash) is already used elsewhere in the codebase and provides faster hashing thanstd::collections::HashSetfor small integer keys likeSym.Affected functions
global_declaration_instantiation_contextdeclared_function_names,declared_var_namesprepare_eval_declaration_instantiationdeclared_function_names,lexically_declared_namesglobal_declaration_instantiationdeclared_function_names,declared_var_nameseval_declaration_instantiationdeclared_function_names,declared_var_namesfunction_declaration_instantiationfunction_names,parameter_bindings,instantiated_var_namesWhy this matters
Declaration instantiation runs on every script, eval, and function compilation. Real-world JavaScript bundles routinely have hundreds or thousands of declarations. The O(n^2) behavior is particularly visible when:
eval()with significant codeTest plan
cargo checkpassescargo clippy-- no warningscargo test -p boa_engine --lib-- all 958 tests pass