perf(size): devirtualize native-module dispatch tables (−20% hello-world __text)#5256
Conversation
…rt phase 1) Split the monolithic dispatch_native_module_method (592 arms) into 37 per-module nm_dispatch_<b> bucket fns reached through a per-module dispatch registry, so the linker can dead-strip handlers a program never imports. NmCtx + nm_general_closures! macro carry the old prologue marshalling. Thin router does name extract+normalize then registry lookup — names no bucket fn. Each js_nm_install_<b>() is the sole static ref to its bucket; codegen will emit one per static import (next commit). Runtime compiles green. Codegen install emission + correctness/measurement pending. See NM_DEVIRT_PLAN.md.
…e sites Wire the devirt registry: codegen nm_install_symbol() maps each native-module name to its dispatch-install symbol; all 5 js_create_native_module_namespace sites now emit it before creating the namespace, so the module's bucket is registered before any method call. Externs declared in runtime_decls/objects.rs. Verified byte-identical to node: hello-world, import os (platform/EOL/arch), import path (join/basename/extname), global process (cwd/pid/argv/platform). hello-world emits 0 installs (imports nothing) → method-dispatch handlers for unimported modules drop (child_process syms 136→45, cluster 41→31; residual pinned by the still-monolithic constructor dispatcher = phase 2).
…size regression)
Runtime-resolved builtins (process.getBuiltinModule(spec)) can't get a codegen
per-module install since the module name isn't known at compile time. Add an
indirect install-all hook: native_module_get_builtin_module_value runs
nm_run_install_all_hook() (loads an opaque fn-ptr, never names js_nm_install_all),
armed by js_nm_enable_install_all() which only the getBuiltinModule codegen
wrapper (js_process_get_builtin_module_devirt) emits. black_box hides the stored
pointer so whole-program opt can't speculatively devirtualize the indirect call
back into a direct js_nm_install_all reference (that re-pinned every bucket).
Verified: hello-world __text 4,058,968 (full stripping preserved, install_all
absent); dynamic getBuiltinModule("node:"+x).platform() byte-identical to node.
js_new_function_construct dispatched 'new ns.Ctor()' for tty/fs/vm/tls/wasi/ readline/repl/stream with a direct call to each subsystem's *_new — statically pinning that code in every binary. Extract the 8 into per-module nm_ctor_<m> fns routed through NM_CTOR_REGISTRY, registered by the SAME js_nm_install_<module>() codegen emits at import (no new codegen). Global builtins (URL/WeakSet/Error/ TypedArray) stay inline; http/events/zlib/sqlite already use dynamic dispatch ptrs. Measured: hello-world __text 4,058,968 -> 3,971,252 (repl ctors fully stripped; total from origin/main baseline 4,667,824 -> 3,971,252 = -696,572 / -14.9%). Correct byte-identical to node: new stream.Readable/Writable/Transform, global new URL/TextEncoder/WeakSet/Error/Uint8Array, + all 6 phase-1 cases (os/path/util/ process/getBuiltinModule/require). Residual node_stream/tls/child_process pinned by intra-subsystem refs + method-dispatch internals = phase 3 (diminishing returns).
console.trace() called std::backtrace::Backtrace::force_capture(), printing Rust
frames that (a) symbolicate to __mh_execute_header on stripped release builds and
(b) are inconsistent with Error().stack, which is intentionally coarse
('Real stack traces are not implemented'). Emit the same 'at <anonymous>' frame
Error.stack uses. Also a prerequisite for dropping the std DWARF symbolizer
(gimli/addr2line/dwarf) — force_capture pulls it regardless of panic mode, so it
must go before panic_immediate_abort can strip the ~143KB.
console.trace output verified: 'Trace: <msg>\n at <anonymous>'.
…virt phase 3
The static SUBMODULES array named every submodule's thunks (fs/promises, stream/web,
timers/promises, readline/promises, ...), and find_submodule iterated it — so the
always-linked native-module property / getBuiltinModule paths pinned all 373 thunks
(fs implementation = ~250KB) into every binary.
Split into per-submodule statics + SUBMOD_REGISTRY (mirrors the native-module
registry): find_submodule does a registry lookup; js_node_submod_install_<key>() is
the sole static ref to each spec; codegen emits it at all 6 submodule-resolution
sites (namespace import, await-import, namespace.member, named export-as-function ×2,
multi-path). Dynamic require/getBuiltinModule arm a black_box'd install-all hook run
at the top of the namespace fns.
Measured: hello-world __text 3,971,424 -> 3,716,980 (-254KB). Cumulative from
origin/main baseline 4,667,824 -> 3,716,980 = -950,844 (-20.4%); ~5.4MB -> ~4.3MB.
Correct: import {readFile} from 'node:fs/promises', fs.promises.readFile via native
fs, + 9/9 regression sweep (os/path/util/process/getBuiltinModule/require/stream/
globals/fs-promises) byte-identical to node.
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (9)
📝 WalkthroughWalkthroughThis PR implements native-module method dispatch devirtualization across three phases: a new per-module atomic dispatch registry with bucketed install entrypoints and an install-all fallback hook; a submodule devirt registry replacing the static ChangesNM Dispatch Devirtualization
Sequence Diagram(s)sequenceDiagram
participant Codegen as Codegen (lowering)
participant InstallFn as js_nm_install_X / js_node_submod_install_X
participant Registry as NM_DISPATCH_REGISTRY / SUBMOD_REGISTRY
participant DevirtShim as js_process_get_builtin_module_devirt
participant Hook as NM_INSTALL_ALL_HOOK
participant Runtime as nm_dispatch_X / find_submodule
Note over Codegen: Static import path (analyzed module name)
Codegen->>InstallFn: emit call_void before namespace creation
InstallFn->>Registry: atomic store dispatch function ptr
Note over Codegen: Dynamic require path (unanalyzable name)
Codegen->>DevirtShim: process.getBuiltinModule(id)
DevirtShim->>Hook: js_nm_enable_install_all() – arm hook
DevirtShim->>Hook: js_node_submod_enable_install_all() – arm hook
DevirtShim->>Runtime: js_process_get_builtin_module(id)
Runtime->>Hook: nm_run_install_all_hook() → js_nm_install_all
Hook->>Registry: all installers populate all buckets
Note over Codegen: Dispatch at call site
Codegen->>Registry: nm_dispatch_lookup(module_name)
Registry-->>Codegen: NmDispatchFn or None
Codegen->>Runtime: invoke nm_dispatch_X(ctx, method, args)
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/perry-codegen/src/runtime_decls/objects.rs`:
- Around line 179-217: The `js_nm_install_v8` function declaration is missing
from the module declaration block, which breaks the devirt symbol contract since
codegen can emit calls to this function. Add the missing
`module.declare_function("js_nm_install_v8", VOID, &[]);` call to the list of
declarations in alphabetical order (it should be positioned between the
`js_nm_install_url` and `js_nm_install_wasi` declarations, following the
existing alphabetical ordering pattern).
In `@crates/perry-runtime/src/node_submodules/mod.rs`:
- Line 789: The installer functions js_node_submod_install_fs_promises at line
789 and the corresponding function at line 803 (likely
js_node_submod_install_sys) only populate SUBMOD_REGISTRY but do not install the
native backing buckets for delegated modules. Since fs_promises depends on
fs.constants and sys delegates to util, these native buckets must also be
installed when their submodule installers are called. Modify both installer
functions to additionally call the appropriate native module bucket install
helpers: for fs_promises, also install the fs native bucket; for sys, also
install the util native bucket. This requires either exposing narrow
crate-visible install helpers from object::native_module_registry or emitting
the paired native installs from codegen at these submodule installation sites.
In `@crates/perry-runtime/src/object/class_registry.rs`:
- Around line 1634-1645: The #[no_mangle] attribute on line 1634 is positioned
before documentation comments, causing it to incorrectly attach to the
nm_ctor_arg function instead of js_new_function_construct due to the intervening
#[inline] attribute. Move the #[no_mangle] attribute from its current position
to directly precede the js_new_function_construct function definition to ensure
that function's symbol is exported without Rust name mangling, which is required
for FFI calls to work correctly with the unmangled symbol name.
In `@crates/perry-runtime/src/object/native_module_registry.rs`:
- Around line 21-24: The `nm_module_index` function does not currently handle
the "assert/strict" module name, causing `nm_dispatch_lookup("assert/strict")`
to return None instead of routing to the Assert bucket. Add a new match arm in
the `nm_module_index` function to map "assert/strict" to
`Some(NmBucket::Assert)`, similar to how "assert" is handled, so that
node:assert/strict methods can be properly dispatched to the Assert bucket.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 9f54ead3-3855-4b3e-bc56-d1c0afc00d50
📒 Files selected for processing (20)
NM_DEVIRT_PLAN.mdcrates/perry-codegen/src/expr/dyn_extern_i18n.rscrates/perry-codegen/src/expr/instance_misc1.rscrates/perry-codegen/src/expr/new_dynamic.rscrates/perry-codegen/src/expr/property_get.rscrates/perry-codegen/src/expr/static_field_meta.rscrates/perry-codegen/src/lib.rscrates/perry-codegen/src/lower_call/native_table/node_core_process.rscrates/perry-codegen/src/nm_install.rscrates/perry-codegen/src/runtime_decls/objects.rscrates/perry-codegen/src/runtime_decls/strings.rscrates/perry-codegen/src/runtime_decls/strings_part2.rscrates/perry-runtime/src/builtins/console.rscrates/perry-runtime/src/node_submodules/mod.rscrates/perry-runtime/src/object/class_registry.rscrates/perry-runtime/src/object/mod.rscrates/perry-runtime/src/object/native_module.rscrates/perry-runtime/src/object/native_module_dispatch.rscrates/perry-runtime/src/object/native_module_registry.rscrates/perry-runtime/src/process.rs
- rustfmt the generated devirt code (nm_install.rs, native_module_dispatch/registry, class_registry, node_submodules) — CI lint (fmt) was red. - Unit tests call dispatch/ctor/submodule helpers directly, without the codegen js_nm_install_<module>() that precedes use in real programs, so the registries were empty (SUBMODULES array also removed). Add a #[cfg(test)]-only lazy install-all fallback in nm_dispatch_lookup/nm_ctor_lookup/find_submodule so tests exercise the real registry path; replace the removed SUBMODULES iteration with a test-only ALL_SUBMODULE_SPECS list. Production builds are unchanged (#[cfg(test)] excluded).
The devirt split (37 per-module nm_dispatch_<bucket> fns + closure preludes) grew the file from ~1975 to 3473 LOC. Same rationale as the other allowlisted dispatch tables (class_registry, native_module, native_call_method): one logical generated dispatch surface, kept together.
- objects.rs: declare js_nm_install_v8 (the decl-gen regex [a-z_]+ skipped the digit in 'v8' → codegen could emit an undeclared call on node:v8 paths). - nm_module_index + codegen nm_install_symbol: map tags that appear only as the *second* literal of a multi-pattern arm (assert/strict, http2, https, punycode.default, v8.DefaultSerializer/Deserializer) — they bucketed by first literal so the index missed them and those modules dispatched undefined. Verified: import assert from 'node:assert/strict' now works. - class_registry.rs: move #[no_mangle] back onto js_new_function_construct (the phase-2 ctor block was inserted between the attribute and the fn, so it bound to nm_ctor_arg and left the FFI entrypoint Rust-mangled). - node_submodules: js_node_submod_install_fs_promises/_sys also install the native fs/util buckets they delegate to (fs.constants, sys→util). 8/8 correctness sweep byte-identical to node; hello-world __text 3,710,432 (win kept).
|
Thanks @coderabbitai — all four addressed in 9eb12be:
8/8 correctness sweep byte-identical to node; full perry-runtime suite 1044/1044. |
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
Summary
Devirtualizes Perry's monolithic native-module dispatch tables so
-dead_stripcanremove handler code a program never imports. hello-world
__text: 4,667,824 →3,716,980 = −950,844 B (−20.4%); binary ~5.4MB → ~4.3MB. Zero perf cost (pure
linker reachability), behaviour byte-identical to Node.
The problem
Several monolithic functions/tables statically named every native handler from one
always-reachable site, so any program creating one native namespace pinned all of
them,
-dead_stripnotwithstanding:dispatch_native_module_method— 592-arm method dispatcherjs_new_function_construct— node-namespaced constructors (new stream.Readable(), …)SUBMODULEStable — everyfs/promises,stream/web, … thunk (the fs impl)The fix (3 phases, same pattern)
Per-module registries populated by
js_nm_install_<module>()/js_node_submod_install_<key>(),which codegen emits only when the module is statically imported — so each handler is
referenced solely through its install symbol and unimported ones dead-strip:
nm_dispatch_<b>buckets +NM_DISPATCH_REGISTRY(−609KB)nm_ctor_<m>viaNM_CTOR_REGISTRY, reusing the same installs (−87KB)SUBMOD_REGISTRY(−254KB)Dynamic
require/getBuiltinModule(module unknown at compile time) is handled by ablack_box'd install-all hook — armed only by programs that use it, so static imports keepprecise stripping. (
black_boxis required: whole-program opt otherwise speculativelydevirtualizes the single-pointer indirect call and re-pins everything.)
Also:
console.tracenow emits the same coarseat <anonymous>frame asError().stackinstead of
std::backtrace::force_capture()(consistent, and avoids pulling the DWARF symbolizer).Correctness
Byte-identical to
node --experimental-strip-types: hello-world,os,path,util,querystring,assert, globalprocess,getBuiltinModule(dynamic + literal),require,new stream.Readable/Writable/Transform, globalnew URL/TextEncoder/WeakSet/ Error/Uint8Array,import {readFile} from 'node:fs/promises',fs.promisesvia native fs.Notes
for maintainer fold-in at merge.
NM_DEVIRT_PLAN.mddocuments the design, the per-phase measurements, and thediminishing-returns landscape (node_stream/fs residual, panic-symbolizer strip).
feat/size-optimize-npm(this strips intra-crate reachability; thatstrips whole dependency crates via feature-gating).
Summary by CodeRabbit
Release Notes
new <namespace>.<Ctor>()and agetBuiltinModuledevirt entry point.console.trace()to use a coarser,Error().stack-style stack output.