Skip to content

perf(size): devirtualize native-module dispatch tables (−20% hello-world __text)#5256

Merged
proggeramlug merged 16 commits into
mainfrom
feat/nm-method-devirt
Jun 16, 2026
Merged

perf(size): devirtualize native-module dispatch tables (−20% hello-world __text)#5256
proggeramlug merged 16 commits into
mainfrom
feat/nm-method-devirt

Conversation

@proggeramlug

@proggeramlug proggeramlug commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

Devirtualizes Perry's monolithic native-module dispatch tables so -dead_strip can
remove handler code a program never imports. hello-world __text: 4,667,824 →
3,716,980 = −950,844 B (−20.4%); binary ~5.4MB → ~4.3MB.
Zero perf cost (pure
linker reachability), behaviour byte-identical to Node.

The problem

Several monolithic functions/tables statically named every native handler from one
always-reachable site, so any program creating one native namespace pinned all of
them, -dead_strip notwithstanding:

  • dispatch_native_module_method — 592-arm method dispatcher
  • js_new_function_construct — node-namespaced constructors (new stream.Readable(), …)
  • the SUBMODULES table — every fs/promises, stream/web, … thunk (the fs impl)

The fix (3 phases, same pattern)

Per-module registries populated by js_nm_install_<module>() / js_node_submod_install_<key>(),
which codegen emits only when the module is statically imported — so each handler is
referenced solely through its install symbol and unimported ones dead-strip:

  1. Method dispatch → 37 per-module nm_dispatch_<b> buckets + NM_DISPATCH_REGISTRY (−609KB)
  2. Constructors → per-module nm_ctor_<m> via NM_CTOR_REGISTRY, reusing the same installs (−87KB)
  3. Submodule thunks → per-submodule statics + SUBMOD_REGISTRY (−254KB)

Dynamic require/getBuiltinModule (module unknown at compile time) is handled by a
black_box'd install-all hook — armed only by programs that use it, so static imports keep
precise stripping. (black_box is required: whole-program opt otherwise speculatively
devirtualizes the single-pointer indirect call and re-pins everything.)

Also: console.trace now emits the same coarse at <anonymous> frame as Error().stack
instead of std::backtrace::force_capture() (consistent, and avoids pulling the DWARF symbolizer).

Correctness

Byte-identical to node --experimental-strip-types: hello-world, os, path, util,
querystring, assert, global process, getBuiltinModule (dynamic + literal),
require, new stream.Readable/Writable/Transform, global new URL/TextEncoder/WeakSet/ Error/Uint8Array, import {readFile} from 'node:fs/promises', fs.promises via native fs.

Notes

  • Off-main contributor metadata (version bump / CHANGELOG) intentionally not touched —
    for maintainer fold-in at merge.
  • NM_DEVIRT_PLAN.md documents the design, the per-phase measurements, and the
    diminishing-returns landscape (node_stream/fs residual, panic-symbolizer strip).
  • Complementary to feat/size-optimize-npm (this strips intra-crate reachability; that
    strips whole dependency crates via feature-gating).

Summary by CodeRabbit

Release Notes

  • New Features
    • Added devirtualized native-module dispatch with per-module install hooks to ensure dispatch targets are available during dynamic import/extern handling.
    • Introduced a process-wide registry for Node submodule resolution with “install all” enablement.
    • Added devirtualized constructor routing for new <namespace>.<Ctor>() and a getBuiltinModule devirt entry point.
  • Bug Fixes
    • Updated console.trace() to use a coarser, Error().stack-style stack output.
  • Documentation
    • Added a development plan covering native-module devirtualization, validation, and binary-size tracking.

Ralph Küpper added 10 commits June 16, 2026 07:59
…rt phase 1)

Split the monolithic dispatch_native_module_method (592 arms) into 37 per-module
nm_dispatch_<b> bucket fns reached through a per-module dispatch registry, so the
linker can dead-strip handlers a program never imports. NmCtx + nm_general_closures!
macro carry the old prologue marshalling. Thin router does name extract+normalize
then registry lookup — names no bucket fn. Each js_nm_install_<b>() is the sole
static ref to its bucket; codegen will emit one per static import (next commit).

Runtime compiles green. Codegen install emission + correctness/measurement pending.
See NM_DEVIRT_PLAN.md.
…e sites

Wire the devirt registry: codegen nm_install_symbol() maps each native-module name
to its dispatch-install symbol; all 5 js_create_native_module_namespace sites now
emit it before creating the namespace, so the module's bucket is registered before
any method call. Externs declared in runtime_decls/objects.rs.

Verified byte-identical to node: hello-world, import os (platform/EOL/arch),
import path (join/basename/extname), global process (cwd/pid/argv/platform).
hello-world emits 0 installs (imports nothing) → method-dispatch handlers for
unimported modules drop (child_process syms 136→45, cluster 41→31; residual pinned
by the still-monolithic constructor dispatcher = phase 2).
…size regression)

Runtime-resolved builtins (process.getBuiltinModule(spec)) can't get a codegen
per-module install since the module name isn't known at compile time. Add an
indirect install-all hook: native_module_get_builtin_module_value runs
nm_run_install_all_hook() (loads an opaque fn-ptr, never names js_nm_install_all),
armed by js_nm_enable_install_all() which only the getBuiltinModule codegen
wrapper (js_process_get_builtin_module_devirt) emits. black_box hides the stored
pointer so whole-program opt can't speculatively devirtualize the indirect call
back into a direct js_nm_install_all reference (that re-pinned every bucket).

Verified: hello-world __text 4,058,968 (full stripping preserved, install_all
absent); dynamic getBuiltinModule("node:"+x).platform() byte-identical to node.
js_new_function_construct dispatched 'new ns.Ctor()' for tty/fs/vm/tls/wasi/
readline/repl/stream with a direct call to each subsystem's *_new — statically
pinning that code in every binary. Extract the 8 into per-module nm_ctor_<m> fns
routed through NM_CTOR_REGISTRY, registered by the SAME js_nm_install_<module>()
codegen emits at import (no new codegen). Global builtins (URL/WeakSet/Error/
TypedArray) stay inline; http/events/zlib/sqlite already use dynamic dispatch ptrs.

Measured: hello-world __text 4,058,968 -> 3,971,252 (repl ctors fully stripped;
total from origin/main baseline 4,667,824 -> 3,971,252 = -696,572 / -14.9%).
Correct byte-identical to node: new stream.Readable/Writable/Transform, global
new URL/TextEncoder/WeakSet/Error/Uint8Array, + all 6 phase-1 cases (os/path/util/
process/getBuiltinModule/require). Residual node_stream/tls/child_process pinned by
intra-subsystem refs + method-dispatch internals = phase 3 (diminishing returns).
console.trace() called std::backtrace::Backtrace::force_capture(), printing Rust
frames that (a) symbolicate to __mh_execute_header on stripped release builds and
(b) are inconsistent with Error().stack, which is intentionally coarse
('Real stack traces are not implemented'). Emit the same 'at <anonymous>' frame
Error.stack uses. Also a prerequisite for dropping the std DWARF symbolizer
(gimli/addr2line/dwarf) — force_capture pulls it regardless of panic mode, so it
must go before panic_immediate_abort can strip the ~143KB.

console.trace output verified: 'Trace: <msg>\n    at <anonymous>'.
…virt phase 3

The static SUBMODULES array named every submodule's thunks (fs/promises, stream/web,
timers/promises, readline/promises, ...), and find_submodule iterated it — so the
always-linked native-module property / getBuiltinModule paths pinned all 373 thunks
(fs implementation = ~250KB) into every binary.

Split into per-submodule statics + SUBMOD_REGISTRY (mirrors the native-module
registry): find_submodule does a registry lookup; js_node_submod_install_<key>() is
the sole static ref to each spec; codegen emits it at all 6 submodule-resolution
sites (namespace import, await-import, namespace.member, named export-as-function ×2,
multi-path). Dynamic require/getBuiltinModule arm a black_box'd install-all hook run
at the top of the namespace fns.

Measured: hello-world __text 3,971,424 -> 3,716,980 (-254KB). Cumulative from
origin/main baseline 4,667,824 -> 3,716,980 = -950,844 (-20.4%); ~5.4MB -> ~4.3MB.
Correct: import {readFile} from 'node:fs/promises', fs.promises.readFile via native
fs, + 9/9 regression sweep (os/path/util/process/getBuiltinModule/require/stream/
globals/fs-promises) byte-identical to node.
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: ed4fe164-16ad-485c-9fc8-51f38b6ed8a1

📥 Commits

Reviewing files that changed from the base of the PR and between aa6774a and aa9150e.

📒 Files selected for processing (9)
  • crates/perry-codegen/src/nm_install.rs
  • crates/perry-codegen/src/runtime_decls/objects.rs
  • crates/perry-runtime/src/node_submodules/mod.rs
  • crates/perry-runtime/src/node_submodules/tests.rs
  • crates/perry-runtime/src/object/class_registry.rs
  • crates/perry-runtime/src/object/mod.rs
  • crates/perry-runtime/src/object/native_module_dispatch.rs
  • crates/perry-runtime/src/object/native_module_registry.rs
  • scripts/check_file_size.sh

📝 Walkthrough

Walkthrough

This PR implements native-module method dispatch devirtualization across three phases: a new per-module atomic dispatch registry with bucketed install entrypoints and an install-all fallback hook; a submodule devirt registry replacing the static SUBMODULES table; per-module constructor dispatch handlers replacing inline match chains; codegen-side nm_install.rs utilities that emit install calls at every namespace creation site; a process.getBuiltinModule devirt shim enabling dynamic-require paths to arm install hooks; and a simplified console.trace that drops DWARF symbolization.

Changes

NM Dispatch Devirtualization

Layer / File(s) Summary
Codegen install-symbol lookup utilities and FFI declarations
crates/perry-codegen/src/nm_install.rs, crates/perry-codegen/src/lib.rs, crates/perry-codegen/src/runtime_decls/objects.rs, crates/perry-codegen/src/runtime_decls/strings.rs, crates/perry-codegen/src/runtime_decls/strings_part2.rs
Adds nm_install_symbol, nm_submod_install_symbol, and their symbol-list constants; registers the nm_install module; extends runtime_decls to declare all per-module install symbols and the devirt process symbol.
Native-module dispatch registry
crates/perry-runtime/src/object/native_module_registry.rs, crates/perry-runtime/src/object/mod.rs
New generated file defining bucket enum, global atomic dispatch-function pointer array, nm_module_index, nm_dispatch_lookup, per-module js_nm_install_* entrypoints, the NM_INSTALL_ALL_HOOK atomic plus js_nm_enable_install_all/nm_run_install_all_hook, and a parallel per-module constructor registry with nm_ctor_lookup/nm_register_ctor.
Submodule devirt registry
crates/perry-runtime/src/node_submodules/mod.rs, crates/perry-runtime/src/node_submodules/tests.rs
Replaces the static SUBMODULES slice with 15 discrete SUBMOD_* spec constants, defines SubmodBucket and SUBMOD_REGISTRY as an atomic array, adds all js_node_submod_install_*/install_all/enable_install_all FFI entrypoints, wires run_submod_install_all_hook into both namespace resolution paths, and updates the test to check the new spec constants.
Constructor devirt per-module handlers
crates/perry-runtime/src/object/class_registry.rs
Adds per-module constructor handler functions (nm_ctor_tty, nm_ctor_fs, nm_ctor_vm, nm_ctor_tls, nm_ctor_wasi, nm_ctor_readline, nm_ctor_repl, nm_ctor_stream) and replaces the inline match chain in js_new_function_construct with a nm_ctor_lookup registry call.
process.getBuiltinModule devirt shim and dynamic-require hook
crates/perry-runtime/src/process.rs, crates/perry-runtime/src/object/native_module.rs, crates/perry-codegen/src/lower_call/native_table/node_core_process.rs
Adds js_process_get_builtin_module_devirt shim enabling both install-all hooks before delegating; retargets codegen dispatch to the devirt symbol; inserts nm_run_install_all_hook into native_module_get_builtin_module_value for the dynamic-require path.
Codegen expr lowering: install calls at namespace creation sites
crates/perry-codegen/src/expr/dyn_extern_i18n.rs, crates/perry-codegen/src/expr/static_field_meta.rs, crates/perry-codegen/src/expr/property_get.rs, crates/perry-codegen/src/expr/new_dynamic.rs, crates/perry-codegen/src/expr/instance_misc1.rs
Adds conditional nm_install_symbol/nm_submod_install_symbol lookups and call_void emissions before every native-module namespace or submodule resolution call across all relevant expression lowering paths.
console.trace coarse-stack simplification
crates/perry-runtime/src/builtins/console.rs
Removes std::backtrace::Backtrace::force_capture() and DWARF symbolization; replaces with emit_console_trace_stack() emitting a single at <anonymous> line.
NM devirtualization plan document and build script updates
NM_DEVIRT_PLAN.md, scripts/check_file_size.sh
Adds architecture plan recording 37-bucket design, registry mechanism, codegen expectations, validation results, binary-size deltas, and status notes; allowlists the oversized native dispatch file.

Sequence Diagram(s)

sequenceDiagram
    participant Codegen as Codegen (lowering)
    participant InstallFn as js_nm_install_X / js_node_submod_install_X
    participant Registry as NM_DISPATCH_REGISTRY / SUBMOD_REGISTRY
    participant DevirtShim as js_process_get_builtin_module_devirt
    participant Hook as NM_INSTALL_ALL_HOOK
    participant Runtime as nm_dispatch_X / find_submodule

    Note over Codegen: Static import path (analyzed module name)
    Codegen->>InstallFn: emit call_void before namespace creation
    InstallFn->>Registry: atomic store dispatch function ptr

    Note over Codegen: Dynamic require path (unanalyzable name)
    Codegen->>DevirtShim: process.getBuiltinModule(id)
    DevirtShim->>Hook: js_nm_enable_install_all() – arm hook
    DevirtShim->>Hook: js_node_submod_enable_install_all() – arm hook
    DevirtShim->>Runtime: js_process_get_builtin_module(id)
    Runtime->>Hook: nm_run_install_all_hook() → js_nm_install_all
    Hook->>Registry: all installers populate all buckets

    Note over Codegen: Dispatch at call site
    Codegen->>Registry: nm_dispatch_lookup(module_name)
    Registry-->>Codegen: NmDispatchFn or None
    Codegen->>Runtime: invoke nm_dispatch_X(ctx, method, args)
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • PerryTS/perry#5220: Modifies the HIR to lower require("<native>") to Expr::NativeModuleRef (namespace-member dispatch), which directly targets the main PR's added Expr::NativeModuleRef lowering logic that installs the devirt dispatch-bucket symbol before creating the native module namespace.

Poem

🐇 Hoppity-hop through the registry lane,
Each module's installed, no dispatch in vain!
Atomic buckets lined up in a row,
Dead-strip the extras and watch the size go!
One <anonymous> trace keeps it clean and bright —
A rabbit devirts code from morning to night. 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: devirtualizing native-module dispatch tables with a specific binary size improvement metric (−20% hello-world __text).
Description check ✅ Passed The PR description comprehensively covers the problem, the fix across three phases, correctness validation, and implementation notes. It follows the template structure with Summary, Changes, and Testing sections.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/nm-method-devirt

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/perry-codegen/src/runtime_decls/objects.rs`:
- Around line 179-217: The `js_nm_install_v8` function declaration is missing
from the module declaration block, which breaks the devirt symbol contract since
codegen can emit calls to this function. Add the missing
`module.declare_function("js_nm_install_v8", VOID, &[]);` call to the list of
declarations in alphabetical order (it should be positioned between the
`js_nm_install_url` and `js_nm_install_wasi` declarations, following the
existing alphabetical ordering pattern).

In `@crates/perry-runtime/src/node_submodules/mod.rs`:
- Line 789: The installer functions js_node_submod_install_fs_promises at line
789 and the corresponding function at line 803 (likely
js_node_submod_install_sys) only populate SUBMOD_REGISTRY but do not install the
native backing buckets for delegated modules. Since fs_promises depends on
fs.constants and sys delegates to util, these native buckets must also be
installed when their submodule installers are called. Modify both installer
functions to additionally call the appropriate native module bucket install
helpers: for fs_promises, also install the fs native bucket; for sys, also
install the util native bucket. This requires either exposing narrow
crate-visible install helpers from object::native_module_registry or emitting
the paired native installs from codegen at these submodule installation sites.

In `@crates/perry-runtime/src/object/class_registry.rs`:
- Around line 1634-1645: The #[no_mangle] attribute on line 1634 is positioned
before documentation comments, causing it to incorrectly attach to the
nm_ctor_arg function instead of js_new_function_construct due to the intervening
#[inline] attribute. Move the #[no_mangle] attribute from its current position
to directly precede the js_new_function_construct function definition to ensure
that function's symbol is exported without Rust name mangling, which is required
for FFI calls to work correctly with the unmangled symbol name.

In `@crates/perry-runtime/src/object/native_module_registry.rs`:
- Around line 21-24: The `nm_module_index` function does not currently handle
the "assert/strict" module name, causing `nm_dispatch_lookup("assert/strict")`
to return None instead of routing to the Assert bucket. Add a new match arm in
the `nm_module_index` function to map "assert/strict" to
`Some(NmBucket::Assert)`, similar to how "assert" is handled, so that
node:assert/strict methods can be properly dispatched to the Assert bucket.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 9f54ead3-3855-4b3e-bc56-d1c0afc00d50

📥 Commits

Reviewing files that changed from the base of the PR and between 7f0c2fe and eae4723.

📒 Files selected for processing (20)
  • NM_DEVIRT_PLAN.md
  • crates/perry-codegen/src/expr/dyn_extern_i18n.rs
  • crates/perry-codegen/src/expr/instance_misc1.rs
  • crates/perry-codegen/src/expr/new_dynamic.rs
  • crates/perry-codegen/src/expr/property_get.rs
  • crates/perry-codegen/src/expr/static_field_meta.rs
  • crates/perry-codegen/src/lib.rs
  • crates/perry-codegen/src/lower_call/native_table/node_core_process.rs
  • crates/perry-codegen/src/nm_install.rs
  • crates/perry-codegen/src/runtime_decls/objects.rs
  • crates/perry-codegen/src/runtime_decls/strings.rs
  • crates/perry-codegen/src/runtime_decls/strings_part2.rs
  • crates/perry-runtime/src/builtins/console.rs
  • crates/perry-runtime/src/node_submodules/mod.rs
  • crates/perry-runtime/src/object/class_registry.rs
  • crates/perry-runtime/src/object/mod.rs
  • crates/perry-runtime/src/object/native_module.rs
  • crates/perry-runtime/src/object/native_module_dispatch.rs
  • crates/perry-runtime/src/object/native_module_registry.rs
  • crates/perry-runtime/src/process.rs

Comment thread crates/perry-codegen/src/runtime_decls/objects.rs
Comment thread crates/perry-runtime/src/node_submodules/mod.rs Outdated
Comment thread crates/perry-runtime/src/object/class_registry.rs Outdated
Comment thread crates/perry-runtime/src/object/native_module_registry.rs
proggeramlug and others added 5 commits June 16, 2026 04:09
- rustfmt the generated devirt code (nm_install.rs, native_module_dispatch/registry,
  class_registry, node_submodules) — CI lint (fmt) was red.
- Unit tests call dispatch/ctor/submodule helpers directly, without the codegen
  js_nm_install_<module>() that precedes use in real programs, so the registries were
  empty (SUBMODULES array also removed). Add a #[cfg(test)]-only lazy install-all
  fallback in nm_dispatch_lookup/nm_ctor_lookup/find_submodule so tests exercise the
  real registry path; replace the removed SUBMODULES iteration with a test-only
  ALL_SUBMODULE_SPECS list. Production builds are unchanged (#[cfg(test)] excluded).
The devirt split (37 per-module nm_dispatch_<bucket> fns + closure preludes) grew the
file from ~1975 to 3473 LOC. Same rationale as the other allowlisted dispatch tables
(class_registry, native_module, native_call_method): one logical generated dispatch
surface, kept together.
- objects.rs: declare js_nm_install_v8 (the decl-gen regex [a-z_]+ skipped the
  digit in 'v8' → codegen could emit an undeclared call on node:v8 paths).
- nm_module_index + codegen nm_install_symbol: map tags that appear only as the
  *second* literal of a multi-pattern arm (assert/strict, http2, https,
  punycode.default, v8.DefaultSerializer/Deserializer) — they bucketed by first
  literal so the index missed them and those modules dispatched undefined.
  Verified: import assert from 'node:assert/strict' now works.
- class_registry.rs: move #[no_mangle] back onto js_new_function_construct (the
  phase-2 ctor block was inserted between the attribute and the fn, so it bound
  to nm_ctor_arg and left the FFI entrypoint Rust-mangled).
- node_submodules: js_node_submod_install_fs_promises/_sys also install the
  native fs/util buckets they delegate to (fs.constants, sys→util).

8/8 correctness sweep byte-identical to node; hello-world __text 3,710,432 (win kept).
@proggeramlug

Copy link
Copy Markdown
Contributor Author

Thanks @coderabbitai — all four addressed in 9eb12be:

  1. js_nm_install_v8 missing decl — the decl-gen regex [a-z_]+ skipped the digit in v8; declared it (and the v8.Default* tags below route to it).
  2. #[no_mangle] on wrong fn — moved back onto js_new_function_construct (the phase-2 ctor block had been inserted between the attribute and the fn). Verified the unmangled entrypoint is restored.
  3. nm_module_index missing assert/strict — this was systematic: tags appearing only as the second literal of a multi-pattern arm were missed by the first-literal bucketer. Added all six (assert/strict, http2, https, punycode.default, v8.DefaultSerializer/Deserializer) to both nm_module_index and codegen nm_install_symbol. import assert from 'node:assert/strict' now dispatches correctly.
  4. Submodule→native delegationjs_node_submod_install_fs_promises/_sys now also install the fs/util native buckets they delegate to.

8/8 correctness sweep byte-identical to node; full perry-runtime suite 1044/1044.

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details
{}

@proggeramlug proggeramlug merged commit 9efd247 into main Jun 16, 2026
1 of 2 checks passed
@proggeramlug proggeramlug deleted the feat/nm-method-devirt branch June 16, 2026 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant