Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,22 @@ jobs:
- uses: Swatinem/rust-cache@v2
- run: cargo bench --bench fork_for_serving -- --warm-up-time 3 --measurement-time 5 --sample-size 50

startup-benchmark:
name: Startup benchmark
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- uses: Swatinem/rust-cache@v2
- name: Build release forge and static runtime
run: cargo build --release --lib --bin forge
- name: Build startup measurement harness
run: rustc tools/startup_time.rs -O -o target/startup_time
- name: Measure startup modes
env:
FORGE_LIB_DIR: target/release
run: ./target/startup_time --forge ./target/release/forge --warmups 2 --reps 20

fmt:
name: Format
runs-on: ubuntu-latest
Expand Down
131 changes: 131 additions & 0 deletions .planning/startup-time-under-10ms.plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Startup Time Under 10ms Measurement Plan

## Roadmap Item

- `ROADMAP.md`: `Startup time: < 10ms (vs ~100ms for interpreter)`

## Scope Decision

This PR does **not** claim the `<10ms` target is achieved. It establishes repeatable startup measurement and report-only CI visibility so the next optimization PR can be judged against real data. The roadmap checkbox must stay unchecked until the measured target is met.

The target should apply to the standalone/native execution path, not ordinary `forge run app.fg`. A source file run still has to start the Rust CLI, parse CLI args, read source, lex, parse, typecheck, initialize runtime state, and execute. The native/standalone path is the only realistic place for `<10ms`.

## Current State

- `forge run app.fg` goes through the full CLI/frontend/interpreter path.
- `forge run app.fgc` skips lex/parse but still starts the CLI and VM.
- `forge build --native` can now produce a standalone source-runtime binary when `libforge_lang.a` is present.
- Existing `benches/fork_for_serving.rs` measures per-request fork cost, not process startup.
- There is no repeatable startup benchmark, no CI trend signal, and no agreed measurement definition.

## Measurement Definition

Measure cold-ish process startup wall time from parent process spawn to child process exit for short-lived programs.

Initial modes:

1. `source-run`: `forge run hello.fg`
2. `bytecode-run`: `forge run hello.fgc`
3. `native-source-runtime`: generated `forge build --native hello.fg` binary when `libforge_lang.a` is available
4. `aot-bytecode`: generated `forge build --aot hello.fg` binary when `libforge_lang.a` is available

Short-lived fixture:

```forge
println("ok")
```

The harness must assert correctness on every run. A child process that exits nonzero, segfaults, times out, or prints unexpected output must fail the measurement instead of looking like a fast startup.

Use a small `println("ok")` fixture for every mode so the harness can assert stdout-based correctness. Avoid server startup, networking, shell builtins, or filesystem writes in the measured child program.

## Implementation Units

### U1. Startup Measurement Harness

Files:
- Create: `tools/startup_time.rs` or `tests/startup_time.rs` as a small Rust harness binary/test helper
- Modify: `Cargo.toml` only if using a cargo bench/bin target is necessary

Do **not** use Criterion for process startup measurement. Criterion is optimized for in-process function benchmarking and its warmup/statistical model is a poor fit for fork/exec wall time.

Add a custom wall-time harness (or a thin wrapper around `hyperfine` only if introducing that dependency/tool is cleaner) that:
- Locates the `forge` binary under test.
- Creates an isolated temp fixture directory.
- Writes `hello.fg`.
- Builds `hello.fgc`.
- Requires the caller/CI job to provide `FORGE_LIB_DIR` pointing at an existing `libforge_lang.a`.
- Builds native artifacts with `FORGE_LIB_DIR` set so standalone modes are actually measured.
- Measures process spawn-to-exit wall time for each mode using `std::process::Command` and `Instant`.
- Runs enough repetitions to report min/median/p95 or min/mean/p95.
- Asserts every child exits successfully and emits expected output where applicable.
- Times out child processes so hangs fail fast.

Harness output should be simple, line-oriented, and easy to paste into PRs, for example:

```text
startup.source_run median=...
startup.bytecode_run median=...
startup.native_source_runtime median=...
startup.aot_bytecode median=...
```

### U2. Report-Only CI Job

Files:
- Modify: `.github/workflows/ci.yml`

Add a startup benchmark job that:
- Builds the Forge binary in release mode.
- Builds `libforge_lang.a` explicitly.
- Sets `FORGE_LIB_DIR` to the directory containing `libforge_lang.a`.
- Runs the startup measurement harness.

Keep this report-only for now:
- The job should fail if the harness does not compile/run or any measured child fails/times out.
- It should not fail because the measured value is above 10ms yet.

Rationale: shared CI runners are noisy; the first step is a trend signal.

### U3. Budget Documentation

Files:
- Create: `docs/performance/startup.md` or update an existing performance doc if one exists
- Modify: `CHANGELOG.md`

Document:
- Measurement modes and what each means.
- Why `<10ms` applies to standalone/native startup, not `forge run`.
- Current status: report-only startup harness exists; hard gate follows after optimization.
- Future hard-gate proposal: native startup p50/p95 budget once stable baseline is known.
- CI explicitly builds and measures the standalone native path; native modes must not be silently skipped.

### U4. Local Developer Command

Files:
- Optional create: `scripts/measure_startup.sh`

Add a script only if it materially improves developer ergonomics by wrapping the Rust harness with the right release-build and `FORGE_LIB_DIR` setup. Avoid duplicating measurement logic between shell and Rust.

## Risks

- Process startup benchmarks are noisy on GitHub-hosted runners.
- Harness setup must not accidentally measure build time.
- Native source-runtime binaries embed the interpreter and may not get close to `<10ms`; if so, the next item may require a bytecode/native runner fast path rather than optimizing the source-runtime path.
- Launcher-mode native binaries must be labeled separately from standalone source-runtime binaries; the roadmap target cares about standalone.
- Without storing historical baselines, CI output is visibility-only; this PR should not pretend to provide trend analysis yet.
- The native measurements require a working C compiler (`cc`) and static library; CI must install/use the available platform toolchain explicitly.

## Verification

- `cargo fmt -- --check`
- `cargo test`
- `cargo clippy --all-targets -- -A clippy::approx_constant -A clippy::result_large_err -A clippy::only_used_in_recursion -A clippy::len_zero`
- The new startup measurement command/harness
- Existing Forge integration tests remain green.

## Success Criteria

- Developers can run one command to see startup timings for source, bytecode, and available native modes.
- CI exposes startup timing regressions as benchmark output.
- The roadmap item remains unchecked, with a clear next optimization target based on measured data.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added

- **Standalone source-runtime native binaries for Forge servers** — `forge build --native` now links against `libforge_lang.a` when available and emits a single executable that embeds Forge source and starts interpreter-only runtime features like `@server` without shelling out to the `forge` CLI. `--aot` remains bytecode/VM-only and continues to reject decorator-driven servers with guidance to use `--native`.
- **Startup time measurement harness** — `tools/startup_time.rs` measures source, bytecode, native source-runtime, and bytecode AOT process startup with correctness checks. CI runs it as a report-only signal before the `<10ms` native startup target becomes a hard gate.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add the PR reference link to this changelog entry.

This entry is missing the required ([#PR](link)) suffix.

Suggested fix
-- **Startup time measurement harness** — `tools/startup_time.rs` measures source, bytecode, native source-runtime, and bytecode AOT process startup with correctness checks. CI runs it as a report-only signal before the `<10ms` native startup target becomes a hard gate.
+- **Startup time measurement harness** — `tools/startup_time.rs` measures source, bytecode, native source-runtime, and bytecode AOT process startup with correctness checks. CI runs it as a report-only signal before the `<10ms` native startup target becomes a hard gate. ([`#149`](https://github.com/humancto/forge-lang/pull/149))
As per coding guidelines, `CHANGELOG.md` entries must use the format `- Description of change ([`#PR`](link))` under the `[Unreleased]` section.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- **Startup time measurement harness**`tools/startup_time.rs` measures source, bytecode, native source-runtime, and bytecode AOT process startup with correctness checks. CI runs it as a report-only signal before the `<10ms` native startup target becomes a hard gate.
- **Startup time measurement harness**`tools/startup_time.rs` measures source, bytecode, native source-runtime, and bytecode AOT process startup with correctness checks. CI runs it as a report-only signal before the `<10ms` native startup target becomes a hard gate. ([`#149`](https://github.com/humancto/forge-lang/pull/149))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG.md` at line 13, Update the CHANGELOG entry for "Startup time
measurement harness" by appending the required PR reference suffix in the format
" ([`#PR`](link))" so it reads "- **Startup time measurement harness** —
`tools/startup_time.rs` … correctness checks. CI runs it as a report-only signal
before the `<10ms` native startup target becomes a hard gate. ([`#PR`](link))";
replace "PR" and "link" with the actual pull request number and its GitHub URL
and ensure the entry remains under the [Unreleased] section.

- **Structured concurrency with `squad` blocks** — `squad { spawn { } spawn { } }` runs tasks concurrently with automatic join, cooperative cancellation on failure, and error propagation. Returns an array of results in spawn order. Works in both interpreter and VM engines.
- **First-class `Set` type** — `set([1, 2, 3])` or `set((1, 2, 3))` builds a deduplicated set. Methods: `.has(x)`, `.add(x)`, `.remove(x)`, `.union(other)`, `.intersect(other)`, `.diff(other)`, `.to_array()`. Supports `len()`, `contains()`, iteration, order-independent equality, and is truthy when non-empty. Works across interpreter, VM, bytecode round-trip, and JIT.
- **First-class `Map` type** — `map([("a", 1), ("b", 2)])` or `map()` builds an ordered key/value map with any-type keys. Methods: `.get(k)`, `.set(k, v)`, `.has(k)`, `.remove(k)`, `.keys()`, `.values()`, `.len()`, `.to_array()`. Insertion order is preserved on overwrite. Key equality uses container semantics (int/float collision, NaN self-match). Supports `for k, v in m` iteration (which also unlocks `for k, v in obj` parity for plain objects under the VM), `len()`, `contains()`, order-independent equality, and is truthy when non-empty. `json.stringify` emits JSON objects for maps with string keys and errors on non-string keys. Works across interpreter, VM, bytecode round-trip, and JIT.
Expand Down
32 changes: 32 additions & 0 deletions docs/performance/startup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Startup Time Measurement

Forge's roadmap target of `<10ms` startup applies to standalone/native execution paths, not to `forge run app.fg`.

`forge run app.fg` intentionally does more work: starts the CLI, reads source, lexes, parses, typechecks, initializes the runtime, and executes. Native and bytecode paths can skip parts of that work and are the realistic target for sub-10ms startup.

## Harness

Startup timing is measured by `tools/startup_time.rs`, a small Rust process-level harness. It measures wall time from parent process spawn to child process exit and verifies each child prints `ok`.

The harness measures:

- `startup.source_run`: `forge run hello.fg`
- `startup.bytecode_run`: `forge run hello.fgc`
- `startup.native_source_runtime`: standalone source-runtime binary from `forge build --native`
- `startup.aot_bytecode`: standalone bytecode binary from `forge build --aot`

The native modes require `FORGE_LIB_DIR` to point at a directory containing `libforge_lang.a`.

## Local Run

```bash
cargo build --release --lib --bin forge
rustc tools/startup_time.rs -O -o target/startup_time
FORGE_LIB_DIR=target/release ./target/startup_time --forge ./target/release/forge --warmups 2 --reps 20
```

## CI Status

CI runs this harness as report-only. The job fails if the harness fails to compile, if fixture builds fail, if any child process exits unsuccessfully, or if output is wrong. It does not yet fail because startup is above 10ms.

The hard `<10ms` gate should be added after we have stable baseline data and an optimization PR that actually reaches the native startup target.
Loading
Loading