Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,8 @@

# Reference

- [Base ABI Specification](reference/abi.md)
- [Extension ABI JSON & Lowering](reference/extension-abi.md)
- [Async Wire Protocol](reference/async-protocol.md)
- [hostgen DSL Reference](reference/hostgen-dsl.md)
- [Glossary](reference/glossary.md)
73 changes: 73 additions & 0 deletions docs/src/reference/abi.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Base ABI Specification

This page specifies the contract between a host and `boomslang.wasm`: the functions the guest exports and the conventions for calling them. It is the contract `PythonInstance` implements on the Java side, and what a non-Java embedder must implement directly. (Host functions the guest *imports* are covered by the [extension ABI](extension-abi.md).)

Source of truth: `python-host-core/src/export.rs` (guest) and `core/src/main/java/com/hubspot/boomslang/PythonInstance.java` (Java host).

> There is currently no ABI version export; compatibility between a host and a wasm artifact is by construction (build them from the same commit). A version handshake is tracked in [issue #43](https://github.com/HubSpot/boomslang/issues/43).

## Conventions

- The guest exports a single linear memory. All pointers are `i32` offsets into it.
- **The host owns buffer lifecycles.** Allocate guest buffers with `alloc`, write through the exported memory, pass `(ptr, len)` pairs, and free with `dealloc` after the call. The guest never frees host-allocated buffers, and the guest's internal allocations are not the host's concern.
- All strings are UTF-8. Passing invalid UTF-8 where a string is expected returns `-1`.
- Every execution-family export (`compile_source`, `load_bytecode`, `execute`, `execute_function`, `install_module`, `uninstall_module`) **clears the captured stdout/stderr buffers on entry**. Read outputs after each call, before the next one.
- Error reporting is two-channel: a coarse return code, plus the Python traceback captured in the stderr buffer. Detailed error strings only exist in stderr.

## Exports

| Export | Signature | Semantics |
| --- | --- | --- |
| `alloc` | `(size: i32) -> i32` | Allocate `size` bytes in guest memory (mimalloc); returns pointer. |
| `dealloc` | `(ptr: i32, size: i32)` | Free an `alloc`'d buffer. `size` is currently ignored but pass the allocated size. |
| `compile_source` | `(source_ptr: i32, source_len: i32, output_ptr: i32, output_max_len: i32) -> i32` | Compile Python source to marshal bytecode, written to the caller-provided output buffer. Returns the bytecode length, `-1` on invalid UTF-8 or compile error (traceback in stderr), `-3` if the bytecode exceeds `output_max_len`. |
| `load_bytecode` | `(ptr: i32, len: i32) -> i32` | Unmarshal and execute bytecode from `compile_source`. `0` ok; `1` Python error (traceback in stderr). |
| `execute` | `(script_ptr: i32, script_len: i32) -> i32` | Execute Python source in `__main__`. `0` ok; `1` Python error; `-1` invalid UTF-8. |
| `execute_function` | `(name_ptr: i32, name_len: i32, args_ptr: i32, args_len: i32) -> i32` | Call a named function from previously loaded code with one string argument (`args_len` 0 → empty string). `0` / `1` / `-1` as above. |
| `get_stdout_len` / `get_stderr_len` | `() -> i32` | Byte length of the captured stream. |
| `get_stdout` / `get_stderr` | `(ptr: i32, max_len: i32) -> i32` | Copy up to `max_len` bytes of the captured stream into the caller's buffer; returns bytes written. |
| `install_module` | `(name_ptr: i32, name_len: i32, source_ptr: i32, source_len: i32) -> i32` | Install a pure-Python module under `name` (dotted names allowed). `0` / `1` / `-1`. |
| `uninstall_module` | `(name_ptr: i32, name_len: i32) -> i32` | Remove an installed module. `0` / `1` / `-1`. |
| `reset_state` | `()` | Clear capture buffers and reset the `__main__` namespace. Note: the Java host does not call this — it resets by restoring the copy-on-write memory snapshot, which is stricter. |
| `get_heap_pages` | `() -> i32` | Current guest memory size in 64 KiB pages. Used by hosts to size snapshots. |

## Imports

A complete embedder must provide, on the same linker/instance:

1. **WASI preview1** — filesystem, clock, random, stdio.
2. **Extension imports** — the bundled runtime imports `boomslang.call` and `boomslang.log` ([extension ABI](extension-abi.md)); custom builds import whatever their extensions declare.

Instantiation fails on any missing import.

## Call sequences

**Execute a script and read output** (what `PythonInstance.execute` does):

```text
ptr = alloc(len(script)) # write script bytes at ptr
rc = execute(ptr, len(script)) # 0 ok, 1 python error, -1 bad utf-8
dealloc(ptr, len(script))
n = get_stdout_len()
buf = alloc(n); get_stdout(buf, n) # read n bytes from memory at buf
dealloc(buf, n) # same dance for stderr
```

**Compile once, run many** (`compile` / `loadCode`):

```text
out = alloc(MAX) # Java uses MAX = 10 MiB
n = compile_source(src, len, out, MAX) # n = bytecode length, or -1 / -3
bytecode = memory[out .. out+n]; dealloc(out, MAX)
...
ptr = alloc(len(bytecode)) # later, possibly many times
rc = load_bytecode(ptr, len(bytecode)) # 0 / 1
```

The bytecode is CPython marshal data — valid only for the exact runtime build that produced it.

## Known sharp edges

- `-1` is overloaded: it means both "invalid UTF-8 input" and "Python-level failure" for `compile_source`. Disambiguate via stderr.
- There is no structured error channel; hosts surface failures by pairing the return code with the captured stderr.
- Output larger than the host's configured cap (Java default 10 MB) is rejected host-side, not guest-side.
26 changes: 26 additions & 0 deletions docs/src/reference/async-protocol.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Async Wire Protocol (v1)

`boomslang_host.asyncio` (the Python client) and the host-side `AsyncHostRegistry` talk over a small, versioned protocol invoked through the stock `boomslang_host.call(name, args)` bridge. This page is the wire-level specification; usage is in the [async guide](../guide/async.md).

The `__async_*` names are a **reserved control namespace** — extension host functions may not use them (hostgen validation rejects them).

| Control call | Args | Returns |
|---|---|---|
| `__async_protocol__` | — | integer protocol version (currently `1`) |
| `__async_start__` | `name\npayload` | decimal token for a registered named async handler |
| `__async_poll__` | timeout ms (`<0` blocks, `0` polls) | one header line per ready completion: `token\t{1\|0}\t<valueByteLength>` |
| `__async_result__` | token | base64 of that completion's value bytes (consumes it) |
| `__async_cancel__` | token | cancels the in-flight future |

Typed async extension functions bypass `__async_start__`: their WASM import returns the `i64` token directly from the shared registry. Polling, result retrieval, and cancellation still flow through the control calls above.

## Design rationale

- **Versioned.** The Python client is frozen into each consumer's WASM Wizer snapshot, so the host must stay compatible with already-shipped clients. `__async_protocol__` lets a client refuse a host older than the protocol it was built against; bump `AsyncHostRegistry.PROTOCOL_VERSION` only for breaking wire changes.
- **Poll and result are decoupled.** `__async_poll__` returns only headers (token, ok flag, length); values are fetched one at a time via `__async_result__`. A batch of completions therefore never exceeds the single host-call result buffer. (A single value larger than that buffer is still a limitation — chunked retrieval is a future protocol addition.)
- **Failures never hang.** Synchronous handler errors are recorded via `AsyncHostRegistry.startFailed` and surface as a failed completion (the coroutine raises `HostAsyncError`); the client also rejects any non-positive token immediately.
- **Binary-safe value channel.** Completion values are carried as base64 of raw bytes, so extending async returns to `bytes` later needs no wire change.

## Implementations

The protocol is implemented by the Java `AsyncHostRegistry` (`core/`), the generated Rust host registry (hostgen's `rust_host.rs` template), and the Python client (`boomslang_host/asyncio.py`). They must agree byte-for-byte; consolidation of the duplicated implementations is tracked in [issue #45](https://github.com/HubSpot/boomslang/issues/45).
84 changes: 84 additions & 0 deletions docs/src/reference/extension-abi.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Extension ABI JSON & Lowering

An extension declares its host functions once, in `build.rs`, with the [hostgen DSL](hostgen-dsl.md). The build emits an **ABI JSON** file — the language-neutral contract from which host adapters (Java, Rust, or hand-written for any runtime) are generated.

## Schema

```json
{
"abi_version": 1,
"extension": {
"name": "boomslang_host",
"wasm_module": "boomslang",
"prewarm": ["_boomslang_host", "boomslang_host", "boomslang_host.asyncio"]
},
"functions": [
{
"name": "call",
"params": [
{ "name": "name", "type": "string" },
{ "name": "args", "type": "string" }
],
"returns": "string",
"async": false
},
{
"name": "log",
"params": [
{ "name": "level", "type": "int" },
{ "name": "message", "type": "string" }
],
"returns": null,
"async": false
}
]
}
```

| Field | Meaning |
| --- | --- |
| `abi_version` | Schema version. Generators require an **exact match** (currently `1`) and fail with a clear error otherwise. If omitted, defaults to `1`. |
| `extension.name` | Extension identifier. Drives generated names: Python module `<name>`, guest file `ext_<name>.rs`, Java class `<Name>HostFunctions`, Rust host file `host_<name>.rs`. |
| `extension.wasm_module` | The WASM **import module** the functions live under (e.g. import `boomslang.call`). Defaults to the extension name when omitted. |
| `extension.prewarm` | Python modules imported during Wizer initialization, frozen into the golden snapshot. |
| `functions[].name` | Function name; becomes the import name and the Python-visible function. |
| `functions[].params` | Ordered typed parameters. |
| `functions[].returns` | Return type or `null` for none. Async functions must return `string`. |
| `functions[].async` | Whether the function is an async host call (see below). |

Types are a closed enum: `string`, `int`, `float`, `bytes`. Unknown type values fail parsing.

## Lowering to WASM signatures

The ABI JSON decides the import signatures and memory protocol. For a function with declared params and return:

| Declared | Lowered |
| --- | --- |
| `string` / `bytes` param | `i32 ptr, i32 len` (UTF-8 bytes for strings) |
| `int` param | `i32` |
| `float` param | `f64` |
| `string` / `bytes` return | caller appends `i32 result_ptr, i32 result_max_len`; host writes the value into that buffer and returns the written byte length as `i32` |
| no return | `i32` status return |
| async function | returns an `i64` host token instead of a value (see the [async wire protocol](async-protocol.md)) |

So declared `call(name: string, args: string) -> string` becomes the import:

```text
boomslang.call(name_ptr: i32, name_len: i32,
args_ptr: i32, args_len: i32,
result_ptr: i32, result_max_len: i32) -> i32
```

**Result buffer protocol:** the guest allocates the result buffer (currently capped at 1 MiB per call) and passes it to the host. A negative return signals failure: `-1` for a handler error, `-2` when the value did not fit in `result_max_len`. The guest surfaces any negative return as a Python exception.

Behavioral note: on malformed pointers the generated Java host traps the instance, while the generated Rust host returns `-1`; aligning these is tracked in [issue #44](https://github.com/HubSpot/boomslang/issues/44).

## Generated artifacts

From one declaration, hostgen produces:

- **Rust guest** (`ext_<name>.rs`, included via `include!` into your extension crate): the `extern` imports, a Python module exposing typed functions, and `register()` / `prewarm()` hooks for `boomslang_host_core::init`.
- **Java host adapter** (`<Name>HostFunctions.java`): typed functional interfaces + a builder producing a `BoomslangExtension` for `PythonExecutorFactory.addExtension`.
- **Rust host adapter** (`host_<name>.rs`): a typed builder with `register(&mut wasmtime::Linker<_>)`.

Function names prefixed `__async_` are reserved for the async control namespace and rejected by validation.
88 changes: 88 additions & 0 deletions docs/src/reference/hostgen-dsl.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# hostgen DSL Reference

`boomslang-hostgen` is both a Rust library (used from an extension crate's `build.rs`) and a CLI. The library declares an extension and emits generated code + [ABI JSON](extension-abi.md); the CLI consumes ABI JSON and generates host adapters.

## Declaring an extension (`build.rs`)

```rust
use boomslang_hostgen::{Build, ExtensionSpec, Type};

fn main() {
let ext = ExtensionSpec::new("myext")
.wasm_module("myext")
.prewarm(["_myext"])
.function("do_thing", |f| {
f.param("input", Type::String).returns(Type::String)
})
.function("lookup", |f| {
f.r#async()
.param("request", Type::String)
.param("shard", Type::Int)
.returns(Type::String)
});

Build::new(ext).emit().generate().expect("generate myext");

println!("cargo:rerun-if-changed=build.rs");
}
```

### `ExtensionSpec`

| Method | Effect |
| --- | --- |
| `ExtensionSpec::new(name)` | Start a spec; `name` is the extension/Python module name. |
| `.wasm_module(module)` | WASM import module for the functions (defaults to the extension name). |
| `.prewarm([modules])` | Python modules to import during Wizer init (frozen into the snapshot). |
| `.function(name, \|f\| ...)` | Declare a host function via the closure. |

### `FunctionSpec` (inside the closure)

| Method | Effect |
| --- | --- |
| `.param(name, Type)` | Append a typed parameter (order matters). |
| `.returns(Type)` | Declare the return type (omit for none). |
| `.r#async()` | Mark as an async host call — Python awaits it; the host handler is asynchronous. Async functions must return `Type::String`. |

`Type` is `String`, `Int`, `Float`, or `Bytes`. See [lowering rules](extension-abi.md#lowering-to-wasm-signatures) for the WASM signatures these produce.

### `Build`

| Method | Output |
| --- | --- |
| `Build::new(spec)` | Start from a spec. |
| `.emit()` | Shorthand for `.emit_rust_guest().emit_abi_json()` — the standard build.rs setup. |
| `.emit_rust_guest()` | `$OUT_DIR/ext_<name>.rs` — guest code, consumed by `include!(concat!(env!("OUT_DIR"), "/ext_<name>.rs"))`. |
| `.emit_abi_json()` | `$OUT_DIR/<name>.abi.json`. |
| `.emit_abi_json_to(path)` | ABI JSON at a stable path of your choosing (recommended when other builds consume it — `$OUT_DIR` paths contain build fingerprints). |
| `.emit_java_host(out_dir, package)` | `<Name>HostFunctions.java` under `out_dir/<package path>/`. Prefer running the CLI after the build instead of writing into a source tree from `build.rs`. |
| `.emit_rust_host(out_dir)` | `host_<name>.rs` Wasmtime adapter. |
| `.generate()` | Validate the manifest and write everything requested. |

Validation enforces: exact `abi_version` match, identifier-safe names (no Java/Rust keywords), no reserved `__async_*` function names, and string returns for async functions.

## The CLI

```text
boomslang-hostgen <abi.json> [--java-out DIR [--java-package PKG]] [--rust-host-out DIR]
```

| Flag | Effect |
| --- | --- |
| `--java-out DIR` | Generate the Java host adapter into `DIR` (package subdirectories created). |
| `--java-package PKG` | Java package for generated code (default `com.hubspot.boomslang.extensions`). |
| `--rust-host-out DIR` | Generate the Rust Wasmtime host adapter into `DIR`. |

With no output flag the CLI validates the ABI JSON, then exits nonzero with `no output requested`.

From source: `cargo run --manifest-path boomslang-hostgen/Cargo.toml -- <args>`.

## Library entry points

For build tooling that wants codegen without the CLI:

- `read_abi(path) -> Manifest` — parse + validate an ABI JSON file.
- `generate_java(abi_path, out_dir, package)` — Java adapter from a file.
- `generate_rust_host(abi_path, out_dir)` — Rust host adapter from a file.

The serde-serializable `Manifest` / `Extension` / `Function` / `Param` / `Type` structs are public; the [ABI JSON schema](extension-abi.md#schema) is their serialized form.
31 changes: 2 additions & 29 deletions examples/custom-python-build/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,32 +138,5 @@ fn main() {

Generated async functions preserve the normal typed argument handling: the Java handler receives typed params and returns a `CompletionStage<String>`, registered alongside a shared `AsyncHostRegistry`. The full Java and Python usage is documented in the [async host calls guide](https://github.hubspot.com/boomslang/guide/async.html).

### Async wire protocol (v1)

`boomslang_host.asyncio` and `AsyncHostRegistry` talk over a small, versioned protocol invoked
through the stock `boomslang_host.call(name, args)` function. The `__async_*` names are a
**reserved control namespace** — do not define extension host functions with these names:

| Control call | Args | Returns |
|---|---|---|
| `__async_protocol__` | — | integer protocol version (currently `1`) |
| `__async_start__` | `name\npayload` | decimal token for a registered named async handler |
| `__async_poll__` | timeout ms (`<0` blocks, `0` polls) | one header line per ready completion: `token\t{1\|0}\t<valueByteLength>` |
| `__async_result__` | token | base64 of that completion's value bytes (consumes it) |
| `__async_cancel__` | token | cancels the in-flight future |

Why this shape matters:

- **Versioned.** The Python client is frozen into each consumer's WASM Wizer snapshot, so the Java
host must stay compatible with already-shipped clients. `__async_protocol__` lets a client refuse
a host older than the protocol it was built against; bump `AsyncHostRegistry.PROTOCOL_VERSION`
only for breaking wire changes.
- **Poll and result are decoupled.** `__async_poll__` returns only headers (token, ok, length);
values are fetched one at a time via `__async_result__`. A batch of completions therefore never
exceeds the single host-call result buffer. (A single value larger than that buffer is still a
limitation — chunked retrieval is a future protocol addition.)
- **Failures never hang.** Synchronous handler errors are recorded via `AsyncHostRegistry.startFailed`
and surface as a failed completion (the coroutine raises `HostAsyncError`); the client also rejects
any non-positive token immediately.
- **Binary-safe value channel.** Completion values are carried as base64 of raw bytes, so extending
async returns to `bytes` later needs no wire change.
The `__async_*` control calls and their framing are specified in the
[async wire protocol reference](https://github.hubspot.com/boomslang/reference/async-protocol.html).
8 changes: 1 addition & 7 deletions examples/rust-host/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,7 @@ cargo run --manifest-path boomslang-hostgen/Cargo.toml -- \

## What The ABI Drives

The ABI JSON decides the Wasmtime import signatures and memory lowering:

- `string` and `bytes` params lower to `i32 ptr, i32 len`.
- `int` params lower to `i32`.
- `float` params lower to `f64`.
- `string` and `bytes` returns use caller-provided `i32 result_ptr, i32 result_max_len` params and return the written byte length as `i32`.
- async functions return an `i64` host token.
The ABI JSON decides the Wasmtime import signatures and memory lowering — pointer/length pairs for strings and bytes, caller-provided result buffers for returns, `i64` tokens for async. The full rules are specified in the [extension ABI reference](https://github.hubspot.com/boomslang/reference/extension-abi.html).

The generated host binding is typed:

Expand Down
Loading