Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 69 additions & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -129,9 +129,71 @@ jobs:
df -h
docker system df || true

python-wheel:
name: Python wheel
needs: build
runs-on: ubuntu-24.04
timeout-minutes: 30
permissions:
contents: read

steps:
- name: Checkout
uses: actions/checkout@v6

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"

- name: Download runtime artifacts
uses: actions/download-artifact@v8
with:
name: boomslang-runtime-${{ github.sha }}
path: dist

- name: Install runtime resources
run: |
set -euo pipefail
mkdir -p stage
tar -xzf "dist/boomslang-runtime-${GITHUB_SHA}.tar.gz" -C stage
mkdir -p core/src/main/resources/python
cp -R stage/python/bin stage/python/usr core/src/main/resources/python/

- name: Stage runtime into Python package
run: ./scripts/stage-python-runtime.sh

- name: Resolve wheel version
run: |
if [[ "$GITHUB_REF_TYPE" == "tag" && "$GITHUB_REF_NAME" == v* ]]; then
echo "BOOMSLANG_WHEEL_VERSION=${GITHUB_REF_NAME#v}" >> "$GITHUB_ENV"
else
echo "BOOMSLANG_WHEEL_VERSION=0.0.0+g${GITHUB_SHA::12}" >> "$GITHUB_ENV"
fi

- name: Build wheel
run: |
pip install build
./scripts/build-python-wheel.sh

- name: Test installed wheel
run: |
python -m venv /tmp/wheel-venv
/tmp/wheel-venv/bin/pip install boomslang-py/dist/*.whl pytest
cd /tmp
/tmp/wheel-venv/bin/pytest "$GITHUB_WORKSPACE/boomslang-py/tests"

- name: Upload wheel artifact
uses: actions/upload-artifact@v7
with:
name: boomslang-wheel-${{ github.sha }}
path: boomslang-py/dist/*.whl
if-no-files-found: error
retention-days: 90

release:
name: Publish runtime release
needs: build
needs: [build, python-wheel]
if: |
github.event_name == 'push' &&
(
Expand All @@ -151,6 +213,12 @@ jobs:
name: boomslang-runtime-${{ github.sha }}
path: dist

- name: Download Python wheel
uses: actions/download-artifact@v8
with:
name: boomslang-wheel-${{ github.sha }}
path: dist

- name: Publish GitHub release
env:
GH_TOKEN: ${{ github.token }}
Expand Down
23 changes: 23 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,9 +78,32 @@ mvn compile -pl core
mvn test -pl tests
```

### Python package (boomslang-py)

`boomslang-py/` is a Python host: a wheel bundling the WASM runtime, executed
with wasmtime-py. Published as a GitHub release asset by CI (not PyPI).

```bash
just python-stage # copy runtime resources + overlay into the package (needs fetch-main-wasm or resources first)
just python-test # staged resources + venv + pytest
just python-wheel # build dist/boomslang-<version>-py3-none-any.whl
```

Key constraint: the guest libc's preopen table is baked into the Wizer
snapshot and binds host preopens **positionally** — the guest-path strings
passed to the WASI config are ignored, and mount points beyond the baked
table are unreachable. The baked table differs across runtime builds
(wasi-libc version dependent): current builds bake a single `/` entry (the
host provides one root dir shaped like the guest fs — same contract as the
Java host's rootPath), while older builds baked one entry per wizer-fs
subdir (`/usr`, `/lib`, `/work`, `/tmp`) in image-specific order. The Python
host probes the layout at runtime (`boomslang-py/src/boomslang/_layout.py`)
instead of assuming either.

## Project Structure

- `core/` — Java runtime (PythonExecutorFactory, PythonInstance, CopyOnWriteMemory)
- `boomslang-py/` — Python host package (Sandbox API, wheel bundling the WASM runtime)
- `python-host/` — Rust WASM host (PyO3 wrapper around CPython)
- `cpython/` — All native WASM build infrastructure:
- `cpython-wasi/` — CPython → WASM build pipeline
Expand Down
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,26 @@ The extension ABI is not tied to Java. An extension crate declares its contract
| Host language | Status | Runtime | Host adapter support |
| --- | --- | --- | --- |
| Java | Primary host | Chicory | Stock runtime API, `HostBridge`, generated Java adapters with `--java-out` or `emit_java_host(...)` |
| Python | Supported host package | Wasmtime (wasmtime-py) | `boomslang-py/` wheel bundling the runtime; `Sandbox` API with host functions; see `boomslang-py/README.md` |
| Rust | Supported example host | Wasmtime | Generated Rust adapters with `--rust-host-out` or `emit_rust_host(...)`; see `examples/rust-host/` |
| Other languages | ABI target only | Any WASM runtime with compatible imports | Use the ABI JSON to implement the same pointer/length lowering and return-buffer protocol |

The Maven artifact is still Java-first and includes the bundled runtime. Rust hosting is there for embedders that want to run the same Boomslang WASM from a Rust process.

## Python host usage

The `boomslang-py/` package lets regular Python programs run sandboxed Python: it bundles the same WASM runtime and executes it with wasmtime. Wheels are attached to GitHub releases (not PyPI).

```python
from boomslang import Sandbox

with Sandbox() as sandbox:
result = sandbox.execute("print('hello from the sandbox')")
print(result.stdout)
```

See `boomslang-py/README.md` for resource limits, host functions, and the guest filesystem layout. Local build: `just fetch-main-wasm && just python-test`; wheel: `just python-wheel`.

## Java host usage

Use the default artifact for the bundled Python runtime:
Expand Down Expand Up @@ -274,6 +289,8 @@ Common local loops:
just fetch-main-wasm # download latest main runtime resources from GitHub release assets
just build # package with AOT, skips tests
just test # tests module
just python-test # Python package test suite (stages runtime resources first)
just python-wheel # build the Python wheel
mvn compile -pl core
mvn test -pl tests
```
Expand Down Expand Up @@ -325,6 +342,7 @@ just test
## Repo map

- `core/`: Java runtime API and bundled Python resources
- `boomslang-py/`: Python host package (wheel bundling the WASM runtime)
- `tests/`: integration tests
- `benchmarks/`: JMH benchmarks
- `python-host/`: stock Rust WASM host
Expand Down
4 changes: 4 additions & 0 deletions boomslang-py/.blazar.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# The Python wheel is built and published by GitHub Actions (see
# .github/workflows/build.yml), not Blazar. Without this, Blazar auto-detects
# pyproject.toml as a Python module and fails to build it.
disabled: true
5 changes: 5 additions & 0 deletions boomslang-py/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
src/boomslang/_runtime/
dist/
.venv/
*.egg-info/
__pycache__/
167 changes: 167 additions & 0 deletions boomslang-py/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
# boomslang (Python)

Run sandboxed Python code from Python. This package bundles boomslang's
CPython 3.14 runtime compiled to WebAssembly (with numpy, pandas, pydantic,
matplotlib, Pillow, and ijson preloaded) and executes it with
[wasmtime](https://pypi.org/project/wasmtime/). Guest code has no network
access and can only touch the directories you mount.

## Install

Wheels are published as GitHub release assets (not PyPI):

```bash
pip install https://github.com/HubSpot/boomslang/releases/download/<tag>/boomslang-<version>-py3-none-any.whl
```

From a source checkout: `just fetch-main-wasm && just python-stage`, then
`pip install -e boomslang-py`.

## Quickstart

```python
from boomslang import Sandbox

with Sandbox() as sandbox:
result = sandbox.execute("print('hello from the sandbox')")
print(result.stdout) # hello from the sandbox
print(result.exit_code) # 0
```

Interpreter state persists across `execute()` calls on the same sandbox;
`sandbox.reset()` restores the pristine interpreter image (files under the
work dir persist). Python errors in guest code don't raise on the host — they
surface as `exit_code != 0` with the traceback in `result.stderr`.

## Resource limits

```python
from boomslang import ResourceLimits, Sandbox

sandbox = Sandbox(limits=ResourceLimits(
timeout=10.0, # seconds, default 120
max_memory_bytes=512 * 1024 * 1024, # default: wasm32 4 GiB cap
max_output_bytes=1024 * 1024, # per stream, default 10 MiB
))
```

A timeout raises `PythonTimeoutError` and poisons the sandbox; call
`reset()` to revive it. `max_memory_bytes` must exceed the baseline runtime
image (~150 MB) or instantiation fails.

## Filesystem

The guest filesystem layout is fixed by the runtime image (the guest libc's
preopen table is baked in at build time):

| Guest path | Host side | Access |
|------------|------------------------------------|------------|
| `/usr` | bundled runtime + stdlib | read-only |
| `/lib` | `lib_dir=` (on the guest sys.path) | read-write |
| `/work` | `work_dir=` | read-write |
| `/tmp` | managed per-sandbox temp dir | read-write |

`work_dir` and `lib_dir` default to managed temporary directories
(`sandbox.work_dir` / `sandbox.lib_dir` expose the host paths). Arbitrary
additional mount points are not supported — share files through `/work`, and
make extra pure-Python libraries importable by placing them in `lib_dir`.

The guest's mount table is frozen into the runtime image at build time, so
the sandbox probes it once per process and adapts. Depending on the image,
user-supplied `work_dir`/`lib_dir` are either mounted directly or emulated
by syncing files (hardlinks where possible) into and out of the guest around
each execution — semantics are the same either way: files present before an
execution are visible to the guest, and guest-created files appear on the
host after it.

## Stdin

```python
sandbox.set_stdin("Ada\n")
sandbox.execute("print('hello', input())")
```

Mirroring the Java host, stdin is consumed by the next execution and then
cleared — call `set_stdin()` before each execution that needs it. Without it,
`input()` raises `EOFError`.

## Host functions

Guest code can call back into your process through the bundled
`boomslang_host` bridge. Arguments and results cross the boundary as JSON.
Results larger than the bridge's native 1 MiB buffer are transparently
fetched back in chunks, so there is no practical size cap.

```python
sandbox = Sandbox()

@sandbox.host_function("lookup_user")
def lookup_user(args):
return {"id": args["id"], "name": "Ada"}

result = sandbox.execute("""
import json
from boomslang_host import call
user = json.loads(call("lookup_user", json.dumps({"id": 7})))
print(user["name"])
""")
```

For full control pass `call_handler=lambda name, args_json: ...` (raw JSON
strings in and out), and `on_log=lambda level, message: ...` to receive
`boomslang_host.log()` output (default: forwarded to the `boomslang.guest`
logger).

### Async host functions

Async handlers run on a host thread pool, so guest coroutines can overlap
slow host work (I/O, RPCs) via the bundled `boomslang_host.asyncio` event
loop (the same wire protocol as the Java `AsyncHostRegistry`):

```python
sandbox = Sandbox()

@sandbox.async_host_function("fetch")
def fetch(args): # runs on a host worker thread
return {"id": args["id"], "name": "Ada"}

result = sandbox.execute("""
import asyncio, json
from boomslang_host.asyncio import async_call

async def main():
a, b = await asyncio.gather(
async_call("fetch", json.dumps({"id": 1})),
async_call("fetch", json.dumps({"id": 2})),
)
print(json.loads(a)["name"], json.loads(b)["name"])

asyncio.run(main())
""")
```

The execute timeout still applies while the guest is awaiting.

## Bytecode and function calls

`compile()` produces bytecode you can cache and re-run (also in other
sandboxes), skipping repeated parsing; `execute_function()` calls a function
defined in the guest's `__main__` with a JSON array of positional arguments:

```python
bytecode = sandbox.compile("def add(a, b):\n print(a + b)")
sandbox.load_bytecode(bytecode)
sandbox.execute_function("add", "[2, 40]") # prints 42
```

## Performance notes

- The first `Sandbox()` ever created on a machine compiles the ~100 MB WASM
module (seconds to a couple of minutes depending on hardware). The compiled
module is cached on disk by wasmtime, so subsequent processes start in
under a second.
- Each sandbox materializes its own copy of the runtime's linear memory
(hundreds of MB). Reuse sandboxes (with `reset()`) where isolation
requirements allow.
- `pip install --no-compile` skips byte-compiling the bundled stdlib tree,
which the guest never reads anyway.
28 changes: 28 additions & 0 deletions boomslang-py/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "boomslang"
dynamic = ["version"]
description = "Sandboxed CPython 3.14 execution via WebAssembly (wasmtime)"
readme = "README.md"
requires-python = ">=3.10"
dependencies = ["wasmtime>=36"]

[project.optional-dependencies]
dev = ["pytest>=8"]

[tool.hatch.version]
path = "src/boomslang/_version.py"

[tool.hatch.build.targets.wheel]
packages = ["src/boomslang"]
# The staged runtime assets are gitignored; force their inclusion in the wheel.
artifacts = ["src/boomslang/_runtime/**"]

[tool.hatch.build.targets.sdist]
exclude = ["src/boomslang/_runtime"]

[tool.pytest.ini_options]
testpaths = ["tests"]
Loading
Loading