A Python 3 interpreter written in Elixir, designed as an execution substrate for agent loops. The interpreter is a pure function on the BEAM. Capabilities are values you pass in.
Pyex.run!("sorted([3, 1, 2])")
# => [1, 2, 3]Pyex exists for one shape of problem: running Python written by, or on behalf of, a language model — including the loop logic itself. Tool calls, planners, controllers, retries, evaluation harnesses. The kind of code an agent emits to act on the world, and the kind of code that decides what the agent does next.
The design constraint is that running this code should be a function call. Not a request to a sandbox service, not a cold container, not a serialized round-trip to a worker pool. Capabilities the program can use — files, network, database, app-specific tools — should be values the host hands in, not endpoints behind an RPC.
That constraint matters most when the agent is non-interactive. When no human is reviewing each step, the trust boundary is the only thing between the model and the host system. Pyex is built so the boundary is small, statically checkable, and the same shape as the ordinary BEAM process boundary you already operate.
The shape rules a lot of things in and out:
- It rules out a microVM-class isolation boundary. Firecracker or gVisor isolates more strongly than a tree-walking interpreter in the same address space. Pyex is not a replacement for that layer when adversarial isolation is the requirement.
- It rules in latency, statefulness, and orchestration ergonomics. A step is a function call; an agent loop keeps its filesystem and in-memory state across calls; a generator is a continuation, not a process. Tools are Elixir functions you reference from Python.
- It rules out being a CPython replacement. Pyex implements the subset of Python that an agent or LLM tends to produce — not scipy, not C extensions, not the long tail of CPython internals.
Compute speed is roughly 10-100× slower than CPython for pure CPU work. For agent loops dominated by tool I/O, JSON shaping, prompt assembly, and routing, the interpreter is not the bottleneck.
A loop where the controller is Python, the tools are Elixir, and the sandbox is a function call:
tools = %{
"search" => {:builtin, fn [q] -> MyApp.Search.run(q) end},
"fetch" => {:builtin, fn [url] -> MyApp.HTTP.get(url) end},
"remember" => {:builtin, fn [k, v] -> MyApp.KV.put(k, v) end}
}
agent_loop = """
import json
from agent import call_model, tools
state = {"steps": []}
for _ in range(10):
decision = call_model(state)
if decision["action"] == "stop":
break
result = tools[decision["tool"]](*decision["args"])
state["steps"].append({"tool": decision["tool"], "result": result})
print(json.dumps(state))
"""
{:ok, _value, ctx} = Pyex.run(agent_loop,
modules: %{"agent" => %{"call_model" => {:builtin, &call_model/1},
"tools" => tools}},
limits: [timeout: 30_000, max_memory_bytes: 50_000_000])
Pyex.output(ctx)The Python program never reaches a Python runtime. It reaches the
Elixir interpreter, which dispatches tools["fetch"](url) to the
Elixir function you registered. There is no IPC, no marshalling
across a process boundary, and no path from the Python source to an
OS process.
{:ok, ast} = Pyex.compile(source)
{:ok, value, ctx} = Pyex.run(source_or_ast, opts)
value = Pyex.run!(source_or_ast, opts)
output = Pyex.output(ctx)Everything the program can see flows through opts. Files, env vars,
network, database, custom modules — explicit, capability-shaped, deny
by default.
Pyex.run(source,
filesystem: Pyex.Filesystem.Memory.new(%{"data.json" => json}),
env: %{"API_KEY" => key},
modules: %{"db" => %{"query" => {:builtin, &my_query/1}}},
network: [%{allowed_url_prefix: "https://api.example.com/"}],
limits: [timeout: 5_000, max_memory_bytes: 50_000_000])Pyex is a tree-walking interpreter, not an eval. Python source
never reaches a Python runtime; it reaches a function written in
Elixir that decides what each AST node means. That makes the threat
model unusually simple to reason about.
- No host filesystem.
open()reads and writes thePyex.Filesystembackend you pass in (Memory,S3, or your own). Without a backend, file I/O fails closed. - No subprocess, shell, or
os.exec. Not implemented. There is no path from Python source to a host process. - No native code. No
ctypes, no C extension loading, nocompile()of source-to-bytecode. Python'sexec()andeval()re-enter the Pyex interpreter — they cannot escape it. - Network is allowlisted. Denied by default. When configured, matched by URL prefix and HTTP method, with optional header injection so credentials never appear in the Python source.
- I/O capabilities are explicit. SQL, S3, and other I/O are
guarded by named capabilities (
sql: true,boto3: true). A program that imports a gated module without the capability fails closed. - Compute time excludes I/O latency. The compute budget is the Python interpreter's own work. Time spent inside an HTTP call or a SQL query doesn't drain it. An agent waiting on a slow tool doesn't get killed for it; an agent running an infinite loop does.
- Resource ceilings are enforced. Step count, estimated memory, output bytes, and call depth are checked at every step boundary.
- Errors are structured.
%Pyex.Error{kind: :timeout | :python | :syntax | :limit | ...}so callers route on failure mode without string matching. The Python-side exception hierarchy mirrors CPython's tree, soexcept OSErrorcatchesFileNotFoundErrorexactly the way an agent author would expect.
The above guarantees rest on the library not calling host primitives
it shouldn't. Pyex enforces this with a custom static analyzer
(Pyex.BannedCallTracer) that walks compiled BEAM abstract code on
every CI run and fails the build if any module under lib/pyex
references:
File,:file,Port,Node(filesystem, ports, remote nodes)Process,Agent,GenServer,Supervisor,Task(process creation and supervised state)System.cmd,System.shell,:os.cmd,:erlang.open_port(OS process spawning)System.get_env,System.put_env(host env leakage):erlang.spawn,:erlang.spawn_link,:erlang.spawn_monitor:erlang.get,:erlang.put(process dictionary)
A short, justified allowlist exists for Process.sleep/1 (so
time.sleep actually blocks), Task.async/2/yield/shutdown (regex
timeout), GenServer.stop/1 (sql connection teardown), and
:os.system_time/1 (wall clock). The analyzer also resolves
apply(File, :read, [path]) when the args are literal atoms.
This means the sandbox guarantees aren't a code-review promise. They're a CI gate on the compiled artifact.
Pyex has not been through a third-party security audit. Treat it as a hardened library, not as adversarial isolation equivalent to a container or microVM. If your threat model is a sophisticated attacker actively trying to escape, Pyex belongs inside a stronger isolation layer, not in place of one.
The hard parts of Python, implemented to match CPython semantics:
- Faithful object model. Heap-based references with aliasing
(
b = a; b.val = 99⇒a.val == 99). Intrusive linked lists work. C3 linearization for MRO with cached lookups. Data descriptors with__get__/__set__.__slots__enforcement. Subclassing built-in types (list,dict,str,int) via a__wrapped__pattern soclass MyList(list)round-trips through iteration,len,isinstance, and method dispatch.super()in multi-inheritance trees with the correct MRO. - Generators as continuations.
yield,yield from, generatorsend(), two-way communication, lazy iteration. Generators suspend through tagged continuation frames so an agent step can be paused and resumed without owning a process. async/awaitas cooperative coroutines.async defproduces a coroutine;awaitis yield-from over the inner iterator, so yields propagate up to the surrounding trampoline (asyncio.run,asyncio.gather, or anotherawait). Observable interleaving matches CPython:gather(step("A"), step("B"))over coroutines thatawait asyncio.sleep(0)between mutations produces ABABAB.asyncio.create_taskis lazy — the body runs when the Task is awaited, withTask.result()/.done()/.cancel()/.exception(). Nestedasyncio.runraisesRuntimeError. Async list comprehensions ([x async for x in g()]) parse and run.awaiton a non-awaitable raises CPython-shaped TypeError. Async generators ride the same lazy-iterator machinery sync generators use, so FastAPI streaming patterns work unchanged.- Exception fidelity. The full CPython exception hierarchy
(
BaseException→Exception→OSError→FileNotFoundError, etc.).try/except/finally/else, exception groups,raise from, traceback chaining.isinstance(e, OSError)resolves through the tree exactly as CPython does. - Modern syntax.
match/casewith class, sequence, and mapping patterns. Walrus operator. Type annotations (parsed, ignored at runtime, like CPython). F-strings with format specs.*args/**kwargs, keyword-only parameters, decorators, comprehensions, context managers. - Dict and set semantics. Custom
__eq__/__hash__resolves correctly as a dict key. Insertion order is preserved as in CPython 3.7+. - Decimal arithmetic that passes 5,073 of the IBM
dectestconformance vectors. Skipped vectors are subnormal, payload, and non-modelled signal cases, documented at the test site.
Standard library, implemented in Elixir to match CPython semantics:
abc datetime html pathlib sql
asyncio decimal hmac pydantic statistics
base64 enum io pygments string
bisect fastapi itertools random sys
boto3 fnmatch jinja2 re textwrap
collections functools json requests time
contextlib glob markdown secrets typing
copy hashlib math shutil unittest
crypto heapq operator urllib uuid
csv yaml
dataclasses zipfile / zoneinfo
pandas is partial. pydantic does BaseModel, Field, and
type coercion. fastapi is a list-based implementation of the
route-decorator subset, with streaming generators.
Pyex also serves FastAPI directly, without a server process. This is useful for agent-emitted handlers and for traditional LLM-generated webapps:
import fastapi
app = fastapi.FastAPI()
@app.get("/hello/{name}")
def hello(name):
return {"message": f"hello {name}"}{:ok, app} = Pyex.Lambda.boot(source)
{:ok, resp, app} = Pyex.Lambda.handle(app, %{method: "GET", path: "/hello/world"})Boot once, handle many requests. State threads through —
filesystem mutations persist across calls, exactly as they would on
a long-lived server. Streaming responses use generator
continuations driven by Stream.resource, so chunks are produced
lazily without spawning processes.
Measured on Apple Silicon (M-series, OTP 28, 1000-iteration samples), wall-clock end-to-end including lex + parse + interpret:
| Workload | p50 | p99 |
|---|---|---|
| FizzBuzz (100 iterations) | 182 µs | 238 µs |
| Algorithms (~150 LOC: sieve + sort + fib + stats) | 1.67 ms | 2.04 ms |
| FastAPI cold boot | 221 µs | 302 µs |
| FastAPI route — list + Jinja2 render | 108 µs | 166 µs |
| FastAPI route — markdown + Jinja2 render | 140 µs | 202 µs |
| FastAPI route — 404 | 9 µs | 19 µs |
Pre-compiled AST execution skips lex + parse and saves 59 µs on
FizzBuzz, 236 µs on the algorithms suite. Reproduce with
mix run bench/readme_bench.exs.
For comparison, a CPython container cold start is on the order of seconds. A Pyex tenant boot is on the order of microseconds.
Pyex is the runtime behind production webapps and is the substrate the author is using for non-interactive agent research. It is not a general drop-in for CPython, and it has not been independently audited.
What gives the project confidence:
-
Differential fuzzing against CPython. Hundreds of properties generate random Python programs across arithmetic, strings, collections, control flow, classes, generators, comprehensions,
match/case, exceptions, and context managers. Each program is run through Pyex and CPython; outputs and exception types must match exactly. This is the suite that catches the bugs no human would write. -
CPython conformance suite. Hundreds of hand-written snippets executed through both interpreters; canonical
reproutput is compared byte-for-byte. A separate exception-conformance file verifies that when Pyex raisesTypeError, CPython does too. -
Whole-program fixtures. A growing set of complete programs recorded against CPython and replayed in CI, including programs that combine generators, file I/O, regex, classes, and stdlib.
-
IBM
dectestvectors. Thedecimalmodule passes 5,073 IBM standard-arithmetic test vectors. Skipped vectors are subnormal, payload, and non-modelled signal cases. -
Property-based invariants. Properties assert Pyex never crashes on random input — valid Python programs, malformed bytes, random tokens. Bad input must produce a structured error, never an Elixir exception.
-
Statically-proven escape boundary.
Pyex.BannedCallTracerwalks the compiled BEAM artifact every CI run and fails the build if any banned host primitive is referenced. See the sandbox section above. -
Real workloads as tests. End-to-end tests run a portfolio rebalancer, a DCF model, a Stripe-shaped webhook handler, an SSR blog, a Tsiolkovsky rocket-equation simulator, and a multi-tenant scaling benchmark for 100K hypothetical tenants — programs sized and shaped like the actual distribution.
-
Static analysis. Dialyzer is clean. Every public function has
@spec. CI runs Elixir 1.19 / OTP 27+28 with warnings as errors.
mix test # full suite
mix dialyzer # static typesPyex emits :telemetry events at the lifecycle boundaries that
matter:
[:pyex, :run, :start | :stop | :exception]for every program[:pyex, :request, :start | :stop]for every HTTP request issued by sandboxed code (after the network policy approves it)[:pyex, :query, :start | :stop]for every SQL query issued by sandboxed code
Pyex.Trace.attach() collects these into a span tree for
debugging. Pyex.Lambda.handle/2 returns per-request telemetry
(compute time, total time, file ops, event count) inline on the
response.
Multi-tenant operation is a design property, not an extension. A
booted FastAPI app is a struct (%{routes, env, ctx}); a tenant is
a value. There are no per-tenant processes or pools to size,
because the runtime doesn't own state on the tenant's behalf — the
caller does. Tenants serialize, migrate, and run concurrently
under the BEAM scheduler the same way any other value does.
Source ──► Pyex.Lexer ──► Pyex.Parser ──► Pyex.Interpreter
│
Pyex.Ctx
(filesystem, env, modules,
limits, network, capabilities,
heap, iterators)
The interpreter is (ast, env, ctx) -> (value, env, ctx). No
processes, no message passing, no global state, no throw/catch
for control flow. Generators yield through tagged continuation
frames so a generator can be suspended, serialized in principle,
and resumed lazily.
The interpreter itself is decomposed into 22 submodules under
lib/pyex/interpreter/ — assignments, binary ops, calls, class
lookup, control flow, dunder protocols, exceptions, format,
imports, iteration, match, statements — each a small file with one
responsibility. The pure-functional core is what makes the static
analyzer's job tractable.
This shape is deliberate. The library does not own a runtime; the host application does. Pyex is a value you compute with.
MIT