Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
c5dc997
PymoniK rewrite
AncientPatata Apr 26, 2026
1d24103
fix: improvements to otel
AncientPatata May 28, 2026
77fbe30
docs: added multi-result example, removed stale comments
AncientPatata May 28, 2026
2a0a38b
fix: swapped worker logger to json for seq
AncientPatata May 28, 2026
4022b46
feat: MVP of replay in the CLI
AncientPatata May 28, 2026
428950b
added task name to options
AncientPatata Jun 4, 2026
0b110d7
docs: fixed various documentation pages
AncientPatata Jun 4, 2026
296f1a1
feat: result-reuse cache + lazy future materialization
AncientPatata Jun 5, 2026
8ad25fb
docs: updated documentation based on recent changes
AncientPatata Jun 5, 2026
7d0ee7e
feat: added hooks for observability-needing extensions
AncientPatata Jun 5, 2026
f0bd538
feat(hooks): added label support for multiresults in pymonik hooks
AncientPatata Jun 15, 2026
d09b29a
chore: remove CLI from ad/rewrite (moved to ad/rewrite-extras)
AncientPatata Jun 15, 2026
a664472
fix(data): auto-spilled params result in unresolved references to fut…
AncientPatata Jun 15, 2026
3904ebc
feat(dx): implicit context detection in task parameters
AncientPatata Jun 16, 2026
38e900e
fix(data): result timestamp preservation optional
AncientPatata Jun 16, 2026
e5832e1
fix(local): task failing on local cluster only failed one result for …
AncientPatata Jun 16, 2026
f4565a8
feat(worker): improvement to subprocess isolation
AncientPatata Jun 16, 2026
ac39531
fix(client): fixed filters for result queries
AncientPatata Jun 16, 2026
91ca2cf
chore: added tests for recent features and changes
AncientPatata Jun 16, 2026
91fbc81
feat(client): major improvements to querying
AncientPatata Jun 16, 2026
7363647
feat(dx): refactored interface for submitting, settling and downloadi…
AncientPatata Jun 18, 2026
0962734
fix(image): added workaround for uv dynamic versioning in image
AncientPatata Jun 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,9 @@

# -- Options to show "Edit on GitHub" button ---------------------------------
html_context = {
"display_github": True, # Integrate GitHub
"github_user": "aneoconsulting", # Username
"github_repo": "ArmoniK.Api", # Repo name
"github_version": "main", # Version
"conf_py_path": "/.docs/", # Path in the checkout to the docs root
"display_github": True,
"github_user": "aneoconsulting",
"github_repo": "PymoniK",
"github_version": "main",
"conf_py_path": "/.docs/",
}
25 changes: 9 additions & 16 deletions .docs/development/contribution.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,14 @@
# Contributing

This doesn't differ from our other projects ([Read ArmoniK.CLI's CONTRIBUTING.md](https://github.com/aneoconsulting/ArmoniK.CLI/blob/main/CONTRIBUTING.md)).
Thank you for considering a contribution. PymoniK is a small,
opinionated SDK and we'd like to keep it that way — but there's
plenty to do, and outside perspectives are welcome.

## Open Issues
For repo-wide conventions, see ANEO's
[contribution guidelines on ArmoniK.CLI](https://github.com/aneoconsulting/ArmoniK.CLI/blob/main/CONTRIBUTING.md);
PymoniK follows the same shape.

Here's a non-exhaustive list of things that are outright/partially missing from PymoniK that we'd like to see added in:
## Before you start

- **Unit tests, end-to-end tests** : This project doesn't have any testing associated to it, 0% code coverage. We'd like to change this.
- **More sophisticated examples** : We'd like to add even more examples and tutorials of common use cases that use ArmoniK under the hood.
- **PymoniK logger** : The session created/closed/cancelled prints and the different errors should be logged instead of being printed.
- **Local PymoniK** : Once of the big advantages of the way things have been coded is being able to switch from/to a remote/local context by just removing the `invoke` methods. This could be done even better, by adding a `local=True` flag to PymoniK that makes it so invokes are executed as regular function calls that run locally. The challenge is mainly handling `Pymonik.put`s and the map_invoke.
- **Rename `ResultHandle` to `ObjectHandle`** : The naming of `ResultHandle` was choosen because invocations return results, but as it turns out, these results are also then served as inputs. It'd be better for naming (especially since you can put things onto ArmoniK) to rename `ResultHandle` to the more generic `ObjectHandle`.
- **Cleaner sub-tasking** : Subtasking requires the user to pass in a `delegate=True` flag to invokes, this isn't particularly clean or nice. There must be a better way of doing it.
- **Tasks returning multiple results**: As of right now, a task can only return a single result object (even when you return a tuple). There should be support for cases where you'd like to return multiple results from a task and not have them be grouped up into one (multiple smaller objects being passed onto multiple tasks). We think this change would be easier to implement once sub-tasking is in place, since it also involves analyzing what the user will return. There should be pre-execution tests to check if the user is returning a different number of results in different branches of the task and that should result in a failure (invalid task).
- **`results.as_completed`** : For more sophisticated and better time-to-execute, we'd like to implement a method for `MultiResultHandle` that allows the user to for instance loop through a `MultiResultHandle` and have the code execute as the result is done/retrieved. Moreover, as a side-effect, having this feature would allow for the usage of `tqdm` to create progress bars which is also really nice.
- **Intermediate Objects** : Support being able to create/download ArmoniK Objects (Intermediate results) within tasks. The download part would require `GetDirectData` in the Python API.
- **Remote to local error propagation** : Supply an additional "error name" to created tasks, when a result creation fails, we create this result, locally we can retrieve the remote stack trace using `my_result.error()` if we try-except, either that are we enrich the local result failure grpc exception with the remote one.
- **Test JIT-ing** : jitting tasks using namba/taichi if they're pure for additional performance. (Should just test if it'd work as intended..)
- **More sophisticated result deserialization**: Right now it's just first-level depickling, it'd be nice to be able to pass in a dict that has ResultHandle values for example and be able to dynamically fetch those but that'd add a lot of complexity to data dependencies that'd need to be handled. There has to be a nice way of doing this and it's worth exploring
- **PymoniK Visualizer**: With the current implementation of invoke/map_invoke, we can make it so you can surround your PymoniK context with a Grapher context that dynamically builds up a visualization of your task graph that you can then save and look at/analyze/vizualize later.
- For non-trivial changes, open an issue to discuss the approach
before writing code. Saves rounds.
115 changes: 101 additions & 14 deletions .docs/development/development.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,118 @@
# Developing PymoniK

We'll be covering some basic information to help you in working on and developing PymoniK
This page covers what you need to know to work *on* PymoniK itself —
not on top of it.

## Requirements
## Prerequisites

We're using `uv` throughout the project, so please make sure that you have it installed. You can refer to their [official `uv` installation guide](https://docs.astral.sh/uv/getting-started/installation/)
- Python 3.11. PymoniK pins to 3.11 because cloudpickle isn't
cross-minor-compatible with the worker image — tests and
`LocalCluster` need to match what the worker runs.
- [`uv`](https://docs.astral.sh/uv/) for project management.
- Docker, if you'll touch worker images or run integration tests
against a real cluster.

## Test client
## Layout

The test client contains some basic examples of working with PymoniK, PymoniK is installed in editable mode `uv add ../pymonik --editable`, it's useful to just create a python file there for testing and then `uv run`ning it to quickly iterate on PymoniK. Keep in mind that if you make changes that affect how the worker functions (obviously like making a change to the `worker.py` file), you'll have to reload the worker image. You can do this by running the following command:
```
src/pymonik/ # the package
__init__.py # public API re-exports
client.py # PymonikClient
session.py # Session + completion loops
task.py # @task decorator, Task wrapper
future.py # Future, FutureList
options.py # TaskOpts, merge semantics
envelope.py # wire format (msgspec)
blob.py # Blob, Materialize
worker.py # pymonik-worker entrypoint
worker_session.py # from-inside-a-worker submission
context.py # pymonik.current() / WorkerContext
errors.py # PymonikError hierarchy
composition.py # gather, as_completed
testing/ # LocalCluster
cli/ # pymonik CLI (click)
_internal/ # not part of the public API
submit.py # shared submission pipeline
refs.py # FutureRef / BlobRef / MaterializeRef
env_builder.py # uv venv + flock for runtime deps
subprocess_dispatch.py # deps + isolate=True path
task_runner.py # subprocess child entrypoint
exec_cache.py # local result cache
query.py # fluent introspection
info.py # TaskInfo / ResultInfo / ...
channel.py # gRPC channel helpers
_otel.py # OpenTelemetry helper
_logging.py # opt-in structlog setup
examples/ # runnable examples (also CI-gated)
tests/ # pytest suite
.docs/ # this documentation (Sphinx + MyST)
worker-image/ # Dockerfile for the harmonic_snake worker
```

The `_internal/` prefix marks code that may change without notice.
Anything re-exported from `pymonik/__init__.py` is part of the public
API and follows semver-ish rules within the alpha.

```bash
kubectl rollout restart deployment/compute-plane-pymonik #(1) -n armonik #(2)
## Running tests

```sh
uv sync # one-time install
uv run pytest # everything
uv run pytest -m "not slow" # skip slow integration tests (no `uv` install)
uv run pytest tests/test_otel.py -v # one file
```

1. This should be compute-plane-(NAME OF YOUR PYMONIK PARTITION).
2. If you're deploying locally the namespace is typically armonik, otherwise use the namespace of your kubernetes cluster
The test suite is divided:

- **Fast tests** (~110) — pure unit tests, no network, no `uv venv`
builds. Run in seconds. These are what CI runs on every push.
- **Slow tests** marked `@pytest.mark.slow` — exercise the runtime
deps path with a real `uv` install. Need `uv` on `PATH`. Skip on
Windows (the subprocess wire is POSIX-only for now).

The `LocalCluster` exercises the same submission pipeline as the
real client, so most behavioural tests don't need a cluster. Only
tests that depend on cluster-side behaviour (partition scheduling,
events stream over the network) need a live ArmoniK; mark those
`@pytest.mark.e2e` and run them separately when you have a deploy.

## Type checking

## Automation Script (`automation.py`)
```sh
uv run ty check src/pymonik
```

The `automation.py` script at the root of the project should help you realize most of your development tasks, it also auto-installs development dependencies.
New code should be fully annotated; private helpers may skip
annotations when obvious.

For example, if you want to access the documentation offline of if you're working on it (thank you!) then you can use the `serve-docs` command.
`ty` currently reports a number of diagnostics, most of them from
upstream typing (anyio's threading helpers, the `armonik` client's
signatures) rather than PymoniK bugs. Typing here is gradual — start
permissive and ratchet up as modules stabilise; tighten rule
severities under `[tool.ty.rules]` in `pyproject.toml` when you want
to enforce more.

To see a list of all available commands and their general descriptions, you can run:
## Linting and formatting

```sh
uv run ruff check
uv run ruff format
```
uv run automation.py --help

Ruff replaces black + flake8 + isort. Configuration is in
`pyproject.toml` under `[tool.ruff]`.

## Working against a cluster

If you're touching code that affects the worker (anything in
`worker.py` or `_internal/`), you'll need to rebuild the image and
restart the partition:

```sh
docker build -t my-org/harmonic_snake:dev worker-image/
docker push my-org/harmonic_snake:dev # or load directly into your kind cluster
kubectl rollout restart deployment/compute-plane-pymonik -n armonik
```

For client-only changes, just `uv sync` (or run with the editable
install) and re-run your client script — no image rebuild needed.
105 changes: 103 additions & 2 deletions .docs/examples/monte_carlo.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,104 @@
# Distributed Monte Carlo for PI Estimation
# Monte Carlo: estimating π

This page hasn't been written yet, but the example code for it is in `test_client/estimate_pi.py`.
A short example that hits every basic primitive: a worker function,
`map` for fan-out, `spawn` for fan-in.

The full source lives at `examples/estimate_pi.py`.

## The idea

Monte Carlo estimation of π: throw N random points in the unit square;
the fraction inside the unit quarter-circle is approximately π/4. The
more points, the better the estimate.

It's embarrassingly parallel — every chunk of points is independent,
and the only fan-in is a sum. Perfect for ArmoniK.

## The code

```python
import random
from pymonik import PymonikClient, task


@task
def count_inside(n: int, seed: int) -> int:
"""How many of `n` random points fall inside the unit quarter circle?"""
rng = random.Random(seed)
inside = 0
for _ in range(n):
x = rng.random()
y = rng.random()
if x * x + y * y <= 1.0:
inside += 1
return inside


@task
def estimate(total_inside: int, total_points: int) -> float:
"""Combine the per-shard counts into a single π estimate."""
return 4.0 * total_inside / total_points


@task
def add_all(xs: list[int]) -> int:
return sum(xs)


def run(total_points: int = 10_000_000, num_tasks: int = 32) -> float:
points_per_task = total_points // num_tasks

with PymonikClient() as client:
with client.session(partition="pymonik") as s:
shards = count_inside.starmap(
(points_per_task, i) for i in range(num_tasks)
)
total_inside = add_all.spawn(shards)
pi = estimate.spawn(total_inside, num_tasks * points_per_task)
return pi.result(timeout=120)


if __name__ == "__main__":
print(f"π ≈ {run()}")
```

## What's happening

- `count_inside.starmap(...)` submits 32 tasks in one gRPC call. Each
one runs in parallel on whatever workers ArmoniK schedules. Returns
a `FutureList[int]`. We use `starmap` because we already have arg
tuples; if the per-task args were single values, `count_inside.map(iter)`
would be cleaner.
- `add_all.spawn(shards)` passes the `FutureList` directly. PymoniK
rewrites each upstream future as a data dependency — `add_all` won't
run until every `count_inside` has finished. The client doesn't
block.
- `estimate.spawn(total_inside, ...)` chains again: `estimate` waits
for `add_all` via the same mechanism.
- Only `pi.result(timeout=120)` blocks. By the time the client wakes
up, the entire DAG (32 + 1 + 1 = 34 tasks) has run.

## Tweaking

- **More accuracy?** Bigger `total_points`. Each shard is independent,
so increasing the count just makes the leaves heavier.
- **More parallelism?** Bigger `num_tasks`. Each shard is small enough
that the overhead of submission per task starts to matter at a few
hundred — you'll see diminishing returns.
- **Reproducible?** Drop the seed indirection and pass a fixed seed.
The worker uses Python's stdlib `random`, which is deterministic
given a seed.

## Variations worth trying

1. Replace `count_inside` with one that uses `numpy.random` for
speed — declare `client.session(deps=["numpy"])` to make numpy
available without rebuilding the image. See
[Runtime environment](../guides/runtime-environment.md).
2. Use the local exec cache to skip re-running shards:
`PymonikClient(cache=True)` + `@task(cache=True)` on
`count_inside`. Re-running the script with the same args returns
instantly on the second invocation.
3. Use `LocalCluster` to run the whole thing in-process for a unit
test — no cluster needed. See
[Local testing](../guides/local-testing.md).
Loading