Skip to content

ftvision/simple-orchestrator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

simple-orchestrator

A from-scratch re-implementation of Docker's core architecture using plain OS processes instead of containers. The goal is to understand how Docker actually works by building the same pieces — daemon, CLI, Unix socket, state machine, shim process — without any of the namespace or cgroup machinery.

No isolation. No images. Just processes that behave like containers.


What it implements

Docker concept This project
dockerd HTTP API over Unix socket orchestratord FastAPI daemon
docker CLI orchestrate Typer CLI
Container state machine State enum with transition guards
docker attach / WebSocket stdio WebSocket bidirectional stream
docker exec new subprocess in the container's cwd
docker logs --follow async condition-variable fan-out
containerd-shim keeping process alive across daemon restarts per-container shim.py process
PR_SET_CHILD_SUBREAPER shim adopts orphaned grandchildren

Architecture

orchestrate (CLI)
      │  HTTP + WebSocket  over  /tmp/orchestrator.sock
      ▼
orchestratord (daemon)
      │  spawns one shim per container
      ▼
shim.py  (one per container, outlives daemon restarts)
      │  spawns and waits on the managed process
      ▼
container process (your "docker run" target)

The daemon is the API server. It creates containers, routes requests, and tails log files, but it is intentionally stateless with respect to running processes — all durable state lives in shim directories on disk.

The shim is the durable layer. It owns the container's stdio, writes an append-only log file, listens on a Unix socket for stdin forwarding, and writes an exit-code file when the process terminates. Because the shim is a separate process, a daemon crash or restart does not kill or disconnect the running container. The daemon re-reads shim directories on startup and reconnects.


Repository layout

simple-orchestrator/
├── orchestrator/               # async rewrite (FastAPI + Typer + websockets)
│   ├── orchestrate.py          # CLI entry point
│   ├── shared.py               # socket path, State enum, ID generator
│   ├── orchestratord/
│   │   ├── __main__.py         # daemon entry point (uvicorn)
│   │   ├── routes.py           # FastAPI app, all HTTP + WebSocket handlers
│   │   ├── models.py           # Container dataclass, out(), resolve()
│   │   ├── process.py          # launch(), persist_meta(), tail_log()
│   │   ├── lifecycle.py        # lifespan, recovery on startup, reaper
│   │   └── shim.py             # per-container shim process
│   ├── docs/
│   │   └── shim-process.md     # detailed write-up of the shim design
│   └── tests/                  # bash integration test suite
│
└── orchestrator-stdlib/        # earlier version using only Python stdlib
    ├── orchestrate.py          # CLI (argparse + http.client over Unix socket)
    ├── orchestratord.py        # daemon (http.server + threading)
    └── docs/
        ├── daemon.md           # how the threading daemon works
        └── unix_socket.md      # how Unix socket HTTP is wired up

The orchestrator-stdlib/ version intentionally uses zero external dependencies to make the Unix socket and HTTP plumbing visible. The orchestrator/ version replaces the hand-rolled pieces with FastAPI, Typer, httpx, and websockets.


Shim state directory

Each container gets a directory under /tmp/orchestrator-shims/<id>/:

shim.pid    PID of the shim process (liveness sentinel)
pid         PID of the container process
meta.json   {id, name, cmd, cwd, created_at, started_at}
log         append-only stdout + stderr of the container
exit        exit code, written atomically on container termination
shim.sock   Unix socket for stdin forwarding

On daemon restart, the daemon scans this directory, checks whether each shim is alive with kill(shim_pid, 0), and reconstructs in-memory state from meta.json and the presence or absence of exit.


Running

cd orchestrator
pip install -e .          # or: uv sync

# terminal 1 — start the daemon
orchestratord --socket /tmp/orch.sock

# terminal 2 — use the CLI
export ORCHESTRATOR_SOCKET=/tmp/orch.sock
orchestrate run --name hello -- echo "hello world"
orchestrate ps -a
orchestrate logs hello
orchestrate run --name shell -- bash
orchestrate attach shell          # interactive stdin/stdout
orchestrate exec shell -- ps aux  # run a command inside

Tests

cd orchestrator
bash tests/run_all.sh

The suite is bash integration tests that start a real daemon, exercise the CLI, and assert on state, log output, and process liveness. Suites cover:

  • test_lifecycle — create / run / start / stop / rm / restart
  • test_logs — ring buffer cap, log follow (streaming)
  • test_exec — stdin EOF protocol, cwd inheritance
  • test_pause_unpause — SIGSTOP / SIGCONT, output freezes
  • test_process_groups — killpg reaches grandchildren
  • test_grandchild_lifecycle — escaped-PGID grandchildren are cleaned up via subreaper
  • test_recovery — CREATED and RUNNING containers survive a daemon SIGKILL
  • test_errors — correct HTTP error codes for every invalid transition

Key design decisions

Unix socket, not TCP. The socket file acts as a capability — only processes with filesystem access to /tmp/orchestrator.sock can talk to the daemon. No port, no binding to a network interface.

Shim process, not pipes. Holding a subprocess.PIPE to a container ties the container's lifetime to the daemon's. A shim process decouples them: the container keeps running when the daemon is restarted, and the daemon can reconnect to an already-running container by re-reading the state directory.

PR_SET_CHILD_SUBREAPER on the shim. If a container forks children that move to their own process group (escaping killpg), those children would normally orphan to init when the container exits. With the shim as subreaper, they re-parent to the shim instead, which kills and reaps them before exiting.

WebSocket for attach/exec, not HTTP upgrade. The stdlib version uses a raw HTTP 101 Switching Protocols upgrade. The async version uses FastAPI's native WebSocket support, which is cleaner and handles the stdin half-close problem (WebSocket has no SHUT_WR, so a {"stdin_eof": true} sentinel is sent instead).

File-based state, not SQLite. The shim's state directory IS the persistent state. Writing a secondary index in SQLite would create a sync problem. The only gap is CREATED-state containers (never started, so no shim), which are handled by writing meta.json at create time — the same approach Docker uses with config.v2.json.

About

A from-scratch re-implementation of Docker's core architecture using plain OS processes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors