intentee · mcharytoniuk · Jun 4, 2026 · May 27, 2026 · May 28, 2026 · May 28, 2026
diff --git a/.claude/.gitignore b/.claude/.gitignore
@@ -1,2 +1,3 @@
 *.lock
+/plans/
 /worktrees/
@@ -0,0 +1,11 @@
+---
+paths:
+  - "paddler_agent/**"
+---
+
+# Paddler Agent Context
+
+- `paddler_agent` is the only crate that can rely on `llama-cpp-bindings`
+- agent is the only crate responsible for instantiating llama.cpp back-end, and communicating with it
+- no crate can depend directly on `paddler_agent` (besides `paddler_bootstrap`, and other test related crates), they need to use `paddler_messaging` instead
+
@@ -0,0 +1,10 @@
+---
+paths:
+  - "paddler_balancer/**"
+---
+
+# Paddler Balancer Context
+
+- `paddler_balancer` crate is responsible for starting inference, and management servers
+- Paddler Agents connect to the balancer in order to handle the requests that the balancer dispatches
+- `paddler_balancer` provides compatibility services that expose vendor-compatible APIs (for example OpenAI compatibility server)
@@ -0,0 +1,11 @@
+---
+paths:
+  - "paddler_bootstrap/**"
+---
+
+# Paddler Bootstrap Context
+
+- `paddler_bootstrap` is used both by `paddler_cli`, `paddler_tests`, and `paddler_gui`
+- `paddler_bootstrap` combines both `paddler_agent`, and `paddler_balancer`, and provides a unified entry point
+- `paddler_bootstrap` is the canonical way to start both core paddler services (balancer, and agent)
+- `paddler_bootstrap` is the source of truth on how to start Paddler services
@@ -0,0 +1,11 @@
+---
+paths:
+  - "paddler_cache_dir/**"
+---
+
+# Paddler Cache Directory Context
+
+- `paddler_cache_dir` is a root Paddler crate, it must not depend on any other Paddler crate
+- `paddler_cache_dir` manages Paddler's global cache directory, and all its nuances
+- `paddler_cache_dir` resolves OS-related differences internally
+- `paddler_cache_dir` must use cache directory patterns idiomatic to the specific operating system
@@ -0,0 +1,10 @@
+---
+paths:
+  - "paddler_cli/**"
+---
+
+# Paddler CLI Context
+
+- `paddler_cli` is intended to combine both `paddler_agent`, and `paddler_balancer` through `paddler_bootstrap`
+- `paddler_cli` is intended to provide a command line Paddler binary, to be used in server infrastructure
+- `paddler_cli` is intended to encapsulate both balancer, and agent in a single binary, for the sake of ease of deployment
@@ -0,0 +1,12 @@
+---
+paths:
+  - "paddler_client_javascript/**"
+---
+
+# Paddler Client Context
+
+- `paddler_client_javascript` provides JavaScript client that connects to `paddler_balancer`
+- It must provide a way to connect to only, specifically balancer's inference address (without the need to connect to management service at the same time)
+- It must provide a way to connect to only, specifically balancer's management address (without the need to connect to inference service at the same time)
+- Paddler client must support all Paddler's native endpoints
+- It must not implement OpenAI compatibility client
@@ -0,0 +1,12 @@
+---
+paths:
+  - "paddler_client_python/**"
+---
+
+# Paddler Client Context
+
+- `paddler_client_python` provides JavaScript client that connects to `paddler_balancer`
+- It must provide a way to connect to only, specifically balancer's inference address (without the need to connect to management service at the same time)
+- It must provide a way to connect to only, specifically balancer's management address (without the need to connect to inference service at the same time)
+- Paddler client must support all Paddler's native endpoints
+- It must not implement OpenAI compatibility client
@@ -0,0 +1,12 @@
+---
+paths:
+  - "paddler_client/**"
+---
+
+# Paddler Client Context
+
+- `paddler_client` provides Rust client that connects to `paddler_balancer`
+- It must provide a way to connect to only, specifically balancer's inference address (without the need to connect to management service at the same time)
+- It must provide a way to connect to only, specifically balancer's management address (without the need to connect to inference service at the same time)
+- Paddler client must support all Paddler's native endpoints
+- It must not implement OpenAI compatibility client
@@ -0,0 +1,12 @@
+---
+paths:
+  - "paddler_download_manager/**"
+---
+
+# Paddler Download Manager Context
+
+- `paddler_download_manager` is a root Paddler crate, it must not depend on any other Paddler crate
+- `paddler_download_manager` is responsible for downloading GGUF models from HTTP URLs
+- `paddler_download_manager` must be resilient, it must support resumes, handle cache corruptions
+- `paddler_download_manager` must not do retries, because it is intended to be used by `paddler_agent`, and `paddler_agent` already has a built-in retry mechanism
+- `paddler_download_manager` must focus only on Paddler internal use-cases related to downloading models
@@ -0,0 +1,10 @@
+---
+paths:
+  - "paddler_messaging/**"
+---
+
+# Paddler Messaging Context
+
+- `paddler_messaging` must contain only messaing protocol between `paddler_agent`, and `paddler_balancer`
+- `paddler_messaging` must only be a thin messaging, and validation layer between `paddler_agent`, and `paddler_balancer`
+- `paddler_messaging` is intended to be used in `paddler_client`, and must not pull heavy dependencies like `llama-cpp-bindings`
@@ -0,0 +1,13 @@
+---
+paths:
+  - "paddler_openai_response_format_validator/**"
+---
+
+# Paddler OpenAI Response Format Validator Context
+
+- `paddler_openai_response_format_validator` is intended to ONLY be used in test crates, unit tests, and such
+- `paddler_openai_response_format_validator` is only intended to be used to validate Paddler's OpenAI compatibility endpoints
+- `paddler_openai_response_format_validator` must NOT be used on runtime; it must ONLY be used in tests, unit tests, integration tests
+- `paddler_openai_response_format_validator` must directly use vendored, official OpenAI schema to build its validation setup
+- `paddler_openai_response_format_validator` must make the official OpenAI schema stricture, to make sure Paddler does not introduce extra fields to the requests
+- `paddler_openai_response_format_validator` must make the official OpenAI schema stricture, to make sure Paddler does not accept unsupported fields
@@ -0,0 +1,9 @@
+---
+paths:
+  - "paddler_state_conversion/**"
+---
+
+# Paddler State Conversion Context
+
+- `paddler_state_conversion` provides traits and utilities responsible for converting `paddler_agent`'s desired state to applicable state
+- `paddler_state_conversion` provides traits and utilities responsible for converting `paddler_agent`'s applicable state to desired state
@@ -0,0 +1,12 @@
+---
+paths:
+  - "paddler_test_cluster_harness/**"
+---
+
+# Paddler Test Cluster Harness Context
+
+- `paddler_test_cluster_harness` provides common test harness to be used with `paddler_cli_tests`, and `paddler_tests`
+
+# OpenAI Compatibility Testing
+
+- To stay objective, we must not implement our own OpenAI client, instead we need to use a vetted 3rd party (preferably official OpenAI API client)
@@ -0,0 +1,9 @@
+---
+paths:
+  - "paddler_tests/**"
+---
+
+# Paddler Tests Context
+
+- `paddler_tests` contains Paddler integration tests
+- `paddler_tests` contain utilities related to the integration tests
@@ -24,7 +24,7 @@ echo "Device: $DEVICE"
 
 `$DEVICE` selects the Rust integration suite variant in Step 2. The other four suites don't take a device feature.
 
-## Step 2: run the five suites
+## Step 2: run the suites
 
 Copy this checklist and tick each item as the suite completes:
 
@@ -33,16 +33,14 @@ Copy this checklist and tick each item as the suite completes:
 - [ ] Python client
 - [ ] Rust unit
 - [ ] Rust integration
-- [ ] paddler_gui
 ```
 
 | # | Suite            | Inner command                                                                                                                     | Working dir              |
 |---|------------------|-----------------------------------------------------------------------------------------------------------------------------------|--------------------------|
 | 1 | JS client        | `make test.client.js`                                                                                                             | repo root                |
 | 2 | Python client    | NixOS: `poetry run pytest`, `ruff`, `poetry run mypy"`. Every other OS: `poetry run pytest`, `poetry run ruff`, `poetry run mypy` | `paddler_client_python/` |
-| 3 | Rust unit        | `make test.unit`                                                                                                                  | repo root                |
-| 4 | Rust integration | `make test.integration` (cpu) / `make test.integration.cuda` / `make test.integration.metal` — pick by `$DEVICE`                  | repo root                |
-| 5 | paddler_gui      | `cargo test -p paddler_gui --features web_admin_panel`                                                                            | `paddler_gui/`           |
+| 3 | Rust unit        | `TEST_DEVICE=$DEVICE make test.unit`                                                                                              | repo root                |
+| 4 | Rust integration | `TEST_DEVICE=$DEVICE make test.integration`                                                                                       | repo root                |
 
 Run them in this order. Cheap suites (1, 3, 4) surface bugs quickly; the heavy GPU-bound suites (2, 5) come last.
 

@@ -0,0 +1,40 @@
+---
+name: running-coverage
+description: Runs every test suite in the paddler workspace on the fastest available device, and produces code coverage report. Use when the user asks to run the code coverage, or to check the coverage.
+---
+
+# Running the code coverage
+
+Run every test suite in the workspace, picking the fastest compiled device backend for the host, then report the workspace code coverage. 
+
+## Step 1: detect the device
+
+Run this once at the start and echo the chosen device:
+
+```bash
+if [[ "$OSTYPE" == "darwin"* ]]; then
+  DEVICE=metal
+elif command -v nvidia-smi >/dev/null 2>&1 && nvidia-smi >/dev/null 2>&1; then
+  DEVICE=cuda
+else
+  DEVICE=cpu
+fi
+echo "Device: $DEVICE"
+```
+
+`$DEVICE` selects the Rust integration suite variant in Step 2. The other four suites don't take a device feature.
+
+## Step 2: run the code coverage
+
+Copy this checklist and tick each item as the suite completes:
+
+`TEST_DEVICE=$DEVICE make test.coverage`
+
+## Step 3: rules during the run
+
+- **Serialize GPU suites.** When `$DEVICE` is `cuda` or `metal`, run test suites sequentially to avoid device contention.
+- **Per-test 30 s budget.** Flag any individual test that exceeds 30 s wall-clock. That is a real bug — production or test — not flakiness.
+
+## Step 4: report
+
+After the coverage suite finishes, sum up the results in an actionable report.
diff --git a/.github/workflows/clippy.yml b/.github/workflows/clippy.yml
@@ -24,6 +24,8 @@ jobs:
     steps:
       - name: checkout code
         uses: actions/checkout@v4
+        with:
+          submodules: true
 
       - uses: Swatinem/rust-cache@v2
 

diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -24,6 +24,8 @@ jobs:
     steps:
       - name: checkout code
         uses: actions/checkout@v4
+        with:
+          submodules: true
 
       - uses: Swatinem/rust-cache@v2
 

diff --git a/.gitmodules b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "vendor/openai/openai-openapi"]
+	path = vendor/openai/openai-openapi
+	url = https://github.com/openai/openai-openapi.git