Toygres is a Rust-based control plane for hosting PostgreSQL containers as a service on Azure Kubernetes Service (AKS). It uses the Duroxide framework for durable workflow orchestration and PostgreSQL for metadata storage.
The project is organized as a Cargo workspace with the following crates:
toygres-models: Shared data structures (instance metadata, deployment config, health status)toygres-activities: Duroxide activities wrapping Azure/K8s operationstoygres-orchestrations: Duroxide orchestrations coordinating the activitiestoygres-server: Main control plane server exposing APIs and running the Duroxide worker
- Azure Kubernetes Service (AKS) cluster already provisioned
- PostgreSQL database for metadata storage
- Azure credentials configured (via environment variables or Azure CLI)
Create the following scripts to help with infrastructure setup:
scripts/setup-infra.sh: Terraform/Azure CLI script to provision AKS cluster, networking, storage classesscripts/db-init.sh: Applies the initial CMS migration and prepares the Duroxide schemascripts/db-migrate.sh: Runs incremental CMS migrations (none yet, but keeps the pattern consistent withduroxide-pg)
The control plane uses environment variables for configuration (see .env.example):
DATABASE_URL: Connection string for metadata PostgreSQL databaseAKS_CLUSTER_NAME: Name of the AKS clusterAKS_RESOURCE_GROUP: Azure resource group containing the AKS clusterAKS_NAMESPACE: Kubernetes namespace for PostgreSQL deployments (default:toygres)- Azure authentication via
DefaultAzureCredential
Implement Duroxide activities for atomic operations:
DeployPostgresActivity: Creates K8s resources (StatefulSet, PVC, Service) for a PostgreSQL podDeletePostgresActivity: Removes K8s resources for a PostgreSQL instanceGetInstanceStatusActivity: Queries K8s API for pod statusHealthCheckActivity: Connects to PostgreSQL instance and verifies it's responsiveUpdateMetadataActivity: Updates instance state in metadata databaseGenerateConnectionStringActivity: Builds connection string from K8s service endpoint
Technologies:
kube-rsfor Kubernetes operationssqlxfor database operations- Azure SDK for Rust for Azure-specific operations (if needed)
Implement durable orchestrations using the Duroxide framework:
Purpose: Create a new PostgreSQL instance
Flow:
- Call
DeployPostgresActivitywith name, credentials - Poll
GetInstanceStatusActivityuntil ready - Call
GenerateConnectionStringActivity - Call
UpdateMetadataActivitywith "running" state - Start detached
HealthCheckOrchestrationfor this instance - Store health check orchestration ID in metadata
- Return connection string
Input: DeploymentConfig (name, username, password, storage size, version)
Output: CreateInstanceResponse (instance_id, connection_string, orchestration_id)
Purpose: Delete an existing PostgreSQL instance
Flow:
- Call
UpdateMetadataActivitywith "deleting" state - Retrieve and cancel the health check orchestration ID from metadata
- Call
DeletePostgresActivity - Call
UpdateMetadataActivitywith "deleted" state
Input: Instance ID
Output: Success/failure status
Purpose: Continuous health monitoring for a single PostgreSQL instance
Flow:
- Input:
instance_id - Loop forever:
- Call
HealthCheckActivityfor the instance - Call
UpdateMetadataActivitywith health status - Wait 30 seconds
- Call
Lifecycle: Started by CreateInstanceOrchestration, cancelled by DeleteInstanceOrchestration
Purpose: Query the status of any orchestration
Flow:
- Query orchestration status from Duroxide
- Return current state (pending, running, completed, failed)
Input: Orchestration ID
Output: OperationStatus
Use the Duroxide cross-crate registry pattern to register orchestrations and activities across the workspace.
Build the main server binary with the following components:
- Load
.envfile usingdotenvy - Parse database connection string
- Validate Azure and AKS configuration
- Initialize
sqlxPostgreSQL pool for metadata DB - Run migrations on startup
- Initialize
duroxide-pgworker connecting to metadata DB - Register all orchestrations and activities
- Start worker loop
Expose the following endpoints using axum:
-
POST /instances→ StartCreateInstanceOrchestration- Body:
CreateInstanceRequest - Response:
CreateInstanceResponse
- Body:
-
DELETE /instances/{id}→ StartDeleteInstanceOrchestration- Response: Operation status
-
GET /instances→ List all from metadata DB- Response:
ListInstancesResponse
- Response:
-
GET /instances/{id}→ Get single instance details- Response:
InstanceMetadata
- Response:
-
GET /operations/{id}→ Monitor operation status- Response:
OperationStatus
- Response:
-
GET /health→ Health check endpoint for the control plane itself
Background service that ensures all running instances have active health check orchestrations. On startup:
- Query metadata DB for all instances in "running" state
- Verify they have health check orchestration IDs
- Start new health check orchestrations if missing
Key Rust crates to include:
duroxide- Core durable workflow frameworkduroxide-pg- PostgreSQL provider for Duroxide
kube- Kubernetes clientk8s-openapi- Kubernetes API types
azure_core- Azure SDK coreazure_identity- Azure authentication
sqlx- Async SQL with compile-time query checking- Features:
runtime-tokio,postgres,macros,migrate
axum- HTTP server frameworktower- Middlewaretower-http- HTTP middleware (tracing, CORS)
tokio- Async runtimeserde/serde_json- Serializationdotenvy- Environment variable loadingtracing/tracing-subscriber- Logginganyhow/thiserror- Error handlingchrono- Date/timeuuid- Unique identifiers
CREATE TYPE instance_state AS ENUM ('creating', 'running', 'deleting', 'deleted', 'failed');
CREATE TYPE health_status AS ENUM ('healthy', 'unhealthy', 'unknown');
CREATE TABLE instances (
id UUID PRIMARY KEY,
name VARCHAR(255) UNIQUE NOT NULL,
state instance_state NOT NULL,
health_status health_status NOT NULL DEFAULT 'unknown',
connection_string TEXT,
health_check_orchestration_id TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_instances_state ON instances(state);
CREATE INDEX idx_instances_health_status ON instances(health_status);- Test individual activities with mocked K8s/Azure clients
- Test data models and serialization
- Test configuration loading
- Test orchestrations using Duroxide in-memory provider
- Test activity coordination and error handling
- Test API endpoints with test server
- Test against local Kubernetes cluster (kind/minikube)
- Test full instance lifecycle (create → health checks → delete)
- Test failure scenarios and recovery
Philosophy: Build working code first, then add Duroxide complexity. Validate each layer before adding the next.
Goal: Get basic Kubernetes/PostgreSQL deployment working without Duroxide
Tasks:
- ✅ Initialize Cargo workspace with four crates
- ✅ Define data models in
toygres-models - ✅ Create database schema and migration scripts
- ✅ Create infrastructure bootstrap scripts
- Create
toygres-server/examples/manual_deploy.rsthat:- Connects to AKS cluster using kube-rs
- Creates PostgreSQL StatefulSet, PVC, and Service
- Waits for pod to be ready
- Extracts connection string from Service
- Tests connection to PostgreSQL
- Cleans up resources
Success Criteria: Can deploy and connect to a PostgreSQL instance in AKS using a simple Rust binary.
Why first? Proves core functionality works without framework complexity. Fast iteration on K8s configurations.
Goal: Refactor POC into reusable, testable functions
Tasks:
-
Create
toygres-server/src/k8s.rsmodule:create_postgres_resources(config) -> Result<()>delete_postgres_resources(name) -> Result<()>get_pod_status(name) -> Result<PodStatus>get_service_endpoint(name) -> Result<String>
-
Create
toygres-server/src/postgres.rsmodule:test_connection(connection_string) -> Result<bool>check_health(connection_string) -> Result<HealthStatus>
-
Write unit tests with mocked K8s clients
-
Write integration tests against real cluster
Success Criteria: Clean, testable modules that can be called independently.
Why next? Separates concerns, enables testing, and creates reusable code for activities.
Goal: Store instance state before adding workflows
Tasks:
-
Run
scripts/db-init.sh(andscripts/db-migrate.sh) to create schema -
Create
toygres-server/src/db.rsmodule:insert_instance(metadata) -> Result<Uuid>update_instance_state(id, state) -> Result<()>update_health_status(id, status) -> Result<()>get_instance(id) -> Result<InstanceMetadata>list_instances() -> Result<Vec<InstanceMetadata>>
-
Create test binary that:
- Creates instance in K8s
- Stores metadata in database
- Queries and updates state
- Cleans up
Success Criteria: Can track instance lifecycle in database while managing K8s resources.
Why next? Database logic separated from workflow logic. Foundation for Duroxide state tracking.
Goal: Working API for basic operations
Tasks:
-
Implement synchronous API endpoints in
toygres-server/src/api.rs:POST /instances- Blocks until instance ready, returns connection stringDELETE /instances/{id}- Blocks until deletion completeGET /instances- Lists all from databaseGET /instances/{id}- Gets instance details
-
Wire up modules: API → K8s module → Database module
-
Add full error handling and logging
-
Manual testing with curl
Success Criteria: Can create/delete instances via REST API. Everything stored in database.
Why next? Validates entire flow works end-to-end. Understand timing/latency requirements.
Goal: Convert modules to Duroxide activities
Tasks:
-
Implement activities one by one in
toygres-activities/:DeployPostgresActivity(wrapsk8s::create_postgres_resources)GetInstanceStatusActivity(wrapsk8s::get_pod_status)DeletePostgresActivity(wrapsk8s::delete_postgres_resources)HealthCheckActivity(wrapspostgres::check_health)UpdateMetadataActivity(wrapsdb::update_*)GenerateConnectionStringActivity(wrapsk8s::get_service_endpoint)
-
Test each activity independently
-
Verify serialization/deserialization
-
Confirm error handling and retries
Success Criteria: All activities work independently with Duroxide.
Why next? We know underlying code works. Just adding Duroxide wrapper. Can test in isolation.
Goal: Build durable workflows
Tasks:
-
Implement
CreateInstanceOrchestrationintoygres-orchestrations/:- Call activities in sequence
- Poll for readiness
- Return connection string
- DON'T start health check yet (that's Phase 7)
-
Test with Duroxide in-memory provider
-
Verify orchestration completes, test retry behavior
-
Implement
DeleteInstanceOrchestration:- Update state, call delete activity
- DON'T worry about canceling health checks yet
Success Criteria: Can create/delete instances using durable workflows.
Why next? Start simple with linear workflows. Learn Duroxide patterns. Validate durability/retry.
Goal: Make API asynchronous with durable workflows
Tasks:
-
Initialize Duroxide worker in
toygres-server/src/worker.rs:- Connect to duroxide-pg
- Register activities and orchestrations
- Start worker loop
-
Update API to start orchestrations:
POST /instances→ StartCreateInstanceOrchestration, return orchestration IDGET /operations/{id}→ Query orchestration status- Keep synchronous endpoints for comparison/testing
-
Test async operations and resumption after worker restart
Success Criteria: API starts durable workflows. Can query status. Workflows survive restarts.
Why next? Everything else working. Just changing API semantics. Can compare with sync version.
Goal: Add continuous monitoring
Tasks:
-
Implement
HealthCheckOrchestrationintoygres-orchestrations/:- Infinite loop with Duroxide timer
- Call
HealthCheckActivity - Update database with health status
-
Update
CreateInstanceOrchestration:- Start detached health check orchestration
- Store orchestration ID in metadata
-
Update
DeleteInstanceOrchestration:- Retrieve and cancel health check orchestration ID
- Then proceed with deletion
-
Test full lifecycle:
- Create instance → Health checks start
- Monitor database updates every 30s
- Delete instance → Health checks stop
Success Criteria: Continuous health monitoring with automatic start/stop on create/delete.
Why last? Most complex feature. Depends on everything else. Involves orchestration cancellation.
Goal: Make it production-grade
Tasks:
- Comprehensive error handling and recovery
- Add metrics and monitoring (Prometheus?)
- Security hardening (RBAC, secrets management)
- Performance optimization
- Complete documentation
- Deployment guides and Helm charts
- End-to-end tests against real AKS cluster
Success Criteria: Production-ready control plane with monitoring, docs, and deployment automation.
- ✅ Phase 0: Scaffolding complete (workspace, models, scripts, docs)
- 🔄 Phase 0: Need to implement
manual_deploy.rsPOC - ⏳ Phase 1-8: Not started
- Implement
toygres-server/examples/manual_deploy.rs - Test against real AKS cluster
- Document learnings and K8s resource configurations
- Move to Phase 1 when POC works reliably