Skip to content

feat: Self-hosted deployment — production-grade setup aligned with #22 scalability model #23

@FrodeHus

Description

@FrodeHus

Goal

Close the gap between the current development-only docker-compose and a production-grade self-hosted deployment that shares the same application architecture as the Azure path defined in #22. This means:

  1. Replace the in-process Channel<T> queue with RabbitMQ so self-hosted gets the same durability guarantees (persistence, dead-letter, retries) as Azure Service Bus
  2. Add an optional Grafana observability stack to docker-compose so self-hosted operators get traces, metrics, and log aggregation out of the box
  3. Automate the OpenBao bootstrap so docker compose up (or a single make setup) results in a fully operational PatchHound instance without manual unseal / token steps

Everything in this issue targets the infra/stacks/selfhosted/ path. The Azure path is covered by #22. Application code changes (streaming ingestion, IIngestionJobQueue abstraction, OpenTelemetry wiring) are shared between both issues and tracked in #22 — this issue focuses on the self-hosted infrastructure layer on top of that shared code.


Current state

Area Current behaviour Gap
Message queue In-process System.Threading.Channels.Channel<T> (SentinelAuditQueue). Ingestion dispatch has no external queue at all. Messages lost on worker restart. No dead-letter. No retry on failure. Not equivalent to Azure Service Bus.
Observability Zero telemetry. No OTLP export configured. Self-hosted operators have no visibility into ingestion performance, errors, or SLA metrics.
OpenBao bootstrap 6 manual steps: init → unseal (×3) → login → enable-kv → policy → token. Must be repeated after every volume reset. First-run experience is brittle and undocumented at the compose level. A single missed step silently breaks all secret reads.
docker-compose profiles No profiles. All services always start. No way to opt in/out of observability services or run a minimal dev stack.
infra/stacks/selfhosted/ Empty directory. No canonical self-hosted IaC or runbook.

1. RabbitMQ as the self-hosted message broker

Why RabbitMQ over in-process Channel

The IIngestionJobQueue abstraction introduced by #22 hides the queue implementation. The self-hosted path should bind it to RabbitMQ rather than an in-process Channel so that:

  • Ingestion jobs survive a worker container restart
  • Failed jobs land in a dead-letter queue rather than being silently dropped
  • Multiple worker replicas can be run safely (competing consumers on the same queue)
  • The operational model mirrors the Azure Service Bus path — operators get the same guarantees and the same retry/DLQ behaviour

docker-compose addition

rabbitmq:
  image: rabbitmq:4-management-alpine
  environment:
    RABBITMQ_DEFAULT_USER: ${RABBITMQ_USER:-patchhound}
    RABBITMQ_DEFAULT_PASS: ${RABBITMQ_PASSWORD}
  volumes:
    - rabbitmq_data:/var/lib/rabbitmq
  ports:
    - "5672:5672"    # AMQP
    - "15672:15672"  # Management UI
  healthcheck:
    test: ["CMD", "rabbitmq-diagnostics", "ping"]
    interval: 10s
    timeout: 5s
    retries: 12
    start_period: 20s

Both api and worker services gain a dependency on rabbitmq: condition: service_healthy and a new environment variable:

ConnectionStrings__RabbitMq: amqp://${RABBITMQ_USER:-patchhound}:${RABBITMQ_PASSWORD}@rabbitmq:5672/

Application wiring

A new RabbitMqIngestionJobQueue implements IIngestionJobQueue (defined in #22):

// src/PatchHound.Infrastructure/Queues/RabbitMqIngestionJobQueue.cs
public class RabbitMqIngestionJobQueue : IIngestionJobQueue
{
    // Uses RabbitMQ.Client or MassTransit backing
    // Queue: "ingestion-jobs"
    // Dead-letter exchange: "ingestion-jobs-dlx" → queue "ingestion-jobs-dead"
    // Message TTL and max-delivery-count configurable via appsettings
}

DI registration in DependencyInjection.cs — select implementation based on configuration:

if (!string.IsNullOrEmpty(config["ConnectionStrings:ServiceBus"]))
    services.AddSingleton<IIngestionJobQueue, ServiceBusIngestionJobQueue>();   // Azure
else if (!string.IsNullOrEmpty(config["ConnectionStrings:RabbitMq"]))
    services.AddSingleton<IIngestionJobQueue, RabbitMqIngestionJobQueue>();     // self-hosted
else
    services.AddSingleton<IIngestionJobQueue, InProcessIngestionJobQueue>();    // dev fallback

The SentinelAuditQueue (currently a Channel<SentinelAuditEvent>) should be evaluated separately — it is an internal in-process audit buffer and may remain as a Channel since it is not a job queue.

Queue topology

Queue Purpose DLX
ingestion-jobs Pending ingestion job messages ingestion-jobs-dlx
ingestion-jobs-dead Failed jobs after max retries — (monitor manually)

Queue and exchange declarations should be idempotent on startup (declare-if-not-exists pattern).

Affected files:

  • docker-compose.yml — add rabbitmq service, update api and worker dependencies
  • src/PatchHound.Infrastructure/Queues/RabbitMqIngestionJobQueue.cs — new
  • src/PatchHound.Infrastructure/DependencyInjection.cs — queue implementation selection
  • src/PatchHound.Infrastructure.csproj — add RabbitMQ.Client or MassTransit.RabbitMQ package
  • .env.example — add RABBITMQ_USER, RABBITMQ_PASSWORD

2. Grafana observability stack (optional docker-compose profile)

Profile structure

Add an observability docker-compose profile containing:

Service Image Purpose
prometheus prom/prometheus:v3 Scrapes /metrics from API and Worker
tempo grafana/tempo:latest OTLP trace receiver + trace storage
loki grafana/loki:3 Log aggregation (JSON log sink from API/Worker)
grafana grafana/grafana-oss:11 Dashboards, explorer, alerting

Services are only started when the profile is active:

docker compose --profile observability up

api and worker gain optional environment variables that are set only when the observability profile is used:

Telemetry__OtlpEndpoint: http://tempo:4317

The OpenTelemetry SDK (wired in #22) will check for this endpoint at startup. If absent, telemetry is a no-op.

Provisioning

Grafana should start pre-configured with:

  • Prometheus, Tempo, and Loki as provisioned data sources (deploy/grafana/provisioning/datasources/)
  • A PatchHound dashboard provisioned from JSON (deploy/grafana/provisioning/dashboards/patchhound.json) covering:
    • Ingestion jobs enqueued / processed / dead-lettered per hour
    • Worker lease acquisition success/failure
    • p99 DB query latency
    • Active workflow and SLA breach counts

Log shipping

API and Worker emit structured JSON logs to stdout (default .NET behaviour). Loki collects these via the Docker log driver or a Promtail sidecar:

promtail:
  image: grafana/promtail:3
  profiles: [observability]
  volumes:
    - /var/lib/docker/containers:/var/lib/docker/containers:ro
    - /var/run/docker.sock:/var/run/docker.sock
    - ./deploy/promtail/config.yml:/etc/promtail/config.yml:ro

New files:

  • deploy/grafana/provisioning/datasources/datasources.yml
  • deploy/grafana/provisioning/dashboards/dashboards.yml
  • deploy/grafana/provisioning/dashboards/patchhound.json
  • deploy/promtail/config.yml
  • deploy/prometheus/prometheus.yml
  • deploy/tempo/tempo.yml

Affected files:

  • docker-compose.yml — add observability profile services
  • .env.example — add GF_SECURITY_ADMIN_PASSWORD

3. Automated OpenBao bootstrap

Problem

The current setup requires six manual steps before PatchHound can read any secrets. A first-time operator who follows docker compose up without reading deploy/openbao/README.md will see silent secret read failures with no indication of why.

Solution — init container + auto-unseal via transit seal (or file-based auto-unseal for dev)

Two modes:

Mode A — Dev / self-hosted single-node (default): Use OpenBao's file-based auto-unseal by configuring a static unseal key in the openbao.hcl via an environment variable. This is not suitable for production secret management but eliminates the manual unseal step for self-hosted deployments where the host machine is the trust boundary.

# deploy/openbao/config/openbao-dev.hcl
seal "shamir" {} # default, but override with auto-unseal config

# Alternative: use transit seal with a local key — no KMS dependency

A simpler approach: configure OpenBao with VAULT_DEV_ROOT_TOKEN_ID in dev mode (-dev flag), which starts already initialised and unsealed. This is appropriate for single-host self-hosted deployments where data persistence across restarts is less critical.

Mode B — Production self-hosted: Keep the current manual init/unseal flow (with the existing Makefile) but document it clearly as the production path.

Automated bootstrap init container:

Add an openbao-init one-shot service that runs after OpenBao is healthy, checks whether it is already initialised, and if not:

  1. Initialises (bao operator init)
  2. Unseals with the generated keys
  3. Creates the KV mount
  4. Writes the policy
  5. Creates the application token
  6. Writes the token to a shared Docker volume so the API and Worker containers can read it
openbao-init:
  image: openbao/openbao
  depends_on:
    openbao:
      condition: service_healthy
  volumes:
    - openbao_init:/init-output
    - ./deploy/openbao/scripts/bootstrap.sh:/bootstrap.sh:ro
  entrypoint: ["/bin/sh", "/bootstrap.sh"]
  environment:
    BAO_ADDR: http://openbao:8200
    KV_MOUNT: ${OPENBAO_KV_MOUNT:-patchhound}

The api and worker services read OPENBAO_TOKEN from the shared volume via an entrypoint wrapper:

# entrypoint.sh
export OPENBAO_TOKEN=$(cat /init-output/app-token.txt)
exec "$@"

Alternatively: extend the existing Makefile all target to be the canonical first-run command and document it as the single entry point (make -C deploy/openbao setup).

deploy/openbao/scripts/bootstrap.sh — idempotent init + unseal + provision script (extracted from the Makefile targets and made container-friendly).

New/affected files:

  • deploy/openbao/config/openbao-dev.hcl — dev mode config (pre-init, auto-unseal)
  • deploy/openbao/scripts/bootstrap.sh — idempotent bootstrap script
  • docker-compose.ymlopenbao-init one-shot service, entrypoint changes for api/worker
  • deploy/openbao/README.md — update to describe both dev and production paths

4. infra/stacks/selfhosted/ documentation

Fill the currently empty directory with:

  • README.md — canonical self-hosted runbook: prerequisites (Docker ≥ 27, compose ≥ 2.24), first-run steps, profile options, upgrade path, backup guidance (pg_dump, OpenBao snapshot, RabbitMQ definitions export)
  • docker-compose.override.yml.example — template for local overrides (custom ports, volume paths, SSL termination)

The infra/stacks/selfhosted/ directory should be the single reference for anyone deploying PatchHound on their own infrastructure.


Acceptance criteria

RabbitMQ

  • docker compose up starts a healthy RabbitMQ container alongside all existing services
  • Ingestion jobs are enqueued to RabbitMQ when their cron schedule is due (once feat: Scalability — SaaS multi-tenant architecture with Azure as first cloud #22 ingestion work is complete)
  • A worker container restart does not lose queued ingestion jobs
  • Failed ingestion jobs (after N retries) appear in the ingestion-jobs-dead queue and do not block other jobs
  • The RabbitMQ management UI is accessible at http://localhost:15672
  • InProcessIngestionJobQueue remains as a dev fallback when no ConnectionStrings:RabbitMq is configured

Observability

  • docker compose --profile observability up starts Prometheus, Tempo, Loki, Promtail, and Grafana
  • Grafana is accessible at http://localhost:3100 (or configured port) with pre-provisioned data sources
  • The PatchHound dashboard shows ingestion job throughput, p99 DB latency, and active workflow counts after at least one ingestion run
  • When the observability profile is not active, the API and Worker start normally with no telemetry errors

OpenBao bootstrap

  • docker compose up on a fresh volume initialises, unseals, and configures OpenBao automatically — no manual steps required
  • The application token is available to API and Worker containers without being hardcoded in docker-compose.yml
  • A volume reset followed by docker compose up re-bootstraps correctly
  • The existing Makefile production flow (make -C deploy/openbao all) remains functional for operators who want explicit control

Documentation

  • infra/stacks/selfhosted/README.md covers first-run, profiles, upgrade, and backup
  • deploy/openbao/README.md is updated to reflect the automated bootstrap path and notes when manual setup is preferred

Dependencies

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions