Goal
Close the gap between the current development-only docker-compose and a production-grade self-hosted deployment that shares the same application architecture as the Azure path defined in #22. This means:
- Replace the in-process
Channel<T> queue with RabbitMQ so self-hosted gets the same durability guarantees (persistence, dead-letter, retries) as Azure Service Bus
- Add an optional Grafana observability stack to docker-compose so self-hosted operators get traces, metrics, and log aggregation out of the box
- Automate the OpenBao bootstrap so
docker compose up (or a single make setup) results in a fully operational PatchHound instance without manual unseal / token steps
Everything in this issue targets the infra/stacks/selfhosted/ path. The Azure path is covered by #22. Application code changes (streaming ingestion, IIngestionJobQueue abstraction, OpenTelemetry wiring) are shared between both issues and tracked in #22 — this issue focuses on the self-hosted infrastructure layer on top of that shared code.
Current state
| Area |
Current behaviour |
Gap |
| Message queue |
In-process System.Threading.Channels.Channel<T> (SentinelAuditQueue). Ingestion dispatch has no external queue at all. |
Messages lost on worker restart. No dead-letter. No retry on failure. Not equivalent to Azure Service Bus. |
| Observability |
Zero telemetry. No OTLP export configured. |
Self-hosted operators have no visibility into ingestion performance, errors, or SLA metrics. |
| OpenBao bootstrap |
6 manual steps: init → unseal (×3) → login → enable-kv → policy → token. Must be repeated after every volume reset. |
First-run experience is brittle and undocumented at the compose level. A single missed step silently breaks all secret reads. |
| docker-compose profiles |
No profiles. All services always start. |
No way to opt in/out of observability services or run a minimal dev stack. |
infra/stacks/selfhosted/ |
Empty directory. |
No canonical self-hosted IaC or runbook. |
1. RabbitMQ as the self-hosted message broker
Why RabbitMQ over in-process Channel
The IIngestionJobQueue abstraction introduced by #22 hides the queue implementation. The self-hosted path should bind it to RabbitMQ rather than an in-process Channel so that:
- Ingestion jobs survive a worker container restart
- Failed jobs land in a dead-letter queue rather than being silently dropped
- Multiple worker replicas can be run safely (competing consumers on the same queue)
- The operational model mirrors the Azure Service Bus path — operators get the same guarantees and the same retry/DLQ behaviour
docker-compose addition
rabbitmq:
image: rabbitmq:4-management-alpine
environment:
RABBITMQ_DEFAULT_USER: ${RABBITMQ_USER:-patchhound}
RABBITMQ_DEFAULT_PASS: ${RABBITMQ_PASSWORD}
volumes:
- rabbitmq_data:/var/lib/rabbitmq
ports:
- "5672:5672" # AMQP
- "15672:15672" # Management UI
healthcheck:
test: ["CMD", "rabbitmq-diagnostics", "ping"]
interval: 10s
timeout: 5s
retries: 12
start_period: 20s
Both api and worker services gain a dependency on rabbitmq: condition: service_healthy and a new environment variable:
ConnectionStrings__RabbitMq: amqp://${RABBITMQ_USER:-patchhound}:${RABBITMQ_PASSWORD}@rabbitmq:5672/
Application wiring
A new RabbitMqIngestionJobQueue implements IIngestionJobQueue (defined in #22):
// src/PatchHound.Infrastructure/Queues/RabbitMqIngestionJobQueue.cs
public class RabbitMqIngestionJobQueue : IIngestionJobQueue
{
// Uses RabbitMQ.Client or MassTransit backing
// Queue: "ingestion-jobs"
// Dead-letter exchange: "ingestion-jobs-dlx" → queue "ingestion-jobs-dead"
// Message TTL and max-delivery-count configurable via appsettings
}
DI registration in DependencyInjection.cs — select implementation based on configuration:
if (!string.IsNullOrEmpty(config["ConnectionStrings:ServiceBus"]))
services.AddSingleton<IIngestionJobQueue, ServiceBusIngestionJobQueue>(); // Azure
else if (!string.IsNullOrEmpty(config["ConnectionStrings:RabbitMq"]))
services.AddSingleton<IIngestionJobQueue, RabbitMqIngestionJobQueue>(); // self-hosted
else
services.AddSingleton<IIngestionJobQueue, InProcessIngestionJobQueue>(); // dev fallback
The SentinelAuditQueue (currently a Channel<SentinelAuditEvent>) should be evaluated separately — it is an internal in-process audit buffer and may remain as a Channel since it is not a job queue.
Queue topology
| Queue |
Purpose |
DLX |
ingestion-jobs |
Pending ingestion job messages |
ingestion-jobs-dlx |
ingestion-jobs-dead |
Failed jobs after max retries |
— (monitor manually) |
Queue and exchange declarations should be idempotent on startup (declare-if-not-exists pattern).
Affected files:
docker-compose.yml — add rabbitmq service, update api and worker dependencies
src/PatchHound.Infrastructure/Queues/RabbitMqIngestionJobQueue.cs — new
src/PatchHound.Infrastructure/DependencyInjection.cs — queue implementation selection
src/PatchHound.Infrastructure.csproj — add RabbitMQ.Client or MassTransit.RabbitMQ package
.env.example — add RABBITMQ_USER, RABBITMQ_PASSWORD
2. Grafana observability stack (optional docker-compose profile)
Profile structure
Add an observability docker-compose profile containing:
| Service |
Image |
Purpose |
prometheus |
prom/prometheus:v3 |
Scrapes /metrics from API and Worker |
tempo |
grafana/tempo:latest |
OTLP trace receiver + trace storage |
loki |
grafana/loki:3 |
Log aggregation (JSON log sink from API/Worker) |
grafana |
grafana/grafana-oss:11 |
Dashboards, explorer, alerting |
Services are only started when the profile is active:
docker compose --profile observability up
api and worker gain optional environment variables that are set only when the observability profile is used:
Telemetry__OtlpEndpoint: http://tempo:4317
The OpenTelemetry SDK (wired in #22) will check for this endpoint at startup. If absent, telemetry is a no-op.
Provisioning
Grafana should start pre-configured with:
- Prometheus, Tempo, and Loki as provisioned data sources (
deploy/grafana/provisioning/datasources/)
- A PatchHound dashboard provisioned from JSON (
deploy/grafana/provisioning/dashboards/patchhound.json) covering:
- Ingestion jobs enqueued / processed / dead-lettered per hour
- Worker lease acquisition success/failure
- p99 DB query latency
- Active workflow and SLA breach counts
Log shipping
API and Worker emit structured JSON logs to stdout (default .NET behaviour). Loki collects these via the Docker log driver or a Promtail sidecar:
promtail:
image: grafana/promtail:3
profiles: [observability]
volumes:
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- /var/run/docker.sock:/var/run/docker.sock
- ./deploy/promtail/config.yml:/etc/promtail/config.yml:ro
New files:
deploy/grafana/provisioning/datasources/datasources.yml
deploy/grafana/provisioning/dashboards/dashboards.yml
deploy/grafana/provisioning/dashboards/patchhound.json
deploy/promtail/config.yml
deploy/prometheus/prometheus.yml
deploy/tempo/tempo.yml
Affected files:
docker-compose.yml — add observability profile services
.env.example — add GF_SECURITY_ADMIN_PASSWORD
3. Automated OpenBao bootstrap
Problem
The current setup requires six manual steps before PatchHound can read any secrets. A first-time operator who follows docker compose up without reading deploy/openbao/README.md will see silent secret read failures with no indication of why.
Solution — init container + auto-unseal via transit seal (or file-based auto-unseal for dev)
Two modes:
Mode A — Dev / self-hosted single-node (default): Use OpenBao's file-based auto-unseal by configuring a static unseal key in the openbao.hcl via an environment variable. This is not suitable for production secret management but eliminates the manual unseal step for self-hosted deployments where the host machine is the trust boundary.
# deploy/openbao/config/openbao-dev.hcl
seal "shamir" {} # default, but override with auto-unseal config
# Alternative: use transit seal with a local key — no KMS dependency
A simpler approach: configure OpenBao with VAULT_DEV_ROOT_TOKEN_ID in dev mode (-dev flag), which starts already initialised and unsealed. This is appropriate for single-host self-hosted deployments where data persistence across restarts is less critical.
Mode B — Production self-hosted: Keep the current manual init/unseal flow (with the existing Makefile) but document it clearly as the production path.
Automated bootstrap init container:
Add an openbao-init one-shot service that runs after OpenBao is healthy, checks whether it is already initialised, and if not:
- Initialises (
bao operator init)
- Unseals with the generated keys
- Creates the KV mount
- Writes the policy
- Creates the application token
- Writes the token to a shared Docker volume so the API and Worker containers can read it
openbao-init:
image: openbao/openbao
depends_on:
openbao:
condition: service_healthy
volumes:
- openbao_init:/init-output
- ./deploy/openbao/scripts/bootstrap.sh:/bootstrap.sh:ro
entrypoint: ["/bin/sh", "/bootstrap.sh"]
environment:
BAO_ADDR: http://openbao:8200
KV_MOUNT: ${OPENBAO_KV_MOUNT:-patchhound}
The api and worker services read OPENBAO_TOKEN from the shared volume via an entrypoint wrapper:
# entrypoint.sh
export OPENBAO_TOKEN=$(cat /init-output/app-token.txt)
exec "$@"
Alternatively: extend the existing Makefile all target to be the canonical first-run command and document it as the single entry point (make -C deploy/openbao setup).
deploy/openbao/scripts/bootstrap.sh — idempotent init + unseal + provision script (extracted from the Makefile targets and made container-friendly).
New/affected files:
deploy/openbao/config/openbao-dev.hcl — dev mode config (pre-init, auto-unseal)
deploy/openbao/scripts/bootstrap.sh — idempotent bootstrap script
docker-compose.yml — openbao-init one-shot service, entrypoint changes for api/worker
deploy/openbao/README.md — update to describe both dev and production paths
4. infra/stacks/selfhosted/ documentation
Fill the currently empty directory with:
README.md — canonical self-hosted runbook: prerequisites (Docker ≥ 27, compose ≥ 2.24), first-run steps, profile options, upgrade path, backup guidance (pg_dump, OpenBao snapshot, RabbitMQ definitions export)
docker-compose.override.yml.example — template for local overrides (custom ports, volume paths, SSL termination)
The infra/stacks/selfhosted/ directory should be the single reference for anyone deploying PatchHound on their own infrastructure.
Acceptance criteria
RabbitMQ
Observability
OpenBao bootstrap
Documentation
Dependencies
Related
Goal
Close the gap between the current development-only docker-compose and a production-grade self-hosted deployment that shares the same application architecture as the Azure path defined in #22. This means:
Channel<T>queue with RabbitMQ so self-hosted gets the same durability guarantees (persistence, dead-letter, retries) as Azure Service Busdocker compose up(or a singlemake setup) results in a fully operational PatchHound instance without manual unseal / token stepsEverything in this issue targets the
infra/stacks/selfhosted/path. The Azure path is covered by #22. Application code changes (streaming ingestion,IIngestionJobQueueabstraction, OpenTelemetry wiring) are shared between both issues and tracked in #22 — this issue focuses on the self-hosted infrastructure layer on top of that shared code.Current state
System.Threading.Channels.Channel<T>(SentinelAuditQueue). Ingestion dispatch has no external queue at all.infra/stacks/selfhosted/1. RabbitMQ as the self-hosted message broker
Why RabbitMQ over in-process Channel
The
IIngestionJobQueueabstraction introduced by #22 hides the queue implementation. The self-hosted path should bind it to RabbitMQ rather than an in-process Channel so that:docker-compose addition
Both
apiandworkerservices gain a dependency onrabbitmq: condition: service_healthyand a new environment variable:Application wiring
A new
RabbitMqIngestionJobQueueimplementsIIngestionJobQueue(defined in #22):DI registration in
DependencyInjection.cs— select implementation based on configuration:The
SentinelAuditQueue(currently aChannel<SentinelAuditEvent>) should be evaluated separately — it is an internal in-process audit buffer and may remain as a Channel since it is not a job queue.Queue topology
ingestion-jobsingestion-jobs-dlxingestion-jobs-deadQueue and exchange declarations should be idempotent on startup (declare-if-not-exists pattern).
Affected files:
docker-compose.yml— addrabbitmqservice, updateapiandworkerdependenciessrc/PatchHound.Infrastructure/Queues/RabbitMqIngestionJobQueue.cs— newsrc/PatchHound.Infrastructure/DependencyInjection.cs— queue implementation selectionsrc/PatchHound.Infrastructure.csproj— addRabbitMQ.ClientorMassTransit.RabbitMQpackage.env.example— addRABBITMQ_USER,RABBITMQ_PASSWORD2. Grafana observability stack (optional docker-compose profile)
Profile structure
Add an
observabilitydocker-compose profile containing:prometheusprom/prometheus:v3/metricsfrom API and Workertempografana/tempo:latestlokigrafana/loki:3grafanagrafana/grafana-oss:11Services are only started when the profile is active:
apiandworkergain optional environment variables that are set only when theobservabilityprofile is used:The OpenTelemetry SDK (wired in #22) will check for this endpoint at startup. If absent, telemetry is a no-op.
Provisioning
Grafana should start pre-configured with:
deploy/grafana/provisioning/datasources/)deploy/grafana/provisioning/dashboards/patchhound.json) covering:Log shipping
API and Worker emit structured JSON logs to stdout (default .NET behaviour). Loki collects these via the Docker log driver or a Promtail sidecar:
New files:
deploy/grafana/provisioning/datasources/datasources.ymldeploy/grafana/provisioning/dashboards/dashboards.ymldeploy/grafana/provisioning/dashboards/patchhound.jsondeploy/promtail/config.ymldeploy/prometheus/prometheus.ymldeploy/tempo/tempo.ymlAffected files:
docker-compose.yml— addobservabilityprofile services.env.example— addGF_SECURITY_ADMIN_PASSWORD3. Automated OpenBao bootstrap
Problem
The current setup requires six manual steps before PatchHound can read any secrets. A first-time operator who follows
docker compose upwithout readingdeploy/openbao/README.mdwill see silent secret read failures with no indication of why.Solution — init container + auto-unseal via transit seal (or file-based auto-unseal for dev)
Two modes:
Mode A — Dev / self-hosted single-node (default): Use OpenBao's file-based auto-unseal by configuring a static unseal key in the
openbao.hclvia an environment variable. This is not suitable for production secret management but eliminates the manual unseal step for self-hosted deployments where the host machine is the trust boundary.A simpler approach: configure OpenBao with
VAULT_DEV_ROOT_TOKEN_IDin dev mode (-devflag), which starts already initialised and unsealed. This is appropriate for single-host self-hosted deployments where data persistence across restarts is less critical.Mode B — Production self-hosted: Keep the current manual init/unseal flow (with the existing Makefile) but document it clearly as the production path.
Automated bootstrap init container:
Add an
openbao-initone-shot service that runs after OpenBao is healthy, checks whether it is already initialised, and if not:bao operator init)The
apiandworkerservices readOPENBAO_TOKENfrom the shared volume via an entrypoint wrapper:Alternatively: extend the existing Makefile
alltarget to be the canonical first-run command and document it as the single entry point (make -C deploy/openbao setup).deploy/openbao/scripts/bootstrap.sh— idempotent init + unseal + provision script (extracted from the Makefile targets and made container-friendly).New/affected files:
deploy/openbao/config/openbao-dev.hcl— dev mode config (pre-init, auto-unseal)deploy/openbao/scripts/bootstrap.sh— idempotent bootstrap scriptdocker-compose.yml—openbao-initone-shot service, entrypoint changes forapi/workerdeploy/openbao/README.md— update to describe both dev and production paths4.
infra/stacks/selfhosted/documentationFill the currently empty directory with:
README.md— canonical self-hosted runbook: prerequisites (Docker ≥ 27, compose ≥ 2.24), first-run steps, profile options, upgrade path, backup guidance (pg_dump, OpenBao snapshot, RabbitMQ definitions export)docker-compose.override.yml.example— template for local overrides (custom ports, volume paths, SSL termination)The
infra/stacks/selfhosted/directory should be the single reference for anyone deploying PatchHound on their own infrastructure.Acceptance criteria
RabbitMQ
docker compose upstarts a healthy RabbitMQ container alongside all existing servicesingestion-jobs-deadqueue and do not block other jobshttp://localhost:15672InProcessIngestionJobQueueremains as a dev fallback when noConnectionStrings:RabbitMqis configuredObservability
docker compose --profile observability upstarts Prometheus, Tempo, Loki, Promtail, and Grafanahttp://localhost:3100(or configured port) with pre-provisioned data sourcesobservabilityprofile is not active, the API and Worker start normally with no telemetry errorsOpenBao bootstrap
docker compose upon a fresh volume initialises, unseals, and configures OpenBao automatically — no manual steps requireddocker-compose.ymldocker compose upre-bootstraps correctlymake -C deploy/openbao all) remains functional for operators who want explicit controlDocumentation
infra/stacks/selfhosted/README.mdcovers first-run, profiles, upgrade, and backupdeploy/openbao/README.mdis updated to reflect the automated bootstrap path and notes when manual setup is preferredDependencies
IIngestionJobQueueinterface and the OpenTelemetry SDK wiring before the RabbitMQ implementation and Grafana stack can be built on top of itRelated