Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 55 additions & 8 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Nopea is a deployment tool that builds a knowledge graph from every deployment.
mix format && mix compile --warnings-as-errors && mix test

# Individual commands
mix test # 280 tests, 0 failures
mix test # 305 tests, 0 failures
mix test test/nopea/deploy_test.exs # Single file
mix test test/nopea/deploy_test.exs:106 # Single test by line number
mix test --exclude integration --exclude cluster # Skip slow tests
Expand All @@ -35,19 +35,39 @@ Tests exclude `:integration` and `:cluster` tags by default (configured in `test
## DEPLOY PIPELINE

```
CLI/MCP/API → Deploy.deploy(spec)
CLI/MCP/API → Surface.*() → Deploy.deploy(spec)
→ ServiceAgent.deploy() # queue/serialize per-service
→ Deploy.run(spec) # orchestration
→ Memory.get_deploy_context() # graph query
→ select_strategy() # direct/canary/blue_green (memory-aware)
→ Strategy.Direct.execute() # K8s server-side apply
→ Drift.verify_manifest() # post-deploy 3-way diff
→ Memory.record_deploy() # graph update (EWMA, async cast)
├─ :direct →
│ → Strategy.Direct.execute() # K8s server-side apply
│ → Drift.verify_manifest() # post-deploy 3-way diff
│ → Memory.record_deploy() # graph update (EWMA, async cast)
└─ :canary/:blue_green →
→ Kulta.RolloutBuilder.build() # Rollout CRD
→ Progressive.Monitor.start() # polls CRD → records outcome on terminal
→ Occurrence.build() + persist() # FALSE Protocol
```

**Entry points**: `Deploy.deploy/1` routes through ServiceAgent if the supervisor is running; falls back to `Deploy.run/1` otherwise. Always use `deploy/1` — never call `run/1` directly from external callers.

**Progressive delivery**: Canary/blue_green strategies return `status: :progressing` and start a `Progressive.Monitor` GenServer. The Monitor polls the Kulta Rollout CRD and records the final outcome. Direct deploys return `:completed` or `:failed` immediately.

---

## SURFACE — UNIFIED INTERFACE LAYER

`Nopea.Surface` is the facade backing CLI, MCP, and HTTP. All user-facing interfaces delegate here.

```
CLI (cli.ex) ──→ Surface.*() ──→ Memory / Cache / ServiceAgent / Progressive.Monitor
MCP (mcp.ex) ──→ Surface.*()
HTTP (router.ex) → Surface.*()
```

Key design: Surface handles graceful degradation when optional subsystems aren't running (e.g., returns `{:error, :unavailable}` if Cache is down rather than crashing).

---

## OTP SUPERVISION TREE
Expand All @@ -62,6 +82,7 @@ Nopea.Application
├── Nopea.Cluster # libcluster (optional, cluster mode)
├── Nopea.Registry / DistributedRegistry # Process registry
├── Nopea.ServiceAgent.Supervisor # DynamicSupervisor for per-service agents
├── Nopea.Progressive.Supervisor # DynamicSupervisor for rollout monitors
└── Nopea.API.Router # Plug/Cowboy HTTP (optional)
```

Expand All @@ -74,11 +95,19 @@ Most children are optional, controlled by `Application.get_env(:nopea, key)`:
| `:enable_metrics` | `true` | TelemetryMetricsPrometheus |
| `:enable_cache` | `true` | Nopea.Cache (ETS) |
| `:enable_memory` | `true` | Nopea.Memory (knowledge graph) |
| `:enable_deploy_supervisor` | `true` | Registry + ServiceAgent.Supervisor |
| `:enable_deploy_supervisor` | `true` | Registry + ServiceAgent.Supervisor + Progressive.Supervisor |
| `:enable_router` | `false` | Nopea.API.Router (HTTP) |
| `:cluster_enabled` | `false` | Cluster + DistributedRegistry |
| `:cdevents_endpoint` | `nil` | Events.Emitter (started only if set) |
| `:canary_threshold` | `0.15` | Failure confidence for auto-canary |
| `:cluster_strategy` | `:kubernetes_dns` | libcluster strategy (`:kubernetes_dns`, `:gossip`, `:epmd`) |
| `:cluster_service` | `"nopea-headless"` | K8s headless service for DNS discovery |
| `:cluster_app_name` | `"nopea"` | Erlang application name for DNS discovery |
| `:pod_namespace` | `"default"` | Kubernetes namespace for DNS discovery |
| `:cluster_polling_interval` | `5_000` | DNS polling interval in ms |
| `:cluster_gossip_port` | `45_892` | UDP port for gossip strategy |
| `:cluster_gossip_secret` | `nil` | Shared secret for gossip authentication |
| `:cluster_hosts` | `[]` | Node list for EPMD strategy (atom list) |

---

Expand Down Expand Up @@ -118,7 +147,7 @@ Per-service GenServer that queues and serializes deploys:

## MEMORY SYSTEM

Knowledge graph stored in `Nopea.Memory` GenServer state.
Knowledge graph stored in `Nopea.Memory` GenServer state, persisted to `.nopea/graph.etf`.

**Graph nodes**: services, namespaces, errors (kinds: `:concept`, `:error`)
**Graph relationships**: `:deployed_to`, `:breaks`, `:deployed_together`
Expand All @@ -129,6 +158,12 @@ Key API:
- `Memory.record_deploy(result)` → ingest into graph (**async cast**)
- `Memory.node_count()` / `Memory.relationship_count()` → graph stats (**sync call**)

### Persistence

Graph persists to `.nopea/graph.etf` as versioned binary (`<<1, rest::binary>>`).
Restore order on startup: ETS snapshot → disk → fresh `Graph.new()`.
Written on every `record_deploy`, hourly decay, and `terminate/2`.

---

## K8S MOCK PATTERN
Expand Down Expand Up @@ -224,9 +259,21 @@ Occurrences are structured events generated after every deployment.

---

## PROGRESSIVE DELIVERY

Canary and blue_green strategies create Kulta Rollout CRDs and return `:progressing`. A `Progressive.Monitor` GenServer per rollout polls the CRD status.

- **Phases**: `:progressing` → `:promoted` / `:completed` / `:degraded` / `:paused` / `:failed`
- **Terminal**: `:completed`, `:promoted`, `:failed` — Monitor stops and records outcome
- **Poll interval**: 10s, **Max duration**: 1 hour (timeout → `:failed`)
- **Manual control**: `Surface.promote(deploy_id)` / `Surface.rollback(deploy_id)`
Comment on lines +267 to +269
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section says terminal phases mean the Monitor “stops and records outcome”. In Nopea.Progressive.Monitor, record_outcome/1 runs on the polling path and timeout, but not on the manual promote/rollback call paths (they stop without recording). Either adjust the docs to reflect the current behavior or update the implementation so manual promote/rollback also records the outcome before stopping.

Suggested change
- **Terminal**: `:completed`, `:promoted`, `:failed` — Monitor stops and records outcome
- **Poll interval**: 10s, **Max duration**: 1 hour (timeout → `:failed`)
- **Manual control**: `Surface.promote(deploy_id)` / `Surface.rollback(deploy_id)`
- **Terminal**: `:completed`, `:promoted`, `:failed`on poll/timeout paths, Monitor stops and records outcome
- **Poll interval**: 10s, **Max duration**: 1 hour (timeout → `:failed`)
- **Manual control**: `Surface.promote(deploy_id)` / `Surface.rollback(deploy_id)` — stop Monitor without recording; outcome is handled by the caller

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 0d0961f. Added record_outcome(rollout) to both the promote and rollback handle_call paths before stopping. Memory and Cache now get updated on manual intervention, same as the poll/timeout paths.

- **Registry**: Monitors register as `{:rollout, deploy_id}` in `Nopea.Registry`

---

## MCP SERVER

JSON-RPC 2.0 over stdin/stdout. Tools: `nopea_deploy`, `nopea_context`, `nopea_history`, `nopea_health`, `nopea_explain`.
JSON-RPC 2.0 over stdin/stdout. Tools: `nopea_deploy`, `nopea_context`, `nopea_history`, `nopea_health`, `nopea_explain`, `nopea_services`, `nopea_promote`, `nopea_rollback`.

---

Expand Down
3 changes: 2 additions & 1 deletion config/config.exs
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ config :logger, :default_formatter,
:type,
:endpoint,
:path,
:manifest_count
:manifest_count,
:request_id
]

import_config "#{config_env()}.exs"
40 changes: 40 additions & 0 deletions config/runtime.exs
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,48 @@ import Config
if config_env() == :prod do
cluster_enabled = System.get_env("NOPEA_CLUSTER_ENABLED", "false") == "true"

cluster_strategy =
case System.get_env("NOPEA_CLUSTER_STRATEGY") do
"kubernetes_dns" -> :kubernetes_dns
"gossip" -> :gossip
"epmd" -> :epmd
_ -> :kubernetes_dns
end

cluster_hosts =
case System.get_env("NOPEA_CLUSTER_HOSTS") do
nil ->
[]

hosts ->
hosts
|> String.split(",", trim: true)
|> Enum.map(&String.to_atom(String.trim(&1)))
end

cluster_polling_interval =
case System.get_env("NOPEA_CLUSTER_POLLING_INTERVAL") do
nil -> 5_000
val -> String.to_integer(val)
end

cluster_gossip_port =
case System.get_env("NOPEA_CLUSTER_GOSSIP_PORT") do
nil -> 45_892
val -> String.to_integer(val)
end

config :nopea,
cluster_enabled: cluster_enabled,
cluster_strategy: cluster_strategy,
cluster_service: System.get_env("NOPEA_CLUSTER_SERVICE", "nopea-headless"),
cluster_app_name: System.get_env("NOPEA_CLUSTER_APP_NAME", "nopea"),
pod_namespace: System.get_env("NOPEA_POD_NAMESPACE", "default"),
cluster_polling_interval: cluster_polling_interval,
cluster_gossip_port: cluster_gossip_port,
cluster_gossip_secret: System.get_env("NOPEA_CLUSTER_GOSSIP_SECRET"),
cluster_hosts: cluster_hosts,
enable_router: System.get_env("NOPEA_ENABLE_ROUTER", "true") == "true",
api_key: System.get_env("NOPEA_API_KEY"),
api_port: String.to_integer(System.get_env("NOPEA_API_PORT", "4000"))
end
59 changes: 59 additions & 0 deletions lib/nopea/api/auth_plug.ex
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
defmodule Nopea.API.AuthPlug do
@moduledoc """
Plug that authenticates API requests using an `x-api-key` header.

## Behaviour

- If no API key is configured (nil), all requests pass through (dev mode).
- If a key is configured, requests must include a matching `x-api-key` header.
- `/health` and `/ready` paths are always allowed without authentication.
- Returns 401 Unauthorized for missing or invalid keys.
"""

@behaviour Plug

import Plug.Conn

@skip_paths ["/health", "/ready"]

@impl true
@spec init(keyword()) :: keyword()
def init(opts), do: opts

@impl true
@spec call(Plug.Conn.t(), keyword()) :: Plug.Conn.t()
def call(conn, _opts) do
if skip_auth?(conn) do
conn
else
case configured_key() do
nil ->
conn

expected_key ->
verify_key(conn, expected_key)
end
end
end

defp skip_auth?(conn) do
conn.request_path in @skip_paths
end

defp configured_key do
Application.get_env(:nopea, :api_key)
end

defp verify_key(conn, expected_key) do
case get_req_header(conn, "x-api-key") do
[^expected_key] ->
conn

_ ->
conn
|> put_resp_content_type("application/json")
|> send_resp(401, Jason.encode!(%{error: "unauthorized"}))
|> halt()
end
end
end
1 change: 1 addition & 0 deletions lib/nopea/api/router.ex
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ defmodule Nopea.API.Router do
json_decoder: Jason
)

plug(Nopea.API.AuthPlug)
plug(:match)
plug(:dispatch)

Expand Down
6 changes: 6 additions & 0 deletions lib/nopea/application.ex
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ defmodule Nopea.Application do
|> add_memory_child()
|> add_cluster_child(cluster_enabled)
|> add_registry_child(cluster_enabled)
|> add_distributed_supervisor_child(cluster_enabled)
|> add_service_agent_child()
|> add_progressive_child()
|> add_router_child()
Expand Down Expand Up @@ -82,6 +83,11 @@ defmodule Nopea.Application do
end
end

defp add_distributed_supervisor_child(children, false), do: children

defp add_distributed_supervisor_child(children, true),
do: children ++ [Nopea.DistributedSupervisor]

defp add_service_agent_child(children) do
if Application.get_env(:nopea, :enable_deploy_supervisor, true),
do: children ++ [Nopea.ServiceAgent.Supervisor],
Expand Down
1 change: 1 addition & 0 deletions lib/nopea/cli.ex
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,7 @@ defmodule Nopea.CLI do
{:ok, _apps} ->
port = Application.get_env(:nopea, :api_port, 4000)
Logger.info("Nopea API listening on port #{port}")
System.trap_signal(:sigterm, fn -> System.stop(0) end)
Process.sleep(:infinity)

{:error, reason} ->
Expand Down
Loading
Loading