Surface alignment, memory persistence, progressive delivery#36
Conversation
…ence Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Updates the repository’s operational/architecture guide (CLAUDE.md) to reflect recent system changes (Surface facade alignment, memory persistence behavior, and progressive delivery/monitoring).
Changes:
- Refreshes the documented deploy pipeline to route through
Nopea.Surfaceand include progressive delivery components. - Documents Memory graph persistence/restore behavior and adds a dedicated progressive delivery section.
- Updates supervision tree, feature flags, MCP tool list, and test count.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| → Memory.get_deploy_context() # graph query | ||
| → select_strategy() # direct/canary/blue_green (memory-aware) | ||
| → Strategy.Direct.execute() # K8s server-side apply | ||
| → Drift.verify_manifest() # post-deploy 3-way diff | ||
| → Strategy.Direct.execute() # K8s server-side apply (direct) | ||
| → Kulta.RolloutBuilder.build() # Rollout CRD (canary/blue_green) | ||
| → Drift.verify_manifest() # post-deploy 3-way diff (direct only) |
There was a problem hiding this comment.
The deploy pipeline diagram implies Strategy.Direct.execute() and Kulta.RolloutBuilder.build() both run in sequence after select_strategy(), but in code only one executes depending on the selected strategy (:direct vs :canary/:blue_green). Consider updating this diagram to show the branching paths (direct → apply + drift; progressive → build/apply Rollout CRD → monitor).
There was a problem hiding this comment.
Fixed in 0d0961f — updated the diagram to show the branching: :direct path goes through apply + drift verify, :canary/:blue_green path goes through RolloutBuilder + Progressive.Monitor.
| - **Terminal**: `:completed`, `:promoted`, `:failed` — Monitor stops and records outcome | ||
| - **Poll interval**: 10s, **Max duration**: 1 hour (timeout → `:failed`) | ||
| - **Manual control**: `Surface.promote(deploy_id)` / `Surface.rollback(deploy_id)` |
There was a problem hiding this comment.
This section says terminal phases mean the Monitor “stops and records outcome”. In Nopea.Progressive.Monitor, record_outcome/1 runs on the polling path and timeout, but not on the manual promote/rollback call paths (they stop without recording). Either adjust the docs to reflect the current behavior or update the implementation so manual promote/rollback also records the outcome before stopping.
| - **Terminal**: `:completed`, `:promoted`, `:failed` — Monitor stops and records outcome | |
| - **Poll interval**: 10s, **Max duration**: 1 hour (timeout → `:failed`) | |
| - **Manual control**: `Surface.promote(deploy_id)` / `Surface.rollback(deploy_id)` | |
| - **Terminal**: `:completed`, `:promoted`, `:failed` — on poll/timeout paths, Monitor stops and records outcome | |
| - **Poll interval**: 10s, **Max duration**: 1 hour (timeout → `:failed`) | |
| - **Manual control**: `Surface.promote(deploy_id)` / `Surface.rollback(deploy_id)` — stop Monitor without recording; outcome is handled by the caller |
There was a problem hiding this comment.
Good catch — fixed in 0d0961f. Added record_outcome(rollout) to both the promote and rollback handle_call paths before stopping. Memory and Cache now get updated on manual intervention, same as the poll/timeout paths.
- Add record_outcome call to promote and rollback handle_call paths so Memory and Cache are updated on manual intervention - Update CLAUDE.md pipeline diagram to show branching between direct and progressive strategy paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Change Occurrence.build/2 to return {:ok, occ} | {:error, term()} instead
of raising on failure. Update deploy.ex caller and test files to handle
the new tuple return.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement proper shutdown/exit/cancelled notification handling in MCP server. Add System.trap_signal(:sigterm) in CLI serve command for graceful container shutdown. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Read all cluster-related config keys from environment variables in the prod block of runtime.exs: NOPEA_CLUSTER_STRATEGY, NOPEA_CLUSTER_SERVICE, NOPEA_CLUSTER_APP_NAME, NOPEA_POD_NAMESPACE, NOPEA_CLUSTER_POLLING_INTERVAL, NOPEA_CLUSTER_GOSSIP_PORT, NOPEA_CLUSTER_GOSSIP_SECRET, NOPEA_CLUSTER_HOSTS. Update the configuration table in CLAUDE.md with all cluster keys. Also fix corrupted literal \n in config/config.exs logger metadata. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Create Nopea.API.AuthPlug that checks x-api-key header against the configured NOPEA_API_KEY. Skip auth for /health and /ready paths. When no key is configured (nil), all requests pass through (dev mode). Wire plug into router between Plug.Parsers and :match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add add_distributed_supervisor_child/2 helper to the application startup pipeline, placed after the registry child and before the service_agent child. When cluster_enabled is true, Nopea.DistributedSupervisor (Horde) is added to the supervision tree for cross-node process distribution. Also fix async test races in auth_plug_test and router_test by making them non-async since they modify Application config. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5 of 6 metrics defined in metrics.ex never fired because no code emitted the corresponding telemetry events. Add emissions for: - [:nopea, :deploy, :stop] via new emit_deploy_complete/2 in Metrics - [:nopea, :memory, :query, :stop] with duration in Memory.get_deploy_context - [:nopea, :verify, :drift] with count in Drift.verify_manifest - [:nopea, :deploys, :active] with count in ServiceAgent.health Also capture and thread the metrics start_time through deploy.ex so emit_deploy_complete fires on both success and progressive delivery paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… state Cache.put_last_applied/3 existed but was never called after applying manifests, so the three-way diff in drift detection could never find last-applied state and always returned :needs_apply or :new_resource. - Add cache_applied_manifests/2 in deploy.ex that iterates applied manifests and calls Cache.put_last_applied/3 for each one - Call it in both the direct strategy success path (before verify) and the canary/blue_green progressing path - Add public Drift.resource_key/1 delegating to Applier.resource_key/1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…deploy logic
The canary and blue_green strategy logic was inlined in Deploy.execute_strategy/2.
Extract into dedicated modules implementing the Strategy behaviour, matching the
existing Strategy.Direct pattern. Update the behaviour callback to include the
{:ok, {applied, :progressing}} return type for progressive strategies.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add assert_eventually polling macro in test/support/test_helpers.ex and replace all Process.sleep calls with deterministic alternatives: - Memory barrier (node_count/0 sync call) for async cast flushes - assert_eventually for supervisor restart and state convergence checks - assert_receive with message signaling for deploy start coordination Leaves distributed_registry_test.exs unchanged (intentional polling sleeps). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Nopea.Surfaceas unified facade for CLI, MCP, and HTTP — all three now delegate to the same domain logic with graceful degradation.nopea/graph.etf(versioned binary), restores on startup via ETS → disk → fresh fallback chain:progressingand start aProgressive.MonitorGenServer that polls Kulta Rollout CRDs. Supports manual promote/rollback via all surfacesexplain,health,services,promote,rollback,mcpnopea_services,nopea_promote,nopea_rollback(8 total)GET /api/status/:service,GET /api/explain/:service,GET /api/services,POST /api/promote/:deploy_id,POST /api/rollback/:deploy_idTest plan
.nopea/graph.etfstrategy: :canary→result.status == :progressing→ Monitor polls → terminal phasenopea promote <deploy_id>patches Kulta Rollout → status transitionsnopea explain <svc>,nopea health,nopea servicesall workinspectleaks)🤖 Generated with Claude Code