Skip to content

Surface alignment, memory persistence, progressive delivery#36

Merged
yairfalse merged 12 commits into
mainfrom
feat/surface-persistence-progressive
Mar 18, 2026
Merged

Surface alignment, memory persistence, progressive delivery#36
yairfalse merged 12 commits into
mainfrom
feat/surface-persistence-progressive

Conversation

@yairfalse
Copy link
Copy Markdown
Collaborator

Summary

  • Surface module: Extract Nopea.Surface as unified facade for CLI, MCP, and HTTP — all three now delegate to the same domain logic with graceful degradation
  • Memory persistence: Graph persists to .nopea/graph.etf (versioned binary), restores on startup via ETS → disk → fresh fallback chain
  • Progressive delivery: Canary/blue_green strategies return :progressing and start a Progressive.Monitor GenServer that polls Kulta Rollout CRDs. Supports manual promote/rollback via all surfaces
  • New CLI commands: explain, health, services, promote, rollback, mcp
  • New MCP tools: nopea_services, nopea_promote, nopea_rollback (8 total)
  • New HTTP routes: GET /api/status/:service, GET /api/explain/:service, GET /api/services, POST /api/promote/:deploy_id, POST /api/rollback/:deploy_id

Test plan

  • 305 tests pass, 0 failures, 0 Credo issues
  • Memory persistence: stop Memory GenServer, restart → graph restores from .nopea/graph.etf
  • Progressive: deploy with strategy: :canaryresult.status == :progressing → Monitor polls → terminal phase
  • Promote/rollback: nopea promote <deploy_id> patches Kulta Rollout → status transitions
  • Surface alignment: nopea explain <svc>, nopea health, nopea services all work
  • HTTP error responses return generic messages (no inspect leaks)

🤖 Generated with Claude Code

…ence

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the repository’s operational/architecture guide (CLAUDE.md) to reflect recent system changes (Surface facade alignment, memory persistence behavior, and progressive delivery/monitoring).

Changes:

  • Refreshes the documented deploy pipeline to route through Nopea.Surface and include progressive delivery components.
  • Documents Memory graph persistence/restore behavior and adds a dedicated progressive delivery section.
  • Updates supervision tree, feature flags, MCP tool list, and test count.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread CLAUDE.md Outdated
Comment on lines +41 to +45
→ Memory.get_deploy_context() # graph query
→ select_strategy() # direct/canary/blue_green (memory-aware)
→ Strategy.Direct.execute() # K8s server-side apply
→ Drift.verify_manifest() # post-deploy 3-way diff
→ Strategy.Direct.execute() # K8s server-side apply (direct)
→ Kulta.RolloutBuilder.build() # Rollout CRD (canary/blue_green)
→ Drift.verify_manifest() # post-deploy 3-way diff (direct only)
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deploy pipeline diagram implies Strategy.Direct.execute() and Kulta.RolloutBuilder.build() both run in sequence after select_strategy(), but in code only one executes depending on the selected strategy (:direct vs :canary/:blue_green). Consider updating this diagram to show the branching paths (direct → apply + drift; progressive → build/apply Rollout CRD → monitor).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0d0961f — updated the diagram to show the branching: :direct path goes through apply + drift verify, :canary/:blue_green path goes through RolloutBuilder + Progressive.Monitor.

Comment thread CLAUDE.md
Comment on lines +256 to +258
- **Terminal**: `:completed`, `:promoted`, `:failed` — Monitor stops and records outcome
- **Poll interval**: 10s, **Max duration**: 1 hour (timeout → `:failed`)
- **Manual control**: `Surface.promote(deploy_id)` / `Surface.rollback(deploy_id)`
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section says terminal phases mean the Monitor “stops and records outcome”. In Nopea.Progressive.Monitor, record_outcome/1 runs on the polling path and timeout, but not on the manual promote/rollback call paths (they stop without recording). Either adjust the docs to reflect the current behavior or update the implementation so manual promote/rollback also records the outcome before stopping.

Suggested change
- **Terminal**: `:completed`, `:promoted`, `:failed` — Monitor stops and records outcome
- **Poll interval**: 10s, **Max duration**: 1 hour (timeout → `:failed`)
- **Manual control**: `Surface.promote(deploy_id)` / `Surface.rollback(deploy_id)`
- **Terminal**: `:completed`, `:promoted`, `:failed`on poll/timeout paths, Monitor stops and records outcome
- **Poll interval**: 10s, **Max duration**: 1 hour (timeout → `:failed`)
- **Manual control**: `Surface.promote(deploy_id)` / `Surface.rollback(deploy_id)` — stop Monitor without recording; outcome is handled by the caller

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 0d0961f. Added record_outcome(rollout) to both the promote and rollback handle_call paths before stopping. Memory and Cache now get updated on manual intervention, same as the poll/timeout paths.

yairfalse and others added 11 commits March 12, 2026 23:52
- Add record_outcome call to promote and rollback handle_call paths
  so Memory and Cache are updated on manual intervention
- Update CLAUDE.md pipeline diagram to show branching between direct
  and progressive strategy paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Change Occurrence.build/2 to return {:ok, occ} | {:error, term()} instead
of raising on failure. Update deploy.ex caller and test files to handle
the new tuple return.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement proper shutdown/exit/cancelled notification handling in MCP server.
Add System.trap_signal(:sigterm) in CLI serve command for graceful container
shutdown.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Read all cluster-related config keys from environment variables in the
prod block of runtime.exs: NOPEA_CLUSTER_STRATEGY, NOPEA_CLUSTER_SERVICE,
NOPEA_CLUSTER_APP_NAME, NOPEA_POD_NAMESPACE, NOPEA_CLUSTER_POLLING_INTERVAL,
NOPEA_CLUSTER_GOSSIP_PORT, NOPEA_CLUSTER_GOSSIP_SECRET, NOPEA_CLUSTER_HOSTS.
Update the configuration table in CLAUDE.md with all cluster keys.
Also fix corrupted literal \n in config/config.exs logger metadata.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Create Nopea.API.AuthPlug that checks x-api-key header against the
configured NOPEA_API_KEY. Skip auth for /health and /ready paths.
When no key is configured (nil), all requests pass through (dev mode).
Wire plug into router between Plug.Parsers and :match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add add_distributed_supervisor_child/2 helper to the application startup
pipeline, placed after the registry child and before the service_agent
child. When cluster_enabled is true, Nopea.DistributedSupervisor (Horde)
is added to the supervision tree for cross-node process distribution.

Also fix async test races in auth_plug_test and router_test by making
them non-async since they modify Application config.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5 of 6 metrics defined in metrics.ex never fired because no code
emitted the corresponding telemetry events. Add emissions for:

- [:nopea, :deploy, :stop] via new emit_deploy_complete/2 in Metrics
- [:nopea, :memory, :query, :stop] with duration in Memory.get_deploy_context
- [:nopea, :verify, :drift] with count in Drift.verify_manifest
- [:nopea, :deploys, :active] with count in ServiceAgent.health

Also capture and thread the metrics start_time through deploy.ex so
emit_deploy_complete fires on both success and progressive delivery paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… state

Cache.put_last_applied/3 existed but was never called after applying
manifests, so the three-way diff in drift detection could never find
last-applied state and always returned :needs_apply or :new_resource.

- Add cache_applied_manifests/2 in deploy.ex that iterates applied
  manifests and calls Cache.put_last_applied/3 for each one
- Call it in both the direct strategy success path (before verify) and
  the canary/blue_green progressing path
- Add public Drift.resource_key/1 delegating to Applier.resource_key/1

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…deploy logic

The canary and blue_green strategy logic was inlined in Deploy.execute_strategy/2.
Extract into dedicated modules implementing the Strategy behaviour, matching the
existing Strategy.Direct pattern. Update the behaviour callback to include the
{:ok, {applied, :progressing}} return type for progressive strategies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add assert_eventually polling macro in test/support/test_helpers.ex and
replace all Process.sleep calls with deterministic alternatives:
- Memory barrier (node_count/0 sync call) for async cast flushes
- assert_eventually for supervisor restart and state convergence checks
- assert_receive with message signaling for deploy start coordination

Leaves distributed_registry_test.exs unchanged (intentional polling sleeps).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yairfalse yairfalse merged commit 42f0c6a into main Mar 18, 2026
0 of 2 checks passed
@yairfalse yairfalse deleted the feat/surface-persistence-progressive branch March 18, 2026 23:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants