fix: address all review findings — wire memory strategy, fix dead code, add tests#34
Conversation
The core "deployment tool with memory" feature was dead code — select_strategy/2 always returned :direct when strategy was nil, ignoring the memory context entirely. Add a new clause that checks known services for failure patterns above the configurable canary_threshold (default 0.15). Services with recurring failures now auto-select canary strategy for safer rollouts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the overbroad rescue in verify_deploy/2 that silently converted programming errors into "unverified" status. Pass k8s_module through to Drift.verify_manifest so verification uses the configured K8s module. Tests now stub get_resource properly instead of relying on the rescue to mask connection failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SYKLI target called Deploy.run() directly, bypassing the ServiceAgent's queue/serialization/backpressure. Change to Deploy.deploy() which routes through ServiceAgent when available. Add test verifying ServiceAgent state is updated after SYKLI deploy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tests Replace flaky Process.sleep barriers in distributed_registry_test with assert_eventually polling helper. Horde.Registry operations are async and can't be reliably synced with fixed sleep durations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove rollback from moduledoc (not implemented) - Add require Logger, replace IO.puts with Logger - Fix serve command to use Application.ensure_all_started - Route deploy through Deploy.deploy instead of Deploy.run Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- nopea_deploy: missing service, empty service errors - nopea_history: cache unavailable, known/unknown service - nopea_explain: memory unavailable, unknown service, failure patterns - nopea_context: known service with memory - tools/list: verify all 5 tools listed Add mcp_with_memory_test.exs (async: false) for tests needing Memory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add structured logging metadata to default formatter config. Fixes credo warnings about metadata keys not in Logger config and ensures service/deploy context appears in log output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Query Nopea.Registry for currently-deploying services instead of hardcoding empty list in record_outcome - Fix ingestor to create :deployed_together edges (not just orphan nodes) so the graph captures temporal deployment correlations - Add :deployed_together to valid relation types Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- OTP tree: Replace Deploy.Supervisor with ServiceAgent.Supervisor, add Cluster, fix Registry description - KEY MODULES: Remove deleted Worker/Supervisor/Canary/BlueGreen, add ServiceAgent, Kulta.RolloutBuilder, Graph modules - FILE LOCATIONS: Remove deleted files, add new ones - DEPENDENCIES: Replace kerto (inlined) with false_protocol - MCP SERVER: Add nopea_health tool - STRATEGY AUTO-SELECTION: Update to actual memory-aware implementation - VERIFICATION: Update test count to 280 - MEMORY: Fix graph stats API, update relationship types - K8s Mock Pattern: Document get_resource stub requirement Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes and completes the “deploy tool with memory” integration by routing deploys through per-service agents, using memory context to auto-select strategies, improving logging, and expanding/strengthening test coverage.
Changes:
- Route deploy execution through
ServiceAgent(Deploy.deploy/1) and update SYKLI/CLI call sites accordingly. - Add memory-aware strategy auto-selection and record concurrent deploy relationships into the memory graph.
- Add/extend tests for MCP tools, strategy selection, SYKLI routing, and reduce flakiness in distributed registry tests.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
lib/nopea/sykli/target.ex |
Switch SYKLI deploy execution to Deploy.deploy/1 to route through ServiceAgent. |
lib/nopea/memory/ingestor.ex |
Ingest concurrent_deploys into the graph via :deployed_together relationships. |
lib/nopea/graph/relation_type.ex |
Register the new :deployed_together relationship type. |
lib/nopea/deploy.ex |
Add memory-based strategy selection, pass k8s_module into drift verification, record concurrent deploys. |
lib/nopea/cli.ex |
Update CLI deploy entry point and rework serve behavior/logging. |
config/config.exs |
Configure Logger default formatter/metadata for structured deploy context. |
CLAUDE.md |
Update repo documentation to reflect current architecture and conventions. |
test/nopea/sykli/target_test.exs |
Add coverage ensuring SYKLI routing goes through ServiceAgent. |
test/nopea/memory/ingestor_test.exs |
Add test coverage for :deployed_together edges creation. |
test/nopea/mcp_with_memory_test.exs |
Add MCP tool tests covering memory-backed explain/context/history behavior. |
test/nopea/mcp_test.exs |
Tighten tools/list assertions and add error-path coverage for MCP tools. |
test/nopea/distributed_registry_test.exs |
Replace sleep-based barriers with polling assertions to reduce flakiness. |
test/nopea/deploy_test.exs |
Add strategy-selection tests (memory-aware) and crash-propagation test for verification. |
test/nopea/deploy_integration_test.exs |
Stub K8s get_resource for cluster-less test environment. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| defp get_concurrent_services(current_service) do | ||
| if Process.whereis(Nopea.Registry) do | ||
| Registry.select(Nopea.Registry, [ | ||
| {{:"$1", :"$2", :_}, [], [{{:"$1", :"$2"}}]} | ||
| ]) | ||
| |> Enum.filter(fn | ||
| {{:service, name}, _pid} when name != current_service -> true | ||
| _ -> false | ||
| end) | ||
| |> Enum.map(fn {{:service, name}, _pid} -> name end) | ||
| else |
There was a problem hiding this comment.
get_concurrent_services/1 currently returns all services registered in Nopea.Registry (i.e., all ServiceAgents), not the services that are actually deploying concurrently. This will over-report concurrent_deploys and can create misleading :deployed_together edges. Consider filtering to agents with status == :deploying (e.g., via Nopea.ServiceAgent.status/1/a registry value) or tracking deploying services separately.
| # Give casts time to process | ||
| Process.sleep(50) | ||
|
|
There was a problem hiding this comment.
Using Process.sleep/1 as a synchronization barrier for the async Nopea.Memory.record_deploy/1 cast can make this test flaky under load/CI. Prefer a deterministic barrier (e.g., issue any GenServer.call to Nopea.Memory after the casts, since mailbox order guarantees the prior casts have been processed) instead of sleeping.
| # Give casts time to process | |
| Process.sleep(50) |
| Process.sleep(50) | ||
|
|
There was a problem hiding this comment.
This Process.sleep/1 is acting as a timing-based sync barrier for memory ingestion. To avoid flakiness, replace it with a deterministic barrier (e.g., a GenServer.call to Nopea.Memory after the cast(s), or an assert_eventually helper) so the test doesn't depend on scheduler timing.
| Process.sleep(50) | ||
|
|
There was a problem hiding this comment.
Timing-based Process.sleep/1 here can introduce intermittent failures on slower machines. Since record_deploy/1 is a cast, consider using a deterministic synchronization method (e.g., follow the cast with a GenServer.call to Nopea.Memory) instead of sleeping.
| }) | ||
| end | ||
|
|
||
| Process.sleep(50) |
There was a problem hiding this comment.
This test relies on Process.sleep/1 to wait for async Nopea.Memory.record_deploy/1 processing, which can be flaky. Prefer a deterministic barrier (for example, call Nopea.Memory.node_count/0 or another GenServer.call after the casts) to ensure ingestion has completed before asserting.
| Process.sleep(50) | |
| # Use a deterministic GenServer.call barrier instead of sleeping | |
| _ = Nopea.Memory.node_count() |
| concurrent_deploys: [] | ||
| }) | ||
|
|
||
| Process.sleep(50) |
There was a problem hiding this comment.
Same concern as above: Process.sleep/1 is used as a sync barrier for an async cast. Replacing it with a deterministic GenServer.call barrier (or a polling helper) will make this test more reliable.
| Process.sleep(50) | |
| Nopea.Memory | |
| |> Process.whereis() | |
| |> :sys.get_state() |
| Logger.info("Starting Nopea daemon...") | ||
| Application.put_env(:nopea, :enable_api, true) | ||
|
|
||
| case Supervisor.start_child(Nopea.AppSupervisor, Nopea.API.Router) do | ||
| {:ok, _pid} -> | ||
| case Application.ensure_all_started(:nopea) do | ||
| {:ok, _apps} -> |
There was a problem hiding this comment.
serve/1 sets :enable_api, but the application supervision tree gates the HTTP server on :enable_router (see Nopea.Application.add_router_child/1). As written, nopea serve may never start Nopea.API.Router unless enable_router is already true in config. Use the same config key here (or update the application to read :enable_api consistently).
…t barriers, correct config key - get_concurrent_services: filter to agents with status == :deploying instead of returning all registered services (avoids misleading :deployed_together edges) - Replace Process.sleep(50) sync barriers with Memory.node_count() call — BEAM mailbox ordering guarantees prior casts complete first - CLI serve: use :enable_router (matching Application) not :enable_api Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addresses Credo warning about inefficient filter chain in get_concurrent_services/1 — single pass instead of three. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove large module/file tables (discoverable via ls). Add config feature flags, ServiceAgent constraints, mailbox flush test pattern, single-test-by-line command, test factory paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
:canaryverify_deployrescue — let OTP handle crashes instead of masking programming errors as "unverified"ServiceAgentinstead of bypassing queue/serializationservecommand, useLoggerinstead ofIO.putsProcess.sleepsync barriers withassert_eventuallypollingconcurrent_deploys— queryNopea.Registryfor currently-deploying services, create:deployed_togethergraph edgesCommits
feat: wire memory-based strategy auto-selectionfix: remove verify_deploy rescue, pass k8s_module to Driftfix: route SYKLI target through ServiceAgentfix: clean up CLI — remove rollback vaporware, fix serve commandtest: add missing MCP tool testsfix: replace Process.sleep sync barriers with polling in distributed testschore: configure Logger metadata keysfeat: populate concurrent_deploys from ServiceAgent registrydocs: update CLAUDE.md to match current codebaseTest plan
mix compile --warnings-as-errorspassesmix testpasses (280 tests, 0 failures)mix format --check-formattedpassesmix credopassesmix escript.buildsucceedsstrategy: nil→ auto-selects:canaryverify_deploycrash propagation on malformed manifestsnopea servedoesn't crash🤖 Generated with Claude Code