Skip to content

fix: address all review findings — wire memory strategy, fix dead code, add tests#34

Merged
yairfalse merged 12 commits into
mainfrom
fix/review-findings-v2
Mar 5, 2026
Merged

fix: address all review findings — wire memory strategy, fix dead code, add tests#34
yairfalse merged 12 commits into
mainfrom
fix/review-findings-v2

Conversation

@yairfalse
Copy link
Copy Markdown
Collaborator

Summary

  • Wire memory-based strategy auto-selection — the core "deployment tool with memory" feature now actually uses memory for decisions. Services with failure patterns above threshold auto-select :canary
  • Remove verify_deploy rescue — let OTP handle crashes instead of masking programming errors as "unverified"
  • Fix SYKLI target routing — route through ServiceAgent instead of bypassing queue/serialization
  • Fix CLI — remove rollback vaporware, fix serve command, use Logger instead of IO.puts
  • Add missing MCP tool tests — cover all 5 tools including error paths
  • Fix flaky distributed registry tests — replace Process.sleep sync barriers with assert_eventually polling
  • Configure Logger metadata — structured logging with deploy context (service, deploy_id, namespace, etc.)
  • Populate concurrent_deploys — query Nopea.Registry for currently-deploying services, create :deployed_together graph edges
  • Update CLAUDE.md — match current codebase reality (modules, deps, supervision tree, API)

Commits

  1. feat: wire memory-based strategy auto-selection
  2. fix: remove verify_deploy rescue, pass k8s_module to Drift
  3. fix: route SYKLI target through ServiceAgent
  4. fix: clean up CLI — remove rollback vaporware, fix serve command
  5. test: add missing MCP tool tests
  6. fix: replace Process.sleep sync barriers with polling in distributed tests
  7. chore: configure Logger metadata keys
  8. feat: populate concurrent_deploys from ServiceAgent registry
  9. docs: update CLAUDE.md to match current codebase

Test plan

  • mix compile --warnings-as-errors passes
  • mix test passes (280 tests, 0 failures)
  • mix format --check-formatted passes
  • mix credo passes
  • mix escript.build succeeds
  • Verify memory-based strategy: service with 2+ failures and strategy: nil → auto-selects :canary
  • Verify verify_deploy crash propagation on malformed manifests
  • Verify SYKLI deploys update ServiceAgent state
  • Verify nopea serve doesn't crash

🤖 Generated with Claude Code

yairfalse and others added 9 commits March 3, 2026 23:11
The core "deployment tool with memory" feature was dead code — select_strategy/2
always returned :direct when strategy was nil, ignoring the memory context entirely.

Add a new clause that checks known services for failure patterns above the
configurable canary_threshold (default 0.15). Services with recurring failures
now auto-select canary strategy for safer rollouts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the overbroad rescue in verify_deploy/2 that silently converted
programming errors into "unverified" status. Pass k8s_module through to
Drift.verify_manifest so verification uses the configured K8s module.

Tests now stub get_resource properly instead of relying on the rescue to
mask connection failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SYKLI target called Deploy.run() directly, bypassing the ServiceAgent's
queue/serialization/backpressure. Change to Deploy.deploy() which routes
through ServiceAgent when available.

Add test verifying ServiceAgent state is updated after SYKLI deploy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tests

Replace flaky Process.sleep barriers in distributed_registry_test with
assert_eventually polling helper. Horde.Registry operations are async
and can't be reliably synced with fixed sleep durations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove rollback from moduledoc (not implemented)
- Add require Logger, replace IO.puts with Logger
- Fix serve command to use Application.ensure_all_started
- Route deploy through Deploy.deploy instead of Deploy.run

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- nopea_deploy: missing service, empty service errors
- nopea_history: cache unavailable, known/unknown service
- nopea_explain: memory unavailable, unknown service, failure patterns
- nopea_context: known service with memory
- tools/list: verify all 5 tools listed

Add mcp_with_memory_test.exs (async: false) for tests needing Memory.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add structured logging metadata to default formatter config. Fixes
credo warnings about metadata keys not in Logger config and ensures
service/deploy context appears in log output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Query Nopea.Registry for currently-deploying services instead of
  hardcoding empty list in record_outcome
- Fix ingestor to create :deployed_together edges (not just orphan nodes)
  so the graph captures temporal deployment correlations
- Add :deployed_together to valid relation types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- OTP tree: Replace Deploy.Supervisor with ServiceAgent.Supervisor, add
  Cluster, fix Registry description
- KEY MODULES: Remove deleted Worker/Supervisor/Canary/BlueGreen, add
  ServiceAgent, Kulta.RolloutBuilder, Graph modules
- FILE LOCATIONS: Remove deleted files, add new ones
- DEPENDENCIES: Replace kerto (inlined) with false_protocol
- MCP SERVER: Add nopea_health tool
- STRATEGY AUTO-SELECTION: Update to actual memory-aware implementation
- VERIFICATION: Update test count to 280
- MEMORY: Fix graph stats API, update relationship types
- K8s Mock Pattern: Document get_resource stub requirement

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes and completes the “deploy tool with memory” integration by routing deploys through per-service agents, using memory context to auto-select strategies, improving logging, and expanding/strengthening test coverage.

Changes:

  • Route deploy execution through ServiceAgent (Deploy.deploy/1) and update SYKLI/CLI call sites accordingly.
  • Add memory-aware strategy auto-selection and record concurrent deploy relationships into the memory graph.
  • Add/extend tests for MCP tools, strategy selection, SYKLI routing, and reduce flakiness in distributed registry tests.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
lib/nopea/sykli/target.ex Switch SYKLI deploy execution to Deploy.deploy/1 to route through ServiceAgent.
lib/nopea/memory/ingestor.ex Ingest concurrent_deploys into the graph via :deployed_together relationships.
lib/nopea/graph/relation_type.ex Register the new :deployed_together relationship type.
lib/nopea/deploy.ex Add memory-based strategy selection, pass k8s_module into drift verification, record concurrent deploys.
lib/nopea/cli.ex Update CLI deploy entry point and rework serve behavior/logging.
config/config.exs Configure Logger default formatter/metadata for structured deploy context.
CLAUDE.md Update repo documentation to reflect current architecture and conventions.
test/nopea/sykli/target_test.exs Add coverage ensuring SYKLI routing goes through ServiceAgent.
test/nopea/memory/ingestor_test.exs Add test coverage for :deployed_together edges creation.
test/nopea/mcp_with_memory_test.exs Add MCP tool tests covering memory-backed explain/context/history behavior.
test/nopea/mcp_test.exs Tighten tools/list assertions and add error-path coverage for MCP tools.
test/nopea/distributed_registry_test.exs Replace sleep-based barriers with polling assertions to reduce flakiness.
test/nopea/deploy_test.exs Add strategy-selection tests (memory-aware) and crash-propagation test for verification.
test/nopea/deploy_integration_test.exs Stub K8s get_resource for cluster-less test environment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/nopea/deploy.ex
Comment on lines +411 to +421
defp get_concurrent_services(current_service) do
if Process.whereis(Nopea.Registry) do
Registry.select(Nopea.Registry, [
{{:"$1", :"$2", :_}, [], [{{:"$1", :"$2"}}]}
])
|> Enum.filter(fn
{{:service, name}, _pid} when name != current_service -> true
_ -> false
end)
|> Enum.map(fn {{:service, name}, _pid} -> name end)
else
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_concurrent_services/1 currently returns all services registered in Nopea.Registry (i.e., all ServiceAgents), not the services that are actually deploying concurrently. This will over-report concurrent_deploys and can create misleading :deployed_together edges. Consider filtering to agents with status == :deploying (e.g., via Nopea.ServiceAgent.status/1/a registry value) or tracking deploying services separately.

Copilot uses AI. Check for mistakes.
Comment thread test/nopea/deploy_test.exs Outdated
Comment on lines +127 to +129
# Give casts time to process
Process.sleep(50)

Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Process.sleep/1 as a synchronization barrier for the async Nopea.Memory.record_deploy/1 cast can make this test flaky under load/CI. Prefer a deterministic barrier (e.g., issue any GenServer.call to Nopea.Memory after the casts, since mailbox order guarantees the prior casts have been processed) instead of sleeping.

Suggested change
# Give casts time to process
Process.sleep(50)

Copilot uses AI. Check for mistakes.
Comment thread test/nopea/deploy_test.exs Outdated
Comment on lines +165 to +166
Process.sleep(50)

Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Process.sleep/1 is acting as a timing-based sync barrier for memory ingestion. To avoid flakiness, replace it with a deterministic barrier (e.g., a GenServer.call to Nopea.Memory after the cast(s), or an assert_eventually helper) so the test doesn't depend on scheduler timing.

Copilot uses AI. Check for mistakes.
Comment thread test/nopea/deploy_test.exs Outdated
Comment on lines +194 to +195
Process.sleep(50)

Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timing-based Process.sleep/1 here can introduce intermittent failures on slower machines. Since record_deploy/1 is a cast, consider using a deterministic synchronization method (e.g., follow the cast with a GenServer.call to Nopea.Memory) instead of sleeping.

Copilot uses AI. Check for mistakes.
Comment thread test/nopea/mcp_with_memory_test.exs Outdated
})
end

Process.sleep(50)
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test relies on Process.sleep/1 to wait for async Nopea.Memory.record_deploy/1 processing, which can be flaky. Prefer a deterministic barrier (for example, call Nopea.Memory.node_count/0 or another GenServer.call after the casts) to ensure ingestion has completed before asserting.

Suggested change
Process.sleep(50)
# Use a deterministic GenServer.call barrier instead of sleeping
_ = Nopea.Memory.node_count()

Copilot uses AI. Check for mistakes.
Comment thread test/nopea/mcp_with_memory_test.exs Outdated
concurrent_deploys: []
})

Process.sleep(50)
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern as above: Process.sleep/1 is used as a sync barrier for an async cast. Replacing it with a deterministic GenServer.call barrier (or a polling helper) will make this test more reliable.

Suggested change
Process.sleep(50)
Nopea.Memory
|> Process.whereis()
|> :sys.get_state()

Copilot uses AI. Check for mistakes.
Comment thread lib/nopea/cli.ex
Comment on lines +102 to +106
Logger.info("Starting Nopea daemon...")
Application.put_env(:nopea, :enable_api, true)

case Supervisor.start_child(Nopea.AppSupervisor, Nopea.API.Router) do
{:ok, _pid} ->
case Application.ensure_all_started(:nopea) do
{:ok, _apps} ->
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serve/1 sets :enable_api, but the application supervision tree gates the HTTP server on :enable_router (see Nopea.Application.add_router_child/1). As written, nopea serve may never start Nopea.API.Router unless enable_router is already true in config. Use the same config key here (or update the application to read :enable_api consistently).

Copilot uses AI. Check for mistakes.
yairfalse and others added 3 commits March 5, 2026 00:06
…t barriers, correct config key

- get_concurrent_services: filter to agents with status == :deploying
  instead of returning all registered services (avoids misleading
  :deployed_together edges)
- Replace Process.sleep(50) sync barriers with Memory.node_count()
  call — BEAM mailbox ordering guarantees prior casts complete first
- CLI serve: use :enable_router (matching Application) not :enable_api

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addresses Credo warning about inefficient filter chain in
get_concurrent_services/1 — single pass instead of three.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove large module/file tables (discoverable via ls). Add config
feature flags, ServiceAgent constraints, mailbox flush test pattern,
single-test-by-line command, test factory paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yairfalse yairfalse merged commit 03eb41f into main Mar 5, 2026
0 of 2 checks passed
@yairfalse yairfalse deleted the fix/review-findings-v2 branch March 5, 2026 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants