Fix expert status not resetting to pending on exit+relaunch flows#64
Conversation
Four tower methods (reset_expert, return_expert_from_worktree,
launch_expert_in_worktree, change_expert_role) were missing
set_marker("pending") calls, causing stale "processing" status
during expert restarts.
Extract shared helpers (exit_expert_and_set_pending,
prepare_expert_files_with_role, PreparedExpertFiles) into
commands/common.rs and refactor all call sites to use them,
eliminating ~120 lines of duplicated prepare-files logic.
There was a problem hiding this comment.
Pull request overview
Fixes stale expert “processing” state during exit+relaunch flows by centralizing the exit→pending marker update and by refactoring duplicated instruction/agents/settings file preparation into shared helpers.
Changes:
- Adds shared helpers in
commands/common.rsfor (1) exiting an expert and setting status to"pending"and (2) preparing expert instruction/agents/settings files with an explicit role/worktree. - Refactors Tower UI flows and CLI
resetto use the new shared helpers, ensuring"pending"is set during restart transitions. - Adds
.macot/specs/*design/task docs for an “expert row execution status” UI change.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/tower/app.rs |
Uses shared helpers to exit experts + set "pending" and to prepare instruction/agents/settings files across Tower flows. |
src/commands/reset.rs |
Refactors CLI reset to call the shared exit+pending and prepare-files helpers. |
src/commands/common.rs |
Introduces PreparedExpertFiles, prepare_expert_files_with_role, and exit_expert_and_set_pending; adds a unit test for role override behavior. |
.macot/specs/expert-row-execution-status-tasks.md |
Adds implementation plan doc for a separate UI execution-status feature. |
.macot/specs/expert-row-execution-status-design.md |
Adds design doc for the same separate UI execution-status feature. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let expert_name = config.get_expert_name(expert_id); | ||
| let manifest_path = config.queue_path.join("experts_manifest.json"); | ||
| let manifest_path_str = manifest_path.to_string_lossy(); | ||
| let status_dir = config.queue_path.join("status"); | ||
| let status_dir_str = status_dir.to_string_lossy(); | ||
|
|
There was a problem hiding this comment.
prepare_expert_files_with_role no longer validates that expert_id exists in the config. Previously prepare_expert_files returned an error when the expert was missing; now this will silently fall back to expert{expert_id} and may write instruction/settings files for a non-existent expert. Consider restoring the bounds check (e.g., config.get_expert(expert_id).context(...) / ensure!(expert_id < config.num_experts())) so invalid IDs fail fast.
| let expert_name = config.get_expert_name(expert_id); | |
| let manifest_path = config.queue_path.join("experts_manifest.json"); | |
| let manifest_path_str = manifest_path.to_string_lossy(); | |
| let status_dir = config.queue_path.join("status"); | |
| let status_dir_str = status_dir.to_string_lossy(); | |
| // Ensure the requested expert exists in the configuration so we don't | |
| // silently generate files for a non-existent expert ID. | |
| let _ = config | |
| .get_expert(expert_id) | |
| .with_context(|| format!("No expert configured with id {}", expert_id))?; | |
| let expert_name = config.get_expert_name(expert_id); | |
| let manifest_path = config.queue_path.join("experts_manifest.json"); | |
| let manifest_path_str = manifest_path.to_string_lossy(); | |
| let status_dir = config.queue_path.join("status"); | |
| let status_dir_str = status_dir.to_string_lossy(); |
| tokio::time::sleep(std::time::Duration::from_secs(3)).await; | ||
| detector | ||
| .set_marker(expert_id, "pending") | ||
| .context("Failed to reset expert status")?; |
There was a problem hiding this comment.
The error context message here is specific to “reset”, but exit_expert_and_set_pending is used for several flows (role change, return from worktree, etc.). Consider making the context string generic (e.g., “Failed to set expert status to pending”) so failures are clearer when surfaced from non-reset code paths.
| .context("Failed to reset expert status")?; | |
| .context("Failed to set expert status to pending")?; |
| # Implementation Plan: Expert Row Execution Status | ||
|
|
||
| ## Overview | ||
|
|
||
| Move the execution indicator from the Experts panel title badge to an inline two-line row for the executing expert. The plan builds bottom-up: data model first, then producer (FeatureExecutor), then consumer (StatusDisplay), then integration (UI::render). Each implementation step is paired with tests and old badge code is removed incrementally. | ||
|
|
There was a problem hiding this comment.
These new spec documents describe an “Expert Row Execution Status” UI change, but the PR title/description and code changes are about expert pending/processing status resets and refactoring instruction-prep helpers. If these specs are intentional in this PR, please update the PR description to reflect that additional scope; otherwise consider moving them to a separate PR to keep the change focused.
| #[test] | ||
| fn prepare_expert_files_with_role_uses_provided_role() { | ||
| let config = | ||
| Config::default().with_project_path(PathBuf::from("/tmp/macot-test-common-role")); | ||
|
|
||
| // Create required directories | ||
| std::fs::create_dir_all(config.queue_path.join("system_prompt")).ok(); | ||
| std::fs::create_dir_all(config.queue_path.join("status")).ok(); | ||
|
|
||
| // Use "general" role explicitly instead of the default "architect" | ||
| let prepared = prepare_expert_files_with_role(&config, 0, "general", None).unwrap(); | ||
|
|
||
| let content = std::fs::read_to_string(prepared.instruction_file.unwrap()).unwrap(); | ||
| // "general" role loads Quality Principles, not "Expert Instructions: Architect" | ||
| assert!( | ||
| !content.contains("Expert Instructions: Architect"), | ||
| "prepare_expert_files_with_role: should use 'general' role, not default 'architect'" | ||
| ); | ||
| assert!( | ||
| content.contains("Quality Principles"), | ||
| "prepare_expert_files_with_role: should contain general role content" | ||
| ); | ||
|
|
||
| // Clean up | ||
| std::fs::remove_dir_all(&config.queue_path).ok(); | ||
| } |
There was a problem hiding this comment.
The new unit test uses a fixed /tmp/macot-test-common-role path. This can be flaky under parallel test runs or on systems without /tmp (and it can leave residue if the test fails before cleanup). Consider using tempfile::TempDir (as many other tests in the repo do) and pointing with_project_path at that temp dir instead.
Sub-agents were specifying expert_name which caused permanent delivery failures due to name mismatches. Restrict recipient targeting to expert_id and role only.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
src/queue/router.rs:163
process_queuenow incrementsMessage.delivery_attempts, but it can drift fromQueuedMessage.attempts(e.g., when a message is enqueued with non-zerodelivery_attempts). This makes the removal log message inaccurate and can leave the two counters inconsistent. Consider making one counter the source of truth (or explicitly syncing them aftermark_delivery_attempt, and logging the same value you use for max-attempt checks).
// Update delivery attempts and status
let mut updated_message = queued_message.clone();
updated_message.mark_delivery_attempt();
updated_message.message.increment_delivery_attempts();
if let Some(error) = &result.error {
updated_message.mark_failed(error.clone());
}
// Check if message should be removed due to max attempts
if updated_message.message.has_exceeded_max_attempts() {
warn!(
"Removing message {} after {} failed delivery attempts",
result.message_id, updated_message.attempts
);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Target for message delivery | ||
| #[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)] | ||
| #[serde(untagged)] | ||
| pub enum MessageRecipient { | ||
| /// Send to specific expert by ID | ||
| ExpertId { expert_id: u32 }, | ||
| /// Send to specific expert by name | ||
| ExpertName { expert_name: String }, | ||
| /// Send to any idle expert with this role | ||
| Role { role: String }, | ||
| } |
There was a problem hiding this comment.
This PR removes MessageRecipient::ExpertName support (model + router + UI + templates) and adds tests that explicitly reject expert_name YAML, but the PR title/description focus on expert status reset flows and do not mention this API/behavior change. Please either (a) document/justify the breaking change in the PR description and ensure it’s intended, or (b) keep backward compatibility (e.g., continue accepting expert_name in YAML or provide a migration/cleanup path for existing queued/outbox messages).
| /// Find expert ID by name (case-insensitive) | ||
| #[allow(dead_code)] | ||
| pub fn find_by_name(&self, name: &str) -> Option<ExpertId> { | ||
| // First try exact match |
There was a problem hiding this comment.
find_by_name is now annotated with #[allow(dead_code)], which suggests it’s unused in non-test builds after removing name-based message targeting. If it’s only needed for tests/internal invariants, consider making it pub(crate) and/or gating it under #[cfg(test)] instead of keeping an unused public API surface (and suppressing the lint).
|
|
||
| role: &str, | ||
| worktree_path: Option<&str>, | ||
| ) -> Result<PreparedExpertFiles> { |
There was a problem hiding this comment.
prepare_expert_files_with_role no longer validates that expert_id exists in the config (it uses get_expert_name fallback instead of failing). This silently changes the previous contract of prepare_expert_files (which errored on unknown experts) and can lead to writing instruction/settings files for invalid IDs. Consider restoring an explicit config.get_expert(expert_id) check (or otherwise validating the ID range) at the top of this helper.
| ) -> Result<PreparedExpertFiles> { | |
| ) -> Result<PreparedExpertFiles> { | |
| // Validate that the requested expert exists, preserving the previous contract | |
| // of failing on unknown expert IDs before writing any files. | |
| let _ = config | |
| .get_expert(expert_id) | |
| .with_context(|| format!("Unknown expert id: {}", expert_id))?; |
- Add expert_id validation in prepare_expert_files_with_role to fail
fast on invalid IDs instead of silently falling back
- Fix error context in exit_expert_and_set_pending to be generic
("Failed to set expert status to pending") since it's used across
multiple flows, not just reset
- Replace hardcoded /tmp paths in tests with tempfile::TempDir for
parallel-safe, auto-cleanup test isolation
Summary
reset_expert,return_expert_from_worktree,launch_expert_in_worktree,change_expert_role) that were missingset_marker("pending")calls, causing stale "processing" status during expert restartsexit_expert_and_set_pending,prepare_expert_files_with_role,PreparedExpertFiles) intocommands/common.rsstart_feature_execution+ CLIreset_expert) to use shared helpers, eliminating ~120 lines of duplicated prepare-files logicTest plan
make cipasses (build, clippy, fmt, 811 tests)🤖 Generated with Claude Code