Skip to content

[TOW-1342] Timeout auto cleanup for subprocess#195

Open
sammuti wants to merge 3 commits intodevelopfrom
bugfix/TOW-1342-cleanup-on-timeout
Open

[TOW-1342] Timeout auto cleanup for subprocess#195
sammuti wants to merge 3 commits intodevelopfrom
bugfix/TOW-1342-cleanup-on-timeout

Conversation

@sammuti
Copy link
Contributor

@sammuti sammuti commented Feb 4, 2026

Timeout auto cleanup for subprocess

@sammuti sammuti changed the title Bugfix/tow 1342 cleanup on timeout [TOW-1342] Timeout auto cleanup for subprocess Feb 4, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements automatic cleanup for subprocess executions after a timeout period to handle cases where the control plane disconnects and fails to send cleanup calls.

Changes:

  • Added a 5-minute automatic cleanup timer that triggers after subprocess runs reach a terminal state
  • Introduced a cleanup monitor that polls subprocess status and performs cleanup if explicit cleanup hasn't been called
  • Refactored SubprocessHandle to use Arc<Mutex<>> wrappers for temporary directories and track cleanup state with an atomic flag

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
crates/tower-runtime/src/auto_cleanup.rs New module implementing the automatic cleanup monitoring logic with comprehensive tests
crates/tower-runtime/src/subprocess.rs Updated to spawn cleanup monitor, wrap temp directories in Arc<Mutex<>>, and track cleanup state
crates/tower-runtime/src/lib.rs Added is_terminal() helper method to Status enum and exported new auto_cleanup module

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


// Wait for terminal state
loop {
tokio::time::sleep(Duration::from_secs(5)).await;
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded 5-second polling interval should be extracted as a named constant (e.g., POLL_INTERVAL) to improve maintainability and make the polling frequency configurable for testing.

Copilot uses AI. Check for mistakes.
// - Polling to detect terminal state (up to 1000ms)
// - Cleanup timeout (500ms)
// - Buffer (500ms)
tokio::time::sleep(Duration::from_millis(2200)).await;
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test sleep duration is calculated as a sum of multiple timeouts plus buffer. Consider computing this value from the constituent durations (100ms transition + 1000ms polling + 500ms timeout + 500ms buffer) to make the relationship explicit and reduce the risk of calculation errors if any timeout values change.

Suggested change
tokio::time::sleep(Duration::from_millis(2200)).await;
let transition_duration = Duration::from_millis(100);
let polling_duration = Duration::from_millis(1000);
let cleanup_timeout = Duration::from_millis(500);
let buffer_duration = Duration::from_millis(500);
let total_sleep_duration =
transition_duration + polling_duration + cleanup_timeout + buffer_duration;
tokio::time::sleep(total_sleep_duration).await;

Copilot uses AI. Check for mistakes.
}

// Wait past the cleanup timeout (already waited 1200ms, need 500ms more + buffer)
tokio::time::sleep(Duration::from_millis(700)).await;
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test sleep duration calculation (700ms to reach total of 1900ms after initial 1200ms wait) should be computed from constituent durations to make the test logic clearer and more maintainable.

Copilot uses AI. Check for mistakes.
// - Polling to detect terminal state (up to 1000ms)
// - Cleanup timeout (200ms)
// - Buffer (300ms)
tokio::time::sleep(Duration::from_millis(2600)).await;
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test sleep duration should be computed from the constituent durations (1000ms remaining transition + 1000ms polling + 200ms timeout + 300ms buffer + margin for 500ms already waited) to improve test maintainability and clarity.

Suggested change
tokio::time::sleep(Duration::from_millis(2600)).await;
let remaining_transition_ms = 1000u64;
let polling_ms = 1000u64;
let cleanup_timeout_ms = 200u64;
let buffer_ms = 300u64;
let already_waited_ms = 500u64;
let margin_ms = 100u64; // Extra margin for the already-waited 500ms
let additional_wait_ms =
remaining_transition_ms + polling_ms + cleanup_timeout_ms + buffer_ms + margin_ms;
assert_eq!(
additional_wait_ms + already_waited_ms,
2600u64 + already_waited_ms,
"Total wait should remain consistent with previous behavior"
);
tokio::time::sleep(Duration::from_millis(additional_wait_ms)).await;

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant