[TOW-1342] Timeout auto cleanup for subprocess#195
Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements automatic cleanup for subprocess executions after a timeout period to handle cases where the control plane disconnects and fails to send cleanup calls.
Changes:
- Added a 5-minute automatic cleanup timer that triggers after subprocess runs reach a terminal state
- Introduced a cleanup monitor that polls subprocess status and performs cleanup if explicit cleanup hasn't been called
- Refactored
SubprocessHandleto useArc<Mutex<>>wrappers for temporary directories and track cleanup state with an atomic flag
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| crates/tower-runtime/src/auto_cleanup.rs | New module implementing the automatic cleanup monitoring logic with comprehensive tests |
| crates/tower-runtime/src/subprocess.rs | Updated to spawn cleanup monitor, wrap temp directories in Arc<Mutex<>>, and track cleanup state |
| crates/tower-runtime/src/lib.rs | Added is_terminal() helper method to Status enum and exported new auto_cleanup module |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| // Wait for terminal state | ||
| loop { | ||
| tokio::time::sleep(Duration::from_secs(5)).await; |
There was a problem hiding this comment.
The hardcoded 5-second polling interval should be extracted as a named constant (e.g., POLL_INTERVAL) to improve maintainability and make the polling frequency configurable for testing.
| // - Polling to detect terminal state (up to 1000ms) | ||
| // - Cleanup timeout (500ms) | ||
| // - Buffer (500ms) | ||
| tokio::time::sleep(Duration::from_millis(2200)).await; |
There was a problem hiding this comment.
The test sleep duration is calculated as a sum of multiple timeouts plus buffer. Consider computing this value from the constituent durations (100ms transition + 1000ms polling + 500ms timeout + 500ms buffer) to make the relationship explicit and reduce the risk of calculation errors if any timeout values change.
| tokio::time::sleep(Duration::from_millis(2200)).await; | |
| let transition_duration = Duration::from_millis(100); | |
| let polling_duration = Duration::from_millis(1000); | |
| let cleanup_timeout = Duration::from_millis(500); | |
| let buffer_duration = Duration::from_millis(500); | |
| let total_sleep_duration = | |
| transition_duration + polling_duration + cleanup_timeout + buffer_duration; | |
| tokio::time::sleep(total_sleep_duration).await; |
| } | ||
|
|
||
| // Wait past the cleanup timeout (already waited 1200ms, need 500ms more + buffer) | ||
| tokio::time::sleep(Duration::from_millis(700)).await; |
There was a problem hiding this comment.
The test sleep duration calculation (700ms to reach total of 1900ms after initial 1200ms wait) should be computed from constituent durations to make the test logic clearer and more maintainable.
| // - Polling to detect terminal state (up to 1000ms) | ||
| // - Cleanup timeout (200ms) | ||
| // - Buffer (300ms) | ||
| tokio::time::sleep(Duration::from_millis(2600)).await; |
There was a problem hiding this comment.
The test sleep duration should be computed from the constituent durations (1000ms remaining transition + 1000ms polling + 200ms timeout + 300ms buffer + margin for 500ms already waited) to improve test maintainability and clarity.
| tokio::time::sleep(Duration::from_millis(2600)).await; | |
| let remaining_transition_ms = 1000u64; | |
| let polling_ms = 1000u64; | |
| let cleanup_timeout_ms = 200u64; | |
| let buffer_ms = 300u64; | |
| let already_waited_ms = 500u64; | |
| let margin_ms = 100u64; // Extra margin for the already-waited 500ms | |
| let additional_wait_ms = | |
| remaining_transition_ms + polling_ms + cleanup_timeout_ms + buffer_ms + margin_ms; | |
| assert_eq!( | |
| additional_wait_ms + already_waited_ms, | |
| 2600u64 + already_waited_ms, | |
| "Total wait should remain consistent with previous behavior" | |
| ); | |
| tokio::time::sleep(Duration::from_millis(additional_wait_ms)).await; |
Timeout auto cleanup for subprocess