[v0.2.1] File watcher infrastructure for agent code (MAH-80)#39
Conversation
Promote watchfiles from a transitive dependency (via livekit-agents) to an explicit one. Used by the file watcher landing in src/openrtc/execution/file_watcher.py for MAH-80. Pin >=0.21,<2 to accept the originally-targeted 0.21 line and the current 1.x (1.1.1 ships with livekit-agents 1.5).
Introduce src/openrtc/execution/file_watcher.py with the FileChange frozen dataclass (path + change_type). Frozen=True gives equality and hash for free, so FileChange instances can be deduplicated through a set during the debounce coalesce step (MAH-80 Step 7). This is the public event-payload contract for the watcher; subsequent commits add the discovery helper and the FileWatcher class itself.
Add the private _discover_user_modules() helper. Walks sys.modules, keeps modules with a real __file__, drops anything under site.getsitepackages() / site.getusersitepackages() / sys.prefix / sys.base_prefix, returns distinct absolute paths. Tests cover: synthetic user module under tempdir is included; synthetic site-packages module is excluded; modules with __file__ = None or missing don't raise; aliased modules pointing at the same file collapse to one entry.
Add the FileWatcher class with the locked public API (design.md §3.5): construction with on_change + debounce_ms + paths, async start()/stop(), sync refresh_paths(), and the new -> running -> stopped state machine. The skeleton lands without watchfiles wiring or debounce logic — those are MAH-80 Steps 6 and 7. start() and stop() are idempotent; start() after stop() raises (no restart). refresh_paths() rebuilds the path list only for auto-discover watchers; it is a no-op when explicit paths were supplied. Eleven new tests cover construction, debounce-validation, path-copy isolation, lifecycle idempotency, the no-restart guard, and refresh behavior in both auto and explicit modes.
start() now spawns a background asyncio task that consumes watchfiles.awatch(*paths, stop_event=...) and translates each (Change, str) into a FileChange. The watchfiles Change enum is mapped explicitly: added -> created, modified -> modified, deleted -> deleted; unrecognized variants are dropped. Events land in self._pending. The trailing-edge debounce that flushes the buffer through on_change ships in the next commit (Step 7). stop() sets the stop event and cancels the watch task, awaiting it under contextlib.suppress(CancelledError) so shutdown is clean. Three new tests cover the live wiring (write -> buffered FileChange), clean cancellation, and the empty-path-list edge case.
…Watcher Add the trailing-edge debounce on top of the watchfiles event loop. The pipeline is now: watch loop -> _handle_change_batch (extends _pending, reschedules _flush_task) -> _flush_after sleeps debounce_ms -> _collapse_changes -> on_change. _collapse_changes follows design.md §3.4: deleted dominates, created beats modified, modified is the fall-through. First-seen path order is preserved. User callback exceptions are caught and logged; the watcher keeps running. stop() cancels both the watch task and any in-flight flush task, drops _pending, and shuts down without leaking tasks or firing the cancelled flush. Ten new tests cover the design-doc collapse rules and the four AC-3 scenarios: 1 save -> 1 callback at ~200ms, 5 rapid saves -> 1 callback ~200ms after the last, raising callback isolated, stop during pending flush cancels cleanly with no leaked tasks.
Add the locked public API surface (design.md §3.5) to the openrtc package: from openrtc import FileWatcher, FileChange works in a fresh interpreter. Both names land in __all__ alongside AgentPool, AgentConfig, etc., so star-imports and IDEs see them. Three tests guard the contract: a fresh subprocess imports both names without going through the runtime, the names appear in __all__, and the package-level re-exports are object-identity with the originals.
End-to-end smoke covering watchfiles -> _handle_change_batch -> trailing debounce -> on_change against a real file. Writes initial contents, starts the watcher, edits the file, waits up to 3s for the debounced callback, then asserts exactly one callback arrived with the expected path and a non-deleted change_type. Finally checks the clean shutdown contract: state -> stopped, _watch_task and _flush_task both cleared.
Expand the module-level docstring with a Contract summary covering discovery, debounce, callback isolation, and stop() semantics. Each public method now documents Args / Side effects / Raises explicitly: - __init__: each parameter described, ValueError raise contract - start: side effect (task + event creation, state transition), RuntimeError on restart, idempotency note - stop: side effects (state, stop_event, task cancellation, dropped buffer), idempotency, no-restart invariant - refresh_paths: synchronous side effect on auto-discover watchers, no-op for explicit-path watchers No behavior change. README + user guide land separately in MAH-87.
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openrtc/execution/file_watcher.py`:
- Around line 64-66: The FileChange.path doc says paths are absolute but the
code constructs FileChange using Path(raw_path) which may be relative; update
all places that create FileChange (e.g., in the FileChange dataclass/factory and
the event-emission sites around the code referenced and also lines ~314-317) to
convert raw_path to an absolute Path by using
Path(raw_path).resolve(strict=False) (or Path(raw_path).absolute equivalent)
before passing into FileChange so emitted events always contain absolute paths.
- Around line 215-218: refresh_paths() currently mutates self._paths but does
not reconfigure the running watch iterator created by _run_watch_loop() because
watchfiles.awatch(*self._paths, ...) captures paths at creation; update
refresh_paths() to signal the running watcher to restart so the awatch iterator
is recreated with the new paths (e.g., add an asyncio.Event or asyncio.Condition
like _restart_event set by refresh_paths(), have _run_watch_loop() check the
event inside the async for loop, break out and recreate the awatch iterator when
the event is set, then clear the event), referencing refresh_paths,
_run_watch_loop, self._paths, awatch, and _restart_event.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ccd330bf-d612-4c02-97d3-705ebd062a11
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (6)
pyproject.tomlsrc/openrtc/__init__.pysrc/openrtc/execution/file_watcher.pytests/execution/__init__.pytests/execution/test_file_watcher.pytests/execution/test_file_watcher_smoke.py
| Frozen so instances are hashable and can be deduplicated in sets. | ||
| Paths are absolute. ``change_type`` is one of ``"created"``, | ||
| ``"modified"``, or ``"deleted"`` (mapped from watchfiles' ``Change`` |
There was a problem hiding this comment.
FileChange.path absolute-path contract is not consistently enforced.
The docs say emitted paths are absolute, but events currently use Path(raw_path) directly. If explicit paths are relative (or backend emits relative), downstream receives relative paths.
Localized fix
- batch.append(
- FileChange(
- path=Path(raw_path),
- change_type=change_type,
- )
- )
+ batch.append(
+ FileChange(
+ path=Path(raw_path).resolve(),
+ change_type=change_type,
+ )
+ )Also applies to: 314-317
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openrtc/execution/file_watcher.py` around lines 64 - 66, The
FileChange.path doc says paths are absolute but the code constructs FileChange
using Path(raw_path) which may be relative; update all places that create
FileChange (e.g., in the FileChange dataclass/factory and the event-emission
sites around the code referenced and also lines ~314-317) to convert raw_path to
an absolute Path by using Path(raw_path).resolve(strict=False) (or
Path(raw_path).absolute equivalent) before passing into FileChange so emitted
events always contain absolute paths.
| Synchronous because rebuilding the path set is a fast | ||
| in-process snapshot. The live watch loop picks up the | ||
| change on its next iteration boundary; this method does | ||
| not restart the watcher. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's find and examine the FileWatcher implementation
find . -name "file_watcher.py" -type f | head -5Repository: mahimailabs/openrtc-runtime
Length of output: 111
🏁 Script executed:
# Read the file to understand the structure
wc -l src/openrtc/execution/file_watcher.pyRepository: mahimailabs/openrtc-runtime
Length of output: 113
🏁 Script executed:
# Read the relevant sections of the file
cat -n src/openrtc/execution/file_watcher.py | head -250 | tail -100Repository: mahimailabs/openrtc-runtime
Length of output: 4704
🏁 Script executed:
# Get more context around the specific lines mentioned
cat -n src/openrtc/execution/file_watcher.py | sed -n '200,330p'Repository: mahimailabs/openrtc-runtime
Length of output: 6157
🏁 Script executed:
# Check if watchfiles is available and examine its behavior
python3 << 'EOF'
import inspect
try:
import watchfiles
# Check the signature and docstring of awatch
print("=== watchfiles.awatch signature ===")
print(inspect.signature(watchfiles.awatch))
print("\n=== watchfiles.awatch docstring ===")
print(watchfiles.awatch.__doc__)
except ImportError:
print("watchfiles not installed, checking via web search needed")
EOFRepository: mahimailabs/openrtc-runtime
Length of output: 128
🌐 Web query:
watchfiles awatch path arguments dynamic monitoring behavior
💡 Result:
watchfiles.awatch accepts one or more paths as positional arguments; it yields (as an async generator) a set of (Change, path) tuples whenever watched paths change. It can watch files or directories, and directories are watched recursively by default. The “dynamic monitoring” question (changing the set of watched paths while the async for loop is already running) is not supported by the API design; adding/removing watches would require starting a new awatch instance (there is an open feature request for this behavior).
Citations:
- 1: https://watchfiles.helpmanual.io/api/watch/
- 2: Feature request: add or remove watched directories while
async for changes in awatch('/path/to/dir')is iterating samuelcolvin/watchfiles#298 - 3: https://watchfiles.helpmanual.io/
- 4: https://github.com/samuelcolvin/watchfiles/
- 5: Watcher timing issue: debounce never triggers and step fires once based on first modification samuelcolvin/watchfiles#360
refresh_paths() does not actually reconfigure a running watcher.
_run_watch_loop() creates the watchfiles.awatch(*self._paths, ...) iterator once at startup, so the paths list is captured at that moment. Mutations to self._paths afterward have no effect—watchfiles does not support dynamic path changes. This contradicts the docstring claim that "the live watch loop picks up the change on its next iteration boundary," and causes the method to silently fail to watch newly discovered modules.
To fix this, restart the awatch iterator when refresh_paths() is called during a running state (e.g., via an asyncio.Event signal that breaks and restarts the async for loop).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openrtc/execution/file_watcher.py` around lines 215 - 218,
refresh_paths() currently mutates self._paths but does not reconfigure the
running watch iterator created by _run_watch_loop() because
watchfiles.awatch(*self._paths, ...) captures paths at creation; update
refresh_paths() to signal the running watcher to restart so the awatch iterator
is recreated with the new paths (e.g., add an asyncio.Event or asyncio.Condition
like _restart_event set by refresh_paths(), have _run_watch_loop() check the
event inside the async for loop, break out and recreate the awatch iterator when
the event is set, then clear the event), referencing refresh_paths,
_run_watch_loop, self._paths, awatch, and _restart_event.
…e running Two issues from review: 1. FileChange paths emitted by the live watcher are now resolved via Path.resolve(strict=False) at the boundary, matching the dataclass docstring's "Paths emitted by FileWatcher are absolute" claim. The dataclass itself does not normalize in __post_init__: doing so would break literal '/tmp/agent.py' assertions on macOS where /tmp resolves to /private/tmp. Manual constructions are documented as the caller's responsibility. 2. refresh_paths() previously mutated self._paths but watchfiles.awatch captures paths at construction, so a running watcher never picked up the new set. Add _restart_event: refresh_paths() sets it, _run_watch_loop wraps awatch in an outer loop that tears down the current iterator (via a per-iteration mirror task that signals awatch's stop_event on either stop or restart) and recreates awatch with the latest self._paths. stop() also sets _restart_event so the mirror task wakes promptly. Two regression tests: - test_emitted_paths_are_absolute: every FileChange landing in the pending buffer has an absolute path. - test_refresh_paths_during_run_swaps_watched_set: with two files in separate dirs, edits to file_b are silent before refresh_paths(), and fire the callback after.
Summary
Phase 3, issue 1 of 8 — the foundation for v0.2 hot reload. Adds an
asyncio-native file watcher that monitors user-edited Python modules
and emits debounced change events. No existing code path changes
behavior; reload, re-import, and session re-binding ship in MAH-81+.
Public API (locked at design.md §3.5)
What ships
Acceptance criteria (design.md §6)
Plus the two non-AC bars:
Test plan
Discoveries logged for follow-ups (out of scope here)
Summary by CodeRabbit
FileWatcherandFileChangepublic APIs now available.