Skip to content

MTP/SMB: reuse watcher-backed listings in pre-flight scans#19

Merged
vdavid merged 9 commits into
mainfrom
feature/fresh-listing-reuse
May 18, 2026
Merged

MTP/SMB: reuse watcher-backed listings in pre-flight scans#19
vdavid merged 9 commits into
mainfrom
feature/fresh-listing-reuse

Conversation

@vdavid
Copy link
Copy Markdown
Owner

@vdavid vdavid commented May 18, 2026

  • Pre-flight scans (copy/move/delete) reuse cached listings when the source volume reports an active watcher, instead of re-listing.
  • Selecting 135 photos in a 15k-photo MTP folder no longer triggers a multi-second parent re-list before F5 / F8 (the "Verifying before copy..." dialog with the climbing 5,080 counter is gone).
  • Delete on MTP stops double-scanning: delete_volume_files_with_progress_inner now consumes the scan-preview result. Previously it ignored preview_id and re-walked silently for top-level files, which made the UI look frozen.
  • Per-file is_directory(source) probes drop away when the parent listing is watcher-backed.
  • One oracle, one rule: try_get_watched_listing(volume_id, path) in file_system/listing/caching.rs. Returns cached FileEntrys iff Volume::listing_is_watched(path) returns true. Picks the most-recently-updated matching listing (sequence + created_at tiebreaker).
  • New Volume trait method listing_is_watched. Per-backend gates:
    • LocalPosixVolume: matching WatchedDirectory in WATCHER_MANAGER.
    • MtpVolume: connection_manager().is_connected(device_id) (volume-level — MTP event loop watches the whole device).
    • SmbVolume: watcher_cancel.try_lock().is_some() AND connection_state() == Direct.
    • InMemoryVolume: default false.
  • Shared oracle-aware recursive walker scan_subtree_with_oracle in write_operations/scan.rs. Used at every level by scan-preview, MTP/SMB scan_for_copy_batch overrides, local FS walk_dir_recursive, and delete.
  • Drive index is not trusted for write-op decisions. Freshness contract is bright-line: watcher-backed cache, or real read. The per-backend debounce windows (MTP 500 ms, SMB 200 ms, local FS ~10 ms) are documented honestly in volume/CLAUDE.md.
  • MTP cold-cache parent-grouping optimization preserved (one list_directory per unique parent). Pinned by mtp_scan_cold_cache_still_uses_parent_grouping.
  • No FE protocol change. startScanPreview / SCAN_PREVIEW_RESULTS unchanged.
  • Plan + per-milestone scope: docs/specs/fresh-listing-reuse-plan.md.

Test plan

  • 7 unit tests for try_get_watched_listing (hit, miss-watcher-false, miss-no-listing, miss-no-volume, sequence determinism, race-window during start-streaming → watcher-attach gap, flip-to-unwatched).
  • Per-backend listing_is_watched lifecycle tests (local FS, MTP virtual-device gated, SMB Docker gated).
  • 5 scan-preview integration tests (oracle hit / fall-through / cached subfolder reuse / symlink semantics / mid-walk listing close).
  • 3 MTP/SMB scan-override tests (oracle hit, cold-cache parent-grouping intact, SMB Docker-gated oracle hit).
  • 4 delete-reuse integration tests (preview_id consumed / no-preview correctness / no is_directory probes / mid-scan listing close).
  • 2 Playwright specs (mtp-copy-preflight-uses-cache, mtp-delete-no-double-scan).
  • Full check suite green (46/46). Slow lane green except 2 pre-existing FE Escape-key flakes (settings.spec.ts:312, viewer.spec.ts:186) unrelated to this branch.
  • Manual QA on real Android (SM-S918B, /DCIM/Camera 996 entries): F8 deleted 118 files (8.4 GB) in ~12 s, no frozen UI, single Scanning → Deleting transition.
  • Manual QA: F5 copy + other-pane subfolder reuse on real device — skipped during run; copy uses the same take_cached_scan_result path that already shipped pre-this-branch, exercised by the unit + Playwright tests.

vdavid added 9 commits May 16, 2026 14:16
Bumped clippy 1.95 flags three pre-existing sort_by(|a,b| ...
cmp(...)) calls in volumes_linux/mod.rs as unnecessary_sort_by.
Switched to sort_by_key(|v| v.name.to_lowercase()).

Cfg-gated for Linux so verified by CI on Linux, not locally.
`mtp.spec.ts: copies file from MTP to local` was the lone outlier among the MTP→local / local→MTP copy tests that skipped the established belt-and-suspenders pattern. Failed once under the slow-suite load profile (rust-tests-linux + Docker SMB containers + everything else running concurrently): BE copy completed in 26 ms (50-byte `report.txt` landed at `/right/`), FE got the `Copy complete` log 1.2 s later, but the destination pane's PaneStateStore stayed at `files: []` for the full 15 s `mcpAwaitItem` budget — both safety nets raced.

What happened:

- The FilePane's per-pane notify-rs watcher (200 ms debounce) apparently missed the FSEvents add for `/right/report.txt`. The drive indexer's separate reconciler did see it (`live_heartbeat events=1` at 13:09:56), so the file genuinely was on disk; the per-pane debouncer just didn't deliver. macOS FSEvents under disk-and-CPU pressure occasionally drops events between debouncer batches.
- The `refreshPanesAfterTransfer` → `refreshPaneListing(rightPaneRef)` IPC fired from `handleTransferComplete` either didn't fire or didn't reach the BE (no `stall_probe::listing_task` for `/right` for 14 s after BE complete). Tauri event-loop saturation under the slow-suite load profile is the likely cause.

The TransferProgressDialog also visibly stayed in "Scanning 0/0/0" at the last captured recording frame even though the `Copy complete` log fired — same root cause (Svelte reactive update starved by the event loop), but cosmetic for the user (next frame would have closed it; in practice the user never sees the timeout the test hits).

This is not a prod bug — a user always has at least one rescuer (watcher or post-op refresh) on the happy path, and can press F5 if both race. It's a test robustness issue: the 15 s budget plus zero fallback path can't survive the worst-case load profile that `--include-slow` produces.

Fix: bring this test in line with tests 11 (`copies file from local to MTP`, mtp.spec.ts:360) and 27 (`copies 50 MB file from MTP to local`, mtp.spec.ts:1073), which already use the defensive pattern:

1. `pollFs` for the destination file on disk (the authoritative truth, independent of UI plumbing).
2. Explicit `mcpCall('refresh', {})` to force the pane re-list deterministically.
3. `mcpAwaitItem` against the fresh PaneStateStore.

The test now passes regardless of whether the watcher or the auto-refresh raced. Verified locally in isolation (4.3 s).
- Viewer and Settings windows open with `focus: false` when running under `CMDR_E2E_MODE=1`. The plugin drives the DOM via the Unix socket and doesn't need OS focus.
- Settings re-open path skips the `focus-self` event in E2E too, so reopening an already-open Settings window doesn't grab focus either.
- macOS `show_main_window` uses `[NSWindow orderFront:]` instead of `makeKeyAndOrderFront:` in E2E, so the initial main-window appearance doesn't steal focus from whatever the user is working in.
- Prod/dev behavior unchanged.
- New typed variants `VolumeError::DeletePending` and `WriteOperationError::DeletePending` so the listing and write-error paths can carry the state end-to-end instead of falling through to the generic `IoError`.
- `map_smb_error` dispatches on `err.status() == Some(NtStatus::DELETE_PENDING)` before the kind match (smb2 currently classifies it as `ErrorKind::Other`).
- `kinds::delete_pending` gives users a transient "File is being removed" message that explains an open handle is keeping the file alive, plus a wait/restart suggestion. Replaces the misleading "disk needs attention" copy.
- Tests in `smb.rs`, `volume_copy_tests.rs`, and `friendly_error/mod.rs` pin the mapping, the friendly copy shape, and the no-"error"/"failed" rule for the new variant.
- New `Volume::listing_is_watched(path)` trait method (default `false`) so write-op pre-flight can ask each backend whether a cached listing is being kept fresh by a live watcher.
- Per-backend wiring:
  - `LocalPosixVolume`: matches against `WATCHER_MANAGER` entries via `find_listings_for_path_on_volume`.
  - `MtpVolume`: new sync `MtpConnectionManager::is_connected` via `try_lock`; lock contention falls through to "not watched" (safe direction).
  - `SmbVolume`: requires `watcher_cancel.try_lock().is_some()` AND `connection_state() == Direct`; `WouldBlock` returns `false` without retry.
  - `InMemoryVolume`: keeps the default `false`.
- New `caching::try_get_watched_listing(volume_id, path)` oracle: picks the highest-`sequence` matching listing (ties broken by latest `created_at`), drops the cache lock, then asks the volume. Returns cloned entries on hit, `None` otherwise.
- Tests: 7 oracle unit tests (hit, miss-on-dead-watcher, no-listing, no-volume, sequence determinism, start-streaming race, flip-to-unwatched), one local FS test exercising the full watcher lifecycle, MTP virtual-device test (gated on `virtual-mtp`), and 4 SMB unit tests plus a `#[ignore]`-gated Docker test. All green.
- Docs: `volume/CLAUDE.md` gains a capability-matrix row and a freshness-contract paragraph with per-backend debounce windows; `listing/CLAUDE.md` documents the new oracle next to `find_listings_for_path_on_volume`.
- Adds `docs/specs/fresh-listing-reuse-plan.md` (the M1-M4 plan this commit implements only M1 of).
- Refreshes `apps/desktop/src/lib/ipc/bindings.ts` to keep `bindings-fresh` happy (picked up a pre-existing `delete_pending` variant on `WriteOperationError`).

M2-M4 (oracle-aware scan walker, MTP/SMB overrides, delete preview-reuse, E2E) land in follow-up commits.
- Add `scan_subtree_with_oracle` in `scan.rs`: oracle-first recursion that falls through to `volume.list_directory` on miss, skips recursion into symlinked dirs, respects cancellation.
- Add `walk_cached_entries` helper inside `walk_dir_recursive`: when `volume_id` is passed and the oracle reports a watcher-backed listing, hydrate that level from cache instead of `fs::read_dir`. Local-FS scan paths now thread `Some("root")` through so an open pane's FSEvents-watched listing short-circuits the disk read.
- Rewrite `run_volume_scan_preview` around `run_oracle_aware_batch_scan`: group input sources by parent dir, check the oracle per parent. On hit, build the per-path `BatchScanResult` slice from cached entries (size + `is_directory` for top-level files; recurse via `scan_subtree_with_oracle` for top-level dirs). On miss, fall through to the existing `volume.scan_for_copy_batch_with_progress` so MTP's parent-grouping and SMB's pipelined-stat optimizations still run for cold-cache parents.
- Thread `source_volume_id: String` through `start_scan_preview` and the Tauri command wrapper.
- Add 5 integration tests in a sibling `scan_preview_oracle_tests.rs` (volume_copy_tests.rs was already past its file-length allowlist): oracle hit skips `list_directory`, watcher-dead falls through, subfolder cached in another pane is reused, cached symlinked dirs aren't recursed into, listing closed mid-walk falls through to a real list.
- MTP `scan_for_copy_batch_with_progress` and SMB scan override left unchanged: M2a only adds an oracle short-circuit before the volume call.
- Docs: Decision entry "Scan preview reuses watched listings" in `write_operations/CLAUDE.md`, plus two Gotchas (hardlink dedup across the cache/walk boundary, volume-disconnect mid-walk race). `volume/CLAUDE.md` Tier 3 now mentions `listing_is_watched` and what it gates.
- `MtpVolume::scan_for_copy_batch_with_progress`: per-parent oracle check before the existing parent-grouping listing. On hit, child sizes come from the cached `FileEntry`; on miss, today's single-listing-per-parent path runs. Decision is per-parent; one call can mix watched and cold parents.
- `SmbVolume::scan_for_copy_batch`: per-parent oracle short-circuit before the pipelined-stat block. Oracle-served paths skip `tree.stat` entirely; remaining paths go through the existing FuturesUnordered pipeline.
- New `mtp_scan_oracle_tests.rs`: oracle-hit asserts zero `MtpVolume::list_directory` calls (via a new `#[cfg(test)]` `test_hooks` counter); cold-cache asserts exactly 2 list calls for 2 unique parents (proves the parent-grouping optimization survives).
- New `smb_scan_oracle_tests.rs` (Docker-gated `#[ignore]`): drops the smb2 session before the scan and asserts the scan still succeeds (proves the oracle path doesn't touch SMB).
- Test infra: moved `nextest.toml` to workspace root and added a `virtual-mtp` test-group with `max-threads = 1` so the three virtual-device tests don't race on the shared `/tmp/cmdr-mtp-e2e-fixtures` backing dir.
- Docs: `volume/CLAUDE.md` Decision entries for both MTP parent-grouping and SMB pipelined-stat now mention the oracle layering. `MtpVolume::scan_for_copy_batch_with_progress` doc comment + per-group comment updated to describe the per-parent oracle check.
- `delete_volume_files_with_progress_inner` now calls `take_cached_scan_result(preview_id)` at the top. On hit, top-level files are recorded straight from `CopyScanResult` (no `is_directory` probe, no `list_directory` round-trip); top-level dirs recurse via the walker with `is_dir_hint = Some(true)` so the recursion never re-probes.
- `scan_volume_recursive` consults `try_get_watched_listing(volume_id, path)` before every `list_directory`. Any subtree open in another pane and watcher-fresh is cache-fed; falls through to the volume call on miss, preserving the per-entry MTP progress callback.
- No-preview path keeps the top-level `is_directory(source)` probe only when the source's parent isn't watcher-fresh. When the parent oracle hits, the `is_directory` flag comes from the cached `FileEntry`.
- Cache-hit path emits a throttled scan-progress event per `progress_interval` while building the entry list, so the FE dialog stops looking frozen during the fast path (the original user-reported symptom).
- New `delete_volume_reuse_tests.rs`: 4 integration tests covering preview consumption, no-preview walker correctness, parent-oracle replacing the `is_directory` probe, and the listing-closed-mid-walk fallthrough.
- Docs: `write_operations/CLAUDE.md` data-flow diagram updated; new "Volume delete reuses the scan preview" Decision entry with the data-safety contract for stale cached entries.
- New Playwright spec `mtp-copy-preflight-uses-cache.spec.ts`: navigates to MTP `/DCIM`, selects three files, presses F5, asserts the transfer-confirmation dialog's pre-flight `.scan-checkmark` lands within 1500 ms (~3× safety margin over the actual cache-hit cost; cold-cache MTP would blow past it) and `filesFound` matches the selection. Cancels the dialog to keep the assertion focused on the pre-flight contract.
- New Playwright spec `mtp-delete-no-double-scan.spec.ts`: subscribes to `write-progress` events, confirms an F8 delete of three files in `/DCIM`, and asserts the captured phase sequence never regresses from a later phase back to `scanning` and that `filesDone` is monotonic within each phase. Pre-M3 the silent re-scan would surface as a second `scanning` run after `deleting`; the test rejects that explicitly.
- `playwright.config.ts`: widen the MTP shard regex to `/mtp(-[a-z-]+)?\.spec\.ts$/` so both new specs land on the dedicated MTP shard (the virtual-MTP backing dir is shared, two MTP specs in parallel would corrupt it).
- `write_operations/CLAUDE.md`: top-of-file note that pre-flight scans reuse watcher-backed listings, with pointers to `volume/CLAUDE.md` and `listing/caching.rs::try_get_watched_listing` for the freshness contract.
- TODO: manual QA on a real Android device with `/DCIM/Camera` + 5k photos to confirm the climbing-counter behaviour is gone end-to-end. Couldn't run from this environment.
- `./scripts/check.sh --include-slow`: all checks green except two pre-existing flakes in `settings.spec.ts:312` and `viewer.spec.ts:186` (Escape closes settings / viewer window). Both unrelated to this work — neither file is touched by M1-M4, the failures reproduce identically on the macOS and Linux lanes, and the surrounding commit history shows a long Linux-flake tail on the same assertions. Tracked separately.
@vdavid vdavid merged commit b90b900 into main May 18, 2026
7 of 8 checks passed
@vdavid vdavid deleted the feature/fresh-listing-reuse branch May 19, 2026 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant