Skip to content

test: harden two load-sensitive tests against CPU oversubscription#85

Closed
leszek3737 wants to merge 1 commit into
mainfrom
fa9
Closed

test: harden two load-sensitive tests against CPU oversubscription#85
leszek3737 wants to merge 1 commit into
mainfrom
fa9

Conversation

@leszek3737

Copy link
Copy Markdown
Owner

Both are test-only changes; production code is untouched. An investigation
(1220 suite runs under CPU saturation, zero failures) showed the suite is
deterministic under a normal single-cargo run — the "flake" seen during a
parallel-agent wave was shared-worktree cargo contention, not test logic.
These two latent fragilities are hardened defensively:

  • viewer: test_viewer_loader_drop_cancels_worker waited on a detached
    busy-wait (yield_now) worker with recv_timeout(1s).unwrap(); under heavy
    load the starved worker could miss the 1s deadline. Drop sets the cancel
    flag, so the worker is guaranteed to send — switch to a blocking recv()
    and drop the now-dead TEST_CHANNEL_TIMEOUT_SECS constant.

  • debug_log: log_truncates_oversized_file shares process-global
    LOG_FILE/CHECK_COUNTER/TEST_CACHE_HOME with concurrent production log()
    calls from other parallel tests. Widen the retry budget (20 -> 50) and

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @leszek3737, your pull request is larger than the review limit of 150000 diff characters

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the input event dispatching layer by introducing an EventContext struct to consolidate multiple mutable state arguments, cleans up the public API surface in src/lib.rs, and optimizes various UI and helper components, such as caching function bar cells in a thread-local storage. Feedback on the changes suggests using recv_timeout with a generous timeout instead of an indefinite blocking recv() in the viewer tests to prevent potential hangs in CI/CD pipelines.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/ui/viewer/tests.rs
Comment on lines 94 to +96
done_rx
.recv_timeout(std::time::Duration::from_secs(TEST_CHANNEL_TIMEOUT_SECS))
.unwrap();
.recv()
.expect("worker should send done after observing cancel");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using an indefinite blocking recv() in tests can cause the entire test suite to hang indefinitely in CI/CD pipelines if the worker thread panics or fails to send the cancellation signal. To prevent potential test hangs while remaining robust against CPU starvation, consider using recv_timeout with a generous timeout (e.g., 5 seconds).

Suggested change
done_rx
.recv_timeout(std::time::Duration::from_secs(TEST_CHANNEL_TIMEOUT_SECS))
.unwrap();
.recv()
.expect("worker should send done after observing cancel");
done_rx
.recv_timeout(std::time::Duration::from_secs(5))
.expect("worker should send done after observing cancel");

@greptile-apps

greptile-apps Bot commented Jun 15, 2026

Copy link
Copy Markdown

Greptile Summary

This PR hardens two load-sensitive tests against CPU oversubscription — switching test_viewer_loader_drop_cancels_worker from recv_timeout(1s) to a blocking recv(), and widening the log_truncates_oversized_file retry budget from 20 to 50 — but also bundles a substantial, unannounced production refactor and Cargo profile additions across 25 non-test files.

  • Test fixes (described): Viewer test drops the arbitrary 1 s deadline in favour of a blocking recv() whose correctness is guaranteed by cancel being set before the call; the debug-log test gains a wider retry window and a thorough comment explaining the process-global race with concurrent test threads.
  • Production refactor (undescribed): src/main.rs gains AppLoop, init_app_state(), poll_async(), and render_frame(); src/input/ gains EventContext replacing the 7-argument dispatch_event signature; dialogs.rs introduces InputOutcome and extracted per-action handlers that now consistently trim filesystem-mutating inputs; lib.rs drops crate-root re-exports; submodule visibility is narrowed to pub(crate).
  • Cargo profiles (undescribed): [profile.release] adds lto = true / codegen-units = 1; [profile.dev] adds opt-level = 1, which raises optimization for every cargo test run.

Confidence Score: 4/5

The two described test fixes are correct and safe; the undescribed production refactor is broadly logic-preserving but its scale warrants a careful read before merging.

The test hardening changes are sound: blocking recv() is safe because a thread-panic drops done_tx making recv() return Err rather than hang, and the debug-log retry count is a reasonable defensive increase. The production refactor is clean but the undisclosed scope — opt-level=1 touching every cargo test run, removal of crate-root re-exports, and the new EventContext/AppLoop/InputOutcome abstractions — introduces surface area that a reviewer relying on the description would miss entirely.

Cargo.toml (profile changes affect all contributors and CI), src/input/dialogs.rs (input trimming behavior change), src/main.rs (AppLoop/init_app_state refactor).

Important Files Changed

Filename Overview
Cargo.toml Adds [profile.release] (lto=true, codegen-units=1) and [profile.dev] (opt-level=1); not mentioned in the PR description. The dev-profile change affects all cargo test runs.
src/ui/viewer/tests.rs Removes TEST_CHANNEL_TIMEOUT_SECS and switches from recv_timeout(1s).unwrap() to recv().expect(). Correct: after Drop sets cancel=true the worker is guaranteed to observe it and send; if the thread panics, done_tx is dropped and recv() returns Err, failing the test cleanly rather than hanging.
src/app/debug_log.rs Widens retry budget from 20 to 50 attempts; adds extensive comment explaining process-global TEST_CACHE_HOME/CHECK_COUNTER/LOG_FILE races with concurrent tests. Sound improvement.
src/main.rs Major refactor: extracts init_app_state(), init_watcher(), poll_async(), render_frame() and introduces AppLoop struct. Logic-preserving but not mentioned in the PR description.
src/input/dialogs.rs Introduces InputOutcome enum and per-action handler functions; now consistently passes trimmed input to filesystem-mutating actions and raw input to search actions.
src/input/mod.rs Introduces EventContext struct aggregating AppState + viewer/loader/job handles; converts all submodule visibilities to pub(crate).
src/input/normal.rs Extracts F-key branches into named helpers; eliminates terminal_state_file_path wrapper. Pure refactor.
src/lib.rs Removes crate-root re-exports that were never consumed via lc:: by any consumer.

Sequence Diagram

sequenceDiagram
    participant EL as Event Loop (main.rs)
    participant EC as EventContext (input/mod.rs)
    participant DH as dispatch_event
    participant DG as handle_dialog (dialogs.rs)

    Note over EL,EC: Before this PR
    EL->>DH: dispatch_event(state, viewer_state, viewer_loader, image_preview_loader, running_job, terminal, size, event)
    DH->>DG: handle_dialog(state, viewer_state, running_job, input, terminal_height)

    Note over EL,EC: After this PR
    EL->>EC: Build EventContext grouping all mutable handles
    EC->>DH: "dispatch_event(&mut ctx, terminal, event)"
    DH->>DG: handle_dialog(ctx, key) returns InputOutcome
Loading

Reviews (1): Last reviewed commit: "test: harden two load-sensitive tests ag..." | Re-trigger Greptile

Comment thread Cargo.toml
Comment on lines +43 to +50
# LTO + a single codegen unit let the optimizer inline across crate
# boundaries (slower link, faster/smaller binary).
[profile.release]
lto = true
codegen-units = 1

# Dev: light optimization keeps TUI interaction responsive while
# preserving full debug info and fast incremental rebuilds.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 PR scope far exceeds description

The PR title uses the test: conventional-commit prefix and the description states "production code is untouched," but 25 of 28 changed files are production code with 2000+ non-test net insertions. The actual scope includes: a new EventContext struct replacing the 7-argument dispatch_event signature, a new AppLoop struct grouping watcher/viewer state in main.rs, refactored InputOutcome enum and extracted handler functions in dialogs.rs/normal.rs, visibility narrowing of input/ submodules to pub(crate), and removal of crate-root re-exports in lib.rs. These are real behavioral and API-surface changes whose review is obscured by the description.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment thread Cargo.toml
Comment on lines +49 to +52
# Dev: light optimization keeps TUI interaction responsive while
# preserving full debug info and fast incremental rebuilds.
[profile.dev]
opt-level = 1

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 opt-level = 1 changes test-run semantics and build ergonomics for all contributors

cargo test uses the dev profile by default, so this immediately raises optimization for every CI and local test run. The trade-off is intentional (faster execution → fewer load-sensitive races), but it also slows incremental test compilation and can mask unoptimized-only memory-ordering bugs. AGENTS.md lists cargo test --locked in the CI gate; now every CI compilation is heavier. If the goal is faster test-binary execution, consider instead targeting the bespoke test profile ([profile.test]) so debug and test-compilation are not coupled.

Suggested change
# Dev: light optimization keeps TUI interaction responsive while
# preserving full debug info and fast incremental rebuilds.
[profile.dev]
opt-level = 1
# Dev: light optimization keeps TUI interaction responsive while
# preserving full debug info and fast incremental rebuilds.
# NOTE: `cargo test` uses this profile; raising opt-level here slows incremental
# test-compilation. Consider `[profile.test] opt-level = 1` to scope the
# trade-off to test binaries only, leaving plain debug builds unaffected.
[profile.dev]
opt-level = 1

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@opencode-agent

Copy link
Copy Markdown

Code Review: test: harden two load-sensitive tests against CPU oversubscription

Summary

The actual delta vs main is 2 files, +20/-5 lines — both test-only, as the PR description claims. The inflated stats (28 files, +2411/-1435) come from 3 stale commits that duplicate content already merged via #84. The PR needs a rebase.


Issue 1 (Blocking): Stale commits inflate the PR / misleading diff

The branch contains 4 commits, but the first 3 (822a9a5, 16914e5, 81e5437) are equivalent to changes already in main (merged as #84). GitHub computes the diff against the merge-base (ec3ec73), so the PR appears 140x larger than its real delta. This:

  • Exceeds Sourcery's 150K-char review limit (no AI review possible)
  • Makes human review impractical
  • Risks merge confusion

Recommendation: Rebase onto current main to drop the 3 redundant commits, leaving only 17e3775.


Issue 2 (Minor concern): Blocking recv() removes fail-fast safety net

src/ui/viewer/tests.rs:91 — switching recv_timeout(1s) → blocking recv():

The claim that the worker is "guaranteed to send" is correct: the Drop impl sets cancel with Ordering::Release, the worker polls with Ordering::Relaxed in a yield_now() loop (atomics can't be optimized away, relaxed stores become visible in finite time), and done_tx is alive until the worker exits. So recv() will return.

However, blocking recv() means if something does go wrong (future refactor breaks the Drop/cancel invariant, OS scheduler pathology), the test hangs indefinitely instead of failing in 1s. In CI this wastes the entire job timeout rather than producing a clean failure.

Suggestion: Use a generous timeout instead of removing it entirely — preserves robustness under load while keeping fail-fast:

.recv_timeout(std::time::Duration::from_secs(30))
.expect("worker should send done after observing cancel")

Issue 3 (Minor): Retry widening 20→50 may not address the actual race

src/app/debug_log.rs:251 — the oversized file is created before the loop, so only iteration 1 can actually exercise truncation. If a concurrent log() call wins the "fresh open" on iteration 1 and truncates first, the test still passes (file is small, contains "attempt") — but iterations 2+ can never re-test truncation since the file is no longer oversized.

The increase from 20→50 is harmless but likely ineffective — the retry only helps if iteration 1's metadata read returns a transient error (unwrap_or(0)), in which case 20 was already sufficient.

Suggestion (stronger fix): Re-create the oversized file at the top of each retry iteration so every attempt genuinely exercises truncation:

for attempt in 0..50 {
    reset_for_test();
    std::fs::write(&path, &vec![b'X'; (MAX_LOG_SIZE_BYTES + 1) as usize]).unwrap();
    log(format_args!("attempt {attempt}"));
    // ...
}

Issue 4 (Observation): Root cause documented but not fixed

The 15-line comment in debug_log.rs accurately documents that LOG_FILE, CHECK_COUNTER, and TEST_CACHE_HOME are process-global statics that break test isolation. This is the real fragility — any parallel test calling debug_log! contaminates this test's state. The retry loop is a workaround, not a fix. A proper fix (dependency-injected log path) is a larger refactor and appropriately out of scope here, but worth tracking as tech debt.


Verification

Check Result
cargo fmt --check Pass
cargo clippy --locked --all-targets -- -D warnings Pass
cargo test --locked (294 tests) All pass
Both modified tests individually Pass

Verdict

The actual code changes are sound and well-reasoned. The two concerns are minor (fail-fast safety net, retry effectiveness). Recommend rebasing to drop the 3 stale commits before merge — the misleading diff size is the only real blocker.

New%20session%20-%202026-06-15T02%3A19%3A03.641Z
opencode session  |  github run

Both are test-only changes; production code is untouched. An investigation
(1220 suite runs under CPU saturation, zero failures) showed the suite is
deterministic under a normal single-cargo run — the "flake" seen during a
parallel-agent wave was shared-worktree cargo contention, not test logic.
These two latent fragilities are hardened defensively:

- viewer: test_viewer_loader_drop_cancels_worker waited on a detached
  busy-wait (yield_now) worker with recv_timeout(1s).unwrap(); under heavy
  load the starved worker could miss the 1s deadline. Drop sets the cancel
  flag, so the worker is guaranteed to send — switch to a blocking recv()
  and drop the now-dead TEST_CHANNEL_TIMEOUT_SECS constant.

- debug_log: log_truncates_oversized_file shares process-global
  LOG_FILE/CHECK_COUNTER/TEST_CACHE_HOME with concurrent production log()
  calls from other parallel tests. Widen the retry budget (20 -> 50) and
  document why the loop must not be collapsed into a single attempt.
@opencode-agent

Copy link
Copy Markdown

Both modified tests pass. Here is my review.


Code Review: test: harden two load-sensitive tests against CPU oversubscription

Scope verification

The actual delta vs main is exactly 2 files, +20/-5, test-only — matching the PR description precisely. The earlier review comments (Greptile, Sourcery, Gemini) describing a 25-28 file production refactor (EventContext, AppLoop, Cargo.toml profiles, lib.rs re-export removal) are stale — they describe un-rebased commits that are no longer present. The branch now contains a single commit (368f118) on top of main. Those concerns do not apply to the code under review.

Change 1: src/ui/viewer/tests.rs — blocking recv()

Correct. The invariant holds:

  • Drop for CancellableLoader stores cancel = true with Ordering::Release (loader.rs:76).
  • The test worker polls with Ordering::Relaxed in a yield_now() loop (tests.rs:75-77). For a single atomic, a relaxed load is guaranteed to observe a release store in finite time (no non-atomic memory needs ordering here), so the loop terminates and done_tx.send(()) runs.
  • done_rx is alive (the test is blocked on it), so send succeeds and recv() returns Ok(()).
  • If the worker thread ever panicked, done_tx would be dropped and recv() would return Err, failing cleanly via .expect(...) rather than hanging.

So the blocking recv() cannot hang given the current code. Removing the now-unused TEST_CHANNEL_TIMEOUT_SECS constant is correct (verified: no other references).

Minor nit (non-blocking): A future refactor that breaks the cancel-then-send invariant would cause an indefinite CI hang instead of a 1s failure. A generous timeout (e.g. recv_timeout(Duration::from_secs(30))) would preserve fail-fast behavior while remaining robust under load — this is the same suggestion Gemini/opencode-agent made. It's a reasonable defense-in-depth preference, not a correctness issue. The current choice is acceptable.

Change 2: src/app/debug_log.rs — retry budget 20→50 + comment

Correct and well-documented. The 15-line comment accurately describes the process-global statics (LOG_FILE mutex, CHECK_COUNTER, TEST_CACHE_HOME) and why TEST_MUTEX only serializes the two debug_log tests against each other, not against the hundreds of other parallel tests that invoke debug_log!. This is valuable documentation of a real isolation weakness.

The retry increase is harmless and defensively reasonable. Note one subtlety worth acknowledging (raised by opencode-agent): the oversized file is created once before the loop, so genuine truncation can only be observed on the first log() call that sees the file as oversized; subsequent iterations can't re-exercise truncation without re-oversizing the file. The retry primarily helps tolerate transient metadata-read races (unwrap_or(0)len == 0 → condition fails → retry). The widening is still a net positive, and re-oversizing inside the loop would be a stronger (but more invasive) fix — appropriately out of scope here. The root cause (process-global statics breaking test isolation) is correctly documented as tech debt rather than "fixed" with a band-aid.

Verification

Check Result
cargo fmt --check Pass
cargo clippy --locked --all-targets -- -D warnings Pass
test_viewer_loader_drop_cancels_worker Pass
log_truncates_oversized_file Pass

Verdict

Approve. The changes are sound, minimal, correctly scoped, and well-reasoned. The commit message's diagnostic methodology (1220 runs under saturation) is thorough. Project conventions are respected (inline #[cfg(test)], explanatory comments, no production code touched). The single optional improvement is a generous recv_timeout instead of an indefinite recv(), but this is a stylistic preference that does not block merge.

New%20session%20-%202026-06-15T02%3A36%3A22.832Z
opencode session  |  github run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant