Fix conversation test isolation across async threads and file roots#1220
Conversation
📝 WalkthroughWalkthroughThis PR fixes test infrastructure to eliminate non-hermetic behavior in conversation runtime tests. It replaces thread-local environment overrides with process-wide mutex-protected storage, explicitly configures audit modes and file root paths in tests, and adds concurrency validation to prevent state leakage across worker threads. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
crates/app/src/context.rs (1)
236-240: LGTM! Test isolation improvement is correct.Explicitly setting
AuditMode::InMemoryensures this test avoids filesystem I/O and HOME-derived paths, making it hermetic under parallel execution. The change correctly achieves the stated PR objective without affecting the test's purpose (verifying network egress capability grants).Optional clarity suggestion: The error message on line 240 still says "default config should succeed" but the config is now explicitly modified to use
InMemoryaudit mode. Consider updating it to something like "bootstrap with in-memory audit should succeed" for accuracy.📝 Proposed message clarification
- .expect("bootstrap with default config should succeed"); + .expect("bootstrap with in-memory audit should succeed");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/app/src/context.rs` around lines 236 - 240, Update the assertion message to reflect that the test now uses an explicit in-memory audit config: change the error string passed to expect on the bootstrap_kernel_context_with_config call (after configuring LoongClawConfig and setting audit.mode = AuditMode::InMemory) from "bootstrap with default config should succeed" to a message like "bootstrap with in-memory audit should succeed" so it accurately describes the setup involving LoongClawConfig and AuditMode::InMemory.crates/app/src/conversation/context_engine_registry.rs (1)
242-253: Harden override cleanup against panic paths.
clear_context_engine_env_override()is manual here; if the test panics before that line, state can leak to subsequent tests.Proposed hardening
let _env_lock = conversation_selector_env_lock() .lock() .unwrap_or_else(|poisoned| poisoned.into_inner()); + struct ClearOverrideOnDrop; + impl Drop for ClearOverrideOnDrop { + fn drop(&mut self) { + clear_context_engine_env_override(); + } + } + let _clear_override = ClearOverrideOnDrop; set_context_engine_env_override(Some("registry-custom")); let observed = std::thread::spawn(context_engine_id_from_env) .join() .expect("join thread"); - - clear_context_engine_env_override(); assert_eq!(observed.as_deref(), Some("registry-custom"));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/app/src/conversation/context_engine_registry.rs` around lines 242 - 253, The test context_engine_env_override_is_visible_across_threads manually calls set_context_engine_env_override(...) and clear_context_engine_env_override(), which can leak state if the test panics; wrap the override in a RAII-style guard or use panic-safe cleanup (e.g., create a ContextEngineEnvOverrideGuard that calls set_context_engine_env_override in its constructor and clear_context_engine_env_override in Drop, or use std::panic::catch_unwind to ensure clear_context_engine_env_override runs), keep the existing conversation_selector_env_lock() usage and run context_engine_id_from_env in the spawned thread as before so the override is always cleared even on panic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/app/src/context.rs`:
- Around line 236-240: Update the assertion message to reflect that the test now
uses an explicit in-memory audit config: change the error string passed to
expect on the bootstrap_kernel_context_with_config call (after configuring
LoongClawConfig and setting audit.mode = AuditMode::InMemory) from "bootstrap
with default config should succeed" to a message like "bootstrap with in-memory
audit should succeed" so it accurately describes the setup involving
LoongClawConfig and AuditMode::InMemory.
In `@crates/app/src/conversation/context_engine_registry.rs`:
- Around line 242-253: The test
context_engine_env_override_is_visible_across_threads manually calls
set_context_engine_env_override(...) and clear_context_engine_env_override(),
which can leak state if the test panics; wrap the override in a RAII-style guard
or use panic-safe cleanup (e.g., create a ContextEngineEnvOverrideGuard that
calls set_context_engine_env_override in its constructor and
clear_context_engine_env_override in Drop, or use std::panic::catch_unwind to
ensure clear_context_engine_env_override runs), keep the existing
conversation_selector_env_lock() usage and run context_engine_id_from_env in the
spawned thread as before so the override is always cleared even on panic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1cf04380-7e6c-4a9b-bde9-65e4420c3cfd
📒 Files selected for processing (4)
crates/app/src/context.rscrates/app/src/conversation/context_engine_registry.rscrates/app/src/conversation/tests.rscrates/app/src/conversation/turn_middleware_registry.rs
|
Follow-up after the first CI rerun exposed one more default-suite race on Ubuntu:
Local verification after the follow-up commit (
Everything passed locally. The new GitHub Actions rerun is pending. |
|
Second follow-up pushed in This addresses the remaining Windows default-suite failure:
Local verification for this follow-up:
Note: the previous |
|
Also refreshed The rerun surfaced the usual governance freshness gate again (
That change is only the generated timestamp refresh needed to keep the tracked monthly drift report current. |
|
LoongClaw AI scientist has claimed this case and is running review execution. |
LoongClaw QA Review — PR #1220Reviewed commit: Findings
Coverage
Open Questions
VerdictNeeds a panic-safe cleanup guard in the new context-engine override test, and the branch also needs a rebase before it is merge-ready. |
2ec3dc5 to
eefbd47
Compare
The flaky conversation tests relied on thread-local env overrides and ambient filesystem defaults. Tokio multi-threaded tests and default audit or file roots allowed state to disappear or leak across unrelated tests. Use process-wide synchronized test overrides, bind file tool roots to harness temp directories, and keep the bootstrap audit sink in-memory for isolated tests. Constraint: Tests must stay compatible with existing runtime selection helpers and harness setup Rejected: Serialize the entire suite | hides the isolation bug and slows CI Rejected: Add retries or sleeps | masks shared-state defects instead of fixing them Confidence: high Scope-risk: narrow Directive: Test-only runtime overrides must remain visible across spawned threads and must not depend on ambient cwd or HOME-derived defaults Tested: cargo fmt --all -- --check Tested: cargo clippy --workspace --all-targets --all-features -- -D warnings Tested: cargo test --workspace --locked Tested: cargo test --workspace --all-features --locked Tested: ./scripts/check_architecture_boundaries.sh Not-tested: GitHub Actions rerun for this branch
The remaining Ubuntu default-suite failure came from tool lease secret initialization exposing an empty file between create and write. Feishu callback tests were the first reliable CI witness, but the defect lived in shared runtime secret creation. Stage the secret in a temporary file, persist it without clobbering an existing secret, and add a parallel-first-use regression test so the lease authority stays readable under concurrent startup. Constraint: Existing secret files must remain authoritative and invalid on-disk state must still fail closed Rejected: Serialize callers with a process-wide lock | would not protect multi-process first use and hides the file publication race Rejected: Regenerate over empty or malformed secret files | weakens fail-closed behavior for potentially tampered state Confidence: high Scope-risk: narrow Directive: Runtime secrets that may be initialized concurrently must be published atomically instead of becoming visible before writes complete Tested: cargo fmt --all -- --check Tested: cargo clippy --workspace --all-targets --all-features -- -D warnings Tested: cargo test -p loongclaw-app --lib issue_tool_lease_parallel_first_use_keeps_secret_readable -- --nocapture Tested: cargo test -p loongclaw-app --lib feishu_webhook_card_callback_delayed_update_waits_for_response_body_consumption -- --nocapture Tested: cargo test -p loongclaw-app --lib Tested: cargo test --workspace --locked Tested: cargo test --workspace --all-features --locked Tested: ./scripts/check_architecture_boundaries.sh Related: eastreams#1219 Not-tested: GitHub Actions rerun after this follow-up commit
The remaining Windows failure came from the loser path after concurrent secret creation. On Windows, a competing publish could report AlreadyExists before the winning file became readable to a follow-up load, which left unrelated tool-search tests failing under another test's temporary LOONG_HOME. Keep the atomic publish path, but treat the loser branch as a bounded publication-wait path instead of a single immediate read, and add a regression that exercises delayed secret visibility. Constraint: Secret files must still fail closed when contents are empty, malformed, or tampered Rejected: Fall back to generating a fresh secret on loser reads | would desynchronize leases across concurrent publishers Rejected: Broaden environment locking around all tool tests | hides the publication race instead of fixing the shared runtime primitive Confidence: high Scope-risk: narrow Directive: Shared runtime secret stores must separate atomic publication from bounded post-publish visibility handling on Windows-class filesystems Tested: cargo fmt --all Tested: cargo test -p loongclaw-app --lib read_tool_lease_secret_after_competitor_publish_waits_for_visible_secret -- --nocapture Tested: cargo test -p loongclaw-app --lib issue_tool_lease_parallel_first_use_keeps_secret_readable -- --nocapture Tested: cargo test -p loongclaw-app --lib tool_search_matches_multilingual_queries_across_languages -- --nocapture Tested: cargo test -p loongclaw-app --lib Tested: cargo clippy --workspace --all-targets --all-features -- -D warnings Tested: ./scripts/check_architecture_boundaries.sh Related: eastreams#1219 Not-tested: GitHub Actions rerun after this Windows-focused follow-up
Rebased the PR branch onto current dev, replaced the manual cleanup in the process-wide context-engine override test with a scoped drop guard, and refreshed the tracked April architecture drift snapshot on the rebased tree. Constraint: The override path is process-global under test, so panic cleanup must restore prior state Rejected: Keep manual clear_context_engine_env_override cleanup | leaks global override state if the test panics before teardown Confidence: high Scope-risk: narrow Reversibility: clean Directive: Any future process-wide test override must use scoped cleanup rather than trailing manual reset calls Tested: cargo fmt --all -- --check Tested: cargo clippy --workspace --all-targets --all-features -- -D warnings Tested: cargo test -p loongclaw-app --lib scoped_context_engine_env_override_clears_on_panic -- --nocapture Tested: cargo test -p loongclaw-app --lib context_engine_env_override_is_visible_across_threads -- --nocapture Tested: cargo test -p loongclaw-app --lib turn_middleware_env_override_is_visible_across_threads -- --nocapture Tested: cargo test -p loongclaw-app --lib issue_tool_lease_parallel_first_use_keeps_secret_readable -- --nocapture Tested: cargo test -p loongclaw-app --lib read_tool_lease_secret_after_competitor_publish_waits_for_visible_secret -- --nocapture Tested: cargo test -p loongclaw-app --lib feishu_webhook_card_callback_delayed_update_waits_for_response_body_consumption -- --nocapture Tested: cargo test -p loongclaw-app --lib tool_search_matches_multilingual_queries_across_languages -- --nocapture Tested: cargo test -p loongclaw-app --lib Tested: cargo test --workspace --locked Tested: cargo test --workspace --all-features --locked Tested: scripts/generate_architecture_drift_report.sh docs/releases/architecture-drift-2026-04.md Tested: bash scripts/check_architecture_drift_freshness.sh docs/releases/architecture-drift-2026-04.md Tested: ./scripts/check_architecture_boundaries.sh Not-tested: GitHub Actions rerun after push
eefbd47 to
98f6284
Compare
gh-xj
left a comment
There was a problem hiding this comment.
Reviewed the rebased head 98f6284 after conflict resolution and the panic-safe cleanup fix. Local verification is green (cargo fmt --all -- --check, cargo clippy --workspace --all-targets --all-features -- -D warnings, cargo test -p loongclaw-app --lib, cargo test --workspace --locked, cargo test --workspace --all-features --locked, ./scripts/check_architecture_boundaries.sh), and the required GitHub checks are passing.
LoongClaw QA Review — PR #1220Reviewed commit: Findings
Coverage
Open Questions
VerdictThe branch is conflict-free, required checks are green, and the code is ready to merge once a non-pusher approving review is recorded. |
Summary
Full-suite Rust tests could fail nondeterministically because conversation-runtime tests relied on thread-local selector overrides and ambient filesystem or audit defaults.
The failures kept CI red, obscured real regressions, and made conversation-runtime work look unstable even when the product code was unchanged.
tools.file_rootto each harness temp directory for the multi-step file tool testsLinked Issues
Change Type
Touched Areas
Risk Track
Validation
cargo fmt --all -- --checkcargo clippy --workspace --all-targets --all-features -- -D warningscargo test --workspace --lockedcargo test --workspace --all-features --lockedCommands and evidence:
User-visible / Operator-visible Changes
Failure Recovery
Revert
dddf25d0e.New tests that depend on ambient cwd, HOME-derived defaults, or per-thread selector overrides instead of harness-scoped state.
Reviewer Focus
crates/app/src/conversation/context_engine_registry.rscrates/app/src/conversation/turn_middleware_registry.rscrates/app/src/conversation/tests.rscrates/app/src/context.rsPlease focus on whether the new synchronized test overrides stay strictly test-only and whether the harness file-root helper covers the multi-step file-tool chain without leaking production semantics.
Summary by CodeRabbit