Conversation
…scaffolding Two issues observed in the 2026-04-27 transcript on dispute 096fb2e4-…: 1. Duplicate session reopened after summary delivery. `deliver_summary` transitioned the session `summary_pending → summary_delivered → closed` synchronously. The eligibility predicate in `mediation::eligibility` treats a session in state `closed` as "session ended, a fresh one may open" (re-dispute semantics), so the next engine tick saw the dispute as eligible again and opened a second session, sending a brand-new English greeting and re-summarising to the solver. Fix: keep the session in `summary_delivered`. The eligibility EXISTS clause already blocks re-mediation while the session is in any non-(closed, superseded_by_human) state, and `latest_open_session_for` / `list_live_sessions` already excluded `summary_delivered`. The legal `summary_delivered → closed` transition is taken later by the `dispute_resolved` handler when Mostro closes the dispute. 2. Visible "Buyer:" / "Seller:" / "Round N." prefixes leaked transcript scaffolding into the user-facing chat. The drafter used those prefixes to keep the inner event id unique between simultaneous buyer/seller sends. Fix: drop the visible prefixes and ride the audience out-of-band on a custom `m-aud` tag on the inner event (see `chat::outbound::build_wrap_with_audience`). The tag changes the inner event id without changing user-visible content, preserving the `(session_id, inner_event_id)` uniqueness invariant on `mediation_messages`. Tests updated: - `tests/phase3_cooperative_summary.rs` and `tests/phase3_followup_summary.rs` now expect the post-summary session state to be `summary_delivered` rather than `closed`. - `tests/phase3_followup_round.rs` no longer asserts the `"Round N. "` prefix on user-visible body and explicitly forbids any `Round-…` prefix from leaking. - `tests/phase3_session_open.rs` no longer asserts the party label (`buyer` / `seller`) inside the user-visible body; the `party` column on `mediation_messages` remains the routing source of truth. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
WalkthroughThe PR implements multi-party mediation messaging with language-aware policies and audience-tagged inner events. Changes include adding a language-matching system prompt policy, creating a new Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 30c0fbde32
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @@ -1294,7 +1340,6 @@ pub async fn deliver_summary( | |||
| now, | |||
There was a problem hiding this comment.
Close summary-delivered sessions on dispute resolution
This change now leaves sessions at summary_delivered, but the only close path later (dispute_resolved) looks up sessions via latest_open_session_for, which explicitly excludes summary_delivered (src/db/mediation.rs, state NOT IN ('closed','summary_delivered',...)). That means a summarized session will never reach closed, and the eligibility predicate (s.state NOT IN ('closed','superseded_by_human')) will keep treating it as an active blocker, preventing future mediation opens/re-dispute flows for that dispute_id.
Useful? React with 👍 / 👎.
| body_len = body_preview.map(|s| s.len()).unwrap_or(0), | ||
| body_preview = body_preview.map(|s| truncate(s, 800)).unwrap_or(""), | ||
| note = note, |
There was a problem hiding this comment.
Remove raw response previews from info-level diagnostics
The new diagnostic logger emits up to 800 chars of raw provider response body at info level on failures. In malformed/non-200 cases, OpenAI-compatible gateways can include generated content or echoed request fragments, which may contain party transcript text and rationale; this breaks the existing metadata-only logging discipline and leaks sensitive dispute data into default production logs (RUST_LOG=info).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (10)
src/reasoning/openai.rs (1)
379-433:⚠️ Potential issue | 🟠 MajorBody preview at
info!may leak party statements / PII.
log_response_failureis invoked on non-success HTTP status, envelope JSON parse failure, and "empty choices", and in those branches it includesbody_preview(up to 800 bytes) atinfo!level. The existing invariant comment at lines 434–440 (FR-120 / TC-103) explicitly states the full model output may contain party statements and free-text rationale that the spec forbids in general logs — which is exactly what may sit in the body when JSON envelope parsing fails (e.g., a gateway returns prose, a partially echoed request, or a model that ignoredresponse_formatand dumped the transcript-aware response). A truncated 800‑byte preview atinfo!will be picked up by the defaultRUST_LOG=infoconfig and persisted by any log aggregator.Suggested mitigations (any one is fine):
- Drop
body_previewtodebug!so default deployments don't capture it; keep status/headers/length atinfo!.- Reduce the truncation budget (e.g., 200 bytes is enough to identify gateway HTML error pages).
- Hash + length the body (mirroring the success path at lines 442–448) instead of including the raw bytes.
🛡️ Example: keep operator-actionable metadata at `info!`, push raw body to `debug!`
fn log_response_failure( attempt: u32, status: reqwest::StatusCode, headers: &HeaderMap, body_preview: Option<&str>, note: &str, ) { let header = |name: &str| -> String { headers .get(name) .and_then(|v| v.to_str().ok()) .unwrap_or("(absent)") .to_string() }; info!( attempt, %status, content_type = %header("content-type"), content_encoding = %header("content-encoding"), transfer_encoding = %header("transfer-encoding"), content_length = %header("content-length"), body_len = body_preview.map(|s| s.len()).unwrap_or(0), - body_preview = body_preview.map(|s| truncate(s, 800)).unwrap_or(""), note = note, "openai-compatible response failure (diagnostic dump)" ); + if let Some(b) = body_preview { + debug!(attempt, %status, body_preview = %truncate(b, 800), "openai-compatible response body preview"); + } }Also applies to: 785-811
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/reasoning/openai.rs` around lines 379 - 433, log_response_failure currently logs a raw/truncated body_preview at info!, which can leak PII; change its behavior so that info-level logs only include safe metadata (status, headers, body length) and push any raw body content to debug! (or replace the preview with a hash + length), and update all call sites (the non-success HTTP branch that constructs ReasoningError::Unreachable, the serde_json::from_str parse-error branch that returns ReasoningError::MalformedResponse, and the empty-choices branch) to stop passing the raw body for info-level logging; you can implement this by modifying log_response_failure to accept an enum/flag (e.g., BodyLog::DebugOrHash) or to always compute a short SHA-256 + length for info and emit the full truncated body only at debug, and reuse the same change for the other occurrences of log_response_failure elsewhere in the file.tests/phase3_followup_summary.rs (2)
393-410:⚠️ Potential issue | 🟡 MinorStale "closed session" wording in the idempotency block.
The comment says "calling advance_session_round again on a closed session must skip (state gate blocks)" and the
expectmessage echoes "second call on a closed session must be a no-op". The session is now insummary_delivered, notclosed; update both so future readers don't chase a state that no longer exists at this point.📝 Suggested fix
- // (e) Idempotency for extra safety — calling advance_session_round - // again on a closed session must skip (state gate blocks) and - // NOT add more rows. + // (e) Idempotency for extra safety — calling advance_session_round + // again on a session sitting at `summary_delivered` must skip + // (state gate blocks anything other than `awaiting_response`) + // and NOT add more rows. @@ - .expect("second call on a closed session must be a no-op"); + .expect("second call on a summary_delivered session must be a no-op");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/phase3_followup_summary.rs` around lines 393 - 410, Update the stale wording that mentions "closed session" to reflect the current session state `summary_delivered`: change the comment above the idempotency test and the `.expect("second call on a closed session must be a no-op")` message to say something like "calling advance_session_round again on a session in `summary_delivered` must skip (state gate blocks)" and ".expect(\"second call on a session in summary_delivered must be a no-op\")" so the comment and the `advance_session_round` expectation match the actual session state and avoid confusion.
1-19:⚠️ Potential issue | 🟡 MinorStale top-of-file doc — still mentions
summary_delivered → closedwalk.The module doc lists the lifecycle as
classified → summary_pending → summary_delivered → closed, but the test now asserts the session stops atsummary_delivered. Update so the spec the tests pin and the doc comment agree.📝 Suggested doc fix
-//! - invokes `deliver_summary` exactly once, which walks the -//! session through `classified → summary_pending → -//! summary_delivered → closed`; +//! - invokes `deliver_summary` exactly once, which walks the +//! session through `classified → summary_pending → +//! summary_delivered`. The legal `summary_delivered → closed` +//! transition is intentionally deferred to the +//! `dispute_resolved` handler (Mostro closing the dispute).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/phase3_followup_summary.rs` around lines 1 - 19, Update the stale module doc at the top of tests/phase3_followup_summary.rs so the lifecycle string matches the test assertions (the session stops at summary_delivered); replace the current "classified → summary_pending → summary_delivered → closed" text with the corrected lifecycle (e.g., "classified → summary_pending → summary_delivered") so the doc and the assertions about advance_session_round/deliver_summary align.src/mediation/mod.rs (3)
1053-1058:⚠️ Potential issue | 🟡 MinorStale
deliver_summarydoc — state machine no longer ends atclosed.The function-level doc says
State machine: classified → summary_pending → summary_delivered → closed, but the implementation now intentionally stops atsummary_delivered(the inline comment block at Line 1297-1318 explains why). The two doc passages should agree — the top one is what shows up in rustdoc and is the first thing a maintainer reads.📝 Suggested fix
-/// State machine: `classified → summary_pending → summary_delivered -/// → closed`. Each transition is written with -/// [`db::mediation::set_session_state`] (which `debug_assert!`s -/// legality in debug builds). +/// State machine: `classified → summary_pending → +/// summary_delivered`. Each transition is written with +/// [`db::mediation::set_session_state`] (which `debug_assert!`s +/// legality in debug builds). The legal `summary_delivered → +/// closed` step is intentionally NOT taken here — see the inline +/// comment in step (6) for the eligibility-predicate rationale — +/// and is performed later by the `dispute_resolved` handler when +/// Mostro closes the dispute.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/mediation/mod.rs` around lines 1053 - 1058, Update the top-level doc for deliver_summary to match the implementation: change the state machine description from "classified → summary_pending → summary_delivered → closed" to "classified → summary_pending → summary_delivered" (or otherwise indicate it intentionally stops at summary_delivered), and ensure the doc mentions that transitions are performed via db::mediation::set_session_state; keep the existing inline comment explanation intact so docs and code agree about why the session does not progress to closed.
186-204:⚠️ Potential issue | 🟡 MinorStale
draft_and_send_initial_messagedoc — still describes role prefixes.The contract bullet says "Builds per-party gift-wraps with role prefixes so the inner event ids cannot collide on identical content", but the implementation now uses the
m-audaudience tag for that. Future maintainers reading rustdoc will look for code that doesn't exist.📝 Suggested fix
/// Contract: -/// - Builds per-party gift-wraps with role prefixes so the inner -/// event ids cannot collide on identical content. +/// - Builds per-party gift-wraps using +/// `outbound::build_wrap_with_audience` with `Some("buyer")` / +/// `Some("seller")` so the inner event ids cannot collide on +/// identical content (the audience rides on a non-rendered +/// `m-aud` tag rather than mixed into the user-visible body).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/mediation/mod.rs` around lines 186 - 204, The doc for draft_and_send_initial_message is stale: it still claims the code "Builds per-party gift-wraps with role prefixes" but the implementation uses the m-aud audience tag instead; update the rustdoc on draft_and_send_initial_message to reflect the current behavior (mention that inner event id uniqueness is ensured via the m-aud audience tag, not role-prefixed wraps), keep the rest of the bullets about transactional outbox, bounded retry, emitting outbound_sent after successful publish (mediation_events), and the session state transition (reference session::open_session) intact, and remove or reword any references to role prefixes so docs match the code.
352-394:⚠️ Potential issue | 🟡 MinorStale
draft_and_send_followup_messagedoc — still describes the "Round N. role:" body prefix.Bullet (1) says the body is prefixed with
"Round {N}. {role}: "and the closing sentence says "The round-number and role prefixes are cosmetic and do not affectpolicy_hash". The implementation now drops both —round_numberonly feeds the trace span andm-audcarries the audience. Update so the rustdoc matches the code; otherwise this is the first place a future contributor will look when reasoning about the visible body.📝 Suggested fix
-/// 1. **Round-number marker in the body.** Each party-facing content -/// is prefixed with `"Round {N}. {role}: "` so parties can -/// distinguish a follow-up question from the opening one. -/// `round_number` is the 1-based round counter (round 1 is the -/// first follow-up, *after* the opening round). +/// 1. **Round-number is metadata-only.** `round_number` is the +/// 1-based mid-session counter (round 1 is the first follow-up, +/// *after* the opening round). It is recorded on the tracing +/// span and audit logs but is NOT prefixed into the user-visible +/// body — that scaffolding was leaking into chat (observed +/// 2026-04-27). Inner-event-id distinctness for identical +/// buyer/seller text is preserved by the per-party `m-aud` tag +/// on the inner event (see `build_wrap_with_audience`). @@ -/// The prompt bundle is NOT modified by this path: the `content` -/// pushed through the gift-wrap is the per-party text (`buyer_text` -/// / `seller_text`) as returned by `policy::evaluate` from the same -/// pinned bundle the session was opened with. The round-number and -/// role prefixes are cosmetic and do not affect `policy_hash`. +/// The prompt bundle is NOT modified by this path: the `content` +/// pushed through the gift-wrap is the per-party text (`buyer_text` +/// / `seller_text`) as returned by `policy::evaluate` from the same +/// pinned bundle the session was opened with.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/mediation/mod.rs` around lines 352 - 394, The doc comment for draft_and_send_followup_message is outdated: it still claims the body is prefixed with "Round {N}. {role}:" and that round-number/role prefixes are cosmetic, but the implementation no longer adds those prefixes (round_number is only used for spans and m-aud carries audience); update the rustdoc for draft_and_send_followup_message to remove the bullet about body prefixes, clarify that round_number is used only for tracing/span purposes and that audience is conveyed via the m-aud claim, and keep the note about policy_hash unchanged (or explicitly state prefixes are not used so policy_hash is unaffected); reference the function name draft_and_send_followup_message and the symbols round_number, m-aud, and policy_hash when editing the commentary.tests/phase3_followup_round.rs (2)
154-160:⚠️ Potential issue | 🟡 MinorSeed-rationale comment partially stale.
The seed is still required so
count_classification_eventsreturns the correctfollowup_numberfor the policy bypass-window, but the rationale "or the Round-N label renders asRound 0. ..." no longer applies — the Round-N label is no longer mixed into user-visible content. Reword so future maintainers don't hunt for a label that does not exist.📝 Suggested fix
- // One `classification_produced` event to stand in for the - // round-0 initial classification. In production the initial - // flow writes one such row; `advance_session_round` derives - // `followup_number` from the count of these events (the first - // mid-session eval is N=1 = after 1 prior classification), - // so the seed must include it or the Round-N label renders - // as "Round 0. ...". + // One `classification_produced` event to stand in for the + // round-0 initial classification. In production the initial + // flow writes one such row; `advance_session_round` derives + // `followup_number` from the count of these events (the first + // mid-session eval is N=1 = after 1 prior classification). + // The seed must include it so the policy bypass-window logic + // sees the correct ordinal — the visible "Round N." prefix is + // gone, but the trace span / audit logs still want a sane + // counter.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/phase3_followup_round.rs` around lines 154 - 160, Update the seed-rationale comment near the `classification_produced` test seed so it no longer mentions a now-removed user-facing "Round-N" label; instead state concisely that the seed is required so `count_classification_events` (and therefore `advance_session_round`) returns the correct `followup_number` used by the policy bypass-window logic. Keep the reference to `classification_produced`, `count_classification_events`, and `advance_session_round` so maintainers can find the relevant code paths.
295-297:⚠️ Potential issue | 🟡 MinorStale comment directly contradicts the new assertions.
This block-comment claims "the new two carry the
Round 1.prefix", but the assertion at Line 335-338 explicitly forbids anyRound-prefixed body. Update so the inline guidance matches what the test actually verifies.📝 Suggested fix
- // Assert: 4 outbound rows (round 0 × 2 + round 1 × 2), the new - // two carry the `"Round 1. "` prefix, and the session stays in - // `awaiting_response` with the marker advanced. + // Assert: 4 outbound rows (round 0 × 2 + round 1 × 2), the new + // two carry the model-supplied follow-up text verbatim (no + // `"Round N. "` body prefix any more — round number is now + // out-of-band on the inner event's `m-aud` tag), and the + // session stays in `awaiting_response` with the marker advanced.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/phase3_followup_round.rs` around lines 295 - 297, The block comment above the assertions in tests/phase3_followup_round.rs is stale and claims "the new two carry the `Round 1. ` prefix" which contradicts the subsequent assertions that explicitly forbid any `Round `-prefixed body (the assertions around the outbound rows check). Update that comment to state that the new two do NOT carry the `Round 1. ` prefix (or remove the prefix claim entirely), and keep the rest of the guidance about expecting 4 outbound rows and the session remaining in `awaiting_response` with the marker advanced so the comment matches the actual assertions.src/mediation/follow_up.rs (1)
380-383:⚠️ Potential issue | 🟡 MinorStale comment — session is no longer "closed" at this point.
After
deliver_summaryreturnsOk, the session is now atsummary_delivered, notclosed(thesummary_delivered → closedstep is deferred todispute_resolved). The marker-advance reasoning is unchanged, but the parenthetical is misleading.📝 Suggested fix
- // Mark the round evaluated. Even though the session is - // now terminal (closed), keeping the marker current is - // a cheap invariant — a future tick never mistakes an - // evaluated round for an unevaluated one. + // Mark the round evaluated. Even though the session is + // now sitting at `summary_delivered` (and the state + // gate at the top of this function will skip it on the + // next tick), keeping the marker current is a cheap + // invariant — a future tick never mistakes an evaluated + // round for an unevaluated one.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/mediation/follow_up.rs` around lines 380 - 383, Update the stale inline comment in follow_up.rs that currently states the session is "closed" after deliver_summary returns; change it to reflect that after deliver_summary returns Ok the session is in the summary_delivered state (not closed) and that the parenthetical about deferring summary_delivered → closed to dispute_resolved remains accurate; locate the comment near the marker-advance logic in the function handling deliver_summary (the block that marks the round evaluated) and revise the parenthetical wording to mention summary_delivered instead of closed while keeping the existing rationale about preventing future ticks from mistaking evaluated rounds for unevaluated ones.src/chat/outbound.rs (1)
79-148:⚠️ Potential issue | 🟠 MajorAdd tests covering the audience-tagging path.
The new behavior whose correctness is load-bearing for the
(session_id, inner_event_id)uniqueness invariant — namely that two builds with identicalmessagebut differentaudience(e.g.Some("buyer")vsSome("seller")) produce differentinner_event_ids, andaudience = Nonematches the legacybuild_wrapoutput — is not exercised by the existing tests. A regression in the conditionalm-audtag application (e.g. accidentally building the inner event before applying the tag) would silently cause inner-event-id collisions and trigger the dedup-invariant guard at runtime inmediation::draft_and_send_initial_message/draft_and_send_followup_message, rather than failing in CI.🧪 Suggested unit test additions
#[tokio::test] async fn audience_tag_changes_inner_event_id_for_identical_message() { let sender = Keys::generate(); let shared = Keys::generate(); let msg = "same content for both parties"; let buyer = build_wrap_with_audience(&sender, &shared.public_key(), msg, Some("buyer")) .await .unwrap(); let seller = build_wrap_with_audience(&sender, &shared.public_key(), msg, Some("seller")) .await .unwrap(); assert_ne!( buyer.inner_event_id, seller.inner_event_id, "m-aud tag must change inner_event_id so the (session_id, inner_event_id) \ uniqueness invariant is preserved across parties" ); } #[tokio::test] async fn inner_event_carries_m_aud_tag_when_audience_set() { let sender = Keys::generate(); let buyer = Keys::generate(); let shared = derive_shared_keys(&sender, &buyer.public_key()).unwrap(); let built = build_wrap_with_audience(&sender, &shared.public_key(), "hi", Some("buyer")) .await .unwrap(); let inner = unwrap_with_shared_key(&shared, &built.outer).unwrap(); // Verify the m-aud tag rides on the inner event so ingest-side // tooling can rely on it. assert_eq!(inner.content, "hi"); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/chat/outbound.rs` around lines 79 - 148, Add unit tests for build_wrap_with_audience: (1) generate sender and shared keys and assert that calling build_wrap_with_audience with the same message but different audience strings (e.g., Some("buyer") vs Some("seller")) produces different inner_event_id values so the (session_id, inner_event_id) uniqueness invariant holds; (2) derive shared keys (e.g., derive_shared_keys or equivalent), call build_wrap_with_audience with Some("buyer"), unwrap the outer with the shared key (e.g., unwrap_with_shared_key) and assert the inner event content equals the original message and contains the m-aud tag when audience is set; and (3) assert that calling build_wrap_with_audience with audience = None matches the legacy build_wrap output (inner_event_id and inner content) to ensure backward compatibility. Ensure tests use Keys::generate and reference build_wrap_with_audience, inner_event_id, and the m-aud tag logic.
🧹 Nitpick comments (6)
src/reasoning/openai.rs (2)
349-374: Body read failure classified asMalformedResponse.A failure inside
resp.bytes().awaitis a transport-layer issue (truncated stream, broken connection, decompression error) rather than a malformed response shape. Mapping it toReasoningError::MalformedResponsecollapses two distinct failure modes into one variant — operators reading the final error after retries are exhausted will see "MalformedResponse" for what is really a network/connectivity problem.The retry behavior is already correct (treated as transient); only the terminal classification is misleading. Consider
Unreachable(or a dedicated transport variant) so failure modes stay distinguishable in logs and metrics.- last_err = Some(ReasoningError::MalformedResponse(format!( - "body bytes read failed (status={status}): {e}" - ))); + last_err = Some(ReasoningError::Unreachable(format!( + "body bytes read failed (status={status}): {e}" + )));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/reasoning/openai.rs` around lines 349 - 374, The code currently treats failures from resp.bytes().await as ReasoningError::MalformedResponse, which misclassifies transport-layer issues; update the error handling around resp.bytes().await to set last_err to a transport-oriented variant (e.g., ReasoningError::Unreachable or a new ReasoningError::TransportError) instead of ReasoningError::MalformedResponse, keep the existing log_response_failure call and retry continue logic unchanged, and ensure the error message includes the original error string and error_chain for diagnostics (refer to resp.bytes().await, log_response_failure, last_err, and ReasoningError::* to locate and change the assignment).
1041-1083: Consider a regression test for envelope JSON failure.
tests/reasoning_anthropic.rshasmalformed_json_classification_surfaces_as_error(context snippet 1) which asserts non-JSON prose surfaces asReasoningError::MalformedResponse. The newpost_chatenvelope-parse error path (lines 403–418) is now the openai equivalent and is currently uncovered — a small mock-server test mirroring the anthropic one would pin the new diagnostic behavior so future tweaks tolog_response_failure/text_strcan't silently regress the public error contract.Want me to draft the openai-side test mirroring the anthropic pattern?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/reasoning/openai.rs` around lines 1041 - 1083, Add a regression test that mirrors tests/reasoning_anthropic.rs::malformed_json_classification_surfaces_as_error but targets the OpenAI path that uses post_chat so the envelope JSON parse failure returns ReasoningError::MalformedResponse; specifically, create a mock-server test (e.g., in tests/reasoning_openai.rs) that makes OpenAiProvider::new(...) and calls the code path that triggers post_chat, then assert that log_response_failure/text_str handling yields a MalformedResponse error rather than a generic parse or success; ensure the test exercises the envelope-parse branch that currently uses post_chat and fails parsing so future changes to log_response_failure or text_str cannot regress the public error contract.tests/phase3_cooperative_summary.rs (1)
130-131: Test function name is now misleading.
cooperative_summary_closes_session_and_notifies_assigned_solverno longer reflects the asserted behavior — the session intentionally stops atsummary_deliveredrather thanclosed. Rename to something likecooperative_summary_delivers_to_assigned_solverto keep the suite self-documenting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/phase3_cooperative_summary.rs` around lines 130 - 131, Rename the test function cooperative_summary_closes_session_and_notifies_assigned_solver to reflect its actual behavior (e.g., cooperative_summary_delivers_to_assigned_solver) by updating the async test fn name in tests/phase3_cooperative_summary.rs (the #[tokio::test] async fn declaration) and update any references to that function name in the test suite (imports, mod references, or CI/test filters) so the test still runs under its new descriptive name.tests/phase3_followup_summary.rs (1)
209-209: Test function name is now misleading.
summarize_branch_delivers_summary_once_and_closes_sessionasserts a session that stops atsummary_delivered, notclosed. Consider renaming tosummarize_branch_delivers_summary_once_and_halts_at_summary_delivered(or similar) to keep the suite self-documenting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/phase3_followup_summary.rs` at line 209, The test function name summarize_branch_delivers_summary_once_and_closes_session is misleading; rename the async test function to summarize_branch_delivers_summary_once_and_halts_at_summary_delivered and update any references (attributes like #[tokio::test], mod uses, or callers) in the file to match the new identifier so the test suite stays self-documenting and continues to run.src/mediation/mod.rs (1)
418-423: Defensivelet _ = round_number;is redundant.
round_numberis already referenced by#[instrument(..., fields(round = round_number))]on Line 395, so the compiler will not warn about an unused binding. Thelet _ =line is harmless but adds noise. Optional cleanup — keep the explanatory comment if you want, but the binding itself is not load-bearing.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/mediation/mod.rs` around lines 418 - 423, Remove the redundant defensive binding "let _ = round_number;" (and optionally keep or move the explanatory comment) because the #[instrument(..., fields(round = round_number))] on the surrounding function already references round_number; edit the block containing round_number, buyer_content, and seller_content to delete that pointless binding so the code is cleaner while leaving the trim-and-to_string calls for buyer_text and seller_text intact.prompts/phase3-system.md (1)
48-65: Consider clarifying ambiguous-language fallback.The policy reads cleanly for the primary case, but two edge cases are worth pinning so the model has explicit guidance instead of having to choose:
- Very short / language-ambiguous first replies (
"ok","sí","thanks") where confident detection is impossible — does the model keep the existing default (English) or switch on weak signal?- Mixed-language replies (e.g. Spanglish) — pick the dominant language, mirror the writer's mix, or fall back to English?
Tightening these in the prompt avoids language-flapping across rounds, which would itself be a user-visible regression. Optional; safe to defer if you want to see real traffic first.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@prompts/phase3-system.md` around lines 48 - 65, The Language Matching rules lack explicit fallback for ambiguous or mixed-language first replies; update the Language Matching section to add two concrete tie-breakers: (1) for very short or low-confidence detections (e.g., "ok", "sí", "thanks") keep the existing default English until a subsequent high-confidence message switches the buyer_clarification or seller_clarification, and (2) for mixed-language messages choose the dominant language by token proportion (if one language >60% use it) otherwise keep English; ensure these rules are applied independently to buyer_clarification and seller_clarification and note that solver-facing outputs remain English.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/reasoning/openai.rs`:
- Around line 558-595: The extract_json_object function can let depth go
negative if stray '}' appears before the first '{', so update the '}' handling
in extract_json_object to clamp depth (e.g., only decrement if depth > 0 or set
depth = (depth - 1).max(0)) so a leading unmatched '}' doesn't prevent setting
start on the first real '{'; keep the rest of the logic (setting start when b'{'
and depth == 0, returning &raw[s..=i]) unchanged.
---
Outside diff comments:
In `@src/chat/outbound.rs`:
- Around line 79-148: Add unit tests for build_wrap_with_audience: (1) generate
sender and shared keys and assert that calling build_wrap_with_audience with the
same message but different audience strings (e.g., Some("buyer") vs
Some("seller")) produces different inner_event_id values so the (session_id,
inner_event_id) uniqueness invariant holds; (2) derive shared keys (e.g.,
derive_shared_keys or equivalent), call build_wrap_with_audience with
Some("buyer"), unwrap the outer with the shared key (e.g.,
unwrap_with_shared_key) and assert the inner event content equals the original
message and contains the m-aud tag when audience is set; and (3) assert that
calling build_wrap_with_audience with audience = None matches the legacy
build_wrap output (inner_event_id and inner content) to ensure backward
compatibility. Ensure tests use Keys::generate and reference
build_wrap_with_audience, inner_event_id, and the m-aud tag logic.
In `@src/mediation/follow_up.rs`:
- Around line 380-383: Update the stale inline comment in follow_up.rs that
currently states the session is "closed" after deliver_summary returns; change
it to reflect that after deliver_summary returns Ok the session is in the
summary_delivered state (not closed) and that the parenthetical about deferring
summary_delivered → closed to dispute_resolved remains accurate; locate the
comment near the marker-advance logic in the function handling deliver_summary
(the block that marks the round evaluated) and revise the parenthetical wording
to mention summary_delivered instead of closed while keeping the existing
rationale about preventing future ticks from mistaking evaluated rounds for
unevaluated ones.
In `@src/mediation/mod.rs`:
- Around line 1053-1058: Update the top-level doc for deliver_summary to match
the implementation: change the state machine description from "classified →
summary_pending → summary_delivered → closed" to "classified → summary_pending →
summary_delivered" (or otherwise indicate it intentionally stops at
summary_delivered), and ensure the doc mentions that transitions are performed
via db::mediation::set_session_state; keep the existing inline comment
explanation intact so docs and code agree about why the session does not
progress to closed.
- Around line 186-204: The doc for draft_and_send_initial_message is stale: it
still claims the code "Builds per-party gift-wraps with role prefixes" but the
implementation uses the m-aud audience tag instead; update the rustdoc on
draft_and_send_initial_message to reflect the current behavior (mention that
inner event id uniqueness is ensured via the m-aud audience tag, not
role-prefixed wraps), keep the rest of the bullets about transactional outbox,
bounded retry, emitting outbound_sent after successful publish
(mediation_events), and the session state transition (reference
session::open_session) intact, and remove or reword any references to role
prefixes so docs match the code.
- Around line 352-394: The doc comment for draft_and_send_followup_message is
outdated: it still claims the body is prefixed with "Round {N}. {role}:" and
that round-number/role prefixes are cosmetic, but the implementation no longer
adds those prefixes (round_number is only used for spans and m-aud carries
audience); update the rustdoc for draft_and_send_followup_message to remove the
bullet about body prefixes, clarify that round_number is used only for
tracing/span purposes and that audience is conveyed via the m-aud claim, and
keep the note about policy_hash unchanged (or explicitly state prefixes are not
used so policy_hash is unaffected); reference the function name
draft_and_send_followup_message and the symbols round_number, m-aud, and
policy_hash when editing the commentary.
In `@src/reasoning/openai.rs`:
- Around line 379-433: log_response_failure currently logs a raw/truncated
body_preview at info!, which can leak PII; change its behavior so that
info-level logs only include safe metadata (status, headers, body length) and
push any raw body content to debug! (or replace the preview with a hash +
length), and update all call sites (the non-success HTTP branch that constructs
ReasoningError::Unreachable, the serde_json::from_str parse-error branch that
returns ReasoningError::MalformedResponse, and the empty-choices branch) to stop
passing the raw body for info-level logging; you can implement this by modifying
log_response_failure to accept an enum/flag (e.g., BodyLog::DebugOrHash) or to
always compute a short SHA-256 + length for info and emit the full truncated
body only at debug, and reuse the same change for the other occurrences of
log_response_failure elsewhere in the file.
In `@tests/phase3_followup_round.rs`:
- Around line 154-160: Update the seed-rationale comment near the
`classification_produced` test seed so it no longer mentions a now-removed
user-facing "Round-N" label; instead state concisely that the seed is required
so `count_classification_events` (and therefore `advance_session_round`) returns
the correct `followup_number` used by the policy bypass-window logic. Keep the
reference to `classification_produced`, `count_classification_events`, and
`advance_session_round` so maintainers can find the relevant code paths.
- Around line 295-297: The block comment above the assertions in
tests/phase3_followup_round.rs is stale and claims "the new two carry the `Round
1. ` prefix" which contradicts the subsequent assertions that explicitly forbid
any `Round `-prefixed body (the assertions around the outbound rows check).
Update that comment to state that the new two do NOT carry the `Round 1. `
prefix (or remove the prefix claim entirely), and keep the rest of the guidance
about expecting 4 outbound rows and the session remaining in `awaiting_response`
with the marker advanced so the comment matches the actual assertions.
In `@tests/phase3_followup_summary.rs`:
- Around line 393-410: Update the stale wording that mentions "closed session"
to reflect the current session state `summary_delivered`: change the comment
above the idempotency test and the `.expect("second call on a closed session
must be a no-op")` message to say something like "calling advance_session_round
again on a session in `summary_delivered` must skip (state gate blocks)" and
".expect(\"second call on a session in summary_delivered must be a no-op\")" so
the comment and the `advance_session_round` expectation match the actual session
state and avoid confusion.
- Around line 1-19: Update the stale module doc at the top of
tests/phase3_followup_summary.rs so the lifecycle string matches the test
assertions (the session stops at summary_delivered); replace the current
"classified → summary_pending → summary_delivered → closed" text with the
corrected lifecycle (e.g., "classified → summary_pending → summary_delivered")
so the doc and the assertions about advance_session_round/deliver_summary align.
---
Nitpick comments:
In `@prompts/phase3-system.md`:
- Around line 48-65: The Language Matching rules lack explicit fallback for
ambiguous or mixed-language first replies; update the Language Matching section
to add two concrete tie-breakers: (1) for very short or low-confidence
detections (e.g., "ok", "sí", "thanks") keep the existing default English until
a subsequent high-confidence message switches the buyer_clarification or
seller_clarification, and (2) for mixed-language messages choose the dominant
language by token proportion (if one language >60% use it) otherwise keep
English; ensure these rules are applied independently to buyer_clarification and
seller_clarification and note that solver-facing outputs remain English.
In `@src/mediation/mod.rs`:
- Around line 418-423: Remove the redundant defensive binding "let _ =
round_number;" (and optionally keep or move the explanatory comment) because the
#[instrument(..., fields(round = round_number))] on the surrounding function
already references round_number; edit the block containing round_number,
buyer_content, and seller_content to delete that pointless binding so the code
is cleaner while leaving the trim-and-to_string calls for buyer_text and
seller_text intact.
In `@src/reasoning/openai.rs`:
- Around line 349-374: The code currently treats failures from
resp.bytes().await as ReasoningError::MalformedResponse, which misclassifies
transport-layer issues; update the error handling around resp.bytes().await to
set last_err to a transport-oriented variant (e.g., ReasoningError::Unreachable
or a new ReasoningError::TransportError) instead of
ReasoningError::MalformedResponse, keep the existing log_response_failure call
and retry continue logic unchanged, and ensure the error message includes the
original error string and error_chain for diagnostics (refer to
resp.bytes().await, log_response_failure, last_err, and ReasoningError::* to
locate and change the assignment).
- Around line 1041-1083: Add a regression test that mirrors
tests/reasoning_anthropic.rs::malformed_json_classification_surfaces_as_error
but targets the OpenAI path that uses post_chat so the envelope JSON parse
failure returns ReasoningError::MalformedResponse; specifically, create a
mock-server test (e.g., in tests/reasoning_openai.rs) that makes
OpenAiProvider::new(...) and calls the code path that triggers post_chat, then
assert that log_response_failure/text_str handling yields a MalformedResponse
error rather than a generic parse or success; ensure the test exercises the
envelope-parse branch that currently uses post_chat and fails parsing so future
changes to log_response_failure or text_str cannot regress the public error
contract.
In `@tests/phase3_cooperative_summary.rs`:
- Around line 130-131: Rename the test function
cooperative_summary_closes_session_and_notifies_assigned_solver to reflect its
actual behavior (e.g., cooperative_summary_delivers_to_assigned_solver) by
updating the async test fn name in tests/phase3_cooperative_summary.rs (the
#[tokio::test] async fn declaration) and update any references to that function
name in the test suite (imports, mod references, or CI/test filters) so the test
still runs under its new descriptive name.
In `@tests/phase3_followup_summary.rs`:
- Line 209: The test function name
summarize_branch_delivers_summary_once_and_closes_session is misleading; rename
the async test function to
summarize_branch_delivers_summary_once_and_halts_at_summary_delivered and update
any references (attributes like #[tokio::test], mod uses, or callers) in the
file to match the new identifier so the test suite stays self-documenting and
continues to run.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 29fd69cf-11bc-40ce-a419-c829edefe4c5
📒 Files selected for processing (9)
prompts/phase3-system.mdsrc/chat/outbound.rssrc/mediation/follow_up.rssrc/mediation/mod.rssrc/reasoning/openai.rstests/phase3_cooperative_summary.rstests/phase3_followup_round.rstests/phase3_followup_summary.rstests/phase3_session_open.rs
| fn is_minimal_param_model(model: &str) -> bool { | ||
| // Curated list of models that publish `supported_parameters: []` | ||
| // (or a near-empty set) on the PPQ.ai catalog. Hardcoded | ||
| // because querying `/v1/models` per request would add latency | ||
| // and a failure mode for a near-static fact. Extend as new | ||
| // router/auto SKUs appear. | ||
| matches!(model, "autoclaw" | "auto" | "switchpoint/router") | ||
| || model.starts_with("gpt-5") | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does OpenAI's gpt-5 family reject the response_format=json_object parameter on the chat completions endpoint?
💡 Result:
No. OpenAI’s GPT-5 models (including the GPT-5 chat snapshot) accept the chat completions endpoint parameter response_format={"type":"json_object"} (i.e., “JSON mode”). What the docs say: 1) The Chat Completions API reference defines response_format with a json_object option, describing it as “older JSON mode” and as ensuring the message is valid JSON. 2) OpenAI’s structured-outputs/json-mode guide explains that JSON mode uses response_format of type json_object. So if you are seeing “rejections,” they’re likely model/feature mismatches or different response_format types (e.g., json_schema vs json_object) rather than GPT-5 family rejecting json_object in general. For example, one community report shows an error for json_schema being “not supported” on gpt-5-chat-latest, which is consistent with the distinction between json_schema (Structured Outputs) support vs json_object (JSON mode). Practical guidance: Prefer response_format type json_schema (Structured Outputs) when supported; otherwise use json_object (JSON mode) with the required prompt instruction to “produce JSON”.
Citations:
- 1: https://developers.openai.com/api/reference/resources/chat/subresources/completions
- 2: https://platform.openai.com/docs/guides/json-mode
- 3: https://developers.openai.com/docs/guides/structured-outputs?api-mode=chat
- 4: https://community.openai.com/t/gpt-5-chat-latest-text-format-of-type-json-schema-is-not-supported/1340112
🏁 Script executed:
# First, examine the function and surrounding lines (104-142) to understand full context
sed -n '104,142p' src/reasoning/openai.rsRepository: MostroP2P/serbero
Length of output: 1831
🏁 Script executed:
# Find all usages of is_minimal_param_model
rg 'is_minimal_param_model' src/reasoning/openai.rs -B 2 -A 2Repository: MostroP2P/serbero
Length of output: 706
🏁 Script executed:
# Look for related logic around JSON response format
rg 'response_format|json_object' src/reasoning/openai.rs -B 2 -A 2Repository: MostroP2P/serbero
Length of output: 5217
🏁 Script executed:
web_search
Does OpenAI's GPT-5 models reject or have constraints on the temperature parameter in chat completions?Repository: MostroP2P/serbero
Length of output: 181
🌐 Web query:
Does OpenAI's GPT-5 models reject or have constraints on the temperature parameter in chat completions?
💡 Result:
Yes. For OpenAI’s GPT-5 models used via chat completions, the API imposes constraints on the temperature parameter: temperature is effectively not supported except for the default value (1). If you send an explicit temperature other than 1 (e.g., 0.3 or 0), the API can reject the request with an “Unsupported value”/“Only the default (1) value is supported” error. How to interpret “constraints” vs “reject”: - In practice, providing temperature=0.7 (or any value other than 1) to GPT-5 models results in a 400 invalid_request_error. - Supplying temperature=1 (or omitting temperature so the default is used) is accepted. OpenAI’s current official docs for chat completions still describe temperature as a parameter in general, but they don’t describe GPT-5’s model-specific restriction on temperature in the snippets found here. The GPT-5 restriction is evidenced by API errors reported when calling GPT-5 models with unsupported temperature values. If you need controllable randomness with GPT-5 models, you’ll likely need to use a model family that supports temperature or rely on alternative knobs (for GPT-5 reasoning-capable usage, OpenAI emphasizes parameters like reasoning.effort rather than temperature).
Citations:
- 1: GPT-5 fail gracefully if temperature parameter is provided pydantic/pydantic-ai#2482
- 2: [BUG] Can't use GPT-5 model family - Unsupported parameter:
temperatureis not supported with this model. getzep/graphiti#874 - 3: [Bug]: OpenAI GPT-5 Chat model does not support "temperature" parameter BerriAI/litellm#13781
- 4: [GPT5] 400 Unsupported value: 'temperature' does not support 0 with this model QwenLM/qwen-code#224
- 5: https://www.reddit.com/r/OpenAI/comments/1mw7nl7/cant_we_set_temperature_parameter_anymore_with/
Split temperature and response_format restrictions — gpt-5 accepts JSON mode.
The web search confirms that OpenAI's gpt-5 family accepts response_format=json_object, contradicting the current code that omits it for all gpt-5* models. The gpt-5 temperature constraint is real (API rejects non-1.0 values), but that does not extend to response_format.
The refactoring is necessary: is_minimal_param_model currently controls both constraints in lockstep. Separate them so only gpt-5* temperature is omitted while response_format is sent normally. This restores the server-enforced JSON schema guarantee for gpt-5 models instead of relying solely on prompt instructions and extract_json_object.
♻️ Example split
+ /// Models that reject explicit `temperature` (must use server default of 1.0).
+ fn rejects_temperature(model: &str) -> bool {
+ model.starts_with("gpt-5")
+ }
+
- fn is_minimal_param_model(model: &str) -> bool {
- // Curated list of models that publish `supported_parameters: []`
- // (or a near-empty set) on the PPQ.ai catalog. Hardcoded
- // because querying `/v1/models` per request would add latency
- // and a failure mode for a near-static fact. Extend as new
- // router/auto SKUs appear.
- matches!(model, "autoclaw" | "auto" | "switchpoint/router")
- || model.starts_with("gpt-5")
+ /// Models that reject `response_format = json_object` (PPQ.ai
+ /// router/auto SKUs publishing `supported_parameters: []`).
+ fn is_minimal_param_model(model: &str) -> bool {
+ matches!(model, "autoclaw" | "auto" | "switchpoint/router")
}Update request_temperature() to use rejects_temperature().
CodeRabbit-style review pass on top of the duplicate-session +
chat-cleanup changes.
Correctness fixes:
- src/reasoning/openai.rs (extract_json_object): clamp depth at zero
on `}` so a stray leading `}` cannot push depth negative and shift
the depth==0 check off by one (would skip the next real `{` as a
candidate `start`).
- src/reasoning/openai.rs (resp.bytes() failure path): re-classify
body-read failures as `ReasoningError::Unreachable` (transport)
instead of `MalformedResponse` (parse). The retry-and-continue
semantics are unchanged; only the public error contract is
realigned with the actual failure mode.
Logging hygiene:
- src/reasoning/openai.rs (log_response_failure): info! now carries
only PII-safe metadata (status, headers, body length, body
sha256[..16]); the truncated body preview is gated to debug! so
rationale text returned in error responses cannot leak into
general logs (FR-120 / TC-103).
New tests:
- src/chat/outbound.rs::tests: three new unit tests for
`build_wrap_with_audience`:
1. distinct audience strings on identical body produce distinct
`inner_event_id` values (the uniqueness invariant on
`mediation_messages`),
2. roundtrip preserves the user-visible body and attaches an
`m-aud` tag with the expected audience value,
3. `audience = None` does NOT attach the tag (legacy compat).
Doc / comment freshness:
- src/mediation/mod.rs: deliver_summary doc clarifies the state
machine stops at `summary_delivered` (not `closed`) and explains
the deferred `summary_delivered → closed` step.
- src/mediation/mod.rs: draft_and_send_initial_message doc no
longer claims role prefixes; mentions `m-aud` audience tag.
- src/mediation/mod.rs: draft_and_send_followup_message doc no
longer claims `Round N. role:` body prefixes; round_number is
documented as tracing-span only.
- src/mediation/mod.rs: drop the redundant `let _ = round_number;`
binding (the `#[instrument(..., fields(round = round_number))]`
attribute already references the arg).
- src/mediation/follow_up.rs: marker-advance comment refers to
`summary_delivered` (not "closed (terminal)").
- tests/phase3_followup_round.rs: stale "Round 1." prefix /
"Round-N label" comments updated to match the new no-prefix
reality.
- tests/phase3_followup_summary.rs: module doc + idempotency
comment / expect message refer to `summary_delivered`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address review feedback that the deliver_summary fix left sessions parked in `summary_delivered` forever — the only later close path (`dispute_resolved`) routes through `latest_open_session_for`, which excludes `summary_delivered` by design, so the state machine ended up asymmetric. The eligibility predicate already short-circuits via `is_eligible_lifecycle(Resolved) == false` once Mostro resolves the dispute, so the practical "preventing future mediation opens" phrasing in the review was overstated — but leaving sessions in a non-`closed` terminal state forever is a real code-quality issue worth fixing. Fix: in `handlers::dispute_resolved`, after the existing open-session supersede-then-close branch, look up any session for the dispute that's in `summary_delivered` and walk it directly through the legal `summary_delivered → closed` transition (no SupersededByHuman intermediate — that variant is reserved for sessions still in an open state when a human pre-empted mediation, and `SummaryDelivered → SupersededByHuman` is not a legal edge in `MediationSessionState::can_transition_to`). Both the state flip and the `session_closed` audit row land in the same transaction as the lifecycle move to `Resolved`, so the close is atomic with the dispute resolution. Audit payload carries `"from_state": "summary_delivered"` so future debuggers can distinguish this close path from the open-session supersede-then-close path that uses the same `SessionClosed` event kind. New regression test `summary_delivered_session_is_closed_on_resolution` (`tests/phase3_external_resolution_report.rs`) seeds a session in `summary_delivered`, fires `dispute_resolved`, and asserts: - session lands in `closed`, - exactly one `session_closed` event with the new `from_state` discriminator, - no `superseded_by_human` event was emitted (direct close skips that step), - the FR-124 final-report DM still fires for this dispute, same as every other shape. The earlier review finding about raw response previews leaking at info-level (`log_response_failure`) was already addressed in 4f39855 — info! emits only `body_len` + `body_sha256_prefix`, with the truncated preview gated to debug! — so no further change on that one. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Two fixes for issues observed in the 2026-04-27 production transcript on dispute
096fb2e4-7eca-4f59-96e8-ae49f69d1328:deliver_summarywas synchronously walking the sessionsummary_pending → summary_delivered → closed. Themediation::eligibilitypredicate treats aclosedsession as "free to re-open" (re-dispute semantics), so the next engine tick saw the dispute as eligible again, opened a second session, sent another English greeting, and re-summarised to the solver. Fix: leave the session insummary_delivered. The eligibility EXISTS clause already blocks while the session is in any non-(closed, superseded_by_human)state, and the live-session / open-session helpers already excludedsummary_delivered. The legalsummary_delivered → closedtransition is taken later by thedispute_resolvedhandler when Mostro closes the dispute.Buyer:/Seller:/Round N.prefixes leaked transcript scaffolding into chat. The drafter used those prefixes only to keep the inner event id unique between simultaneous buyer/seller sends. Fix: drop the user-visible prefixes and ride the audience out-of-band on a customm-audtag on the inner event (chat::outbound::build_wrap_with_audience). Same uniqueness invariant onmediation_messages, clean user-facing body.Test plan
cargo test— full suite green locally (113+ tests across 30+ binaries)cargo clippy --all-targets— cleanphase3_cooperative_summary.rsandphase3_followup_summary.rsexpect post-summary statesummary_delivered(notclosed)phase3_followup_round.rsno longer asserts the"Round N. "prefix and explicitly forbids anyRound-…prefix leaking into the user-visible bodyphase3_session_open.rsno longer asserts party label inside the body;mediation_messages.partystays the routing source of truth🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Improvements