Skip to content

fix(mediation): block re-mediation after summary; clean party-facing scaffolding#47

Merged
grunch merged 4 commits intomainfrom
fix/mediation-summary-and-chat-cleanup
Apr 27, 2026
Merged

fix(mediation): block re-mediation after summary; clean party-facing scaffolding#47
grunch merged 4 commits intomainfrom
fix/mediation-summary-and-chat-cleanup

Conversation

@grunch
Copy link
Copy Markdown
Member

@grunch grunch commented Apr 27, 2026

Summary

Two fixes for issues observed in the 2026-04-27 production transcript on dispute 096fb2e4-7eca-4f59-96e8-ae49f69d1328:

  • Duplicate session reopened after summary. deliver_summary was synchronously walking the session summary_pending → summary_delivered → closed. The mediation::eligibility predicate treats a closed session as "free to re-open" (re-dispute semantics), so the next engine tick saw the dispute as eligible again, opened a second session, sent another English greeting, and re-summarised to the solver. Fix: leave the session in summary_delivered. The eligibility EXISTS clause already blocks while the session is in any non-(closed, superseded_by_human) state, and the live-session / open-session helpers already excluded summary_delivered. The legal summary_delivered → closed transition is taken later by the dispute_resolved handler when Mostro closes the dispute.
  • Visible Buyer: / Seller: / Round N. prefixes leaked transcript scaffolding into chat. The drafter used those prefixes only to keep the inner event id unique between simultaneous buyer/seller sends. Fix: drop the user-visible prefixes and ride the audience out-of-band on a custom m-aud tag on the inner event (chat::outbound::build_wrap_with_audience). Same uniqueness invariant on mediation_messages, clean user-facing body.

Test plan

  • cargo test — full suite green locally (113+ tests across 30+ binaries)
  • cargo clippy --all-targets — clean
  • Updated assertions:
    • phase3_cooperative_summary.rs and phase3_followup_summary.rs expect post-summary state summary_delivered (not closed)
    • phase3_followup_round.rs no longer asserts the "Round N. " prefix and explicitly forbids any Round-… prefix leaking into the user-visible body
    • phase3_session_open.rs no longer asserts party label inside the body; mediation_messages.party stays the routing source of truth
  • Live verification on a Mostro dispute: confirm only one greeting per session and that the seller never sees a duplicate "Hello, I'm Serbero…"

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Language-matched clarification messages for each party
    • Enhanced message tagging for audience identification
  • Improvements

    • Removed visible role and round labels from mediation messages for cleaner formatting
    • Improved message parsing robustness and diagnostic logging
    • Refined session state management timing during dispute resolution

…scaffolding

Two issues observed in the 2026-04-27 transcript on dispute
096fb2e4-…:

1. Duplicate session reopened after summary delivery. `deliver_summary`
   transitioned the session `summary_pending → summary_delivered →
   closed` synchronously. The eligibility predicate in
   `mediation::eligibility` treats a session in state `closed` as
   "session ended, a fresh one may open" (re-dispute semantics), so
   the next engine tick saw the dispute as eligible again and opened
   a second session, sending a brand-new English greeting and
   re-summarising to the solver. Fix: keep the session in
   `summary_delivered`. The eligibility EXISTS clause already blocks
   re-mediation while the session is in any non-(closed,
   superseded_by_human) state, and `latest_open_session_for` /
   `list_live_sessions` already excluded `summary_delivered`. The
   legal `summary_delivered → closed` transition is taken later by
   the `dispute_resolved` handler when Mostro closes the dispute.

2. Visible "Buyer:" / "Seller:" / "Round N." prefixes leaked
   transcript scaffolding into the user-facing chat. The drafter
   used those prefixes to keep the inner event id unique between
   simultaneous buyer/seller sends. Fix: drop the visible prefixes
   and ride the audience out-of-band on a custom `m-aud` tag on the
   inner event (see `chat::outbound::build_wrap_with_audience`).
   The tag changes the inner event id without changing user-visible
   content, preserving the `(session_id, inner_event_id)` uniqueness
   invariant on `mediation_messages`.

Tests updated:

- `tests/phase3_cooperative_summary.rs` and
  `tests/phase3_followup_summary.rs` now expect the post-summary
  session state to be `summary_delivered` rather than `closed`.
- `tests/phase3_followup_round.rs` no longer asserts the
  `"Round N. "` prefix on user-visible body and explicitly forbids
  any `Round-…` prefix from leaking.
- `tests/phase3_session_open.rs` no longer asserts the party label
  (`buyer` / `seller`) inside the user-visible body; the `party`
  column on `mediation_messages` remains the routing source of
  truth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 27, 2026

Warning

Rate limit exceeded

@grunch has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 7 minutes and 4 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 55dd30e1-ba6e-43e5-80b8-cb028c4f2c55

📥 Commits

Reviewing files that changed from the base of the PR and between 30c0fbd and de7ae91.

📒 Files selected for processing (8)
  • src/chat/outbound.rs
  • src/handlers/dispute_resolved.rs
  • src/mediation/follow_up.rs
  • src/mediation/mod.rs
  • src/reasoning/openai.rs
  • tests/phase3_external_resolution_report.rs
  • tests/phase3_followup_round.rs
  • tests/phase3_followup_summary.rs

Walkthrough

The PR implements multi-party mediation messaging with language-aware policies and audience-tagged inner events. Changes include adding a language-matching system prompt policy, creating a new build_wrap_with_audience builder function that conditionally tags inner events, removing user-visible party/round labels from mediation messages, refactoring OpenAI response handling with JSON extraction and diagnostics, and adjusting the summary delivery state transition to halt at summary_delivered instead of progressing to closed. Test expectations are updated to reflect the new state semantics and message formatting.

Changes

Cohort / File(s) Summary
System Prompt
prompts/phase3-system.md
Adds language-matching policy requiring clarifications to be generated in the recipient's transcript language, with English as default until first non-English reply; constrains solver-facing outputs (summary, rationale, classification labels) to English.
Outbound Message Construction
src/chat/outbound.rs
Adds new public function build_wrap_with_audience that conditionally appends inner-event tag m-aud when audience is provided; refactors build_wrap to delegate to the new function with audience = None.
Mediation Logic
src/mediation/follow_up.rs, src/mediation/mod.rs
Removes transcript role labels and round prefixes from party-facing message bodies; adds fixed greeting and uses build_wrap_with_audience with explicit audience tags ("buyer"/"seller"); defers session state transition from summary_delivered to closed to the dispute_resolved handler; retains round_number only for tracing.
Response Handling
src/reasoning/openai.rs
Refactors OpenAI client to dynamically omit response_format based on model classifier; disables connection pooling and forces HTTP/1.1; adds Accept-Encoding: identity header; hardens JSON parsing with extract_json_object to handle markdown-wrapped responses; adds richer retryable diagnostics and error logging.
Test Expectations
tests/phase3_cooperative_summary.rs, tests/phase3_followup_round.rs, tests/phase3_followup_summary.rs, tests/phase3_session_open.rs
Updates assertions to expect summary_delivered state instead of closed after delivery; removes checks for user-visible "Round N" and party name prefixes in message bodies; confirms audience/round metadata is carried out-of-band via inner-event tagging.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • mostronatorcoder

Poem

🐰 Wrapped in audiences, tagged with care,
Messages hop through channels fair.
No more labels cluttering the way—
Just pure content, come what may.
Phase 3 hops on, state by state! 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(mediation): block re-mediation after summary; clean party-facing scaffolding' clearly and concisely summarizes the two primary changes: preventing re-mediation after summary delivery and removing visible scaffolding from party-facing messages.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/mediation-summary-and-chat-cleanup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 30c0fbde32

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/mediation/mod.rs
@@ -1294,7 +1340,6 @@ pub async fn deliver_summary(
now,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Close summary-delivered sessions on dispute resolution

This change now leaves sessions at summary_delivered, but the only close path later (dispute_resolved) looks up sessions via latest_open_session_for, which explicitly excludes summary_delivered (src/db/mediation.rs, state NOT IN ('closed','summary_delivered',...)). That means a summarized session will never reach closed, and the eligibility predicate (s.state NOT IN ('closed','superseded_by_human')) will keep treating it as an active blocker, preventing future mediation opens/re-dispute flows for that dispute_id.

Useful? React with 👍 / 👎.

Comment thread src/reasoning/openai.rs Outdated
Comment on lines +806 to +808
body_len = body_preview.map(|s| s.len()).unwrap_or(0),
body_preview = body_preview.map(|s| truncate(s, 800)).unwrap_or(""),
note = note,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove raw response previews from info-level diagnostics

The new diagnostic logger emits up to 800 chars of raw provider response body at info level on failures. In malformed/non-200 cases, OpenAI-compatible gateways can include generated content or echoed request fragments, which may contain party transcript text and rationale; this breaks the existing metadata-only logging discipline and leaks sensitive dispute data into default production logs (RUST_LOG=info).

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (10)
src/reasoning/openai.rs (1)

379-433: ⚠️ Potential issue | 🟠 Major

Body preview at info! may leak party statements / PII.

log_response_failure is invoked on non-success HTTP status, envelope JSON parse failure, and "empty choices", and in those branches it includes body_preview (up to 800 bytes) at info! level. The existing invariant comment at lines 434–440 (FR-120 / TC-103) explicitly states the full model output may contain party statements and free-text rationale that the spec forbids in general logs — which is exactly what may sit in the body when JSON envelope parsing fails (e.g., a gateway returns prose, a partially echoed request, or a model that ignored response_format and dumped the transcript-aware response). A truncated 800‑byte preview at info! will be picked up by the default RUST_LOG=info config and persisted by any log aggregator.

Suggested mitigations (any one is fine):

  • Drop body_preview to debug! so default deployments don't capture it; keep status/headers/length at info!.
  • Reduce the truncation budget (e.g., 200 bytes is enough to identify gateway HTML error pages).
  • Hash + length the body (mirroring the success path at lines 442–448) instead of including the raw bytes.
🛡️ Example: keep operator-actionable metadata at `info!`, push raw body to `debug!`
 fn log_response_failure(
     attempt: u32,
     status: reqwest::StatusCode,
     headers: &HeaderMap,
     body_preview: Option<&str>,
     note: &str,
 ) {
     let header = |name: &str| -> String {
         headers
             .get(name)
             .and_then(|v| v.to_str().ok())
             .unwrap_or("(absent)")
             .to_string()
     };
     info!(
         attempt,
         %status,
         content_type = %header("content-type"),
         content_encoding = %header("content-encoding"),
         transfer_encoding = %header("transfer-encoding"),
         content_length = %header("content-length"),
         body_len = body_preview.map(|s| s.len()).unwrap_or(0),
-        body_preview = body_preview.map(|s| truncate(s, 800)).unwrap_or(""),
         note = note,
         "openai-compatible response failure (diagnostic dump)"
     );
+    if let Some(b) = body_preview {
+        debug!(attempt, %status, body_preview = %truncate(b, 800), "openai-compatible response body preview");
+    }
 }

Also applies to: 785-811

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/reasoning/openai.rs` around lines 379 - 433, log_response_failure
currently logs a raw/truncated body_preview at info!, which can leak PII; change
its behavior so that info-level logs only include safe metadata (status,
headers, body length) and push any raw body content to debug! (or replace the
preview with a hash + length), and update all call sites (the non-success HTTP
branch that constructs ReasoningError::Unreachable, the serde_json::from_str
parse-error branch that returns ReasoningError::MalformedResponse, and the
empty-choices branch) to stop passing the raw body for info-level logging; you
can implement this by modifying log_response_failure to accept an enum/flag
(e.g., BodyLog::DebugOrHash) or to always compute a short SHA-256 + length for
info and emit the full truncated body only at debug, and reuse the same change
for the other occurrences of log_response_failure elsewhere in the file.
tests/phase3_followup_summary.rs (2)

393-410: ⚠️ Potential issue | 🟡 Minor

Stale "closed session" wording in the idempotency block.

The comment says "calling advance_session_round again on a closed session must skip (state gate blocks)" and the expect message echoes "second call on a closed session must be a no-op". The session is now in summary_delivered, not closed; update both so future readers don't chase a state that no longer exists at this point.

📝 Suggested fix
-    // (e) Idempotency for extra safety — calling advance_session_round
-    //     again on a closed session must skip (state gate blocks) and
-    //     NOT add more rows.
+    // (e) Idempotency for extra safety — calling advance_session_round
+    //     again on a session sitting at `summary_delivered` must skip
+    //     (state gate blocks anything other than `awaiting_response`)
+    //     and NOT add more rows.
@@
-    .expect("second call on a closed session must be a no-op");
+    .expect("second call on a summary_delivered session must be a no-op");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/phase3_followup_summary.rs` around lines 393 - 410, Update the stale
wording that mentions "closed session" to reflect the current session state
`summary_delivered`: change the comment above the idempotency test and the
`.expect("second call on a closed session must be a no-op")` message to say
something like "calling advance_session_round again on a session in
`summary_delivered` must skip (state gate blocks)" and ".expect(\"second call on
a session in summary_delivered must be a no-op\")" so the comment and the
`advance_session_round` expectation match the actual session state and avoid
confusion.

1-19: ⚠️ Potential issue | 🟡 Minor

Stale top-of-file doc — still mentions summary_delivered → closed walk.

The module doc lists the lifecycle as classified → summary_pending → summary_delivered → closed, but the test now asserts the session stops at summary_delivered. Update so the spec the tests pin and the doc comment agree.

📝 Suggested doc fix
-//! - invokes `deliver_summary` exactly once, which walks the
-//!   session through `classified → summary_pending →
-//!   summary_delivered → closed`;
+//! - invokes `deliver_summary` exactly once, which walks the
+//!   session through `classified → summary_pending →
+//!   summary_delivered`. The legal `summary_delivered → closed`
+//!   transition is intentionally deferred to the
+//!   `dispute_resolved` handler (Mostro closing the dispute).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/phase3_followup_summary.rs` around lines 1 - 19, Update the stale
module doc at the top of tests/phase3_followup_summary.rs so the lifecycle
string matches the test assertions (the session stops at summary_delivered);
replace the current "classified → summary_pending → summary_delivered → closed"
text with the corrected lifecycle (e.g., "classified → summary_pending →
summary_delivered") so the doc and the assertions about
advance_session_round/deliver_summary align.
src/mediation/mod.rs (3)

1053-1058: ⚠️ Potential issue | 🟡 Minor

Stale deliver_summary doc — state machine no longer ends at closed.

The function-level doc says State machine: classified → summary_pending → summary_delivered → closed, but the implementation now intentionally stops at summary_delivered (the inline comment block at Line 1297-1318 explains why). The two doc passages should agree — the top one is what shows up in rustdoc and is the first thing a maintainer reads.

📝 Suggested fix
-/// State machine: `classified → summary_pending → summary_delivered
-/// → closed`. Each transition is written with
-/// [`db::mediation::set_session_state`] (which `debug_assert!`s
-/// legality in debug builds).
+/// State machine: `classified → summary_pending →
+/// summary_delivered`. Each transition is written with
+/// [`db::mediation::set_session_state`] (which `debug_assert!`s
+/// legality in debug builds). The legal `summary_delivered →
+/// closed` step is intentionally NOT taken here — see the inline
+/// comment in step (6) for the eligibility-predicate rationale —
+/// and is performed later by the `dispute_resolved` handler when
+/// Mostro closes the dispute.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/mediation/mod.rs` around lines 1053 - 1058, Update the top-level doc for
deliver_summary to match the implementation: change the state machine
description from "classified → summary_pending → summary_delivered → closed" to
"classified → summary_pending → summary_delivered" (or otherwise indicate it
intentionally stops at summary_delivered), and ensure the doc mentions that
transitions are performed via db::mediation::set_session_state; keep the
existing inline comment explanation intact so docs and code agree about why the
session does not progress to closed.

186-204: ⚠️ Potential issue | 🟡 Minor

Stale draft_and_send_initial_message doc — still describes role prefixes.

The contract bullet says "Builds per-party gift-wraps with role prefixes so the inner event ids cannot collide on identical content", but the implementation now uses the m-aud audience tag for that. Future maintainers reading rustdoc will look for code that doesn't exist.

📝 Suggested fix
 /// Contract:
-/// - Builds per-party gift-wraps with role prefixes so the inner
-///   event ids cannot collide on identical content.
+/// - Builds per-party gift-wraps using
+///   `outbound::build_wrap_with_audience` with `Some("buyer")` /
+///   `Some("seller")` so the inner event ids cannot collide on
+///   identical content (the audience rides on a non-rendered
+///   `m-aud` tag rather than mixed into the user-visible body).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/mediation/mod.rs` around lines 186 - 204, The doc for
draft_and_send_initial_message is stale: it still claims the code "Builds
per-party gift-wraps with role prefixes" but the implementation uses the m-aud
audience tag instead; update the rustdoc on draft_and_send_initial_message to
reflect the current behavior (mention that inner event id uniqueness is ensured
via the m-aud audience tag, not role-prefixed wraps), keep the rest of the
bullets about transactional outbox, bounded retry, emitting outbound_sent after
successful publish (mediation_events), and the session state transition
(reference session::open_session) intact, and remove or reword any references to
role prefixes so docs match the code.

352-394: ⚠️ Potential issue | 🟡 Minor

Stale draft_and_send_followup_message doc — still describes the "Round N. role:" body prefix.

Bullet (1) says the body is prefixed with "Round {N}. {role}: " and the closing sentence says "The round-number and role prefixes are cosmetic and do not affect policy_hash". The implementation now drops both — round_number only feeds the trace span and m-aud carries the audience. Update so the rustdoc matches the code; otherwise this is the first place a future contributor will look when reasoning about the visible body.

📝 Suggested fix
-/// 1. **Round-number marker in the body.** Each party-facing content
-///    is prefixed with `"Round {N}. {role}: "` so parties can
-///    distinguish a follow-up question from the opening one.
-///    `round_number` is the 1-based round counter (round 1 is the
-///    first follow-up, *after* the opening round).
+/// 1. **Round-number is metadata-only.** `round_number` is the
+///    1-based mid-session counter (round 1 is the first follow-up,
+///    *after* the opening round). It is recorded on the tracing
+///    span and audit logs but is NOT prefixed into the user-visible
+///    body — that scaffolding was leaking into chat (observed
+///    2026-04-27). Inner-event-id distinctness for identical
+///    buyer/seller text is preserved by the per-party `m-aud` tag
+///    on the inner event (see `build_wrap_with_audience`).
@@
-/// The prompt bundle is NOT modified by this path: the `content`
-/// pushed through the gift-wrap is the per-party text (`buyer_text`
-/// / `seller_text`) as returned by `policy::evaluate` from the same
-/// pinned bundle the session was opened with. The round-number and
-/// role prefixes are cosmetic and do not affect `policy_hash`.
+/// The prompt bundle is NOT modified by this path: the `content`
+/// pushed through the gift-wrap is the per-party text (`buyer_text`
+/// / `seller_text`) as returned by `policy::evaluate` from the same
+/// pinned bundle the session was opened with.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/mediation/mod.rs` around lines 352 - 394, The doc comment for
draft_and_send_followup_message is outdated: it still claims the body is
prefixed with "Round {N}. {role}:" and that round-number/role prefixes are
cosmetic, but the implementation no longer adds those prefixes (round_number is
only used for spans and m-aud carries audience); update the rustdoc for
draft_and_send_followup_message to remove the bullet about body prefixes,
clarify that round_number is used only for tracing/span purposes and that
audience is conveyed via the m-aud claim, and keep the note about policy_hash
unchanged (or explicitly state prefixes are not used so policy_hash is
unaffected); reference the function name draft_and_send_followup_message and the
symbols round_number, m-aud, and policy_hash when editing the commentary.
tests/phase3_followup_round.rs (2)

154-160: ⚠️ Potential issue | 🟡 Minor

Seed-rationale comment partially stale.

The seed is still required so count_classification_events returns the correct followup_number for the policy bypass-window, but the rationale "or the Round-N label renders as Round 0. ..." no longer applies — the Round-N label is no longer mixed into user-visible content. Reword so future maintainers don't hunt for a label that does not exist.

📝 Suggested fix
-    // One `classification_produced` event to stand in for the
-    // round-0 initial classification. In production the initial
-    // flow writes one such row; `advance_session_round` derives
-    // `followup_number` from the count of these events (the first
-    // mid-session eval is N=1 = after 1 prior classification),
-    // so the seed must include it or the Round-N label renders
-    // as "Round 0. ...".
+    // One `classification_produced` event to stand in for the
+    // round-0 initial classification. In production the initial
+    // flow writes one such row; `advance_session_round` derives
+    // `followup_number` from the count of these events (the first
+    // mid-session eval is N=1 = after 1 prior classification).
+    // The seed must include it so the policy bypass-window logic
+    // sees the correct ordinal — the visible "Round N." prefix is
+    // gone, but the trace span / audit logs still want a sane
+    // counter.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/phase3_followup_round.rs` around lines 154 - 160, Update the
seed-rationale comment near the `classification_produced` test seed so it no
longer mentions a now-removed user-facing "Round-N" label; instead state
concisely that the seed is required so `count_classification_events` (and
therefore `advance_session_round`) returns the correct `followup_number` used by
the policy bypass-window logic. Keep the reference to `classification_produced`,
`count_classification_events`, and `advance_session_round` so maintainers can
find the relevant code paths.

295-297: ⚠️ Potential issue | 🟡 Minor

Stale comment directly contradicts the new assertions.

This block-comment claims "the new two carry the Round 1. prefix", but the assertion at Line 335-338 explicitly forbids any Round -prefixed body. Update so the inline guidance matches what the test actually verifies.

📝 Suggested fix
-    // Assert: 4 outbound rows (round 0 × 2 + round 1 × 2), the new
-    // two carry the `"Round 1. "` prefix, and the session stays in
-    // `awaiting_response` with the marker advanced.
+    // Assert: 4 outbound rows (round 0 × 2 + round 1 × 2), the new
+    // two carry the model-supplied follow-up text verbatim (no
+    // `"Round N. "` body prefix any more — round number is now
+    // out-of-band on the inner event's `m-aud` tag), and the
+    // session stays in `awaiting_response` with the marker advanced.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/phase3_followup_round.rs` around lines 295 - 297, The block comment
above the assertions in tests/phase3_followup_round.rs is stale and claims "the
new two carry the `Round 1. ` prefix" which contradicts the subsequent
assertions that explicitly forbid any `Round `-prefixed body (the assertions
around the outbound rows check). Update that comment to state that the new two
do NOT carry the `Round 1. ` prefix (or remove the prefix claim entirely), and
keep the rest of the guidance about expecting 4 outbound rows and the session
remaining in `awaiting_response` with the marker advanced so the comment matches
the actual assertions.
src/mediation/follow_up.rs (1)

380-383: ⚠️ Potential issue | 🟡 Minor

Stale comment — session is no longer "closed" at this point.

After deliver_summary returns Ok, the session is now at summary_delivered, not closed (the summary_delivered → closed step is deferred to dispute_resolved). The marker-advance reasoning is unchanged, but the parenthetical is misleading.

📝 Suggested fix
-            // Mark the round evaluated. Even though the session is
-            // now terminal (closed), keeping the marker current is
-            // a cheap invariant — a future tick never mistakes an
-            // evaluated round for an unevaluated one.
+            // Mark the round evaluated. Even though the session is
+            // now sitting at `summary_delivered` (and the state
+            // gate at the top of this function will skip it on the
+            // next tick), keeping the marker current is a cheap
+            // invariant — a future tick never mistakes an evaluated
+            // round for an unevaluated one.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/mediation/follow_up.rs` around lines 380 - 383, Update the stale inline
comment in follow_up.rs that currently states the session is "closed" after
deliver_summary returns; change it to reflect that after deliver_summary returns
Ok the session is in the summary_delivered state (not closed) and that the
parenthetical about deferring summary_delivered → closed to dispute_resolved
remains accurate; locate the comment near the marker-advance logic in the
function handling deliver_summary (the block that marks the round evaluated) and
revise the parenthetical wording to mention summary_delivered instead of closed
while keeping the existing rationale about preventing future ticks from
mistaking evaluated rounds for unevaluated ones.
src/chat/outbound.rs (1)

79-148: ⚠️ Potential issue | 🟠 Major

Add tests covering the audience-tagging path.

The new behavior whose correctness is load-bearing for the (session_id, inner_event_id) uniqueness invariant — namely that two builds with identical message but different audience (e.g. Some("buyer") vs Some("seller")) produce different inner_event_ids, and audience = None matches the legacy build_wrap output — is not exercised by the existing tests. A regression in the conditional m-aud tag application (e.g. accidentally building the inner event before applying the tag) would silently cause inner-event-id collisions and trigger the dedup-invariant guard at runtime in mediation::draft_and_send_initial_message / draft_and_send_followup_message, rather than failing in CI.

🧪 Suggested unit test additions
#[tokio::test]
async fn audience_tag_changes_inner_event_id_for_identical_message() {
    let sender = Keys::generate();
    let shared = Keys::generate();
    let msg = "same content for both parties";
    let buyer = build_wrap_with_audience(&sender, &shared.public_key(), msg, Some("buyer"))
        .await
        .unwrap();
    let seller = build_wrap_with_audience(&sender, &shared.public_key(), msg, Some("seller"))
        .await
        .unwrap();
    assert_ne!(
        buyer.inner_event_id, seller.inner_event_id,
        "m-aud tag must change inner_event_id so the (session_id, inner_event_id) \
         uniqueness invariant is preserved across parties"
    );
}

#[tokio::test]
async fn inner_event_carries_m_aud_tag_when_audience_set() {
    let sender = Keys::generate();
    let buyer = Keys::generate();
    let shared = derive_shared_keys(&sender, &buyer.public_key()).unwrap();
    let built = build_wrap_with_audience(&sender, &shared.public_key(), "hi", Some("buyer"))
        .await
        .unwrap();
    let inner = unwrap_with_shared_key(&shared, &built.outer).unwrap();
    // Verify the m-aud tag rides on the inner event so ingest-side
    // tooling can rely on it.
    assert_eq!(inner.content, "hi");
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/chat/outbound.rs` around lines 79 - 148, Add unit tests for
build_wrap_with_audience: (1) generate sender and shared keys and assert that
calling build_wrap_with_audience with the same message but different audience
strings (e.g., Some("buyer") vs Some("seller")) produces different
inner_event_id values so the (session_id, inner_event_id) uniqueness invariant
holds; (2) derive shared keys (e.g., derive_shared_keys or equivalent), call
build_wrap_with_audience with Some("buyer"), unwrap the outer with the shared
key (e.g., unwrap_with_shared_key) and assert the inner event content equals the
original message and contains the m-aud tag when audience is set; and (3) assert
that calling build_wrap_with_audience with audience = None matches the legacy
build_wrap output (inner_event_id and inner content) to ensure backward
compatibility. Ensure tests use Keys::generate and reference
build_wrap_with_audience, inner_event_id, and the m-aud tag logic.
🧹 Nitpick comments (6)
src/reasoning/openai.rs (2)

349-374: Body read failure classified as MalformedResponse.

A failure inside resp.bytes().await is a transport-layer issue (truncated stream, broken connection, decompression error) rather than a malformed response shape. Mapping it to ReasoningError::MalformedResponse collapses two distinct failure modes into one variant — operators reading the final error after retries are exhausted will see "MalformedResponse" for what is really a network/connectivity problem.

The retry behavior is already correct (treated as transient); only the terminal classification is misleading. Consider Unreachable (or a dedicated transport variant) so failure modes stay distinguishable in logs and metrics.

-                    last_err = Some(ReasoningError::MalformedResponse(format!(
-                        "body bytes read failed (status={status}): {e}"
-                    )));
+                    last_err = Some(ReasoningError::Unreachable(format!(
+                        "body bytes read failed (status={status}): {e}"
+                    )));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/reasoning/openai.rs` around lines 349 - 374, The code currently treats
failures from resp.bytes().await as ReasoningError::MalformedResponse, which
misclassifies transport-layer issues; update the error handling around
resp.bytes().await to set last_err to a transport-oriented variant (e.g.,
ReasoningError::Unreachable or a new ReasoningError::TransportError) instead of
ReasoningError::MalformedResponse, keep the existing log_response_failure call
and retry continue logic unchanged, and ensure the error message includes the
original error string and error_chain for diagnostics (refer to
resp.bytes().await, log_response_failure, last_err, and ReasoningError::* to
locate and change the assignment).

1041-1083: Consider a regression test for envelope JSON failure.

tests/reasoning_anthropic.rs has malformed_json_classification_surfaces_as_error (context snippet 1) which asserts non-JSON prose surfaces as ReasoningError::MalformedResponse. The new post_chat envelope-parse error path (lines 403–418) is now the openai equivalent and is currently uncovered — a small mock-server test mirroring the anthropic one would pin the new diagnostic behavior so future tweaks to log_response_failure / text_str can't silently regress the public error contract.

Want me to draft the openai-side test mirroring the anthropic pattern?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/reasoning/openai.rs` around lines 1041 - 1083, Add a regression test that
mirrors
tests/reasoning_anthropic.rs::malformed_json_classification_surfaces_as_error
but targets the OpenAI path that uses post_chat so the envelope JSON parse
failure returns ReasoningError::MalformedResponse; specifically, create a
mock-server test (e.g., in tests/reasoning_openai.rs) that makes
OpenAiProvider::new(...) and calls the code path that triggers post_chat, then
assert that log_response_failure/text_str handling yields a MalformedResponse
error rather than a generic parse or success; ensure the test exercises the
envelope-parse branch that currently uses post_chat and fails parsing so future
changes to log_response_failure or text_str cannot regress the public error
contract.
tests/phase3_cooperative_summary.rs (1)

130-131: Test function name is now misleading.

cooperative_summary_closes_session_and_notifies_assigned_solver no longer reflects the asserted behavior — the session intentionally stops at summary_delivered rather than closed. Rename to something like cooperative_summary_delivers_to_assigned_solver to keep the suite self-documenting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/phase3_cooperative_summary.rs` around lines 130 - 131, Rename the test
function cooperative_summary_closes_session_and_notifies_assigned_solver to
reflect its actual behavior (e.g.,
cooperative_summary_delivers_to_assigned_solver) by updating the async test fn
name in tests/phase3_cooperative_summary.rs (the #[tokio::test] async fn
declaration) and update any references to that function name in the test suite
(imports, mod references, or CI/test filters) so the test still runs under its
new descriptive name.
tests/phase3_followup_summary.rs (1)

209-209: Test function name is now misleading.

summarize_branch_delivers_summary_once_and_closes_session asserts a session that stops at summary_delivered, not closed. Consider renaming to summarize_branch_delivers_summary_once_and_halts_at_summary_delivered (or similar) to keep the suite self-documenting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/phase3_followup_summary.rs` at line 209, The test function name
summarize_branch_delivers_summary_once_and_closes_session is misleading; rename
the async test function to
summarize_branch_delivers_summary_once_and_halts_at_summary_delivered and update
any references (attributes like #[tokio::test], mod uses, or callers) in the
file to match the new identifier so the test suite stays self-documenting and
continues to run.
src/mediation/mod.rs (1)

418-423: Defensive let _ = round_number; is redundant.

round_number is already referenced by #[instrument(..., fields(round = round_number))] on Line 395, so the compiler will not warn about an unused binding. The let _ = line is harmless but adds noise. Optional cleanup — keep the explanatory comment if you want, but the binding itself is not load-bearing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/mediation/mod.rs` around lines 418 - 423, Remove the redundant defensive
binding "let _ = round_number;" (and optionally keep or move the explanatory
comment) because the #[instrument(..., fields(round = round_number))] on the
surrounding function already references round_number; edit the block containing
round_number, buyer_content, and seller_content to delete that pointless binding
so the code is cleaner while leaving the trim-and-to_string calls for buyer_text
and seller_text intact.
prompts/phase3-system.md (1)

48-65: Consider clarifying ambiguous-language fallback.

The policy reads cleanly for the primary case, but two edge cases are worth pinning so the model has explicit guidance instead of having to choose:

  1. Very short / language-ambiguous first replies ("ok", "sí", "thanks") where confident detection is impossible — does the model keep the existing default (English) or switch on weak signal?
  2. Mixed-language replies (e.g. Spanglish) — pick the dominant language, mirror the writer's mix, or fall back to English?

Tightening these in the prompt avoids language-flapping across rounds, which would itself be a user-visible regression. Optional; safe to defer if you want to see real traffic first.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prompts/phase3-system.md` around lines 48 - 65, The Language Matching rules
lack explicit fallback for ambiguous or mixed-language first replies; update the
Language Matching section to add two concrete tie-breakers: (1) for very short
or low-confidence detections (e.g., "ok", "sí", "thanks") keep the existing
default English until a subsequent high-confidence message switches the
buyer_clarification or seller_clarification, and (2) for mixed-language messages
choose the dominant language by token proportion (if one language >60% use it)
otherwise keep English; ensure these rules are applied independently to
buyer_clarification and seller_clarification and note that solver-facing outputs
remain English.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/reasoning/openai.rs`:
- Around line 558-595: The extract_json_object function can let depth go
negative if stray '}' appears before the first '{', so update the '}' handling
in extract_json_object to clamp depth (e.g., only decrement if depth > 0 or set
depth = (depth - 1).max(0)) so a leading unmatched '}' doesn't prevent setting
start on the first real '{'; keep the rest of the logic (setting start when b'{'
and depth == 0, returning &raw[s..=i]) unchanged.

---

Outside diff comments:
In `@src/chat/outbound.rs`:
- Around line 79-148: Add unit tests for build_wrap_with_audience: (1) generate
sender and shared keys and assert that calling build_wrap_with_audience with the
same message but different audience strings (e.g., Some("buyer") vs
Some("seller")) produces different inner_event_id values so the (session_id,
inner_event_id) uniqueness invariant holds; (2) derive shared keys (e.g.,
derive_shared_keys or equivalent), call build_wrap_with_audience with
Some("buyer"), unwrap the outer with the shared key (e.g.,
unwrap_with_shared_key) and assert the inner event content equals the original
message and contains the m-aud tag when audience is set; and (3) assert that
calling build_wrap_with_audience with audience = None matches the legacy
build_wrap output (inner_event_id and inner content) to ensure backward
compatibility. Ensure tests use Keys::generate and reference
build_wrap_with_audience, inner_event_id, and the m-aud tag logic.

In `@src/mediation/follow_up.rs`:
- Around line 380-383: Update the stale inline comment in follow_up.rs that
currently states the session is "closed" after deliver_summary returns; change
it to reflect that after deliver_summary returns Ok the session is in the
summary_delivered state (not closed) and that the parenthetical about deferring
summary_delivered → closed to dispute_resolved remains accurate; locate the
comment near the marker-advance logic in the function handling deliver_summary
(the block that marks the round evaluated) and revise the parenthetical wording
to mention summary_delivered instead of closed while keeping the existing
rationale about preventing future ticks from mistaking evaluated rounds for
unevaluated ones.

In `@src/mediation/mod.rs`:
- Around line 1053-1058: Update the top-level doc for deliver_summary to match
the implementation: change the state machine description from "classified →
summary_pending → summary_delivered → closed" to "classified → summary_pending →
summary_delivered" (or otherwise indicate it intentionally stops at
summary_delivered), and ensure the doc mentions that transitions are performed
via db::mediation::set_session_state; keep the existing inline comment
explanation intact so docs and code agree about why the session does not
progress to closed.
- Around line 186-204: The doc for draft_and_send_initial_message is stale: it
still claims the code "Builds per-party gift-wraps with role prefixes" but the
implementation uses the m-aud audience tag instead; update the rustdoc on
draft_and_send_initial_message to reflect the current behavior (mention that
inner event id uniqueness is ensured via the m-aud audience tag, not
role-prefixed wraps), keep the rest of the bullets about transactional outbox,
bounded retry, emitting outbound_sent after successful publish
(mediation_events), and the session state transition (reference
session::open_session) intact, and remove or reword any references to role
prefixes so docs match the code.
- Around line 352-394: The doc comment for draft_and_send_followup_message is
outdated: it still claims the body is prefixed with "Round {N}. {role}:" and
that round-number/role prefixes are cosmetic, but the implementation no longer
adds those prefixes (round_number is only used for spans and m-aud carries
audience); update the rustdoc for draft_and_send_followup_message to remove the
bullet about body prefixes, clarify that round_number is used only for
tracing/span purposes and that audience is conveyed via the m-aud claim, and
keep the note about policy_hash unchanged (or explicitly state prefixes are not
used so policy_hash is unaffected); reference the function name
draft_and_send_followup_message and the symbols round_number, m-aud, and
policy_hash when editing the commentary.

In `@src/reasoning/openai.rs`:
- Around line 379-433: log_response_failure currently logs a raw/truncated
body_preview at info!, which can leak PII; change its behavior so that
info-level logs only include safe metadata (status, headers, body length) and
push any raw body content to debug! (or replace the preview with a hash +
length), and update all call sites (the non-success HTTP branch that constructs
ReasoningError::Unreachable, the serde_json::from_str parse-error branch that
returns ReasoningError::MalformedResponse, and the empty-choices branch) to stop
passing the raw body for info-level logging; you can implement this by modifying
log_response_failure to accept an enum/flag (e.g., BodyLog::DebugOrHash) or to
always compute a short SHA-256 + length for info and emit the full truncated
body only at debug, and reuse the same change for the other occurrences of
log_response_failure elsewhere in the file.

In `@tests/phase3_followup_round.rs`:
- Around line 154-160: Update the seed-rationale comment near the
`classification_produced` test seed so it no longer mentions a now-removed
user-facing "Round-N" label; instead state concisely that the seed is required
so `count_classification_events` (and therefore `advance_session_round`) returns
the correct `followup_number` used by the policy bypass-window logic. Keep the
reference to `classification_produced`, `count_classification_events`, and
`advance_session_round` so maintainers can find the relevant code paths.
- Around line 295-297: The block comment above the assertions in
tests/phase3_followup_round.rs is stale and claims "the new two carry the `Round
1. ` prefix" which contradicts the subsequent assertions that explicitly forbid
any `Round `-prefixed body (the assertions around the outbound rows check).
Update that comment to state that the new two do NOT carry the `Round 1. `
prefix (or remove the prefix claim entirely), and keep the rest of the guidance
about expecting 4 outbound rows and the session remaining in `awaiting_response`
with the marker advanced so the comment matches the actual assertions.

In `@tests/phase3_followup_summary.rs`:
- Around line 393-410: Update the stale wording that mentions "closed session"
to reflect the current session state `summary_delivered`: change the comment
above the idempotency test and the `.expect("second call on a closed session
must be a no-op")` message to say something like "calling advance_session_round
again on a session in `summary_delivered` must skip (state gate blocks)" and
".expect(\"second call on a session in summary_delivered must be a no-op\")" so
the comment and the `advance_session_round` expectation match the actual session
state and avoid confusion.
- Around line 1-19: Update the stale module doc at the top of
tests/phase3_followup_summary.rs so the lifecycle string matches the test
assertions (the session stops at summary_delivered); replace the current
"classified → summary_pending → summary_delivered → closed" text with the
corrected lifecycle (e.g., "classified → summary_pending → summary_delivered")
so the doc and the assertions about advance_session_round/deliver_summary align.

---

Nitpick comments:
In `@prompts/phase3-system.md`:
- Around line 48-65: The Language Matching rules lack explicit fallback for
ambiguous or mixed-language first replies; update the Language Matching section
to add two concrete tie-breakers: (1) for very short or low-confidence
detections (e.g., "ok", "sí", "thanks") keep the existing default English until
a subsequent high-confidence message switches the buyer_clarification or
seller_clarification, and (2) for mixed-language messages choose the dominant
language by token proportion (if one language >60% use it) otherwise keep
English; ensure these rules are applied independently to buyer_clarification and
seller_clarification and note that solver-facing outputs remain English.

In `@src/mediation/mod.rs`:
- Around line 418-423: Remove the redundant defensive binding "let _ =
round_number;" (and optionally keep or move the explanatory comment) because the
#[instrument(..., fields(round = round_number))] on the surrounding function
already references round_number; edit the block containing round_number,
buyer_content, and seller_content to delete that pointless binding so the code
is cleaner while leaving the trim-and-to_string calls for buyer_text and
seller_text intact.

In `@src/reasoning/openai.rs`:
- Around line 349-374: The code currently treats failures from
resp.bytes().await as ReasoningError::MalformedResponse, which misclassifies
transport-layer issues; update the error handling around resp.bytes().await to
set last_err to a transport-oriented variant (e.g., ReasoningError::Unreachable
or a new ReasoningError::TransportError) instead of
ReasoningError::MalformedResponse, keep the existing log_response_failure call
and retry continue logic unchanged, and ensure the error message includes the
original error string and error_chain for diagnostics (refer to
resp.bytes().await, log_response_failure, last_err, and ReasoningError::* to
locate and change the assignment).
- Around line 1041-1083: Add a regression test that mirrors
tests/reasoning_anthropic.rs::malformed_json_classification_surfaces_as_error
but targets the OpenAI path that uses post_chat so the envelope JSON parse
failure returns ReasoningError::MalformedResponse; specifically, create a
mock-server test (e.g., in tests/reasoning_openai.rs) that makes
OpenAiProvider::new(...) and calls the code path that triggers post_chat, then
assert that log_response_failure/text_str handling yields a MalformedResponse
error rather than a generic parse or success; ensure the test exercises the
envelope-parse branch that currently uses post_chat and fails parsing so future
changes to log_response_failure or text_str cannot regress the public error
contract.

In `@tests/phase3_cooperative_summary.rs`:
- Around line 130-131: Rename the test function
cooperative_summary_closes_session_and_notifies_assigned_solver to reflect its
actual behavior (e.g., cooperative_summary_delivers_to_assigned_solver) by
updating the async test fn name in tests/phase3_cooperative_summary.rs (the
#[tokio::test] async fn declaration) and update any references to that function
name in the test suite (imports, mod references, or CI/test filters) so the test
still runs under its new descriptive name.

In `@tests/phase3_followup_summary.rs`:
- Line 209: The test function name
summarize_branch_delivers_summary_once_and_closes_session is misleading; rename
the async test function to
summarize_branch_delivers_summary_once_and_halts_at_summary_delivered and update
any references (attributes like #[tokio::test], mod uses, or callers) in the
file to match the new identifier so the test suite stays self-documenting and
continues to run.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 29fd69cf-11bc-40ce-a419-c829edefe4c5

📥 Commits

Reviewing files that changed from the base of the PR and between ace9517 and 30c0fbd.

📒 Files selected for processing (9)
  • prompts/phase3-system.md
  • src/chat/outbound.rs
  • src/mediation/follow_up.rs
  • src/mediation/mod.rs
  • src/reasoning/openai.rs
  • tests/phase3_cooperative_summary.rs
  • tests/phase3_followup_round.rs
  • tests/phase3_followup_summary.rs
  • tests/phase3_session_open.rs

Comment thread src/reasoning/openai.rs
Comment on lines +134 to +142
fn is_minimal_param_model(model: &str) -> bool {
// Curated list of models that publish `supported_parameters: []`
// (or a near-empty set) on the PPQ.ai catalog. Hardcoded
// because querying `/v1/models` per request would add latency
// and a failure mode for a near-static fact. Extend as new
// router/auto SKUs appear.
matches!(model, "autoclaw" | "auto" | "switchpoint/router")
|| model.starts_with("gpt-5")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Does OpenAI's gpt-5 family reject the response_format=json_object parameter on the chat completions endpoint?

💡 Result:

No. OpenAI’s GPT-5 models (including the GPT-5 chat snapshot) accept the chat completions endpoint parameter response_format={"type":"json_object"} (i.e., “JSON mode”). What the docs say: 1) The Chat Completions API reference defines response_format with a json_object option, describing it as “older JSON mode” and as ensuring the message is valid JSON. 2) OpenAI’s structured-outputs/json-mode guide explains that JSON mode uses response_format of type json_object. So if you are seeing “rejections,” they’re likely model/feature mismatches or different response_format types (e.g., json_schema vs json_object) rather than GPT-5 family rejecting json_object in general. For example, one community report shows an error for json_schema being “not supported” on gpt-5-chat-latest, which is consistent with the distinction between json_schema (Structured Outputs) support vs json_object (JSON mode). Practical guidance: Prefer response_format type json_schema (Structured Outputs) when supported; otherwise use json_object (JSON mode) with the required prompt instruction to “produce JSON”.

Citations:


🏁 Script executed:

# First, examine the function and surrounding lines (104-142) to understand full context
sed -n '104,142p' src/reasoning/openai.rs

Repository: MostroP2P/serbero

Length of output: 1831


🏁 Script executed:

# Find all usages of is_minimal_param_model
rg 'is_minimal_param_model' src/reasoning/openai.rs -B 2 -A 2

Repository: MostroP2P/serbero

Length of output: 706


🏁 Script executed:

# Look for related logic around JSON response format
rg 'response_format|json_object' src/reasoning/openai.rs -B 2 -A 2

Repository: MostroP2P/serbero

Length of output: 5217


🏁 Script executed:

web_search
Does OpenAI's GPT-5 models reject or have constraints on the temperature parameter in chat completions?

Repository: MostroP2P/serbero

Length of output: 181


🌐 Web query:

Does OpenAI's GPT-5 models reject or have constraints on the temperature parameter in chat completions?

💡 Result:

Yes. For OpenAI’s GPT-5 models used via chat completions, the API imposes constraints on the temperature parameter: temperature is effectively not supported except for the default value (1). If you send an explicit temperature other than 1 (e.g., 0.3 or 0), the API can reject the request with an “Unsupported value”/“Only the default (1) value is supported” error. How to interpret “constraints” vs “reject”: - In practice, providing temperature=0.7 (or any value other than 1) to GPT-5 models results in a 400 invalid_request_error. - Supplying temperature=1 (or omitting temperature so the default is used) is accepted. OpenAI’s current official docs for chat completions still describe temperature as a parameter in general, but they don’t describe GPT-5’s model-specific restriction on temperature in the snippets found here. The GPT-5 restriction is evidenced by API errors reported when calling GPT-5 models with unsupported temperature values. If you need controllable randomness with GPT-5 models, you’ll likely need to use a model family that supports temperature or rely on alternative knobs (for GPT-5 reasoning-capable usage, OpenAI emphasizes parameters like reasoning.effort rather than temperature).

Citations:


Split temperature and response_format restrictions — gpt-5 accepts JSON mode.

The web search confirms that OpenAI's gpt-5 family accepts response_format=json_object, contradicting the current code that omits it for all gpt-5* models. The gpt-5 temperature constraint is real (API rejects non-1.0 values), but that does not extend to response_format.

The refactoring is necessary: is_minimal_param_model currently controls both constraints in lockstep. Separate them so only gpt-5* temperature is omitted while response_format is sent normally. This restores the server-enforced JSON schema guarantee for gpt-5 models instead of relying solely on prompt instructions and extract_json_object.

♻️ Example split
+    /// Models that reject explicit `temperature` (must use server default of 1.0).
+    fn rejects_temperature(model: &str) -> bool {
+        model.starts_with("gpt-5")
+    }
+
-    fn is_minimal_param_model(model: &str) -> bool {
-        // Curated list of models that publish `supported_parameters: []`
-        // (or a near-empty set) on the PPQ.ai catalog. Hardcoded
-        // because querying `/v1/models` per request would add latency
-        // and a failure mode for a near-static fact. Extend as new
-        // router/auto SKUs appear.
-        matches!(model, "autoclaw" | "auto" | "switchpoint/router")
-            || model.starts_with("gpt-5")
+    /// Models that reject `response_format = json_object` (PPQ.ai
+    /// router/auto SKUs publishing `supported_parameters: []`).
+    fn is_minimal_param_model(model: &str) -> bool {
+        matches!(model, "autoclaw" | "auto" | "switchpoint/router")
     }

Update request_temperature() to use rejects_temperature().

Comment thread src/reasoning/openai.rs
grunch and others added 3 commits April 27, 2026 17:59
CodeRabbit-style review pass on top of the duplicate-session +
chat-cleanup changes.

Correctness fixes:
- src/reasoning/openai.rs (extract_json_object): clamp depth at zero
  on `}` so a stray leading `}` cannot push depth negative and shift
  the depth==0 check off by one (would skip the next real `{` as a
  candidate `start`).
- src/reasoning/openai.rs (resp.bytes() failure path): re-classify
  body-read failures as `ReasoningError::Unreachable` (transport)
  instead of `MalformedResponse` (parse). The retry-and-continue
  semantics are unchanged; only the public error contract is
  realigned with the actual failure mode.

Logging hygiene:
- src/reasoning/openai.rs (log_response_failure): info! now carries
  only PII-safe metadata (status, headers, body length, body
  sha256[..16]); the truncated body preview is gated to debug! so
  rationale text returned in error responses cannot leak into
  general logs (FR-120 / TC-103).

New tests:
- src/chat/outbound.rs::tests: three new unit tests for
  `build_wrap_with_audience`:
  1. distinct audience strings on identical body produce distinct
     `inner_event_id` values (the uniqueness invariant on
     `mediation_messages`),
  2. roundtrip preserves the user-visible body and attaches an
     `m-aud` tag with the expected audience value,
  3. `audience = None` does NOT attach the tag (legacy compat).

Doc / comment freshness:
- src/mediation/mod.rs: deliver_summary doc clarifies the state
  machine stops at `summary_delivered` (not `closed`) and explains
  the deferred `summary_delivered → closed` step.
- src/mediation/mod.rs: draft_and_send_initial_message doc no
  longer claims role prefixes; mentions `m-aud` audience tag.
- src/mediation/mod.rs: draft_and_send_followup_message doc no
  longer claims `Round N. role:` body prefixes; round_number is
  documented as tracing-span only.
- src/mediation/mod.rs: drop the redundant `let _ = round_number;`
  binding (the `#[instrument(..., fields(round = round_number))]`
  attribute already references the arg).
- src/mediation/follow_up.rs: marker-advance comment refers to
  `summary_delivered` (not "closed (terminal)").
- tests/phase3_followup_round.rs: stale "Round 1." prefix /
  "Round-N label" comments updated to match the new no-prefix
  reality.
- tests/phase3_followup_summary.rs: module doc + idempotency
  comment / expect message refer to `summary_delivered`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address review feedback that the deliver_summary fix left
sessions parked in `summary_delivered` forever — the only later
close path (`dispute_resolved`) routes through
`latest_open_session_for`, which excludes `summary_delivered` by
design, so the state machine ended up asymmetric.

The eligibility predicate already short-circuits via
`is_eligible_lifecycle(Resolved) == false` once Mostro resolves
the dispute, so the practical "preventing future mediation opens"
phrasing in the review was overstated — but leaving sessions in a
non-`closed` terminal state forever is a real code-quality issue
worth fixing.

Fix: in `handlers::dispute_resolved`, after the existing
open-session supersede-then-close branch, look up any session for
the dispute that's in `summary_delivered` and walk it directly
through the legal `summary_delivered → closed` transition (no
SupersededByHuman intermediate — that variant is reserved for
sessions still in an open state when a human pre-empted
mediation, and `SummaryDelivered → SupersededByHuman` is not a
legal edge in `MediationSessionState::can_transition_to`). Both
the state flip and the `session_closed` audit row land in the
same transaction as the lifecycle move to `Resolved`, so the
close is atomic with the dispute resolution.

Audit payload carries `"from_state": "summary_delivered"` so
future debuggers can distinguish this close path from the
open-session supersede-then-close path that uses the same
`SessionClosed` event kind.

New regression test
`summary_delivered_session_is_closed_on_resolution`
(`tests/phase3_external_resolution_report.rs`) seeds a session in
`summary_delivered`, fires `dispute_resolved`, and asserts:
- session lands in `closed`,
- exactly one `session_closed` event with the new `from_state`
  discriminator,
- no `superseded_by_human` event was emitted (direct close skips
  that step),
- the FR-124 final-report DM still fires for this dispute, same
  as every other shape.

The earlier review finding about raw response previews leaking at
info-level (`log_response_failure`) was already addressed in
4f39855 — info! emits only `body_len` + `body_sha256_prefix`,
with the truncated preview gated to debug! — so no further change
on that one.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@grunch grunch merged commit df19616 into main Apr 27, 2026
2 checks passed
@grunch grunch deleted the fix/mediation-summary-and-chat-cleanup branch April 27, 2026 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant