fix(prompts): forbid repeated clarifications, harden language switch#50
fix(prompts): forbid repeated clarifications, harden language switch#50
Conversation
Observed 2026-04-28 in production with `gpt-5.4-mini` (Alice/Bob test
trade, dispute 22df3f9d-…), screenshot of seller's chat:
14:21 Serbero (en): "Hello, I'm Serbero, … did you receive the
fiat payment, …"
14:22 Seller "hola no entiendo"
14:22 Serbero (en): same English question, verbatim
14:22 Seller "no entiendo ingles, habla español?"
14:23 Serbero (es): "Me gustaría entender tu versión. … ¿recibiste
el pago fiat? …"
14:23 Seller "acabo de recibir el pago fiat"
14:23 Serbero (es): same Spanish question, verbatim
14:24 Seller "te digo que recibi el pago fiat"
→ Serbero Escalate(LowConfidence)
Two model failures stack:
1. Round 2 was emitted in English even though the seller's only reply
was Spanish. The "switch on first reply in another language" rule
in `phase3-system.md`'s "Language Matching" section was too soft —
the model treated the language switch as optional.
2. Round 4 repeated the round-3 Spanish question verbatim despite the
seller giving a substantive answer ("acabo de recibir el pago
fiat"). The classifier prompt had no rule against repeating a
prior clarification, and no guidance to ask for evidence
(timestamp, transfer reference, redacted screenshot) once the
yes/no question is answered.
The model is what it is — `gpt-5.4-mini` is a weak router target, and
the prompt has to compensate. Two prompt fixes:
- `phase3-classification.md` "Clarifying Questions (per-party)" hard
rules: added an explicit "advance the conversation" rule that
forbids byte-identical or substantively-equivalent repeats of a
previous Serbero outbound to the same party. Includes concrete
examples ("seller said they received the fiat → ask for timestamp,
method, redacted bank statement") and the meta-message carve-out
("no entiendo" / "speak Spanish" is NOT a substantive answer; re-
ask the original question in the party's detected language).
- `phase3-system.md` "Language Matching" section: tightened "switch
on the first reply that is clearly in another language" to
"mandatory and immediate", with a worked example pinned to the
2026-04-28 production incident. Also added the cross-check that
`*_clarification` text MUST match its corresponding `*_language`
code (no `seller_language: "es"` + English `seller_clarification`).
No code change. Lib + integration tests still pass; the keyword-audit
test on the cooperative-self-resolution bundle is unaffected.
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThe PR introduces a new Changes
Sequence DiagramsequenceDiagram
participant Classifier as Classifier<br/>(OpenAI)
participant FollowUpFlow as Follow-up Flow<br/>(follow_up.rs)
participant DB as Database/State
Classifier->>Classifier: Evaluate session<br/>→ SuggestSelfResolutionWithSummary
Classifier->>FollowUpFlow: Return ClassificationResponse<br/>with seller_confirmed_fiat_receipt
alt seller_confirmed_fiat_receipt == Some(true)
FollowUpFlow->>DB: Update session state<br/>awaiting_response → classified
FollowUpFlow->>DB: Dispatch cooperative<br/>self-resolution invitation
FollowUpFlow->>DB: Deliver summary
DB-->>FollowUpFlow: Session → summary_delivered
else seller_confirmed_fiat_receipt != Some(true)
FollowUpFlow->>DB: Advance evaluator marker<br/>to total_fresh_inbounds
FollowUpFlow->>DB: Log suppression<br/>(confidence, receipt flag)
FollowUpFlow->>DB: Session remains<br/>awaiting_response
DB-->>FollowUpFlow: Eligible for<br/>reconsidering in next round
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 23 minutes and 53 seconds.Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8560b9b3e3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| `## Transcript` section for any `serbero` outbound to the same | ||
| party in earlier rounds. Your text MUST NOT be byte-identical or | ||
| substantively equivalent to a previous Serbero clarification to |
There was a problem hiding this comment.
Remove impossible same-party transcript scan rule
This rule assumes the model can identify which prior serbero outbound belonged to the buyer vs seller, but the runtime transcript only serializes entries as [ts] serbero: <text> without a recipient field (src/reasoning/openai.rs prompt builder). As a result, the model cannot reliably enforce "same party" non-repetition and may compare against the wrong prior message, causing false "repeat" detections and degraded clarification flow.
Useful? React with 👍 / 👎.
| `*_clarification` text MUST match its corresponding `*_language` | ||
| code — emitting `seller_language: "es"` while writing | ||
| `seller_clarification` in English is an internal contradiction |
There was a problem hiding this comment.
Exempt null language rounds from clarification-language match
The new requirement that *_clarification MUST match *_language creates a contract conflict on rounds where no party text exists yet: the classification policy requires buyer_language/seller_language to be null when there is no content, while round-0 still requires English clarifications. This contradiction can push the model to emit "en" on round 0 (violating the null rule) or keep null and violate this new rule, reducing output reliability.
Useful? React with 👍 / 👎.
Maintainer's correction on PR #50: when the seller says "acabo de recibir el pago fiat" (or any equivalent across supported languages), that is NOT a yes/no answer that wants follow-up evidence — it is a satisfaction signal. The trade is going smoothly on the seller's side; in Mostro the seller simply releases the funds at that point and the dispute resolves externally without solver action. The previous commit on this branch told the model to ask for timestamp / method / redacted screenshot, which is exactly the opposite of the cooperative path. Feature 005's whole purpose was to detect this case and dispatch the cooperative-self-resolution invitation ("looks like you're close to coordinating between yourselves…") to both parties, with the solver notified in parallel. The model should classify `coordination_failure_resolvable` with `confidence ≥ 0.75` and `suggested_action = summarize`; the policy layer rewrites that to `SuggestSelfResolutionWithSummary`. Two prompt changes: - `phase3-classification.md` Labels section: expanded the `coordination_failure_resolvable` description with the concrete cues that should classify here (seller confirms receipt of fiat without a "but…", both parties consistently describing a coordination/timing issue, buyer confirms payment sent and seller has not contradicted), with a pointer that "give me your bank statement" follow-ups on an already-resolving case are a defect. - `phase3-classification.md` Hard rules: rewrote the "advance the conversation" rule into three branches keyed by the kind of reply (satisfaction signal → STOP asking, switch to coordination_failure_resolvable + summarize; partial answer or new contradiction → ask next concrete piece, but visibly different; meta-message → re-ask in the detected language). The previous "ask for evidence" examples are gone. The language-switch tightening from the prior commit on this branch is unaffected.
…olution invite A buyer-only "I sent the fiat" claim is self-serving and was enough to trigger the cooperative self-resolution invitation. The classifier now emits a nullable seller_confirmed_fiat_receipt signal, and the follow-up path suppresses the invitation (falling back to the solver-only summary) unless that signal is true. Prompt, spec (FR-015), and tests updated accordingly.
…ound The previous suppression branch walked the session to summary_delivered when seller_confirmed_fiat_receipt was not Some(true). That contradicted the spec edge case (FR-015) the same PR adds: "if the seller later confirms the fiat arrived, the invitation becomes eligible on that later round." The state gate at the top of advance_session_round only re-admits a summary_delivered session when a self_resolution_offered audit row exists, so the suppression path was permanently inert. Drop the deliver_summary fallback. Leave the session in awaiting_response, advance the evaluator marker, and let the next seller-confirmation round re-classify cleanly. Test updated to assert state == awaiting_response and the marker advanced. This also moots the deliver_summary error-path revert gap, since there is no longer a deliver_summary call in this branch.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
src/reasoning/openai.rs (1)
959-971: Add a parse test forseller_confirmed_fiat_receipt: false.You already pin
NoneandSome(true). Add one case for explicitfalseto fully lock the tri-state contract.Proposed test
+ #[test] + fn parse_classification_parses_seller_receipt_denial_signal() { + let raw = r#"{ + "classification":"coordination_failure_resolvable", + "confidence":0.72, + "suggested_action":"ask_clarification", + "buyer_clarification":"Please share transfer proof.", + "seller_clarification":"Please confirm if funds arrived.", + "rationale":"seller denied receipt", + "flags":[], + "seller_confirmed_fiat_receipt":false + }"#; + let parsed = parse_classification(raw).unwrap(); + assert_eq!(parsed.seller_confirmed_fiat_receipt, Some(false)); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/reasoning/openai.rs` around lines 959 - 971, Add another unit test alongside parse_classification_parses_seller_receipt_signal that verifies parse_classification correctly parses an explicit false value for seller_confirmed_fiat_receipt; create a new test (e.g., parse_classification_parses_seller_receipt_signal_false) that feeds JSON with "seller_confirmed_fiat_receipt":false into parse_classification and assert_eq!(parsed.seller_confirmed_fiat_receipt, Some(false)) to complete the tri-state coverage (None, Some(true), Some(false)).tests/phase3_self_resolution_role_guard.rs (2)
280-320: Cover theNoneguard state too.The runtime suppresses invitations for both
Some(false)andNone, but this file only proves the explicit-denial path. Adding aNonecase would catch prompt/parser regressions where the field is omitted instead of set.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/phase3_self_resolution_role_guard.rs` around lines 280 - 320, Add coverage for the None guard state by expanding the test that verifies invitations are suppressed: update buyer_only_fiat_claim_does_not_send_self_resolution_invitation (or add a new async test) to run the same assertions for both run_case inputs Some(false) and None (i.e., call run_case with the third-party guard parameter set to None as well), reusing the same session flow, queries (mediation_events/mediation_messages/mediation_sessions) and assertions (offered_count == 0, outbound_count == 2, state == "awaiting_response", marker > 0) so the test protects against regressions where the guard field is omitted; ensure you create a distinct session_id for the None case to avoid DB collisions.
339-359: Avoid keying this test to one Spanish template phrase.
LIKE '%coordinar el siguiente paso%'makes the test fail on harmless copy edits inprompts/phase3-message-templates.md, even if the role guard still works. For this test, asserting on the extra outbound rows /self_resolution_offeredevent would be more stable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/phase3_self_resolution_role_guard.rs` around lines 339 - 359, The test is brittle because it asserts on a specific Spanish template via "content LIKE '%coordinar el siguiente paso%'" — remove that LIKE clause from the SQL and instead count outbound mediation_messages more generally (e.g. SELECT COUNT(*) FROM mediation_messages WHERE session_id = ?1 AND direction = 'outbound') or otherwise assert only on the self_resolution_offered event count; update the query used in the block that assigns (offered_count, invite_count, state) and adjust the corresponding assert on invite_count to rely on the generic outbound count (still expecting 2) so template text edits won't break the test; refer to the variables offered_count, invite_count, state and the SQL query in the test to locate and change the clause.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@prompts/phase3-system.md`:
- Around line 69-77: Update the sentence "The switch to a non-default language
is mandatory and immediate on the very first reply where the language is
unambiguous." to define “unambiguous” concretely: treat it as (1) explicit
metadata: buyer_language or seller_language set to a non-default ISO code, or
(2) language detection: a non-default ISO language detected with confidence >=
0.90; declare metadata takes precedence over detection; and require the model to
switch immediately on the very next reply after either trigger is present.
Ensure the prompt text enforces this precedence and confidence threshold so the
mandatory, immediate switch behavior is deterministic.
- Around line 78-84: The runtime currently renders clarification text via
render_for(language_code) but does not ensure the rendered text language matches
the persisted buyer_language/seller_language (allowing silent fallback to
bundle.fallback_language); modify the runtime path that calls render_for to
either (a) assert that render_for succeeded without using the fallback
(detectable via render_for return value or an added boolean) before persisting
buyer_language/seller_language, or (b) replace the persisted language code with
the resolved language returned by render_for, or (c) run a bounded post-render
language-check (coarse heuristic or LLM verifier) and fail/panic/log if
mismatch; locate changes around render_for, the code that writes the audit event
(where buyer_language/seller_language are stored), and the
bundle.fallback_language usage and implement one of these fixes so stored
language codes always match rendered clarification text.
- Around line 85-92: The prompt text claims languages "switch on the first
reply" and "stay in that language going forward", but runtime resolves languages
each round using the classifier outputs (`buyer_language`, `seller_language`)
and the resolver function `resolve_effective_language`, which falls back to
English when detection is None/unknown; update the prompt text to match runtime
semantics by replacing the "one-time switch" wording with a per-round
re-evaluation description and note that `buyer_clarification` and
`seller_clarification` are independently resolved each round via
`resolve_effective_language` (round 0 defaults to English), or alternatively
adjust `resolve_effective_language` to implement sticky behavior if you intend
one-time switching—pick one and make prompt and code consistent.
---
Nitpick comments:
In `@src/reasoning/openai.rs`:
- Around line 959-971: Add another unit test alongside
parse_classification_parses_seller_receipt_signal that verifies
parse_classification correctly parses an explicit false value for
seller_confirmed_fiat_receipt; create a new test (e.g.,
parse_classification_parses_seller_receipt_signal_false) that feeds JSON with
"seller_confirmed_fiat_receipt":false into parse_classification and
assert_eq!(parsed.seller_confirmed_fiat_receipt, Some(false)) to complete the
tri-state coverage (None, Some(true), Some(false)).
In `@tests/phase3_self_resolution_role_guard.rs`:
- Around line 280-320: Add coverage for the None guard state by expanding the
test that verifies invitations are suppressed: update
buyer_only_fiat_claim_does_not_send_self_resolution_invitation (or add a new
async test) to run the same assertions for both run_case inputs Some(false) and
None (i.e., call run_case with the third-party guard parameter set to None as
well), reusing the same session flow, queries
(mediation_events/mediation_messages/mediation_sessions) and assertions
(offered_count == 0, outbound_count == 2, state == "awaiting_response", marker >
0) so the test protects against regressions where the guard field is omitted;
ensure you create a distinct session_id for the None case to avoid DB
collisions.
- Around line 339-359: The test is brittle because it asserts on a specific
Spanish template via "content LIKE '%coordinar el siguiente paso%'" — remove
that LIKE clause from the SQL and instead count outbound mediation_messages more
generally (e.g. SELECT COUNT(*) FROM mediation_messages WHERE session_id = ?1
AND direction = 'outbound') or otherwise assert only on the
self_resolution_offered event count; update the query used in the block that
assigns (offered_count, invite_count, state) and adjust the corresponding assert
on invite_count to rely on the generic outbound count (still expecting 2) so
template text edits won't break the test; refer to the variables offered_count,
invite_count, state and the SQL query in the test to locate and change the
clause.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5bddb35d-0324-41d8-8b19-5624e5be0ef3
📒 Files selected for processing (13)
prompts/phase3-classification.mdprompts/phase3-system.mdspecs/005-cooperative-self-resolution/spec.mdsrc/mediation/follow_up.rssrc/mediation/policy.rssrc/models/reasoning.rssrc/reasoning/openai.rstests/common/mod.rstests/phase3_authority_boundary.rstests/phase3_escalation_triggers.rstests/phase3_followup_summary.rstests/phase3_self_resolution_role_guard.rstests/phase3_take_reasoning_coupling.rs
| Concrete cues that should classify here with `confidence ≥ 0.75` | ||
| and `suggested_action = summarize` (which the policy layer | ||
| routes to the Feature 005 cooperative-self-resolution branch): | ||
|
|
||
| - **Seller confirms receipt of fiat** without a qualifying | ||
| "but…" or fraud red flag. Examples: "I received the fiat", | ||
| "acabo de recibir el pago", "ya me llegó el dinero", "recebi | ||
| o pagamento". This is a satisfaction signal — the dispute can | ||
| resolve externally if the seller now releases the funds in | ||
| their Mostro client. Serbero never names or instructs that | ||
| action; the cooperative invitation simply tells both parties | ||
| they look close to coordinating between themselves and | ||
| monitors for a status change. | ||
| - **Both parties consistently describe a coordination / | ||
| timing issue** (delayed bank wire, timezone gap, payment- | ||
| method confusion) without contradicting each other on the | ||
| underlying facts. | ||
| - **Buyer confirms payment sent AND the seller corroborates | ||
| receipt** in the transcript. Once the seller says the fiat | ||
| arrived, the case becomes a legitimate cooperative-resolution | ||
| candidate; until then, the buyer's claim remains self-serving | ||
| and does NOT by itself justify the self-resolution invitation. | ||
|
|
||
| Counter-example that MUST NOT classify into the cooperative | ||
| self-resolution branch on its own: | ||
|
|
||
| - **Buyer says they sent the fiat, but the seller has not yet | ||
| confirmed receipt.** That is exactly what an honest buyer and a | ||
| dishonest buyer would both say. Do NOT treat this as a | ||
| satisfaction signal and do NOT jump to `suggested_action = | ||
| summarize` just to invite the parties to "coordinate the next | ||
| step". Instead, keep gathering evidence: ask the buyer for | ||
| proof of payment / transfer details and ask the seller whether | ||
| the fiat has arrived. | ||
|
|
||
| Do NOT keep gathering evidence once the cooperative cue is | ||
| unambiguous; "give me your bank statement" / "what timestamp" | ||
| follow-ups on a seller who already said "I got the fiat" are a | ||
| defect (observed 2026-04-28 production transcript, where the | ||
| model re-asked the same Spanish question instead of routing to | ||
| self-resolution). |
There was a problem hiding this comment.
Define seller_confirmed_fiat_receipt in the structured output contract.
These new cooperative-routing rules rely on that field, but this prompt never actually tells the model to emit it. A model can follow the narrative guidance here and still omit the flag; that deserializes as None, and the runtime guard in src/mediation/follow_up.rs Line 456 will suppress the self-resolution invitation even after a clear seller confirmation.
Suggested prompt addition
+## Seller Fiat-Receipt Confirmation (Feature 005)
+
+The classifier MUST also emit:
+
+- `seller_confirmed_fiat_receipt` (boolean | null): `true` only when the
+ seller explicitly confirms the fiat was received; `false` when the seller
+ explicitly says it has not arrived; `null` when the transcript does not
+ support a confident seller-side determination yet.Also applies to: 171-225
🧰 Tools
🪛 LanguageTool
[style] ~26-~26: The word “ya” is non-standard. Did you mean “you”, “your” or “yes”?
Context: ...fiat", "acabo de recibir el pago", "ya me llegó el dinero", "recebi o paga...
(YA)
[style] ~47-~47: Consider an alternative for the overused word “exactly”.
Context: ...ot yet confirmed receipt.** That is exactly what an honest buyer and a dishones...
(EXACTLY_PRECISELY)
…h deterministic Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Summary
The 2026-04-28 Alice/Bob test trade exposed two model-quality defects in production with
gpt-5.4-mini(screenshot shared by maintainer):"hola no entiendo"(Spanish), Serbero re-sent the round-1 question in English instead of switching to Spanish. The seller had to explicitly ask"no entiendo ingles, habla español?"to get Spanish."acabo de recibir el pago fiat"— a substantive answer — and Serbero re-asked the exact same Spanish question instead of requesting proof (timestamp, transfer reference, redacted screenshot). The seller replied"te digo que recibi el pago fiat", then Serbero finally escalated withLowConfidenceafter wasting two rounds.The model is what it is. The prompt has to compensate.
Changes
prompts/phase3-classification.md— new hard rule under "Clarifying Questions (per-party)": the clarification MUST advance the conversation. Forbids byte-identical or substantively-equivalent repeats of any previous Serbero outbound to the same party. Includes concrete adaptation examples ("seller said they received the fiat → next ask MUST request timestamp/method/redacted screenshot, not re-ask the same question") and the meta-message carve-out ("no entiendo" / "speak Spanish" is NOT a substantive answer — re-ask the original question, but in the party's detected language).prompts/phase3-system.md— tightened the "Language Matching" section. Promoted "switch on the first reply that is clearly in another language" from soft to mandatory and immediate, with a worked example pinned to the 2026-04-28 incident. Added the cross-check that*_clarificationtext MUST match its corresponding*_languagecode, soseller_language: "es"+ Englishseller_clarificationis now flagged as an internal contradiction.No code change.
Test plan
cargo test— 296 lib + integration tests pass.cargo clippy --all-targets -- -D warnings— clean.cargo fmt --all -- --check— clean.Summary by CodeRabbit
Bug Fixes
Documentation
Tests