Skip to content

fix(prompts): forbid repeated clarifications, harden language switch#50

Merged
grunch merged 5 commits intomainfrom
fix/clarification-repetition-and-language
Apr 29, 2026
Merged

fix(prompts): forbid repeated clarifications, harden language switch#50
grunch merged 5 commits intomainfrom
fix/clarification-repetition-and-language

Conversation

@grunch
Copy link
Copy Markdown
Member

@grunch grunch commented Apr 28, 2026

Summary

The 2026-04-28 Alice/Bob test trade exposed two model-quality defects in production with gpt-5.4-mini (screenshot shared by maintainer):

  • Language-switch failure: seller wrote "hola no entiendo" (Spanish), Serbero re-sent the round-1 question in English instead of switching to Spanish. The seller had to explicitly ask "no entiendo ingles, habla español?" to get Spanish.
  • Repeated clarification: seller answered "acabo de recibir el pago fiat" — a substantive answer — and Serbero re-asked the exact same Spanish question instead of requesting proof (timestamp, transfer reference, redacted screenshot). The seller replied "te digo que recibi el pago fiat", then Serbero finally escalated with LowConfidence after wasting two rounds.

The model is what it is. The prompt has to compensate.

Changes

  • prompts/phase3-classification.md — new hard rule under "Clarifying Questions (per-party)": the clarification MUST advance the conversation. Forbids byte-identical or substantively-equivalent repeats of any previous Serbero outbound to the same party. Includes concrete adaptation examples ("seller said they received the fiat → next ask MUST request timestamp/method/redacted screenshot, not re-ask the same question") and the meta-message carve-out ("no entiendo" / "speak Spanish" is NOT a substantive answer — re-ask the original question, but in the party's detected language).

  • prompts/phase3-system.md — tightened the "Language Matching" section. Promoted "switch on the first reply that is clearly in another language" from soft to mandatory and immediate, with a worked example pinned to the 2026-04-28 incident. Added the cross-check that *_clarification text MUST match its corresponding *_language code, so seller_language: "es" + English seller_clarification is now flagged as an internal contradiction.

No code change.

Test plan

  • cargo test — 296 lib + integration tests pass.
  • cargo clippy --all-targets -- -D warnings — clean.
  • cargo fmt --all -- --check — clean.
  • Re-run an Alice/Bob trade where the seller writes Spanish on round 1:
    • Round 2 clarification should arrive in Spanish, not English.
    • After the seller answers a yes/no question substantively, the next clarification should ask for evidence (timestamp, method, redacted screenshot), not re-ask the same question.

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced self-resolution validation to require seller payment receipt confirmation before proceeding, preventing premature resolution based on incomplete evidence.
  • Documentation

    • Updated policy and specifications to clarify cooperative self-resolution requirements and language matching enforcement for clarifications.
  • Tests

    • Added comprehensive test coverage for self-resolution flow with seller confirmation validation.

Observed 2026-04-28 in production with `gpt-5.4-mini` (Alice/Bob test
trade, dispute 22df3f9d-…), screenshot of seller's chat:

  14:21  Serbero (en): "Hello, I'm Serbero, …  did you receive the
                      fiat payment, …"
  14:22  Seller       "hola no entiendo"
  14:22  Serbero (en): same English question, verbatim
  14:22  Seller       "no entiendo ingles, habla español?"
  14:23  Serbero (es): "Me gustaría entender tu versión. … ¿recibiste
                      el pago fiat? …"
  14:23  Seller       "acabo de recibir el pago fiat"
  14:23  Serbero (es): same Spanish question, verbatim
  14:24  Seller       "te digo que recibi el pago fiat"
  →      Serbero      Escalate(LowConfidence)

Two model failures stack:

1. Round 2 was emitted in English even though the seller's only reply
   was Spanish. The "switch on first reply in another language" rule
   in `phase3-system.md`'s "Language Matching" section was too soft —
   the model treated the language switch as optional.

2. Round 4 repeated the round-3 Spanish question verbatim despite the
   seller giving a substantive answer ("acabo de recibir el pago
   fiat"). The classifier prompt had no rule against repeating a
   prior clarification, and no guidance to ask for evidence
   (timestamp, transfer reference, redacted screenshot) once the
   yes/no question is answered.

The model is what it is — `gpt-5.4-mini` is a weak router target, and
the prompt has to compensate. Two prompt fixes:

- `phase3-classification.md` "Clarifying Questions (per-party)" hard
  rules: added an explicit "advance the conversation" rule that
  forbids byte-identical or substantively-equivalent repeats of a
  previous Serbero outbound to the same party. Includes concrete
  examples ("seller said they received the fiat → ask for timestamp,
  method, redacted bank statement") and the meta-message carve-out
  ("no entiendo" / "speak Spanish" is NOT a substantive answer; re-
  ask the original question in the party's detected language).

- `phase3-system.md` "Language Matching" section: tightened "switch
  on the first reply that is clearly in another language" to
  "mandatory and immediate", with a worked example pinned to the
  2026-04-28 production incident. Also added the cross-check that
  `*_clarification` text MUST match its corresponding `*_language`
  code (no `seller_language: "es"` + English `seller_clarification`).

No code change. Lib + integration tests still pass; the keyword-audit
test on the cooperative-self-resolution bundle is unaffected.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

Warning

Rate limit exceeded

@grunch has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 23 minutes and 53 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 783a5918-d774-40e5-9274-d3c1d9bf1938

📥 Commits

Reviewing files that changed from the base of the PR and between 7e0c694 and 5579fbf.

📒 Files selected for processing (1)
  • prompts/phase3-system.md

Walkthrough

The PR introduces a new seller_confirmed_fiat_receipt field to gate cooperative self-resolution invitations, requiring explicit seller-side fiat receipt confirmation before proceeding. Changes span policy documents, data model, OpenAI adapter, follow-up flow logic, and test coverage.

Changes

Cohort / File(s) Summary
Policy & Specification Updates
prompts/phase3-classification.md, prompts/phase3-system.md, specs/005-cooperative-self-resolution/spec.md
Refined classification policy to prevent self-resolution escalation on buyer-only claims; enforced language synchronization for party-specific clarifications; added functional requirements that seller-side fiat receipt confirmation (not buyer claims) must trigger invitations.
Model Extension
src/models/reasoning.rs
Added optional seller_confirmed_fiat_receipt: Option<bool> field to ClassificationResponse to capture classifier-emitted seller receipt confirmation, denial, or inconclusive state.
OpenAI Adapter Integration
src/reasoning/openai.rs
Extended structured output schema with seller_confirmed_fiat_receipt nullable boolean field; updated classification prompt to instruct model to emit based on seller-side evidence only (not buyer claims); added parser and unit test coverage.
Follow-up Flow Gate
src/mediation/follow_up.rs
Implemented runtime check: when SuggestSelfResolutionWithSummary is returned, suppress invitation dispatch if seller_confirmed_fiat_receipt is not Some(true); advances evaluator marker and logs suppression, allowing reconsideration in later rounds.
Test Fixture Updates
src/mediation/policy.rs, tests/common/mod.rs, tests/phase3_authority_boundary.rs, tests/phase3_escalation_triggers.rs, tests/phase3_followup_summary.rs, tests/phase3_take_reasoning_coupling.rs
Updated test helpers and mocks to initialize seller_confirmed_fiat_receipt: None in ClassificationResponse structs.
Integration Test
tests/phase3_self_resolution_role_guard.rs
New end-to-end test module validating the gate: negative path (buyer-only claim + seller_confirmed_fiat_receipt: false) asserts zero invitations and continued awaiting_response state; positive path (seller confirmation: true) asserts invitation dispatch and session transition to summary_delivered.

Sequence Diagram

sequenceDiagram
    participant Classifier as Classifier<br/>(OpenAI)
    participant FollowUpFlow as Follow-up Flow<br/>(follow_up.rs)
    participant DB as Database/State
    
    Classifier->>Classifier: Evaluate session<br/>→ SuggestSelfResolutionWithSummary
    Classifier->>FollowUpFlow: Return ClassificationResponse<br/>with seller_confirmed_fiat_receipt
    
    alt seller_confirmed_fiat_receipt == Some(true)
        FollowUpFlow->>DB: Update session state<br/>awaiting_response → classified
        FollowUpFlow->>DB: Dispatch cooperative<br/>self-resolution invitation
        FollowUpFlow->>DB: Deliver summary
        DB-->>FollowUpFlow: Session → summary_delivered
    else seller_confirmed_fiat_receipt != Some(true)
        FollowUpFlow->>DB: Advance evaluator marker<br/>to total_fresh_inbounds
        FollowUpFlow->>DB: Log suppression<br/>(confidence, receipt flag)
        FollowUpFlow->>DB: Session remains<br/>awaiting_response
        DB-->>FollowUpFlow: Eligible for<br/>reconsidering in next round
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • mostronatorcoder

🐰 A seller's nod now guards the gate,
No buyer's word can seal a fate,
Confirmed receipt—the golden light,
Self-resolution springs just right!
🎯 The cooperative path runs true,
When both sides shine through.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main changes: forbidding repeated clarifications and hardening the language switching logic, which aligns with the core incident fixes detailed in the PR objectives.
Docstring Coverage ✅ Passed Docstring coverage is 81.82% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/clarification-repetition-and-language

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 23 minutes and 53 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8560b9b3e3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +131 to +133
`## Transcript` section for any `serbero` outbound to the same
party in earlier rounds. Your text MUST NOT be byte-identical or
substantively equivalent to a previous Serbero clarification to
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove impossible same-party transcript scan rule

This rule assumes the model can identify which prior serbero outbound belonged to the buyer vs seller, but the runtime transcript only serializes entries as [ts] serbero: <text> without a recipient field (src/reasoning/openai.rs prompt builder). As a result, the model cannot reliably enforce "same party" non-repetition and may compare against the wrong prior message, causing false "repeat" detections and degraded clarification flow.

Useful? React with 👍 / 👎.

Comment thread prompts/phase3-system.md
Comment on lines +81 to +83
`*_clarification` text MUST match its corresponding `*_language`
code — emitting `seller_language: "es"` while writing
`seller_clarification` in English is an internal contradiction
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Exempt null language rounds from clarification-language match

The new requirement that *_clarification MUST match *_language creates a contract conflict on rounds where no party text exists yet: the classification policy requires buyer_language/seller_language to be null when there is no content, while round-0 still requires English clarifications. This contradiction can push the model to emit "en" on round 0 (violating the null rule) or keep null and violate this new rule, reducing output reliability.

Useful? React with 👍 / 👎.

grunch added 3 commits April 28, 2026 14:36
Maintainer's correction on PR #50: when the seller says "acabo de
recibir el pago fiat" (or any equivalent across supported languages),
that is NOT a yes/no answer that wants follow-up evidence — it is a
satisfaction signal. The trade is going smoothly on the seller's
side; in Mostro the seller simply releases the funds at that point
and the dispute resolves externally without solver action.

The previous commit on this branch told the model to ask for
timestamp / method / redacted screenshot, which is exactly the
opposite of the cooperative path. Feature 005's whole purpose was
to detect this case and dispatch the cooperative-self-resolution
invitation ("looks like you're close to coordinating between
yourselves…") to both parties, with the solver notified in
parallel. The model should classify
`coordination_failure_resolvable` with `confidence ≥ 0.75` and
`suggested_action = summarize`; the policy layer rewrites that to
`SuggestSelfResolutionWithSummary`.

Two prompt changes:

- `phase3-classification.md` Labels section: expanded the
  `coordination_failure_resolvable` description with the concrete
  cues that should classify here (seller confirms receipt of fiat
  without a "but…", both parties consistently describing a
  coordination/timing issue, buyer confirms payment sent and
  seller has not contradicted), with a pointer that "give me
  your bank statement" follow-ups on an already-resolving case
  are a defect.

- `phase3-classification.md` Hard rules: rewrote the "advance the
  conversation" rule into three branches keyed by the kind of
  reply (satisfaction signal → STOP asking, switch to
  coordination_failure_resolvable + summarize; partial answer or
  new contradiction → ask next concrete piece, but visibly
  different; meta-message → re-ask in the detected language).
  The previous "ask for evidence" examples are gone.

The language-switch tightening from the prior commit on this
branch is unaffected.
…olution invite

A buyer-only "I sent the fiat" claim is self-serving and was enough to trigger
the cooperative self-resolution invitation. The classifier now emits a
nullable seller_confirmed_fiat_receipt signal, and the follow-up path
suppresses the invitation (falling back to the solver-only summary) unless
that signal is true. Prompt, spec (FR-015), and tests updated accordingly.
…ound

The previous suppression branch walked the session to summary_delivered
when seller_confirmed_fiat_receipt was not Some(true). That contradicted
the spec edge case (FR-015) the same PR adds: "if the seller later
confirms the fiat arrived, the invitation becomes eligible on that later
round." The state gate at the top of advance_session_round only
re-admits a summary_delivered session when a self_resolution_offered
audit row exists, so the suppression path was permanently inert.

Drop the deliver_summary fallback. Leave the session in
awaiting_response, advance the evaluator marker, and let the next
seller-confirmation round re-classify cleanly. Test updated to assert
state == awaiting_response and the marker advanced. This also moots the
deliver_summary error-path revert gap, since there is no longer a
deliver_summary call in this branch.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
src/reasoning/openai.rs (1)

959-971: Add a parse test for seller_confirmed_fiat_receipt: false.

You already pin None and Some(true). Add one case for explicit false to fully lock the tri-state contract.

Proposed test
+    #[test]
+    fn parse_classification_parses_seller_receipt_denial_signal() {
+        let raw = r#"{
+            "classification":"coordination_failure_resolvable",
+            "confidence":0.72,
+            "suggested_action":"ask_clarification",
+            "buyer_clarification":"Please share transfer proof.",
+            "seller_clarification":"Please confirm if funds arrived.",
+            "rationale":"seller denied receipt",
+            "flags":[],
+            "seller_confirmed_fiat_receipt":false
+        }"#;
+        let parsed = parse_classification(raw).unwrap();
+        assert_eq!(parsed.seller_confirmed_fiat_receipt, Some(false));
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/reasoning/openai.rs` around lines 959 - 971, Add another unit test
alongside parse_classification_parses_seller_receipt_signal that verifies
parse_classification correctly parses an explicit false value for
seller_confirmed_fiat_receipt; create a new test (e.g.,
parse_classification_parses_seller_receipt_signal_false) that feeds JSON with
"seller_confirmed_fiat_receipt":false into parse_classification and
assert_eq!(parsed.seller_confirmed_fiat_receipt, Some(false)) to complete the
tri-state coverage (None, Some(true), Some(false)).
tests/phase3_self_resolution_role_guard.rs (2)

280-320: Cover the None guard state too.

The runtime suppresses invitations for both Some(false) and None, but this file only proves the explicit-denial path. Adding a None case would catch prompt/parser regressions where the field is omitted instead of set.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/phase3_self_resolution_role_guard.rs` around lines 280 - 320, Add
coverage for the None guard state by expanding the test that verifies
invitations are suppressed: update
buyer_only_fiat_claim_does_not_send_self_resolution_invitation (or add a new
async test) to run the same assertions for both run_case inputs Some(false) and
None (i.e., call run_case with the third-party guard parameter set to None as
well), reusing the same session flow, queries
(mediation_events/mediation_messages/mediation_sessions) and assertions
(offered_count == 0, outbound_count == 2, state == "awaiting_response", marker >
0) so the test protects against regressions where the guard field is omitted;
ensure you create a distinct session_id for the None case to avoid DB
collisions.

339-359: Avoid keying this test to one Spanish template phrase.

LIKE '%coordinar el siguiente paso%' makes the test fail on harmless copy edits in prompts/phase3-message-templates.md, even if the role guard still works. For this test, asserting on the extra outbound rows / self_resolution_offered event would be more stable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/phase3_self_resolution_role_guard.rs` around lines 339 - 359, The test
is brittle because it asserts on a specific Spanish template via "content LIKE
'%coordinar el siguiente paso%'" — remove that LIKE clause from the SQL and
instead count outbound mediation_messages more generally (e.g. SELECT COUNT(*)
FROM mediation_messages WHERE session_id = ?1 AND direction = 'outbound') or
otherwise assert only on the self_resolution_offered event count; update the
query used in the block that assigns (offered_count, invite_count, state) and
adjust the corresponding assert on invite_count to rely on the generic outbound
count (still expecting 2) so template text edits won't break the test; refer to
the variables offered_count, invite_count, state and the SQL query in the test
to locate and change the clause.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@prompts/phase3-system.md`:
- Around line 69-77: Update the sentence "The switch to a non-default language
is mandatory and immediate on the very first reply where the language is
unambiguous." to define “unambiguous” concretely: treat it as (1) explicit
metadata: buyer_language or seller_language set to a non-default ISO code, or
(2) language detection: a non-default ISO language detected with confidence >=
0.90; declare metadata takes precedence over detection; and require the model to
switch immediately on the very next reply after either trigger is present.
Ensure the prompt text enforces this precedence and confidence threshold so the
mandatory, immediate switch behavior is deterministic.
- Around line 78-84: The runtime currently renders clarification text via
render_for(language_code) but does not ensure the rendered text language matches
the persisted buyer_language/seller_language (allowing silent fallback to
bundle.fallback_language); modify the runtime path that calls render_for to
either (a) assert that render_for succeeded without using the fallback
(detectable via render_for return value or an added boolean) before persisting
buyer_language/seller_language, or (b) replace the persisted language code with
the resolved language returned by render_for, or (c) run a bounded post-render
language-check (coarse heuristic or LLM verifier) and fail/panic/log if
mismatch; locate changes around render_for, the code that writes the audit event
(where buyer_language/seller_language are stored), and the
bundle.fallback_language usage and implement one of these fixes so stored
language codes always match rendered clarification text.
- Around line 85-92: The prompt text claims languages "switch on the first
reply" and "stay in that language going forward", but runtime resolves languages
each round using the classifier outputs (`buyer_language`, `seller_language`)
and the resolver function `resolve_effective_language`, which falls back to
English when detection is None/unknown; update the prompt text to match runtime
semantics by replacing the "one-time switch" wording with a per-round
re-evaluation description and note that `buyer_clarification` and
`seller_clarification` are independently resolved each round via
`resolve_effective_language` (round 0 defaults to English), or alternatively
adjust `resolve_effective_language` to implement sticky behavior if you intend
one-time switching—pick one and make prompt and code consistent.

---

Nitpick comments:
In `@src/reasoning/openai.rs`:
- Around line 959-971: Add another unit test alongside
parse_classification_parses_seller_receipt_signal that verifies
parse_classification correctly parses an explicit false value for
seller_confirmed_fiat_receipt; create a new test (e.g.,
parse_classification_parses_seller_receipt_signal_false) that feeds JSON with
"seller_confirmed_fiat_receipt":false into parse_classification and
assert_eq!(parsed.seller_confirmed_fiat_receipt, Some(false)) to complete the
tri-state coverage (None, Some(true), Some(false)).

In `@tests/phase3_self_resolution_role_guard.rs`:
- Around line 280-320: Add coverage for the None guard state by expanding the
test that verifies invitations are suppressed: update
buyer_only_fiat_claim_does_not_send_self_resolution_invitation (or add a new
async test) to run the same assertions for both run_case inputs Some(false) and
None (i.e., call run_case with the third-party guard parameter set to None as
well), reusing the same session flow, queries
(mediation_events/mediation_messages/mediation_sessions) and assertions
(offered_count == 0, outbound_count == 2, state == "awaiting_response", marker >
0) so the test protects against regressions where the guard field is omitted;
ensure you create a distinct session_id for the None case to avoid DB
collisions.
- Around line 339-359: The test is brittle because it asserts on a specific
Spanish template via "content LIKE '%coordinar el siguiente paso%'" — remove
that LIKE clause from the SQL and instead count outbound mediation_messages more
generally (e.g. SELECT COUNT(*) FROM mediation_messages WHERE session_id = ?1
AND direction = 'outbound') or otherwise assert only on the
self_resolution_offered event count; update the query used in the block that
assigns (offered_count, invite_count, state) and adjust the corresponding assert
on invite_count to rely on the generic outbound count (still expecting 2) so
template text edits won't break the test; refer to the variables offered_count,
invite_count, state and the SQL query in the test to locate and change the
clause.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5bddb35d-0324-41d8-8b19-5624e5be0ef3

📥 Commits

Reviewing files that changed from the base of the PR and between fec4017 and 7e0c694.

📒 Files selected for processing (13)
  • prompts/phase3-classification.md
  • prompts/phase3-system.md
  • specs/005-cooperative-self-resolution/spec.md
  • src/mediation/follow_up.rs
  • src/mediation/policy.rs
  • src/models/reasoning.rs
  • src/reasoning/openai.rs
  • tests/common/mod.rs
  • tests/phase3_authority_boundary.rs
  • tests/phase3_escalation_triggers.rs
  • tests/phase3_followup_summary.rs
  • tests/phase3_self_resolution_role_guard.rs
  • tests/phase3_take_reasoning_coupling.rs

Comment on lines +20 to +60
Concrete cues that should classify here with `confidence ≥ 0.75`
and `suggested_action = summarize` (which the policy layer
routes to the Feature 005 cooperative-self-resolution branch):

- **Seller confirms receipt of fiat** without a qualifying
"but…" or fraud red flag. Examples: "I received the fiat",
"acabo de recibir el pago", "ya me llegó el dinero", "recebi
o pagamento". This is a satisfaction signal — the dispute can
resolve externally if the seller now releases the funds in
their Mostro client. Serbero never names or instructs that
action; the cooperative invitation simply tells both parties
they look close to coordinating between themselves and
monitors for a status change.
- **Both parties consistently describe a coordination /
timing issue** (delayed bank wire, timezone gap, payment-
method confusion) without contradicting each other on the
underlying facts.
- **Buyer confirms payment sent AND the seller corroborates
receipt** in the transcript. Once the seller says the fiat
arrived, the case becomes a legitimate cooperative-resolution
candidate; until then, the buyer's claim remains self-serving
and does NOT by itself justify the self-resolution invitation.

Counter-example that MUST NOT classify into the cooperative
self-resolution branch on its own:

- **Buyer says they sent the fiat, but the seller has not yet
confirmed receipt.** That is exactly what an honest buyer and a
dishonest buyer would both say. Do NOT treat this as a
satisfaction signal and do NOT jump to `suggested_action =
summarize` just to invite the parties to "coordinate the next
step". Instead, keep gathering evidence: ask the buyer for
proof of payment / transfer details and ask the seller whether
the fiat has arrived.

Do NOT keep gathering evidence once the cooperative cue is
unambiguous; "give me your bank statement" / "what timestamp"
follow-ups on a seller who already said "I got the fiat" are a
defect (observed 2026-04-28 production transcript, where the
model re-asked the same Spanish question instead of routing to
self-resolution).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Define seller_confirmed_fiat_receipt in the structured output contract.

These new cooperative-routing rules rely on that field, but this prompt never actually tells the model to emit it. A model can follow the narrative guidance here and still omit the flag; that deserializes as None, and the runtime guard in src/mediation/follow_up.rs Line 456 will suppress the self-resolution invitation even after a clear seller confirmation.

Suggested prompt addition
+## Seller Fiat-Receipt Confirmation (Feature 005)
+
+The classifier MUST also emit:
+
+- `seller_confirmed_fiat_receipt` (boolean | null): `true` only when the
+  seller explicitly confirms the fiat was received; `false` when the seller
+  explicitly says it has not arrived; `null` when the transcript does not
+  support a confident seller-side determination yet.

Also applies to: 171-225

🧰 Tools
🪛 LanguageTool

[style] ~26-~26: The word “ya” is non-standard. Did you mean “you”, “your” or “yes”?
Context: ...fiat", "acabo de recibir el pago", "ya me llegó el dinero", "recebi o paga...

(YA)


[style] ~47-~47: Consider an alternative for the overused word “exactly”.
Context: ...ot yet confirmed receipt.** That is exactly what an honest buyer and a dishones...

(EXACTLY_PRECISELY)

Comment thread prompts/phase3-system.md Outdated
Comment thread prompts/phase3-system.md
Comment thread prompts/phase3-system.md
…h deterministic

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@grunch grunch merged commit b38ced2 into main Apr 29, 2026
2 checks passed
@grunch grunch deleted the fix/clarification-repetition-and-language branch April 29, 2026 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant