Background
The multi-hop research pipeline rework (see docs/plans/2026-05-26-superclaude-methodology-rework.md and the design doc landing alongside it) introduces a contradiction_rate quality signal that triggers replanning when sources disagree above a threshold.
For v1, contradiction detection is folded into the classify-agent's prompt — it compares key_claims across summaries and flags semantic conflicts. That's cheap but soft: depends on LLM judgment, sensitive to prompt drift, hard to test offline.
Goal
Design a more robust mechanism to replace the v1 in-classify approach. Output is a design memo in docs/plans/, not implementation.
Candidates to weigh
- Dedicated contradiction-scanner Haiku agent. Runs after summarize. Does pairwise comparison of
key_claims across all summaries with structured JSON output. Pros: isolated, testable, swappable. Cons: extra agent call per run.
- Embedding-based semantic conflict detection. Compute embeddings of each claim, find pairs with high topical similarity but opposing stance markers. Pros: cheap, deterministic, offline-friendly. Cons: requires an embedding model in the tier stack; stance detection is its own hard problem.
- Citation-graph analysis. Track which sources back which claims; flag when authoritative sources (T1/T2) cite contradictory positions. Pros: ties contradiction to source authority directly. Cons: requires structured claim-to-source provenance we don't track today.
- Temporal authority weighting. When sources from different eras contradict, weight newer sources more for civic/legislative topics. Pros: matches the domain. Cons: not a detection mechanism on its own — a resolution mechanism. Pair with one of the above.
Acceptance criteria
- Design memo with recommended approach, why, and rough implementation cost
- Trade-offs noted vs. the v1 in-classify approach
- Specific contract: how the signal is exposed to the quality-gate (numeric rate? per-claim flags? both?)
- Test plan: how to verify contradiction detection offline
Out of scope
- Implementation
- Choice of embedding model (defer to the recommendation)
Related
- Design doc:
docs/plans/2026-05-26-superclaude-methodology-rework.md (and forthcoming design doc with full rework spec)
Background
The multi-hop research pipeline rework (see
docs/plans/2026-05-26-superclaude-methodology-rework.mdand the design doc landing alongside it) introduces acontradiction_ratequality signal that triggers replanning when sources disagree above a threshold.For v1, contradiction detection is folded into the classify-agent's prompt — it compares
key_claimsacross summaries and flags semantic conflicts. That's cheap but soft: depends on LLM judgment, sensitive to prompt drift, hard to test offline.Goal
Design a more robust mechanism to replace the v1 in-classify approach. Output is a design memo in
docs/plans/, not implementation.Candidates to weigh
key_claimsacross all summaries with structured JSON output. Pros: isolated, testable, swappable. Cons: extra agent call per run.Acceptance criteria
Out of scope
Related
docs/plans/2026-05-26-superclaude-methodology-rework.md(and forthcoming design doc with full rework spec)