Skip to content

US-011: Validate Deterministic Correction Refinement Input And Output #49

Description

@Mateus-Mannes

Summary

Validate deterministic correction refinement input and output. The correction flow should protect the highlight/range contract before and after deterministic refinement, skip refinement for unstable inputs, reject abusive request sizes before comparison work starts, remove stale AI-specific public API fields, and keep the orchestration evaluator as the regression suite.

Scope

  • Add request-size protection before comparison work starts.
  • Add deterministic refinement validation around CorrectionOrchestrationService.
  • Add configurable thresholds under TextComparison:RefinementValidation.
  • Enforce structural range safety before and after deterministic refinement.
  • Remove AI-specific public contract fields.
  • Remove or neutralize unused AI-only internals.
  • Keep deterministic trace data public for debugging and review.

Request Validation

  • Frontend blocks submit when userText.length > 12000 and shows a clear validation message.
  • compare-texts rejects null, empty, whitespace-only, and oversized UserText with 400 BadRequest.
  • Rejected requests do not run static comparison or deterministic refinement.
  • Rejection logs include only proposition id, user text length, and reason.

Refinement Validation

  • Validate static comparison output before calling DeterministicTextComparisonRefiner.
  • Validate final deterministic comparisons before returning TextComparisonResult.
  • Use typed reason codes such as:
    • valid
    • skip_empty_comparisons
    • skip_too_many_comparisons
    • skip_unstable_diff
    • invalid_static_ranges
    • invalid_final_ranges
  • Never log raw originalText, userText, or comparison snippets.

Configurable Thresholds

Add under TextComparison:RefinementValidation:

  • MaxUserTextLength = 12000
  • MaxUserToOriginalLengthRatio = 3.0
  • MaxComparisonCount = 200
  • MaxTotalComparisonCharacters = 6000
  • MaxOriginalCoverageRatio = 0.75
  • MinStaticAccuracyPercentage = 0.15

If an unstable threshold is exceeded, skip deterministic refinement and return validated static output with correctionMode = static.

Structural Range Rules

  • Indexes are in bounds.
  • finalIndex >= initialIndex.
  • Snippets match authoritative full-text slices.
  • Comparisons are sorted by original range and monotonic by user range.
  • Original/user ranges do not overlap.
  • Returned ranges contain non-empty text on both sides.
  • Invalid static output returns one safe full-range static comparison instead of unsafe ranges.
  • Invalid deterministic output logs the failure reason and returns validated static output.

Public Contract Cleanup

Remove:

  • TextComparison.IsAiRefined
  • TextComparisonResult.AiAttempted
  • CorrectionTraceEntry.Ai
  • CorrectionModes.AiRefined
  • CorrectionModes.Fallback

Keep:

  • CorrectionTrace
  • CorrectionStageTrace
  • SourceComparisonIndex
  • IsDeterministicallyRefined

Update OpenAPI, generated webapp models, public API contract tests, and progress tracking/restoration code.

Acceptance Criteria

  • Free users receive validated static comparisons only.
  • Pro users receive deterministic normalized output only when deterministic refinement is valid and changed the result.
  • Unstable diffs return static output and do not run deterministic refinement.
  • Invalid deterministic output falls back to validated static output.
  • Endpoint and frontend enforce the shared 12000-character text limit.
  • Public API and webapp models no longer expose AI fields.
  • Orchestration evaluator fixtures validate deterministic final comparisons and deterministic traces.
  • Current orchestration evaluator cases pass.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions