Audit resolution: cherry-pick clean base, add falsifies_if docs, complete guardian conditions#114
Conversation
…esolution-20260411] Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
…esolution-20260411] Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
|
Devin ai said "Here's the current state and what to do next. PR Status Assessment
The ProblemNeither PR has everything:
Codex branched directly from What Codex Did WellCodex's falsifies_if insertions are correct — proper docstring placement, domain-specific conditions derived from function logic, and it even fixed pre-existing What's Still Missing on PR #114
Next Codex Coding TaskOnce PR #114 is complete with all phases, close PR #113 — everything good from it will be in #114 via cherry-pick. Regarding the 17 older stale PRs (#3, #20, #26, #52-56, #64, #85, #90-93, #96, #99, #102): those are a separate triage task. Most are WIP drafts from February that are likely superseded by the 640+ commits since then. You may want to bulk-close the ones that are no longer relevant." commence coding tasks |
…dit-20260411-185520] Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
…dit-20260411-185520] Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
…dit-20260411-185520] Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
…dit-20260411-185520] Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
…de-audit-20260411-185520] Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
…-audit-20260411-185520] Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
…audit-20260411-185520] Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
…aude-audit-20260411-191230] Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
…audit-20260411-185520] Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
…esolution-20260411] Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
…esolution-20260411] Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
…esolution-20260411] Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
…esolution-20260411] Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
…esolution-20260411] Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
|
Devin ai: "Codex completed PR #114 with 17 commits across all phases. Here's the assessment: PR #114 Final State
What Codex Did Right (vs Claude)
What's Still MissingBased on the session log, one item from Phase 3 was not completed:
Also:
Remaining Coding Task for CodexPost this on PR #114:
After that, PR #114 is complete and ready to merge. Then close PR #113 — all its good work is already in #114 via cherry-pick." |
…resolution-20260411] Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
| class_i_effectiveness = system.check_recall_effectiveness(class_i_recall) | ||
| class_i_target = Fraction(95, 100) | ||
| class_i_correct = class_i_effectiveness["target_rate"] == float(class_i_target) | ||
| class_i_correct = class_i_effectiveness["target_rate"] == class_i_target |
There was a problem hiding this comment.
🔴 Float-to-Fraction comparison always fails in recall effectiveness check
The PR changed lines 301 and 315 to compare class_i_effectiveness["target_rate"] directly against Fraction(95, 100) and Fraction(80, 100). However, the check_recall_effectiveness method in src/domains/d_food_safety/implementation.py:320 returns "target_rate": float(target_recovery_rate), i.e. a Python float. In Python, float(Fraction(95, 100)) == Fraction(95, 100) evaluates to False because 0.95 is not exactly representable in IEEE 754 binary floating point. I verified this empirically: both float(Fraction(95,100)) == Fraction(95,100) and float(Fraction(80,100)) == Fraction(80,100) return False. This means class_i_correct and class_iii_correct are always False, so check_recall_effectiveness_targets() will always report a failure — the invariant is now broken.
Previous code that worked (float == float)
Old line 301: class_i_correct = class_i_effectiveness["target_rate"] == float(class_i_target)
This compared float(0.95) == float(0.95) → True
New line 301: class_i_correct = class_i_effectiveness["target_rate"] == class_i_target
This compares float(0.95) == Fraction(95, 100) → False
Prompt for agents
The root cause is that check_recall_effectiveness in src/domains/d_food_safety/implementation.py:320 returns float(target_recovery_rate) as the target_rate value. The invariant code on lines 301 and 315 of src/domains/d_food_safety/invariants.py now compares this float value against a Fraction, which always fails.
Two approaches to fix:
1. Fix the implementation (implementation.py line 320 and 319): change target_rate and recovery_rate to return Fraction instead of float, aligning with the repo-wide float-to-Fraction migration. This is the preferred approach since the PR's intent is to eliminate float usage.
2. Or wrap the dict lookups in Fraction() on the invariant side: class_i_correct = Fraction(class_i_effectiveness['target_rate']).limit_denominator(1000) == class_i_target, though this is less clean.
The same issue applies to both line 301 (class_i_correct) and line 315 (class_iii_correct).
Was this helpful? React with 👍 or 👎 to provide feedback.
|
Devin ai: "## PR #114 Verification Report 18 commits, 142 files, +1,621/-859 lines. All work complete. Phase-by-Phase Verification
Quality Check: Falsifies_if Examples from Batch 8Codex read each function body and wrote specific conditions:
These are domain-specific and derived from actual function logic — not generic placeholders. What You Need To Do
One Minor NoteThe consent log entry is missing |
PR #113 mixed good fraction commits with broken falsifies_if docstrings and missing guardian conditions. This branch rebuilds on main by cherry-picking the nine good commits, redoing falsifies_if documentation correctly, and completing guardian monitoring.
Cherry-picks (clean base)
Falsifies_if documentation remediation (batches 1–3)
Guardian monitor completion
Example (guardian monitor):