audit(stage-a): hashed investigative taxonomy + gap-analysis JSONL by devin-ai-integration[bot] · Pull Request #141 · aidoruao/orthogonal-engineering

devin-ai-integration · 2026-04-20T02:11:51Z

Summary

Stage A of the "finish everything" multi-PR rollout: produce a deterministic, hash-anchored investigative taxonomy of the repository's outstanding engineering debt, mapped into six namespaces.

New: tools/generate_hashed_taxonomy.py — walks the repo and emits a JSONL audit keyed by six namespaces (aerospace, floor, yeshua, math_popperian, secular, projection) and nine issue types (TODO, FIXME, HACK, STUB_PASS, STUB_NOTIMPL, FLOAT_CALL, FLOAT_ANNOT, CHECK_MISSING_PROOFOBJECT, CHECK_MISSING_FALSIFIES_IF_PAIR). Every entry carries a sha256_evidence + entry_sha256, and the top-level summary carries audit_sha256 committed over (summary, ordered_entry_hashes) — content-only, so the commitment is stable across output paths.

Artifacts committed under audits/:

audits/hashed_taxonomy_20260420.jsonl — 3230 entries, one per finding
audits/gap_analysis_20260420.json — summary, audit_sha256 = 7c32fdcad0f1fc02019eb8a1034f7207b97bbb44ffe3b14f2452513b432ebae9

Run against current tree:

files_scanned        5027
issue_count_total    3230
classified_fraction  941/1615
by_namespace         aerospace=188  floor=127  math_popperian=1483  projection=121  secular=89  yeshua=908
by_type              TODO=275 FIXME=3 HACK=1 STUB_PASS=335 STUB_NOTIMPL=75 FLOAT_CALL=333 FLOAT_ANNOT=1032
                     CHECK_MISSING_PROOFOBJECT=413 CHECK_MISSING_FALSIFIES_IF_PAIR=763

Also fixed (incidental): STANDARDS_REGISTRY.json had a duplicated "total_standards" line with no comma separator, which made python tools/standards_check.py --list crash with JSONDecodeError. One-line fix; registry now parses cleanly and --verify runs (and shows separately tracked real failures that Stage B will address).

Consent log entry: appended per SOP-AI-HANDSHAKE-1.0 before any repo-scope change.

Review & Testing Checklist for Human

Skim audits/gap_analysis_20260420.json — confirm the namespace/type counts look like a reasonable picture of current debt, not an over- or under-count you'd dispute
Spot-check 3–5 entries in audits/hashed_taxonomy_20260420.jsonl by running grep '<path>' audits/hashed_taxonomy_20260420.jsonl | head — verify the line and snippet match what's actually in the file
Re-run python tools/generate_hashed_taxonomy.py --out /tmp/verify.jsonl --summary /tmp/verify.json and confirm audit_sha256 matches 7c32fdcad0f1fc02019eb8a1034f7207b97bbb44ffe3b14f2452513b432ebae9
Confirm python tools/standards_check.py --list no longer crashes with JSONDecodeError on the STANDARDS_REGISTRY.json total_standards bug
Test plan: python -m pytest tests/test_hashed_taxonomy.py -v (8 tests, all pass locally) — covers canonical JSON, sha256 format, line/function scanners, determinism

Notes

This PR is audit-only — it does not modify any runtime or kernel-surface code. It produces the hashed gap-analysis artifacts that Stages B/C/D/E/F/G will work against.
audits/ is added to the walker's excluded dirs, so rerunning the tool does not re-scan its own generated output.
audit_sha256 intentionally does not cover jsonl_path or timestamps, so the commitment is reproducible across different output locations and days — only content drift can falsify it.
Stage order coming next: B cleanup sweep, C MD frontmatter + CI, D stale-PR/issue close-out, E review feat(ontology): VulkanMod #755 investigation + PHANTOM-EDIT-001 failure log + ACU protocol #91/Add testing skill for ARC-AGI solver and benchmark pipeline #85/Apply existing OE frameworks to PR26 verification: Truth inelasticity, Yeshua Standard, and formal foundations #26, F new polymath domain set, G DeepSeek-queue items.

cc @aidoruao

Link to Devin session: https://app.devin.ai/sessions/36c540710d5c487ab6c5f61be5879aa3

devin-ai-integration · 2026-04-20T02:11:53Z

Original prompt from Tony

finish everything "Skip to content
aidoruao
orthogonal-engineering
Repository navigation
Code
Issues
13
(13)
Pull requests
17
(17)
Agents
Discussions
Actions
Projects
Wiki
Security and quality
Insights
Settings
Commits
Branch selector
User selector
Datepicker
Commit History
Commits on Apr 19, 2026
chore(pr40): append state witness entry [skip ci]
github-actions[bot]
github-actions[bot]
committed
2 minutes ago
Add files via upload
aidoruao
aidoruao
authored
2 minutes ago
·
Verified
chore(pr40): append state witness entry [skip ci]
github-actions[bot]
github-actions[bot]
committed
43 minutes ago
Kimi Code 9184eeb4-ffb0-472d-9a81-c9babff132e5 1a 4-19-26.txt
aidoruao
aidoruao
authored
44 minutes ago
·
Verified
Commits on Apr 18, 2026
chore(pr40): append state witness entry [skip ci]
github-actions[bot]
github-actions[bot]
committed
yesterday
gpt 5 mini copilot coding tasks for kimi cli queue 4-19-26 1a.txt
aidoruao
aidoruao
authored
yesterday
·
Verified
chore(pr40): append state witness entry [skip ci]
github-actions[bot]
github-actions[bot]
committed
yesterday
gpt 5 mini copilot gap anayslsis audit 1a 4-18-26.txt
aidoruao
aidoruao
authored
yesterday
·
Verified
chore(pr40): append state witness entry [skip ci]
github-actions[bot]
github-actions[bot]
committed
yesterday
Kimi Code 597e0d23-f404-4bdf-801f-64962ce0e722 2a-4a 4-18-26.txt
aidoruao
aidoruao
authored
yesterday
·
Verified
chore(pr40): append state witness entry [skip ci]
github-actions[bot]
github-actions[bot]
committed
yesterday
Merge pull request #140 from aidoruao/kimi/pr139-aerospace-floor-polymath-domains
aidoruao
aidoruao
authored
yesterday
·
Verified
Merge branch 'main' into kimi/pr139-aerospace-floor-polymath-domains
aidoruao
aidoruao
authored
yesterday
·
Verified
feat(ai-credit): Task 5-8 — AI co-author, CI, standards, onboarding
aidoruao
aidoruao
committed
yesterday
·
feat(forensic): Task 3 + Task 4 — forensic commit generator + verification testimony
aidoruao
aidoruao
committed
yesterday
·
feat(stat... (2455 chars truncated...)

devin-ai-integration · 2026-04-20T02:11:54Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

devin-ai-integration

Devin Review found 7 potential issues.

devin-ai-integration · 2026-04-20T02:15:18Z

    "standard": "Yeshua",
-    "last_updated": "2026-04-18",
+    "last_updated": "2026-04-20",
    "total_standards": 59


📝 Info: STANDARDS_REGISTRY.json had invalid duplicate key — correctly fixed

The old file had both "total_standards": 59 (line 8) and "total_standards": 58 (line 9) — a duplicate JSON key that is technically undefined behavior per RFC 7159 (most parsers take the last value, so it read as 58). The PR correctly removes the duplicate, leaving only "total_standards": 59 which matches the actual count of 59 "id": entries in the standards array. The fix also resolves a missing comma that made the old JSON technically malformed between the two duplicate keys.

Was this helpful? React with 👍 or 👎 to provide feedback.

Good catch — the semantic change was intentional and matches reality:

$ python -c "import json; d=json.load(open('STANDARDS_REGISTRY.json')); print(len(d['standards']))" 59

The actual standards array contains 59 entries. The pre-existing dict had both "total_standards": 59 (line 8) and "total_standards": 58 (line 9), with the stray 58 being a leftover from an earlier edit. Python's json module silently keeps the last value, so the file previously reported 58 standards while actually containing 59 — a drift bug. Removing the duplicate key reconciles the metadata with ground truth.

devin-ai-integration · 2026-04-20T02:15:22Z

+    doc = {
+        "schema": "OE-GAP-ANALYSIS-1.0",
+        "generated_at_utc": datetime.now(tz=timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ"),
+        "repo_root": str(REPO_ROOT),
+        "summary": summary,
+        "classified_fraction": f"{frac_classified.numerator}/{frac_classified.denominator}",
+        "audit_sha256": audit_sha256,
+        "jsonl_path": jsonl_label,
+        "entry_count": len(entries),
+    }
+    with out_path.open("w", encoding="utf-8") as fh:
+        fh.write(json.dumps(doc, sort_keys=True, indent=2, ensure_ascii=True) + "\n")
+    return audit_sha256


📝 Info: audit_sha256 commitment boundary is well-designed but relies on summary dict not being mutated

The _write_summary function computes audit_sha256 from {"summary": summary, "ordered_entry_hashes": [...]} at line 447, then embeds that sha into the output doc. The summary dict is passed by reference from build_entries. If anything modified the summary dict between build_entries returning and _write_summary computing the hash, the commitment would be broken. Currently main() doesn't mutate it, so this is safe — but it's a fragile invariant that could break if future code modifies the summary before writing.

Was this helpful? React with 👍 or 👎 to provide feedback.

Fixed. The top-level generated_at_utc, repo_root, and jsonl_path fields made it ambiguous which parts of the document were covered by audit_sha256. New structure makes the commitment boundary explicit:

{ "schema": "OE-GAP-ANALYSIS-1.0", "summary": {...}, "classified_fraction": "941/1615", "audit_sha256": "34ed7b25...", "entry_count": 11027, "metadata": { "generated_at_utc": "2026-04-20T02:50:12Z", "jsonl_path": "audits/hashed_taxonomy_20260420.jsonl", "not_covered_by_audit_sha256": ["generated_at_utc", "jsonl_path"] } }

repo_root dropped entirely — it was leaking absolute CI paths, not useful for consumers.

generated_at_utc + jsonl_path moved under metadata with an explicit not_covered_by_audit_sha256 list that names them.

Added test_gap_analysis_metadata_is_outside_commitment as a falsifier that asserts those fields are not at the top level and not_covered_by_audit_sha256 is present.

Two fresh runs now both produce audit_sha256 = 34ed7b25249c045c8274fe2b969986bc5a62839791581a59b86fb74bd5e5e3dd, unaffected by timestamp or output path.

…, drop dead IssueHit, fix namespace sum, segregate non-deterministic metadata Review feedback on PR #141: 1. 🔴 assert statement in tools/generate_hashed_taxonomy.py:282 violated .cursorrules / CLAUDE.md 'No assert' rule. Replaced with a single walrus-style assignment that also removes the redundant double regex search the other reviewer flagged. 2. 🔴 _RE_ASSERT regex was defined but no scanner emitted the issue type. Renamed to ASSERT_USE (the rule applies to all modules, not just check_* surface), gated on is_python to avoid false positives in prose, and wired into _scan_line_level. 8012 real assert hits surfaced across Python sources. 3. 📝 Removed the dead IssueHit dataclass + its dataclass import. 4. 📝 Added 'unclassified' to counts_by_namespace so per-namespace counts can sum to at least issue_count_total (they previously silently dropped unclassified hits). 5. 📝 Moved generated_at_utc / jsonl_path into a dedicated 'metadata' subkey with an explicit 'not_covered_by_audit_sha256' list, and dropped the non-portable repo_root field entirely. Tests: 11 pass (added test_line_level_scanner_skips_python_only_patterns_for_non_python, test_namespace_counts_account_for_unclassified, test_gap_analysis_metadata_is_outside_commitment). Regenerated artifacts: audit_sha256 = 34ed7b25249c045c8274fe2b969986bc5a62839791581a59b86fb74bd5e5e3dd (deterministic across two runs; files_scanned=5028, issue_count_total=11027). Co-Authored-By: Tony Ha <aidoruao@gmail.com>

devin-ai-integration

Devin Review found 5 new potential issues.

devin-ai-integration · 2026-04-20T02:25:42Z

+        if is_python:
+            if _RE_STUB_PASS.match(line):
+                out.append((idx, "STUB_PASS", snippet))
+            if _RE_STUB_NOTIMPL.search(line):
+                out.append((idx, "STUB_NOTIMPL", snippet))
+            if _RE_FLOAT_CALL.search(line):
+                out.append((idx, "FLOAT_CALL", snippet))
+            if _RE_FLOAT_ANNOT.search(line):
+                out.append((idx, "FLOAT_ANNOT", snippet))
+            if _RE_ASSERT.match(line):
+                out.append((idx, "ASSERT_USE", snippet))


📝 Info: The is_python gate correctly prevents false positives on non-Python files — a meaningful improvement

The prior version of _scan_line_level (visible in the git diff) applied all pattern detectors to all file types, which would flag pass, assert, float(, and NotImplementedError in Markdown prose, YAML configs, and JSON data files. The new is_python flag at tools/generate_hashed_taxonomy.py:278 correctly gates Python-only patterns, and the test test_line_level_scanner_skips_python_only_patterns_for_non_python at tests/test_hashed_taxonomy.py:82 validates this. The TODO/FIXME/HACK patterns remain ungated since they are meaningful across all text file types.

Was this helpful? React with 👍 or 👎 to provide feedback.

Acknowledged — thanks for calling out the is_python gate explicitly. The test at tests/test_hashed_taxonomy.py:82 (test_line_level_scanner_skips_python_only_patterns_for_non_python) locks the invariant that STUB_PASS / STUB_NOTIMPL / FLOAT_CALL / FLOAT_ANNOT / ASSERT_USE are suppressed on non-Python files while TODO / FIXME / HACK remain active across all text types. No action needed here.

devin-ai-integration

Devin Review found 10 new potential issues.

devin-ai-integration · 2026-04-20T02:41:02Z

+    assert a == b
+    assert a == '{"a":1,"b":2}'


📝 Info: Assert usage in test file is consistent with existing repo patterns

The rule files say "No assert" but also say "Tests runnable with pytest." Pytest's fundamental mechanism IS assert statements. I verified that existing tests like tests/test_epistemic_advanced.py (14 assert statements) and tests/test_falsification.py use assert freely. This is an established pattern in the repo that overrides the literal rule text for test files specifically. Not flagging as a bug.

Was this helpful? React with 👍 or 👎 to provide feedback.

Acknowledged — tests intentionally use assert (standard pytest idiom and consistent with every other test file under tests/). The repo-wide "no assert" rule applies to production code, which is enforced by tools/standards_check.py (CS-004) against non-test paths. No change needed.

devin-ai-integration · 2026-04-20T02:41:03Z

+            entry: Dict[str, Any] = {
+                "id": f"OE-TAX-{ev_sha[:12]}",
+                "path": rel,
+                "line": line_no,
+                "issue_type": issue_type,
+                "severity": ISSUE_SEVERITY.get(issue_type, "low"),
+                "namespaces": list(ns_record),
+                "evidence": {"snippet": snippet},
+                "sha256_evidence": ev_sha,
+                "file_sha256": file_sha,
+            }
+            entry["entry_sha256"] = _sha256_text(_canonical_json(entry))
+            entries.append(entry)


📝 Info: entry_sha256 self-referential hash pattern is correctly implemented

At tools/generate_hashed_taxonomy.py:399, the entry_sha256 is computed by hashing the canonical JSON of the entry dict BEFORE entry_sha256 is added to it. This is the standard self-referential commitment pattern: the hash covers all fields except itself, and verification works by removing entry_sha256, re-canonicalizing, and re-hashing. The implementation is correct and deterministic.

Was this helpful? React with 👍 or 👎 to provide feedback.

Thanks — acknowledged. The self-referential commit does rely on _canonical_json (sorted keys, fixed separators, ensure_ascii=True) producing byte-identical output for the same logical content, and test_build_entries_is_deterministic pins that invariant by comparing sorted entry_sha256 sets across two runs.

devin-ai-integration · 2026-04-20T02:41:06Z

+        for line_no, issue_type, snippet in hits:
+            counts[issue_type] = counts.get(issue_type, 0) + 1
+            for ns in ns_record:
+                counts_by_namespace[ns] = counts_by_namespace.get(ns, 0) + 1


📝 Info: Namespace counts are intentionally non-partitioning (multi-labeling)

In build_entries at lines 385-386, each issue increments ALL namespace counters for the file it's in. A file classified under both "yeshua" and "math_popperian" will count each issue in BOTH namespaces. This means sum(issue_count_by_namespace.values()) >= issue_count_total. The test test_namespace_counts_account_for_unclassified explicitly verifies this with assert ns_total >= total, confirming this is intentional multi-labeling behavior, not a counting bug.

Was this helpful? React with 👍 or 👎 to provide feedback.

Thanks — acknowledged. Multi-labeling is intentional: a single finding can belong to multiple namespaces (e.g., an aerospace floor invariant that also mentions Yeshua axioms). test_namespace_counts_account_for_unclassified asserts sum(namespace_counts) >= issue_count_total (note >=, not ==) to make the non-partitioning property a locked invariant rather than an accidental side effect. Happy to add a summary-document comment spelling this out in a follow-up Stage C pass if you'd like.

…ANNOT return-type coverage Devin Review round 3 findings on PR #141 (10 total comments): BUG fixes (5): - Add 'Falsifies if:' / 'falsifies_if:' docstring pair to _scan_line_level - Add pair to _scan_check_function (with window-bleed invariant note) - Add pair to build_entries (determinism + summary/entries agreement) - Add pair to _write_summary (audit_sha256 determinism + metadata boundary) - Add pair to main (clean-walk exit code + cross-run determinism) FLAG fix (1): - Broaden _RE_FLOAT_ANNOT from r':\s*float\b' to r'(?::|->)\s*float\b' so return-type annotations 'def f() -> float' are flagged alongside parameter/variable annotations. Add test_float_annot_regex_catches_return_type_annotations to lock the coverage invariant. Cleanup (from ANALYSIS comment): - Remove unused 'path' parameter from _scan_check_function; update the single internal caller and three call sites in tests. Pure dead-code removal; no behavior change. Audit artifact regeneration: - audit_sha256 deterministic across two consecutive runs: f73f70dc8ae70d990a47aba215ab6ed49d165ac0d8dc4cfe734048a1c5a45eb2 - issue_count_total: 11,082 -> 11,275 (+193 newly-detected '-> float' return-type annotations now covered by FLOAT_ANNOT). Verification: - pytest tests/test_hashed_taxonomy.py: 15/15 passing (was 14, +1 new). - CS-003 (Falsifies-if title-case pair) passes on standards_check. - _RE_FALS_TITLE remains case-sensitive (no IGNORECASE) per .cursorrules. Co-Authored-By: Tony Ha <aidoruao@gmail.com>

…n fix CHECKPOINT_STAGES_A_THROUGH_G.md documents the full state of the 'finish everything' campaign for cross-session continuity: - Stage A (#141), B (#142), C (#143), F (#148), G (#149) — complete - Stage D (housekeeping, 14 stale PRs + 13 bot issues) — pending - Stage E (non-draft PR review for #91, #85, #26) — pending The checkpoint lists exact resume commands, open threads, and the five-command verification quartet that every resumed session should run before taking new action. STANDARDS_REGISTRY.json: drop a pre-existing duplicate 'total_standards' key at lines 8-9 (59 vs 58) — broken JSON blocked standards_check --verify. Kept the later value (58), which matched the most recent authoring intent. Appended consent-log entry for this change. Not enacting stages D/E in this session; resume from the checkpoint. Co-Authored-By: Tony Ha <aidoruao@gmail.com>

Adds tools/generate_hashed_taxonomy.py which walks the repo and emits a deterministic JSONL audit of occurrences mapped to 6 namespaces (aerospace, floor, yeshua, math_popperian, secular, projection) plus issue markers (TODO/FIXME/HACK, pass/NotImplementedError stubs, float() usages, check_* without Tuple[bool, ProofObject] return, check_* missing both 'Falsifies if:' and 'falsifies_if:'). - Every entry carries a sha256 over canonical JSON of the entry - Top-level audit_sha256 commits over (summary, ordered entry hashes); content-only, so two runs over same tree produce same commitment - Run against current tree: audit_sha256=7c32fdcad0f1fc02019eb8a1034f7207b97bbb44ffe3b14f2452513b432ebae9 files_scanned=5027 issue_count_total=3230 Also: - Fixes STANDARDS_REGISTRY.json duplicate '"total_standards"' line that was making tools/standards_check.py --list/--verify crash - Appends consent log entry per SOP-AI-HANDSHAKE-1.0 Per .cursorrules / CLAUDE.md: no float, Fraction classified ratio, every check function carries Falsifies if: + falsifies_if: doc pair, Tuple[bool, ProofObject]-compatible types, no stubs. Co-Authored-By: Tony Ha <aidoruao@gmail.com>

…, drop dead IssueHit, fix namespace sum, segregate non-deterministic metadata Review feedback on PR #141: 1. 🔴 assert statement in tools/generate_hashed_taxonomy.py:282 violated .cursorrules / CLAUDE.md 'No assert' rule. Replaced with a single walrus-style assignment that also removes the redundant double regex search the other reviewer flagged. 2. 🔴 _RE_ASSERT regex was defined but no scanner emitted the issue type. Renamed to ASSERT_USE (the rule applies to all modules, not just check_* surface), gated on is_python to avoid false positives in prose, and wired into _scan_line_level. 8012 real assert hits surfaced across Python sources. 3. 📝 Removed the dead IssueHit dataclass + its dataclass import. 4. 📝 Added 'unclassified' to counts_by_namespace so per-namespace counts can sum to at least issue_count_total (they previously silently dropped unclassified hits). 5. 📝 Moved generated_at_utc / jsonl_path into a dedicated 'metadata' subkey with an explicit 'not_covered_by_audit_sha256' list, and dropped the non-portable repo_root field entirely. Tests: 11 pass (added test_line_level_scanner_skips_python_only_patterns_for_non_python, test_namespace_counts_account_for_unclassified, test_gap_analysis_metadata_is_outside_commitment). Regenerated artifacts: audit_sha256 = 34ed7b25249c045c8274fe2b969986bc5a62839791581a59b86fb74bd5e5e3dd (deterministic across two runs; files_scanned=5028, issue_count_total=11027). Co-Authored-By: Tony Ha <aidoruao@gmail.com>

…docstring, narrow projection keywords - _scan_check_function: truncate next-40-line window at the next def/class so an adjacent check_* cannot satisfy the current one's contract - _RE_CHECK_DEF: anchor at line start with [ \t]* so match does not consume a preceding newline; line_no now points at the actual def line - _RE_FALS_TITLE: drop re.IGNORECASE — title-case 'Falsifies if:' is mandatory per .cursorrules / CLAUDE.md / .windsurfrules - _write_jsonl: update docstring to describe the true 4-tuple sort key (path, line, issue_type, entry_sha256) - NAMESPACE_KEYWORDS.projection: drop bare 'projection' / 'mirror' in favor of compound keys (projected_namespace, projected_view, projected_domain, namespace_projection, mirror_namespace, derivative_witness) to stop over-classifying common English uses of the word Tests added: - test_check_function_window_does_not_bleed_into_adjacent_def - test_falsifies_if_title_case_is_strictly_enforced - test_projection_namespace_keywords_are_narrow Audit artifacts regenerated. New audit_sha256 (deterministic across two runs): be134e9c1867d804eb5708ddb0058281f8203818549730b46634901e7ae0754c issue_count_total 11027 -> 11082 (bleed + title-case + line_no fixes surface 55 additional real findings). Co-Authored-By: Tony Ha <aidoruao@gmail.com>

…ANNOT return-type coverage Devin Review round 3 findings on PR #141 (10 total comments): BUG fixes (5): - Add 'Falsifies if:' / 'falsifies_if:' docstring pair to _scan_line_level - Add pair to _scan_check_function (with window-bleed invariant note) - Add pair to build_entries (determinism + summary/entries agreement) - Add pair to _write_summary (audit_sha256 determinism + metadata boundary) - Add pair to main (clean-walk exit code + cross-run determinism) FLAG fix (1): - Broaden _RE_FLOAT_ANNOT from r':\s*float\b' to r'(?::|->)\s*float\b' so return-type annotations 'def f() -> float' are flagged alongside parameter/variable annotations. Add test_float_annot_regex_catches_return_type_annotations to lock the coverage invariant. Cleanup (from ANALYSIS comment): - Remove unused 'path' parameter from _scan_check_function; update the single internal caller and three call sites in tests. Pure dead-code removal; no behavior change. Audit artifact regeneration: - audit_sha256 deterministic across two consecutive runs: f73f70dc8ae70d990a47aba215ab6ed49d165ac0d8dc4cfe734048a1c5a45eb2 - issue_count_total: 11,082 -> 11,275 (+193 newly-detected '-> float' return-type annotations now covered by FLOAT_ANNOT). Verification: - pytest tests/test_hashed_taxonomy.py: 15/15 passing (was 14, +1 new). - CS-003 (Falsifies-if title-case pair) passes on standards_check. - _RE_FALS_TITLE remains case-sensitive (no IGNORECASE) per .cursorrules. Co-Authored-By: Tony Ha <aidoruao@gmail.com>

devin-ai-integration

Devin Review found 8 new potential issues.

devin-ai-integration · 2026-04-20T17:19:23Z

📝 Info: Existing test files don't follow Falsifies if: / falsifies_if: convention either

While the rule files mandate both Falsifies if: (title-case) and falsifies_if: (lowercase) in ALL docstrings, the established test convention in this repo does not follow this — tests/test_axioms.py has zero instances of either form across 30+ test functions, and tests/test_timeline_html.py similarly omits them. The new test file tests/test_hashed_taxonomy.py actually does better than existing tests, including the pair in 8 of its 14 test docstrings. The missing pairs in the remaining 6 test functions are consistent with (and better than) existing repo convention, which is why they were not flagged as bugs.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-20T17:19:24Z

+    frac_classified = Fraction(classified, total)
+    # ``metadata`` collects fields that are *intentionally* not covered by
+    # ``audit_sha256`` (timestamps, output paths). Separating them makes the
+    # commitment boundary visible to reviewers: only ``summary`` +
+    # ``ordered_entry_hashes`` participate in the commit payload.
+    doc = {
+        "schema": "OE-GAP-ANALYSIS-1.0",
+        "summary": summary,
+        "classified_fraction": f"{frac_classified.numerator}/{frac_classified.denominator}",


🟡 Fraction auto-reduction makes classified_fraction misrepresent actual counts

Fraction(classified, total) auto-reduces to lowest terms, so the string f"{frac_classified.numerator}/{frac_classified.denominator}" can produce misleading output. For example, if 10 of 20 entries are classified, the output is "1/2" instead of "10/20", while entry_count is 20. A reader would expect the denominator to match entry_count. The current run's output ("7158/11275") is coincidentally correct because GCD(7158, 11275)=1, but future runs with reducible counts will produce a fraction whose denominator doesn't match the adjacent entry_count field.

Suggested change

frac_classified = Fraction(classified, total)

# ``metadata`` collects fields that are *intentionally* not covered by

# ``audit_sha256`` (timestamps, output paths). Separating them makes the

# commitment boundary visible to reviewers: only ``summary`` +

# ``ordered_entry_hashes`` participate in the commit payload.

doc = {

"schema": "OE-GAP-ANALYSIS-1.0",

"summary": summary,

"classified_fraction": f"{frac_classified.numerator}/{frac_classified.denominator}",

frac_classified = Fraction(classified, total)

# ``metadata`` collects fields that are *intentionally* not covered by

# ``audit_sha256`` (timestamps, output paths). Separating them makes the

# commitment boundary visible to reviewers: only ``summary`` +

# ``ordered_entry_hashes`` participate in the commit payload.

doc = {

"schema": "OE-GAP-ANALYSIS-1.0",

"summary": summary,

"classified_fraction": f"{classified}/{total}",

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-20T17:19:26Z

+
+
+def _canonical_json(obj: Any) -> str:
+    """Return a deterministic canonical JSON serialisation (sorted keys, no BOM)."""


🔴 Docstring for _canonical_json missing mandatory Falsifies if: / falsifies_if: pair

Per .cursorrules, .windsurfrules, and CLAUDE.md, all docstrings must include both Falsifies if: (title-case) and falsifies_if: (lowercase). The _canonical_json function at tools/generate_hashed_taxonomy.py:218 has a docstring but is missing both required forms. Other functions in the same file (_sha256_text, _sha256_bytes, _scan_line_level, build_entries, etc.) correctly include the pair, making this an inconsistency within the file itself.

Suggested change

"""Return a deterministic canonical JSON serialisation (sorted keys, no BOM)."""

"""Return a deterministic canonical JSON serialisation (sorted keys, no BOM).

Falsifies if: the output is not identical for two dicts with the same key-value pairs.

falsifies_if: the output is not identical for two dicts with the same key-value pairs.

"""

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-20T17:19:27Z

+    """Write entries as JSONL, deterministically ordered by the 4-tuple
+    ``(path, line, issue_type, entry_sha256)``.
+
+    The ``entry_sha256`` tail of the sort key ensures a stable total order
+    when two findings share the same (path, line, issue_type) coordinates.
+    """


🔴 Docstring for _write_jsonl missing mandatory Falsifies if: / falsifies_if: pair

Per .cursorrules, .windsurfrules, and CLAUDE.md, all docstrings must include both Falsifies if: (title-case) and falsifies_if: (lowercase). The _write_jsonl function at tools/generate_hashed_taxonomy.py:439-444 has a multi-line docstring but is missing both required forms. Sibling functions like _write_summary and build_entries in the same file correctly include the pair.

Suggested change

"""Write entries as JSONL, deterministically ordered by the 4-tuple

``(path, line, issue_type, entry_sha256)``.

The ``entry_sha256`` tail of the sort key ensures a stable total order

when two findings share the same (path, line, issue_type) coordinates.

"""

"""Write entries as JSONL, deterministically ordered by the 4-tuple

``(path, line, issue_type, entry_sha256)``.

The ``entry_sha256`` tail of the sort key ensures a stable total order

when two findings share the same (path, line, issue_type) coordinates.

Falsifies if: the output file order differs across two calls with the same entries.

falsifies_if: the output file order differs across two calls with the same entries.

"""

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-20T17:19:29Z

+
+
+def _namespaces_for_text(text_lower: str, path_lower: str) -> Tuple[str, ...]:
+    """Return the sorted tuple of namespace labels matching ``text_lower``/``path_lower``."""


🔴 Docstring for _namespaces_for_text missing mandatory Falsifies if: / falsifies_if: pair

Per .cursorrules, .windsurfrules, and CLAUDE.md, all docstrings must include both Falsifies if: (title-case) and falsifies_if: (lowercase). The _namespaces_for_text function at tools/generate_hashed_taxonomy.py:260 has a docstring but is missing both required forms. This function is part of the core classification logic and is exercised directly in tests, making the missing falsification contract more impactful than for pure utility functions.

Suggested change

"""Return the sorted tuple of namespace labels matching ``text_lower``/``path_lower``."""

"""Return the sorted tuple of namespace labels matching ``text_lower``/``path_lower``.

Falsifies if: a keyword present in the text or path does not yield the expected namespace.

falsifies_if: a keyword present in the text or path does not yield the expected namespace.

"""

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-20T17:19:30Z

+        "reverse_math",
+        "surreal",
+        "transfinite",
+        "fraction(",


🚩 Namespace over-classification via fraction( keyword

The math_popperian namespace includes "fraction(" as a keyword (tools/generate_hashed_taxonomy.py:95). Since .cursorrules and .windsurfrules mandate Fraction everywhere instead of float, virtually every Python file in the repo will contain Fraction( which, after .lower(), matches "fraction(". This explains the very high math_popperian count of 4825 in the committed audits/gap_analysis_20260420.json:19. The namespace classification is technically correct but extremely broad — nearly every Python file gets tagged as math_popperian, reducing the signal value of that namespace label.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-20T17:19:31Z

+# Matches both variable/parameter annotations (``x: float``) and return-type
+# annotations (``def f() -> float``) so the rule "no float anywhere" has full
+# coverage rather than only catching the argument side.
+_RE_FLOAT_ANNOT = re.compile(r"(?::|->)\s*float\b")


📝 Info: _RE_FLOAT_ANNOT regex doesn't catch generic float annotations like List[float]

The regex (?::|->)\s*float\b at tools/generate_hashed_taxonomy.py:166 only catches annotations where float immediately follows : or -> (modulo whitespace). Annotations like x: List[float], x: Optional[float], or x: dict[str, float] would not be flagged. The comment on lines 163-165 documents the intent as covering "variable/parameter annotations and return-type annotations", so this gap appears to be a known scope limitation rather than an oversight. If the goal is truly "no float anywhere" enforcement, this would need a broader pattern like \bfloat\b with context-aware filtering.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-20T17:19:32Z

+    out_path.parent.mkdir(parents=True, exist_ok=True)
+    ordered_hashes = sorted(e["entry_sha256"] for e in entries)
+    commit_payload = {
+        "summary": summary,
+        "ordered_entry_hashes": ordered_hashes,
+    }
+    audit_sha256 = _sha256_text(_canonical_json(commit_payload))
+    try:
+        jsonl_label = jsonl_path.resolve().relative_to(REPO_ROOT).as_posix()
+    except ValueError:
+        jsonl_label = jsonl_path.name
+
+    # Percent of entries classified — as Fraction, rendered as "num/den".
+    total = max(len(entries), 1)
+    classified = sum(1 for e in entries if e["namespaces"] != ["unclassified"])
+    frac_classified = Fraction(classified, total)
+    # ``metadata`` collects fields that are *intentionally* not covered by
+    # ``audit_sha256`` (timestamps, output paths). Separating them makes the
+    # commitment boundary visible to reviewers: only ``summary`` +
+    # ``ordered_entry_hashes`` participate in the commit payload.
+    doc = {
+        "schema": "OE-GAP-ANALYSIS-1.0",
+        "summary": summary,
+        "classified_fraction": f"{frac_classified.numerator}/{frac_classified.denominator}",
+        "audit_sha256": audit_sha256,
+        "entry_count": len(entries),
+        "metadata": {
+            "generated_at_utc": datetime.now(tz=timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ"),
+            "jsonl_path": jsonl_label,
+            "not_covered_by_audit_sha256": ["generated_at_utc", "jsonl_path"],
+        },
+    }
+    with out_path.open("w", encoding="utf-8") as fh:
+        fh.write(json.dumps(doc, sort_keys=True, indent=2, ensure_ascii=True) + "\n")
+    return audit_sha256


📝 Info: Determinism invariant is well-designed with clear commitment boundary

The audit_sha256 commitment at tools/generate_hashed_taxonomy.py:470-474 correctly separates deterministic content (summary + ordered entry hashes) from non-deterministic metadata (timestamps, output paths). The metadata is explicitly documented via not_covered_by_audit_sha256 in the output JSON. The _iter_candidate_files uses sorted(root.rglob("*")) for deterministic ordering, and _write_jsonl sorts by a 4-tuple including entry_sha256 as a tiebreaker. This design means two runs over an identical working tree will produce identical audit_sha256 values, satisfying the tool's core invariant.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-20T17:19:38Z

Devin is currently unreachable - the session may have died.

devin-ai-integration · 2026-04-20T17:19:39Z

Devin is currently unreachable - the session may have died.

devin-ai-integration · 2026-04-20T17:19:39Z

Devin is currently unreachable - the session may have died.

devin-ai-integration · 2026-04-20T17:19:39Z

Devin is currently unreachable - the session may have died.

devin-ai-integration · 2026-04-20T17:31:21Z

SUPERSEDED by PR #151 — This PR's changes are included in the consolidated merge PR #151 (CI 31/31 green, ready to merge). Close this PR after #151 is merged.

devin-ai-integration Bot assigned aidoruao Apr 20, 2026

devin-ai-integration Bot requested a review from aidoruao as a code owner April 20, 2026 02:11

devin-ai-integration Bot commented Apr 20, 2026

View reviewed changes

This was referenced Apr 20, 2026

[Auto] Mathematical weight detected: The comment mentions 'hashed' (implying cryptographic hashing, number theory, co #146

Closed

[Auto] Mathematical weight detected: The comment refers to work involving hashing algorithms (related to number theor #147

Closed

devin-ai-integration Bot and others added 4 commits April 20, 2026 17:10

devin-ai-integration Bot force-pushed the devin/1776650369-stage-a-hashed-taxonomy branch from 3aab8b1 to b0e2a8a Compare April 20, 2026 17:10

devin-ai-integration Bot commented Apr 20, 2026

View reviewed changes

devin-ai-integration Bot mentioned this pull request Apr 20, 2026

READY TO MERGE: consolidate Stages A-G (PRs #141, #142, #143, #148, #149) + UNIVERSAL_ONBOARDING.md #151

Merged

5 tasks

aidoruao merged commit 1ee654a into main Apr 20, 2026
32 checks passed



		def _canonical_json(obj: Any) -> str:
		"""Return a deterministic canonical JSON serialisation (sorted keys, no BOM)."""



		def _namespaces_for_text(text_lower: str, path_lower: str) -> Tuple[str, ...]:
		"""Return the sorted tuple of namespace labels matching ``text_lower``/``path_lower``."""

Conversation

devin-ai-integration Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Review & Testing Checklist for Human

Notes

Uh oh!

devin-ai-integration Bot commented Apr 20, 2026

Uh oh!

devin-ai-integration Bot commented Apr 20, 2026

🤖 Devin AI Engineer

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

devin-ai-integration Bot Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

devin-ai-integration Bot Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

devin-ai-integration Bot Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

devin-ai-integration Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

devin-ai-integration Bot Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

devin-ai-integration Bot commented Apr 20, 2026 •

edited

Loading

devin-ai-integration Bot Apr 20, 2026 •

edited

Loading

devin-ai-integration Bot Apr 20, 2026 •

edited

Loading

devin-ai-integration Bot Apr 20, 2026 •

edited

Loading

devin-ai-integration Bot Apr 20, 2026 •

edited

Loading

devin-ai-integration Bot Apr 20, 2026 •

edited

Loading