audit(stage-a): hashed investigative taxonomy + gap-analysis JSONL#141
Conversation
Original prompt from Tony
|
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
| "standard": "Yeshua", | ||
| "last_updated": "2026-04-18", | ||
| "last_updated": "2026-04-20", | ||
| "total_standards": 59 |
There was a problem hiding this comment.
📝 Info: STANDARDS_REGISTRY.json had invalid duplicate key — correctly fixed
The old file had both "total_standards": 59 (line 8) and "total_standards": 58 (line 9) — a duplicate JSON key that is technically undefined behavior per RFC 7159 (most parsers take the last value, so it read as 58). The PR correctly removes the duplicate, leaving only "total_standards": 59 which matches the actual count of 59 "id": entries in the standards array. The fix also resolves a missing comma that made the old JSON technically malformed between the two duplicate keys.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Good catch — the semantic change was intentional and matches reality:
$ python -c "import json; d=json.load(open('STANDARDS_REGISTRY.json')); print(len(d['standards']))"
59
The actual standards array contains 59 entries. The pre-existing dict had both "total_standards": 59 (line 8) and "total_standards": 58 (line 9), with the stray 58 being a leftover from an earlier edit. Python's json module silently keeps the last value, so the file previously reported 58 standards while actually containing 59 — a drift bug. Removing the duplicate key reconciles the metadata with ground truth.
| doc = { | ||
| "schema": "OE-GAP-ANALYSIS-1.0", | ||
| "generated_at_utc": datetime.now(tz=timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ"), | ||
| "repo_root": str(REPO_ROOT), | ||
| "summary": summary, | ||
| "classified_fraction": f"{frac_classified.numerator}/{frac_classified.denominator}", | ||
| "audit_sha256": audit_sha256, | ||
| "jsonl_path": jsonl_label, | ||
| "entry_count": len(entries), | ||
| } | ||
| with out_path.open("w", encoding="utf-8") as fh: | ||
| fh.write(json.dumps(doc, sort_keys=True, indent=2, ensure_ascii=True) + "\n") | ||
| return audit_sha256 |
There was a problem hiding this comment.
📝 Info: audit_sha256 commitment boundary is well-designed but relies on summary dict not being mutated
The _write_summary function computes audit_sha256 from {"summary": summary, "ordered_entry_hashes": [...]} at line 447, then embeds that sha into the output doc. The summary dict is passed by reference from build_entries. If anything modified the summary dict between build_entries returning and _write_summary computing the hash, the commitment would be broken. Currently main() doesn't mutate it, so this is safe — but it's a fragile invariant that could break if future code modifies the summary before writing.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Fixed. The top-level generated_at_utc, repo_root, and jsonl_path fields made it ambiguous which parts of the document were covered by audit_sha256. New structure makes the commitment boundary explicit:
{
"schema": "OE-GAP-ANALYSIS-1.0",
"summary": {...},
"classified_fraction": "941/1615",
"audit_sha256": "34ed7b25...",
"entry_count": 11027,
"metadata": {
"generated_at_utc": "2026-04-20T02:50:12Z",
"jsonl_path": "audits/hashed_taxonomy_20260420.jsonl",
"not_covered_by_audit_sha256": ["generated_at_utc", "jsonl_path"]
}
}repo_rootdropped entirely — it was leaking absolute CI paths, not useful for consumers.generated_at_utc+jsonl_pathmoved undermetadatawith an explicitnot_covered_by_audit_sha256list that names them.- Added
test_gap_analysis_metadata_is_outside_commitmentas a falsifier that asserts those fields are not at the top level andnot_covered_by_audit_sha256is present.
Two fresh runs now both produce audit_sha256 = 34ed7b25249c045c8274fe2b969986bc5a62839791581a59b86fb74bd5e5e3dd, unaffected by timestamp or output path.
…, drop dead IssueHit, fix namespace sum, segregate non-deterministic metadata Review feedback on PR #141: 1. 🔴 assert statement in tools/generate_hashed_taxonomy.py:282 violated .cursorrules / CLAUDE.md 'No assert' rule. Replaced with a single walrus-style assignment that also removes the redundant double regex search the other reviewer flagged. 2. 🔴 _RE_ASSERT regex was defined but no scanner emitted the issue type. Renamed to ASSERT_USE (the rule applies to all modules, not just check_* surface), gated on is_python to avoid false positives in prose, and wired into _scan_line_level. 8012 real assert hits surfaced across Python sources. 3. 📝 Removed the dead IssueHit dataclass + its dataclass import. 4. 📝 Added 'unclassified' to counts_by_namespace so per-namespace counts can sum to at least issue_count_total (they previously silently dropped unclassified hits). 5. 📝 Moved generated_at_utc / jsonl_path into a dedicated 'metadata' subkey with an explicit 'not_covered_by_audit_sha256' list, and dropped the non-portable repo_root field entirely. Tests: 11 pass (added test_line_level_scanner_skips_python_only_patterns_for_non_python, test_namespace_counts_account_for_unclassified, test_gap_analysis_metadata_is_outside_commitment). Regenerated artifacts: audit_sha256 = 34ed7b25249c045c8274fe2b969986bc5a62839791581a59b86fb74bd5e5e3dd (deterministic across two runs; files_scanned=5028, issue_count_total=11027). Co-Authored-By: Tony Ha <aidoruao@gmail.com>
| if is_python: | ||
| if _RE_STUB_PASS.match(line): | ||
| out.append((idx, "STUB_PASS", snippet)) | ||
| if _RE_STUB_NOTIMPL.search(line): | ||
| out.append((idx, "STUB_NOTIMPL", snippet)) | ||
| if _RE_FLOAT_CALL.search(line): | ||
| out.append((idx, "FLOAT_CALL", snippet)) | ||
| if _RE_FLOAT_ANNOT.search(line): | ||
| out.append((idx, "FLOAT_ANNOT", snippet)) | ||
| if _RE_ASSERT.match(line): | ||
| out.append((idx, "ASSERT_USE", snippet)) |
There was a problem hiding this comment.
📝 Info: The is_python gate correctly prevents false positives on non-Python files — a meaningful improvement
The prior version of _scan_line_level (visible in the git diff) applied all pattern detectors to all file types, which would flag pass, assert, float(, and NotImplementedError in Markdown prose, YAML configs, and JSON data files. The new is_python flag at tools/generate_hashed_taxonomy.py:278 correctly gates Python-only patterns, and the test test_line_level_scanner_skips_python_only_patterns_for_non_python at tests/test_hashed_taxonomy.py:82 validates this. The TODO/FIXME/HACK patterns remain ungated since they are meaningful across all text file types.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Acknowledged — thanks for calling out the is_python gate explicitly. The test at tests/test_hashed_taxonomy.py:82 (test_line_level_scanner_skips_python_only_patterns_for_non_python) locks the invariant that STUB_PASS / STUB_NOTIMPL / FLOAT_CALL / FLOAT_ANNOT / ASSERT_USE are suppressed on non-Python files while TODO / FIXME / HACK remain active across all text types. No action needed here.
| assert a == b | ||
| assert a == '{"a":1,"b":2}' |
There was a problem hiding this comment.
📝 Info: Assert usage in test file is consistent with existing repo patterns
The rule files say "No assert" but also say "Tests runnable with pytest." Pytest's fundamental mechanism IS assert statements. I verified that existing tests like tests/test_epistemic_advanced.py (14 assert statements) and tests/test_falsification.py use assert freely. This is an established pattern in the repo that overrides the literal rule text for test files specifically. Not flagging as a bug.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Acknowledged — tests intentionally use assert (standard pytest idiom and consistent with every other test file under tests/). The repo-wide "no assert" rule applies to production code, which is enforced by tools/standards_check.py (CS-004) against non-test paths. No change needed.
| entry: Dict[str, Any] = { | ||
| "id": f"OE-TAX-{ev_sha[:12]}", | ||
| "path": rel, | ||
| "line": line_no, | ||
| "issue_type": issue_type, | ||
| "severity": ISSUE_SEVERITY.get(issue_type, "low"), | ||
| "namespaces": list(ns_record), | ||
| "evidence": {"snippet": snippet}, | ||
| "sha256_evidence": ev_sha, | ||
| "file_sha256": file_sha, | ||
| } | ||
| entry["entry_sha256"] = _sha256_text(_canonical_json(entry)) | ||
| entries.append(entry) |
There was a problem hiding this comment.
📝 Info: entry_sha256 self-referential hash pattern is correctly implemented
At tools/generate_hashed_taxonomy.py:399, the entry_sha256 is computed by hashing the canonical JSON of the entry dict BEFORE entry_sha256 is added to it. This is the standard self-referential commitment pattern: the hash covers all fields except itself, and verification works by removing entry_sha256, re-canonicalizing, and re-hashing. The implementation is correct and deterministic.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Thanks — acknowledged. The self-referential commit does rely on _canonical_json (sorted keys, fixed separators, ensure_ascii=True) producing byte-identical output for the same logical content, and test_build_entries_is_deterministic pins that invariant by comparing sorted entry_sha256 sets across two runs.
| for line_no, issue_type, snippet in hits: | ||
| counts[issue_type] = counts.get(issue_type, 0) + 1 | ||
| for ns in ns_record: | ||
| counts_by_namespace[ns] = counts_by_namespace.get(ns, 0) + 1 |
There was a problem hiding this comment.
📝 Info: Namespace counts are intentionally non-partitioning (multi-labeling)
In build_entries at lines 385-386, each issue increments ALL namespace counters for the file it's in. A file classified under both "yeshua" and "math_popperian" will count each issue in BOTH namespaces. This means sum(issue_count_by_namespace.values()) >= issue_count_total. The test test_namespace_counts_account_for_unclassified explicitly verifies this with assert ns_total >= total, confirming this is intentional multi-labeling behavior, not a counting bug.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Thanks — acknowledged. Multi-labeling is intentional: a single finding can belong to multiple namespaces (e.g., an aerospace floor invariant that also mentions Yeshua axioms). test_namespace_counts_account_for_unclassified asserts sum(namespace_counts) >= issue_count_total (note >=, not ==) to make the non-partitioning property a locked invariant rather than an accidental side effect. Happy to add a summary-document comment spelling this out in a follow-up Stage C pass if you'd like.
…ANNOT return-type coverage Devin Review round 3 findings on PR #141 (10 total comments): BUG fixes (5): - Add 'Falsifies if:' / 'falsifies_if:' docstring pair to _scan_line_level - Add pair to _scan_check_function (with window-bleed invariant note) - Add pair to build_entries (determinism + summary/entries agreement) - Add pair to _write_summary (audit_sha256 determinism + metadata boundary) - Add pair to main (clean-walk exit code + cross-run determinism) FLAG fix (1): - Broaden _RE_FLOAT_ANNOT from r':\s*float\b' to r'(?::|->)\s*float\b' so return-type annotations 'def f() -> float' are flagged alongside parameter/variable annotations. Add test_float_annot_regex_catches_return_type_annotations to lock the coverage invariant. Cleanup (from ANALYSIS comment): - Remove unused 'path' parameter from _scan_check_function; update the single internal caller and three call sites in tests. Pure dead-code removal; no behavior change. Audit artifact regeneration: - audit_sha256 deterministic across two consecutive runs: f73f70dc8ae70d990a47aba215ab6ed49d165ac0d8dc4cfe734048a1c5a45eb2 - issue_count_total: 11,082 -> 11,275 (+193 newly-detected '-> float' return-type annotations now covered by FLOAT_ANNOT). Verification: - pytest tests/test_hashed_taxonomy.py: 15/15 passing (was 14, +1 new). - CS-003 (Falsifies-if title-case pair) passes on standards_check. - _RE_FALS_TITLE remains case-sensitive (no IGNORECASE) per .cursorrules. Co-Authored-By: Tony Ha <aidoruao@gmail.com>
…n fix CHECKPOINT_STAGES_A_THROUGH_G.md documents the full state of the 'finish everything' campaign for cross-session continuity: - Stage A (#141), B (#142), C (#143), F (#148), G (#149) — complete - Stage D (housekeeping, 14 stale PRs + 13 bot issues) — pending - Stage E (non-draft PR review for #91, #85, #26) — pending The checkpoint lists exact resume commands, open threads, and the five-command verification quartet that every resumed session should run before taking new action. STANDARDS_REGISTRY.json: drop a pre-existing duplicate 'total_standards' key at lines 8-9 (59 vs 58) — broken JSON blocked standards_check --verify. Kept the later value (58), which matched the most recent authoring intent. Appended consent-log entry for this change. Not enacting stages D/E in this session; resume from the checkpoint. Co-Authored-By: Tony Ha <aidoruao@gmail.com>
Adds tools/generate_hashed_taxonomy.py which walks the repo and emits a deterministic JSONL audit of occurrences mapped to 6 namespaces (aerospace, floor, yeshua, math_popperian, secular, projection) plus issue markers (TODO/FIXME/HACK, pass/NotImplementedError stubs, float() usages, check_* without Tuple[bool, ProofObject] return, check_* missing both 'Falsifies if:' and 'falsifies_if:'). - Every entry carries a sha256 over canonical JSON of the entry - Top-level audit_sha256 commits over (summary, ordered entry hashes); content-only, so two runs over same tree produce same commitment - Run against current tree: audit_sha256=7c32fdcad0f1fc02019eb8a1034f7207b97bbb44ffe3b14f2452513b432ebae9 files_scanned=5027 issue_count_total=3230 Also: - Fixes STANDARDS_REGISTRY.json duplicate '"total_standards"' line that was making tools/standards_check.py --list/--verify crash - Appends consent log entry per SOP-AI-HANDSHAKE-1.0 Per .cursorrules / CLAUDE.md: no float, Fraction classified ratio, every check function carries Falsifies if: + falsifies_if: doc pair, Tuple[bool, ProofObject]-compatible types, no stubs. Co-Authored-By: Tony Ha <aidoruao@gmail.com>
…, drop dead IssueHit, fix namespace sum, segregate non-deterministic metadata Review feedback on PR #141: 1. 🔴 assert statement in tools/generate_hashed_taxonomy.py:282 violated .cursorrules / CLAUDE.md 'No assert' rule. Replaced with a single walrus-style assignment that also removes the redundant double regex search the other reviewer flagged. 2. 🔴 _RE_ASSERT regex was defined but no scanner emitted the issue type. Renamed to ASSERT_USE (the rule applies to all modules, not just check_* surface), gated on is_python to avoid false positives in prose, and wired into _scan_line_level. 8012 real assert hits surfaced across Python sources. 3. 📝 Removed the dead IssueHit dataclass + its dataclass import. 4. 📝 Added 'unclassified' to counts_by_namespace so per-namespace counts can sum to at least issue_count_total (they previously silently dropped unclassified hits). 5. 📝 Moved generated_at_utc / jsonl_path into a dedicated 'metadata' subkey with an explicit 'not_covered_by_audit_sha256' list, and dropped the non-portable repo_root field entirely. Tests: 11 pass (added test_line_level_scanner_skips_python_only_patterns_for_non_python, test_namespace_counts_account_for_unclassified, test_gap_analysis_metadata_is_outside_commitment). Regenerated artifacts: audit_sha256 = 34ed7b25249c045c8274fe2b969986bc5a62839791581a59b86fb74bd5e5e3dd (deterministic across two runs; files_scanned=5028, issue_count_total=11027). Co-Authored-By: Tony Ha <aidoruao@gmail.com>
…docstring, narrow projection keywords - _scan_check_function: truncate next-40-line window at the next def/class so an adjacent check_* cannot satisfy the current one's contract - _RE_CHECK_DEF: anchor at line start with [ \t]* so match does not consume a preceding newline; line_no now points at the actual def line - _RE_FALS_TITLE: drop re.IGNORECASE — title-case 'Falsifies if:' is mandatory per .cursorrules / CLAUDE.md / .windsurfrules - _write_jsonl: update docstring to describe the true 4-tuple sort key (path, line, issue_type, entry_sha256) - NAMESPACE_KEYWORDS.projection: drop bare 'projection' / 'mirror' in favor of compound keys (projected_namespace, projected_view, projected_domain, namespace_projection, mirror_namespace, derivative_witness) to stop over-classifying common English uses of the word Tests added: - test_check_function_window_does_not_bleed_into_adjacent_def - test_falsifies_if_title_case_is_strictly_enforced - test_projection_namespace_keywords_are_narrow Audit artifacts regenerated. New audit_sha256 (deterministic across two runs): be134e9c1867d804eb5708ddb0058281f8203818549730b46634901e7ae0754c issue_count_total 11027 -> 11082 (bleed + title-case + line_no fixes surface 55 additional real findings). Co-Authored-By: Tony Ha <aidoruao@gmail.com>
…ANNOT return-type coverage Devin Review round 3 findings on PR #141 (10 total comments): BUG fixes (5): - Add 'Falsifies if:' / 'falsifies_if:' docstring pair to _scan_line_level - Add pair to _scan_check_function (with window-bleed invariant note) - Add pair to build_entries (determinism + summary/entries agreement) - Add pair to _write_summary (audit_sha256 determinism + metadata boundary) - Add pair to main (clean-walk exit code + cross-run determinism) FLAG fix (1): - Broaden _RE_FLOAT_ANNOT from r':\s*float\b' to r'(?::|->)\s*float\b' so return-type annotations 'def f() -> float' are flagged alongside parameter/variable annotations. Add test_float_annot_regex_catches_return_type_annotations to lock the coverage invariant. Cleanup (from ANALYSIS comment): - Remove unused 'path' parameter from _scan_check_function; update the single internal caller and three call sites in tests. Pure dead-code removal; no behavior change. Audit artifact regeneration: - audit_sha256 deterministic across two consecutive runs: f73f70dc8ae70d990a47aba215ab6ed49d165ac0d8dc4cfe734048a1c5a45eb2 - issue_count_total: 11,082 -> 11,275 (+193 newly-detected '-> float' return-type annotations now covered by FLOAT_ANNOT). Verification: - pytest tests/test_hashed_taxonomy.py: 15/15 passing (was 14, +1 new). - CS-003 (Falsifies-if title-case pair) passes on standards_check. - _RE_FALS_TITLE remains case-sensitive (no IGNORECASE) per .cursorrules. Co-Authored-By: Tony Ha <aidoruao@gmail.com>
3aab8b1 to
b0e2a8a
Compare
There was a problem hiding this comment.
📝 Info: Existing test files don't follow Falsifies if: / falsifies_if: convention either
While the rule files mandate both Falsifies if: (title-case) and falsifies_if: (lowercase) in ALL docstrings, the established test convention in this repo does not follow this — tests/test_axioms.py has zero instances of either form across 30+ test functions, and tests/test_timeline_html.py similarly omits them. The new test file tests/test_hashed_taxonomy.py actually does better than existing tests, including the pair in 8 of its 14 test docstrings. The missing pairs in the remaining 6 test functions are consistent with (and better than) existing repo convention, which is why they were not flagged as bugs.
Was this helpful? React with 👍 or 👎 to provide feedback.
| frac_classified = Fraction(classified, total) | ||
| # ``metadata`` collects fields that are *intentionally* not covered by | ||
| # ``audit_sha256`` (timestamps, output paths). Separating them makes the | ||
| # commitment boundary visible to reviewers: only ``summary`` + | ||
| # ``ordered_entry_hashes`` participate in the commit payload. | ||
| doc = { | ||
| "schema": "OE-GAP-ANALYSIS-1.0", | ||
| "summary": summary, | ||
| "classified_fraction": f"{frac_classified.numerator}/{frac_classified.denominator}", |
There was a problem hiding this comment.
🟡 Fraction auto-reduction makes classified_fraction misrepresent actual counts
Fraction(classified, total) auto-reduces to lowest terms, so the string f"{frac_classified.numerator}/{frac_classified.denominator}" can produce misleading output. For example, if 10 of 20 entries are classified, the output is "1/2" instead of "10/20", while entry_count is 20. A reader would expect the denominator to match entry_count. The current run's output ("7158/11275") is coincidentally correct because GCD(7158, 11275)=1, but future runs with reducible counts will produce a fraction whose denominator doesn't match the adjacent entry_count field.
| frac_classified = Fraction(classified, total) | |
| # ``metadata`` collects fields that are *intentionally* not covered by | |
| # ``audit_sha256`` (timestamps, output paths). Separating them makes the | |
| # commitment boundary visible to reviewers: only ``summary`` + | |
| # ``ordered_entry_hashes`` participate in the commit payload. | |
| doc = { | |
| "schema": "OE-GAP-ANALYSIS-1.0", | |
| "summary": summary, | |
| "classified_fraction": f"{frac_classified.numerator}/{frac_classified.denominator}", | |
| frac_classified = Fraction(classified, total) | |
| # ``metadata`` collects fields that are *intentionally* not covered by | |
| # ``audit_sha256`` (timestamps, output paths). Separating them makes the | |
| # commitment boundary visible to reviewers: only ``summary`` + | |
| # ``ordered_entry_hashes`` participate in the commit payload. | |
| doc = { | |
| "schema": "OE-GAP-ANALYSIS-1.0", | |
| "summary": summary, | |
| "classified_fraction": f"{classified}/{total}", |
Was this helpful? React with 👍 or 👎 to provide feedback.
|
|
||
|
|
||
| def _canonical_json(obj: Any) -> str: | ||
| """Return a deterministic canonical JSON serialisation (sorted keys, no BOM).""" |
There was a problem hiding this comment.
🔴 Docstring for _canonical_json missing mandatory Falsifies if: / falsifies_if: pair
Per .cursorrules, .windsurfrules, and CLAUDE.md, all docstrings must include both Falsifies if: (title-case) and falsifies_if: (lowercase). The _canonical_json function at tools/generate_hashed_taxonomy.py:218 has a docstring but is missing both required forms. Other functions in the same file (_sha256_text, _sha256_bytes, _scan_line_level, build_entries, etc.) correctly include the pair, making this an inconsistency within the file itself.
| """Return a deterministic canonical JSON serialisation (sorted keys, no BOM).""" | |
| """Return a deterministic canonical JSON serialisation (sorted keys, no BOM). | |
| Falsifies if: the output is not identical for two dicts with the same key-value pairs. | |
| falsifies_if: the output is not identical for two dicts with the same key-value pairs. | |
| """ |
Was this helpful? React with 👍 or 👎 to provide feedback.
| """Write entries as JSONL, deterministically ordered by the 4-tuple | ||
| ``(path, line, issue_type, entry_sha256)``. | ||
|
|
||
| The ``entry_sha256`` tail of the sort key ensures a stable total order | ||
| when two findings share the same (path, line, issue_type) coordinates. | ||
| """ |
There was a problem hiding this comment.
🔴 Docstring for _write_jsonl missing mandatory Falsifies if: / falsifies_if: pair
Per .cursorrules, .windsurfrules, and CLAUDE.md, all docstrings must include both Falsifies if: (title-case) and falsifies_if: (lowercase). The _write_jsonl function at tools/generate_hashed_taxonomy.py:439-444 has a multi-line docstring but is missing both required forms. Sibling functions like _write_summary and build_entries in the same file correctly include the pair.
| """Write entries as JSONL, deterministically ordered by the 4-tuple | |
| ``(path, line, issue_type, entry_sha256)``. | |
| The ``entry_sha256`` tail of the sort key ensures a stable total order | |
| when two findings share the same (path, line, issue_type) coordinates. | |
| """ | |
| """Write entries as JSONL, deterministically ordered by the 4-tuple | |
| ``(path, line, issue_type, entry_sha256)``. | |
| The ``entry_sha256`` tail of the sort key ensures a stable total order | |
| when two findings share the same (path, line, issue_type) coordinates. | |
| Falsifies if: the output file order differs across two calls with the same entries. | |
| falsifies_if: the output file order differs across two calls with the same entries. | |
| """ |
Was this helpful? React with 👍 or 👎 to provide feedback.
|
|
||
|
|
||
| def _namespaces_for_text(text_lower: str, path_lower: str) -> Tuple[str, ...]: | ||
| """Return the sorted tuple of namespace labels matching ``text_lower``/``path_lower``.""" |
There was a problem hiding this comment.
🔴 Docstring for _namespaces_for_text missing mandatory Falsifies if: / falsifies_if: pair
Per .cursorrules, .windsurfrules, and CLAUDE.md, all docstrings must include both Falsifies if: (title-case) and falsifies_if: (lowercase). The _namespaces_for_text function at tools/generate_hashed_taxonomy.py:260 has a docstring but is missing both required forms. This function is part of the core classification logic and is exercised directly in tests, making the missing falsification contract more impactful than for pure utility functions.
| """Return the sorted tuple of namespace labels matching ``text_lower``/``path_lower``.""" | |
| """Return the sorted tuple of namespace labels matching ``text_lower``/``path_lower``. | |
| Falsifies if: a keyword present in the text or path does not yield the expected namespace. | |
| falsifies_if: a keyword present in the text or path does not yield the expected namespace. | |
| """ |
Was this helpful? React with 👍 or 👎 to provide feedback.
| "reverse_math", | ||
| "surreal", | ||
| "transfinite", | ||
| "fraction(", |
There was a problem hiding this comment.
🚩 Namespace over-classification via fraction( keyword
The math_popperian namespace includes "fraction(" as a keyword (tools/generate_hashed_taxonomy.py:95). Since .cursorrules and .windsurfrules mandate Fraction everywhere instead of float, virtually every Python file in the repo will contain Fraction( which, after .lower(), matches "fraction(". This explains the very high math_popperian count of 4825 in the committed audits/gap_analysis_20260420.json:19. The namespace classification is technically correct but extremely broad — nearly every Python file gets tagged as math_popperian, reducing the signal value of that namespace label.
Was this helpful? React with 👍 or 👎 to provide feedback.
| # Matches both variable/parameter annotations (``x: float``) and return-type | ||
| # annotations (``def f() -> float``) so the rule "no float anywhere" has full | ||
| # coverage rather than only catching the argument side. | ||
| _RE_FLOAT_ANNOT = re.compile(r"(?::|->)\s*float\b") |
There was a problem hiding this comment.
📝 Info: _RE_FLOAT_ANNOT regex doesn't catch generic float annotations like List[float]
The regex (?::|->)\s*float\b at tools/generate_hashed_taxonomy.py:166 only catches annotations where float immediately follows : or -> (modulo whitespace). Annotations like x: List[float], x: Optional[float], or x: dict[str, float] would not be flagged. The comment on lines 163-165 documents the intent as covering "variable/parameter annotations and return-type annotations", so this gap appears to be a known scope limitation rather than an oversight. If the goal is truly "no float anywhere" enforcement, this would need a broader pattern like \bfloat\b with context-aware filtering.
Was this helpful? React with 👍 or 👎 to provide feedback.
| out_path.parent.mkdir(parents=True, exist_ok=True) | ||
| ordered_hashes = sorted(e["entry_sha256"] for e in entries) | ||
| commit_payload = { | ||
| "summary": summary, | ||
| "ordered_entry_hashes": ordered_hashes, | ||
| } | ||
| audit_sha256 = _sha256_text(_canonical_json(commit_payload)) | ||
| try: | ||
| jsonl_label = jsonl_path.resolve().relative_to(REPO_ROOT).as_posix() | ||
| except ValueError: | ||
| jsonl_label = jsonl_path.name | ||
|
|
||
| # Percent of entries classified — as Fraction, rendered as "num/den". | ||
| total = max(len(entries), 1) | ||
| classified = sum(1 for e in entries if e["namespaces"] != ["unclassified"]) | ||
| frac_classified = Fraction(classified, total) | ||
| # ``metadata`` collects fields that are *intentionally* not covered by | ||
| # ``audit_sha256`` (timestamps, output paths). Separating them makes the | ||
| # commitment boundary visible to reviewers: only ``summary`` + | ||
| # ``ordered_entry_hashes`` participate in the commit payload. | ||
| doc = { | ||
| "schema": "OE-GAP-ANALYSIS-1.0", | ||
| "summary": summary, | ||
| "classified_fraction": f"{frac_classified.numerator}/{frac_classified.denominator}", | ||
| "audit_sha256": audit_sha256, | ||
| "entry_count": len(entries), | ||
| "metadata": { | ||
| "generated_at_utc": datetime.now(tz=timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ"), | ||
| "jsonl_path": jsonl_label, | ||
| "not_covered_by_audit_sha256": ["generated_at_utc", "jsonl_path"], | ||
| }, | ||
| } | ||
| with out_path.open("w", encoding="utf-8") as fh: | ||
| fh.write(json.dumps(doc, sort_keys=True, indent=2, ensure_ascii=True) + "\n") | ||
| return audit_sha256 |
There was a problem hiding this comment.
📝 Info: Determinism invariant is well-designed with clear commitment boundary
The audit_sha256 commitment at tools/generate_hashed_taxonomy.py:470-474 correctly separates deterministic content (summary + ordered entry hashes) from non-deterministic metadata (timestamps, output paths). The metadata is explicitly documented via not_covered_by_audit_sha256 in the output JSON. The _iter_candidate_files uses sorted(root.rglob("*")) for deterministic ordering, and _write_jsonl sorts by a 4-tuple including entry_sha256 as a tiebreaker. This design means two runs over an identical working tree will produce identical audit_sha256 values, satisfying the tool's core invariant.
Was this helpful? React with 👍 or 👎 to provide feedback.
|
Devin is currently unreachable - the session may have died. |
3 similar comments
|
Devin is currently unreachable - the session may have died. |
|
Devin is currently unreachable - the session may have died. |
|
Devin is currently unreachable - the session may have died. |
Summary
Stage A of the "finish everything" multi-PR rollout: produce a deterministic, hash-anchored investigative taxonomy of the repository's outstanding engineering debt, mapped into six namespaces.
New:
tools/generate_hashed_taxonomy.py— walks the repo and emits a JSONL audit keyed by six namespaces (aerospace,floor,yeshua,math_popperian,secular,projection) and nine issue types (TODO,FIXME,HACK,STUB_PASS,STUB_NOTIMPL,FLOAT_CALL,FLOAT_ANNOT,CHECK_MISSING_PROOFOBJECT,CHECK_MISSING_FALSIFIES_IF_PAIR). Every entry carries asha256_evidence+entry_sha256, and the top-level summary carriesaudit_sha256committed over(summary, ordered_entry_hashes)— content-only, so the commitment is stable across output paths.Artifacts committed under
audits/:audits/hashed_taxonomy_20260420.jsonl— 3230 entries, one per findingaudits/gap_analysis_20260420.json— summary,audit_sha256 = 7c32fdcad0f1fc02019eb8a1034f7207b97bbb44ffe3b14f2452513b432ebae9Run against current tree:
Also fixed (incidental):
STANDARDS_REGISTRY.jsonhad a duplicated"total_standards"line with no comma separator, which madepython tools/standards_check.py --listcrash withJSONDecodeError. One-line fix; registry now parses cleanly and--verifyruns (and shows separately tracked real failures that Stage B will address).Consent log entry: appended per
SOP-AI-HANDSHAKE-1.0before any repo-scope change.Review & Testing Checklist for Human
audits/gap_analysis_20260420.json— confirm the namespace/type counts look like a reasonable picture of current debt, not an over- or under-count you'd disputeaudits/hashed_taxonomy_20260420.jsonlby runninggrep '<path>' audits/hashed_taxonomy_20260420.jsonl | head— verify thelineandsnippetmatch what's actually in the filepython tools/generate_hashed_taxonomy.py --out /tmp/verify.jsonl --summary /tmp/verify.jsonand confirmaudit_sha256matches7c32fdcad0f1fc02019eb8a1034f7207b97bbb44ffe3b14f2452513b432ebae9python tools/standards_check.py --listno longer crashes withJSONDecodeErroron theSTANDARDS_REGISTRY.jsontotal_standards bugpython -m pytest tests/test_hashed_taxonomy.py -v(8 tests, all pass locally) — covers canonical JSON, sha256 format, line/function scanners, determinismNotes
audits/is added to the walker's excluded dirs, so rerunning the tool does not re-scan its own generated output.audit_sha256intentionally does not coverjsonl_pathor timestamps, so the commitment is reproducible across different output locations and days — only content drift can falsify it.cc @aidoruao
Link to Devin session: https://app.devin.ai/sessions/36c540710d5c487ab6c5f61be5879aa3