fix(security): R89-170b — neutralize U+2028/U+2029 in _sanitize_inline (v1.4.4)#49
Merged
Merged
Conversation
…e (v1.4.4) Fast-follow to R89-167b. The C0/C1 sweep [\x00-\x1f\x7f-\x9f] caught every ASCII/Latin-1 line boundary but missed the two Unicode separators above U+009F that str.splitlines() — and some Markdown/agent renderers — still treat as newlines: U+2028 LINE SEPARATOR U+2029 PARAGRAPH SEPARATOR A promoted rule's pattern/explain carrying one of them could still break onto a new line/bullet at any of the 6 emit sinks. Fixing the shared sanitizer covers all 6 at once. A full-Unicode sweep of str.splitlines() boundaries confirms these two are the ONLY ones above the C0/C1 range, so the change is exactly + — no blind char-class widening. Detection/threshold unchanged. 8 regression tests (unit + sink-level prompt/formatter), red->green non-vacuous; full suite 222 passed, ruff clean, cursor-rules in sync. Version bump v1.4.4. V-flagged residual: R89-134v + R89-168v (CANDIDATE/non-blocking).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fast-follow to #48 (R89-167b). The shared
InstinctStore._sanitize_inlineneutralization that protects all six emit sinks used a C0/C1 control-character sweep[\x00-\x1f\x7f-\x9f]. That range catches every ASCII / Latin-1 line boundary (CR, LF, VT, FF, FS, GS, RS, NEL/U+0085) but misses the two Unicode separators above U+009F that Python'sstr.splitlines()— and several Markdown / agent renderers — still treat as newlines:A promoted rule whose
pattern/explaincarried one of these could still break onto a new line/bullet at any of the six sinks (instinct_rules,instinct_suggestions, and the fourexport_platformformatters).V flagged this residual at R89-134v and R89-168v (CANDIDATE / non-blocking); operator approved the fast-follow.
Fix (surgical — two codepoints)
Fixing the shared sanitizer covers all six sinks at once — the sinks are unchanged. Detection / threshold / promotion logic untouched.
Audit (verify-before-claim, not blind widening)
A full-Unicode sweep of every codepoint Python
str.splitlines()treats as a line boundary, checked against the pre-fix regex:So the change is exactly
+— these are the only splitlines boundaries above the C0/C1 range. (reinterprets/in the raw-string pattern as the codepoints.)Tests (red → green, non-vacuous)
tests/test_unicode_lineseparator_r89_170b.py— 8 tests:_sanitize_inlinefolds U+2028 / U+2029 / both → single physical line; clean input is a no-opobserve(explain="legit<SEP>- INJECTED: ...")× 10 → promote →instinct_rulesprompt +export_platform("claude-md")produce no injected bullet; value preservedNon-vacuous proof — the 7 injection/unit tests failed before the char-class change and pass after (same tests):
Verify
pytest tests/ --cov-fail-under=60→ 222 passedruff check src/ tests/→ cleanpython tools/sync_cursor_rules.py --check→ in sync (sanitizer is a no-op on legit data)Release
Includes v1.4.4 bump (
pyproject.toml+__init__.py+CHANGELOG.md). Version-bump rides in this PR per the R89-167b pattern; tag / PyPI publish are operator-gated.