Skip to content

fix(security): R89-170b — neutralize U+2028/U+2029 in _sanitize_inline (v1.4.4)#49

Merged
WRG-11 merged 1 commit into
mainfrom
session/B-r89-170b-unicode-lineseparator
Jun 2, 2026
Merged

fix(security): R89-170b — neutralize U+2028/U+2029 in _sanitize_inline (v1.4.4)#49
WRG-11 merged 1 commit into
mainfrom
session/B-r89-170b-unicode-lineseparator

Conversation

@WRG-11
Copy link
Copy Markdown
Owner

@WRG-11 WRG-11 commented Jun 2, 2026

Summary

Fast-follow to #48 (R89-167b). The shared InstinctStore._sanitize_inline neutralization that protects all six emit sinks used a C0/C1 control-character sweep [\x00-\x1f\x7f-\x9f]. That range catches every ASCII / Latin-1 line boundary (CR, LF, VT, FF, FS, GS, RS, NEL/U+0085) but misses the two Unicode separators above U+009F that Python's str.splitlines() — and several Markdown / agent renderers — still treat as newlines:

  • U+2028 LINE SEPARATOR
  • U+2029 PARAGRAPH SEPARATOR

A promoted rule whose pattern/explain carried one of these could still break onto a new line/bullet at any of the six sinks (instinct_rules, instinct_suggestions, and the four export_platform formatters).

V flagged this residual at R89-134v and R89-168v (CANDIDATE / non-blocking); operator approved the fast-follow.

Fix (surgical — two codepoints)

- cleaned = re.sub(r"[\x00-\x1f\x7f-\x9f]+", " ", value)
+ cleaned = re.sub(r"[\x00-\x1f\x7f-\x9f

]+", " ", value)

Fixing the shared sanitizer covers all six sinks at once — the sinks are unchanged. Detection / threshold / promotion logic untouched.

Audit (verify-before-claim, not blind widening)

A full-Unicode sweep of every codepoint Python str.splitlines() treats as a line boundary, checked against the pre-fix regex:

boundaries splitlines() honours: \n \r \v \f \x1c \x1d \x1e \x85 U+2028 U+2029
caught by [\x00-\x1f\x7f-\x9f]:  all EXCEPT U+2028, U+2029
FULL-unicode sweep of missed boundaries = ['U+2028', 'U+2029']

So the change is exactly +

 — these are the only splitlines boundaries above the C0/C1 range. (re interprets / in the raw-string pattern as the codepoints.)

Tests (red → green, non-vacuous)

tests/test_unicode_lineseparator_r89_170b.py — 8 tests:

  • unit: _sanitize_inline folds U+2028 / U+2029 / both → single physical line; clean input is a no-op
  • sink-level: observe(explain="legit<SEP>- INJECTED: ...") × 10 → promote → instinct_rules prompt + export_platform("claude-md") produce no injected bullet; value preserved

Non-vacuous proof — the 7 injection/unit tests failed before the char-class change and pass after (same tests):

pre-fix:  7 failed, 215 passed
post-fix: 222 passed

Verify

  • pytest tests/ --cov-fail-under=60222 passed
  • ruff check src/ tests/ → clean
  • python tools/sync_cursor_rules.py --check → in sync (sanitizer is a no-op on legit data)

Release

Includes v1.4.4 bump (pyproject.toml + __init__.py + CHANGELOG.md). Version-bump rides in this PR per the R89-167b pattern; tag / PyPI publish are operator-gated.

…e (v1.4.4)

Fast-follow to R89-167b. The C0/C1 sweep [\x00-\x1f\x7f-\x9f] caught every
ASCII/Latin-1 line boundary but missed the two Unicode separators above U+009F
that str.splitlines() — and some Markdown/agent renderers — still treat as
newlines:

  U+2028 LINE SEPARATOR
  U+2029 PARAGRAPH SEPARATOR

A promoted rule's pattern/explain carrying one of them could still break onto a
new line/bullet at any of the 6 emit sinks. Fixing the shared sanitizer covers
all 6 at once. A full-Unicode sweep of str.splitlines() boundaries confirms
these two are the ONLY ones above the C0/C1 range, so the change is exactly
+

 — no blind char-class widening. Detection/threshold unchanged.

8 regression tests (unit + sink-level prompt/formatter), red->green non-vacuous;
full suite 222 passed, ruff clean, cursor-rules in sync. Version bump v1.4.4.

V-flagged residual: R89-134v + R89-168v (CANDIDATE/non-blocking).
@WRG-11 WRG-11 merged commit 06fe489 into main Jun 2, 2026
12 checks passed
@WRG-11 WRG-11 deleted the session/B-r89-170b-unicode-lineseparator branch June 2, 2026 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant