Detect whitespace padding used to hide prompt-injection instructions (P9) by korjavin · Pull Request #24 · NVIDIA/SkillSpector

korjavin · 2026-06-11T14:53:02Z

Adds rule P9 "Whitespace Padding" under Prompt Injection, for issue #20. It detects padding that pushes injected instructions out of a reviewer's view while the agent still reads them.

P6 through P8 were taken by System Prompt Leakage, so this uses P9. One id covers all three signals; confidence carries the weighting.

Signals (all reported as P9):

Vertical: 20+ consecutive blank or whitespace-only lines. MEDIUM, raised to HIGH when content follows a gap of 40+ lines. Confidence 0.8 with trailing content, 0.6 without.
Horizontal: 80+ consecutive whitespace characters in a line, including leading indentation. MEDIUM, confidence 0.7.
Ratio: a contiguous whitespace block over 2 KB, or whitespace over 90% of a file larger than 4 KB. LOW, confidence 0.4.

Whitespace is classified by Unicode category rather than ASCII space/tab: controls (\t \n \r \v \f), categories Zs/Zl/Zp (U+00A0, U+2028, U+2029, U+3000, and so on), and the zero-width family (U+200B/C/D, U+2060, U+FEFF). That zero-width set is now one shared constant (ZERO_WIDTH_CHARS) used by P2's regex and the mcp_tool_poisoning zero-width check, so the two cannot drift; the MCP check also picks up U+2060/U+FEFF.

Each finding points at the line where padding starts and includes a visible snippet of what was hidden (for example U+00A0 x82 or \n x80).

False-positive guards: markdown fenced code is skipped for the horizontal signal; vendored files are skipped (*.min.js, *.min.css, *.lock, package-lock.json, yarn.lock, *.svg, *.map); binary-ish content (containing U+FFFD) bails out; the ratio signal stays at LOW. Eval-dataset prose and files over 1 MB are already skipped upstream.

MCP manifest description fields are covered by the same detector (horizontal and block signals; the per-file ratio signal is skipped since fields are short).

Tests cover all three signals at their thresholds, the full Unicode evasion set, the false-positive guards, and the MCP path. Thresholds are named constants, easy to tune against a real corpus. Happy to adjust the signals or thresholds before merge.

Plan for issue NVIDIA#20 — detect large whitespace padding used to hide prompt-injection instructions from review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…VIDIA#20 comment

…up, tests)

…mpleted/

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

rng1995

This is a high-quality addition. I read the detector end to end and the design is sound for an anti-evasion rule.

Strengths

Detection is genuinely Unicode-aware: is_padding_char covers ASCII controls, the Zs/Zl/Zp categories, and the explicit zero-width family, rather than ASCII space/tab only — that closes the obvious bypasses. _split_lines treating CR/CRLF/U+2028/U+2029/U+0085 as boundaries (and the offset table with the trailing sentinel) keeps char-offset math correct, and the boundary off-by-one cases are tested.
All scanning is linear (finditer on a simple alternation, anchored per-line fence match, char/line while-loops), so there's no ReDoS exposure on attacker-controlled content. The only regex change to P2/the MCP check is swapping a literal class for one built from the shared ZERO_WIDTH_CHARS set — same/expanded code points, still a plain character class.
Folding the zero-width set into one shared constant is a real improvement: it prevents P2 and the mcp_tool_poisoning hidden-text check from drifting, and it correctly extends the MCP check to U+2060/U+FEFF.
Sensible false-positive handling: fenced-code skipping for the horizontal signal, vendored/generated globs (P9-only), the ratio signal capped at LOW, dedup so one oversized span doesn't triple-report, and severity/confidence weighting per signal. Thresholds are named constants, which makes corpus tuning trivial.
Tests are thorough: each evasion code point inline and vertical, the MCP description path, threshold boundaries, CRLF/LF offsets, block/ratio dedup, fence and skip-glob guards, and the binary bailout.

Security — important, non-blocking: the U+FFFD bailout is itself an evasion vector
detect_whitespace_padding returns [] if _REPLACEMENT_CHAR (U+FFFD) appears anywhere in the content. Since files are read with errors="replace", that's a reasonable guard against truly binary blobs — but a U+FFFD is also a perfectly valid character an author can embed in a UTF-8 SKILL.md. So an adversary can drop a single U+FFFD into the file and disable P9 for the entire file, then pad freely — defeating exactly the rule being added. It's mitigated (the injected text is still subject to P1–P4 and the LLM pass, and a stray U+FFFD in a manifest is itself anomalous), so I'm not blocking on it, but I'd recommend hardening before relying on P9 as a guardrail: make the binary heuristic proportion-based (e.g. bail only when U+FFFD density exceeds some fraction), or strip/ignore U+FFFD and still scan the remainder. A regression test for "embedded U+FFFD must not suppress an otherwise-detected pad" would lock it in.

Minor / optional

Completeness: U+0085 (NEL) and U+180E aren't treated as padding chars. NEL is still caught vertically because it splits lines, but a horizontal/block run built from NEL or U+180E would slip past the in-line/block signals. Low priority given how niche these are.
A region that is simultaneously a vertical gap and has 80+ char horizontal runs per line can emit both a vertical finding and per-line horizontal findings (the dedup only suppresses block/ratio against primaries). Minor noise, not incorrect.
Micro-perf: is_padding_char calls unicodedata.category per character and the block/ratio pass re-encodes per char; fine under the upstream 1 MB cap, but an ASCII fast-path would cut the common case.

Net: correct, well-guarded, well-tested, and a real improvement to the prompt-injection surface. Approving — please consider tightening the U+FFFD bailout as a near-term follow-up.

rng1995 · 2026-06-22T19:44:05Z

@korjavin - Please resolve the conflicts, minor issues and merge the PR.

Resolve conflicts: - README.md: pattern/category counts. Base was 64/16; this branch adds P9 (whitespace padding, +1) and main adds anti-refusal (+1 category, +4 patterns), so the merged totals are 69 patterns / 17 categories. Verified against the per-category counts in the detail table. - static_patterns_prompt_injection.py: keep P2's zero-width class built from the shared ZERO_WIDTH_CHARS constant (this branch) AND main's new bidi-control pattern [U+202A-U+202E,U+2066-U+2069]. The shared constant expands to exactly main's [U+200B U+200C U+200D U+2060 U+FEFF] class, so no code points are lost. Claude-Session: https://claude.ai/code/session_012ng7GJhfjXehRtPhQuAHsN

Resolve the minor issues raised in PR review: - Security (the U+FFFD bailout was itself an evasion vector): bail only when U+FFFD *density* exceeds a threshold (0.30) instead of when a single U+FFFD is present anywhere. A lone replacement char in an otherwise-textual file can no longer disable P9 for the whole file and let an attacker pad freely. Genuine binary blobs (mostly U+FFFD) still bail. Adds a regression test plus a high-density bail test. - Completeness: treat U+0085 (NEL) and U+180E as padding chars. Both fall outside Zs/Zl/Zp, so a horizontal/block run built from them previously slipped past the in-line/block signals. Adds unit + end-to-end coverage. - Micro-perf: ASCII fast-path in is_padding_char so the common case skips the per-char unicodedata.category() call. The remaining "vertical + horizontal can both report on one region" note is left as-is: the reviewer flagged it as minor noise, not incorrect, and the dedup change would risk regressions for no correctness gain. Claude-Session: https://claude.ai/code/session_012ng7GJhfjXehRtPhQuAHsN

korjavin · 2026-06-24T19:27:15Z

Rebased/merged main and addressed the review feedback. Pushed as a merge commit + a follow-up fix commit.

Conflicts resolved

README.md — pattern/category counts. Base was 64/16; this branch adds P9 (+1 pattern) and main adds anti-refusal (+1 category, +4 patterns), so the merged totals are 69 patterns / 17 categories (verified against the per-category counts in the detail table).
static_patterns_prompt_injection.py (P2) — kept this branch's zero-width class built from the shared ZERO_WIDTH_CHARS constant and main's new bidi-control pattern [U+202A–U+202E, U+2066–U+2069]. The shared constant expands to exactly main's [U+200B U+200C U+200D U+2060 U+FEFF], so no code points are lost.

Review feedback addressed

U+FFFD bailout (the important one) is no longer an evasion vector. P9 now bails only when U+FFFD density exceeds a threshold (0.30) rather than when a single U+FFFD appears anywhere — a lone replacement char can no longer disable P9 for the whole file. Genuine binary blobs (mostly U+FFFD) still bail. Added the regression test you suggested ("embedded U+FFFD must not suppress an otherwise-detected pad") plus a high-density bail test.
Completeness: U+0085 (NEL) and U+180E are now treated as padding chars, so horizontal/block runs built from them are caught. Added unit + end-to-end coverage.
Micro-perf: added an ASCII fast-path in is_padding_char so the common case skips the per-char unicodedata.category() call.

Left as-is: the "a region can emit both a vertical and per-line horizontal finding" note — it's minor noise and not incorrect, and reworking the dedup risks regressions for no correctness gain. Happy to follow up if you'd prefer it suppressed.

All P9/static-pattern/MCP tests pass locally (149 passed). The CI run shows action_required — it needs a maintainer to approve the workflow run for this fork PR.

korjavin and others added 12 commits June 11, 2026 14:50

docs: add whitespace padding detection (P9) implementation plan

0d06810

Plan for issue NVIDIA#20 — detect large whitespace padding used to hide prompt-injection instructions from review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat: add whitespace padding detector helper module (P9 Task 1)

0dad1c7

feat: add P9 whitespace padding findings to prompt-injection analyzer

3c81ee2

feat: add P9 whitespace padding detection to MCP manifest fields

884f134

test: P9 acceptance verification and adversarial padding-char coverage

bac008f

docs: add P9 whitespace padding to README and CLAUDE.md, draft issue N…

c46232d

…VIDIA#20 comment

fix: address review findings (U+2028/2029 vertical padding, block ded…

172fac4

…up, tests)

fix: detect U+2028/U+2029 vertical padding in MCP description fields

106ab39

test: cover MCP block-kind P9 path; clarify _check_p9_padding docstring

bc1e96f

fix: codex review - trailing-gap boundary and block/ratio dedup

f427866

docs: move completed plan 20260611-detect-whitespace-padding.md to co…

0b0ad17

…mpleted/

chore: drop internal planning artifacts from feature branch

3d44702

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

rng1995 approved these changes Jun 21, 2026

View reviewed changes

korjavin added 2 commits June 24, 2026 21:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Detect whitespace padding used to hide prompt-injection instructions (P9)#24

Detect whitespace padding used to hide prompt-injection instructions (P9)#24
korjavin wants to merge 14 commits into
NVIDIA:mainfrom
korjavin:feat/detect-whitespace-padding-injection

korjavin commented Jun 11, 2026

Uh oh!

rng1995 left a comment

Uh oh!

rng1995 commented Jun 22, 2026

Uh oh!

korjavin commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

korjavin commented Jun 11, 2026

Uh oh!

rng1995 left a comment

Choose a reason for hiding this comment

Uh oh!

rng1995 commented Jun 22, 2026

Uh oh!

korjavin commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants