Skip to content

fix: positional args parsing#34

Merged
kenryu42 merged 13 commits intomainfrom
fix/positional-args-parsing
Mar 19, 2026
Merged

fix: positional args parsing#34
kenryu42 merged 13 commits intomainfrom
fix/positional-args-parsing

Conversation

@kenryu42
Copy link
Owner

@kenryu42 kenryu42 commented Mar 19, 2026

Summary

  • Fix rm -rf false positive where 2>/dev/null redirect was parsed as positional arguments (rm -rf false positive: 2>/dev/null redirect parsed as positional arguments #29)
  • Fix shell parser to correctly handle redirects, IO number prefixes, arithmetic expansions $((...)), grouped subshells, positional args, and nested command substitutions
  • Prevent false positives/negatives caused by shell-quote splitting these constructs into unexpected token sequences
  • Add comprehensive test coverage for all new parsing edge cases

Changes

  • src/core/shell.ts — Major parsing improvements:
    • Strip attached IO numbers (e.g. 2>) before redirect operators so they aren't misinterpreted as command arguments
    • Detect and skip arithmetic expansions $((...)) instead of treating inner content as executable commands
    • Handle grouped subshell parentheses to avoid depth-tracking errors
    • Add redirect target analysis (_getRedirectTargetInfo) that extracts embedded command substitutions from redirect targets
    • Add inline $(...) substitution extraction (extractInlineCommandSubstitutions) for tokens containing embedded command substitutions
    • Handle shell-quote { pattern } tokens and positional arg tokens ($1, $@, etc.) gracefully
    • Track unclosed backticks across split tokens to correctly reconstruct redirect targets
  • tests/core/command.test.ts — Added tests for redirect parsing, nested command substitutions inside redirects, and IO-number stripping
  • tests/core/edge-cases.test.ts — Added tests for arithmetic expansion, grouped subshells, and quoted inline substitutions
  • tests/core/parsing-helpers.test.ts — Added tests for splitShellCommands covering positional args, globs with pattern tokens, backtick redirect targets, and escaped content in inline substitutions
  • tests/core/rules-rm.test.ts — Added tests for rm -rf with redirects (2>/dev/null) and attached IO-number disambiguation
  • tests/core/system-info.test.ts — Increased timeout for doctor system-info tests
  • dist/ — Rebuilt distribution bundles

Testing

bun run check

Related Issue

Fixes #29rm -rf false positive: 2>/dev/null redirect parsed as positional arguments

Supersedes #30

PR Checklist

  • I have read the CONTRIBUTING.md
  • Code follows project conventions (type hints, naming, etc.)
  • bun run check passes (lint, types, dead code, rules, tests)
  • Tests added for new rules (minimum 90% coverage required)
  • Tested locally with Claude Code, OpenCode, Gemini CLI or GitHub Copilot CLI
  • Updated documentation if needed (README, AGENTS.md)
  • No version changes in package.json

Summary by CodeRabbit

  • Refactor

    • Improved shell parsing: better handling of redirects, process and command substitutions (including backticks and $()), arithmetic expansions, attached I/O-number forms, inline substitutions, and token extraction—yielding more accurate tokenization and safer analysis.
  • Tests

    • Expanded coverage and edge-case tests for redirects, nested substitutions, arithmetic/backtick behavior, process substitution, numeric vs attached redirect operands, and added timeouts for certain async tests.

- Add redirect operator detection and handling (_isRedirectOp, _getRedirectAdvance)
- Support command substitutions inside redirect targets (e.g., echo x >)
- Handle redirects inside nested command substitutions properly
- Fix ambiguous numeric args before redirects to be preserved for analysis
- Add tests for redirects in rm commands (2>/dev/null, > /dev/null)
- Test command substitution in redirect targets
- Test nested command substitutions with redirects
- Verify numeric args before redirects are preserved and analyzed
@coderabbitai
Copy link

coderabbitai bot commented Mar 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a94e3596-7f86-45fc-83fd-9c39a173023d

📥 Commits

Reviewing files that changed from the base of the PR and between 8c6f328 and b388c23.

⛔ Files ignored due to path filters (2)
  • dist/bin/cc-safety-net.js is excluded by !**/dist/**
  • dist/index.js is excluded by !**/dist/**
📒 Files selected for processing (2)
  • src/core/shell.ts
  • tests/core/analyze/parsing-helpers.test.ts

📝 Walkthrough

Walkthrough

Rewrites shell parsing in splitShellCommands to normalize input, handle process substitution, redirects, inline/nested command substitutions (backticks and $()), and arithmetic expansion; adds many parser helpers and sentinels and expands tests covering redirects, substitutions, arithmetic, and edge cases.

Changes

Cohort / File(s) Summary
Core Shell Parsing Logic
src/core/shell.ts
Major rewrite of splitShellCommands and many helpers: newline normalization + _stripAttachedIoNumbers, centralized token extraction (_getCommandTokenText), process substitution detection/extraction, redirect detection/target collection (_isRedirectOp, _getRedirectTargetInfo, _collectParenthesizedTokens), inline command/substitution parsing (extractInlineCommandSubstitutions, _findInlineCommandSubstitutionEnd, _pushInlineSubstitutionSegments), backtick/attached-suffix handling, arithmetic substitution support (ARITHMETIC_SENTINEL, _extractArithmeticSubstitution), and numerous new helpers/constants.
Parsing Helpers Tests
tests/core/analyze/parsing-helpers.test.ts
Large additions and adjustments for splitShellCommands: backtick splitting behavior changed (inner tokens only), multiple inline substitutions per token, arithmetic expansion preservation and nesting, malformed/unterminated arithmetic handling, grouped subshell parsing inside $(), redirect target rules (attached vs spaced IO numbers, quoted literals), process-substitution expectations, inline-substitution edge cases, and stripWrappersWithInfo test.
Edge-Case Analysis Tests
tests/core/analyze/edge-cases.test.ts
Added tests asserting blocking/allowing of dangerous commands appearing in grouped subshells, command-substitution redirect targets (attached and spaced), arithmetic expansions with nested/adjacent substitutions, backtick redirect targets, process substitutions, and redirect noise before --.
Explain Command Tests
tests/bin/explain/command.test.ts
New edge-case suite for explainCommand: rm redirect-to-dev-null parsing, nested command-substitution traces, numeric vs attached I/O-number token retention rules, backtick handling inside/outside redirect targets, and related trace assertions.
RM Rules Tests
tests/core/rules-rm.test.ts
Added rm -rf cases with redirections and numeric/attached redirect forms to assert allow/block behavior (including cwd-aware cases and attached 123>/dev/null handling).
System Info Tests
tests/bin/doctor/system-info.test.ts
Added 15000ms timeout to two async tests to reduce flakiness (test(..., 15000)).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Fix: tmpdir traversal bypass #31: Modifies src/core/shell.ts and touches splitShellCommands/env-assignment parsing—likely directly related to these parser changes.
  • Refactor commands #12: Updates core shell parsing logic (splitShellCommands, command/substitution handling), overlapping on parsing/refactor work.

Poem

🐰 I nibble tokens, hop through backticks and braces,

I chase redirs, parentheses, and nested places.
Arithmetic hums while substitutions play,
I sort each segment tidy, hopping all day.
Cheer for parsed shells — a rabbit's small hooray!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: positional args parsing' is concise, clear, and directly relates to the main change—fixing a parser bug where redirects were misinterpreted as positional arguments.
Description check ✅ Passed The PR description comprehensively covers all template sections: clear summary with bullet points, detailed changes per file, testing instructions, issue references, and completed checklist items indicating proper validation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/positional-args-parsing
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Mar 19, 2026

Codecov Report

❌ Patch coverage is 98.94737% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 99.50%. Comparing base (fb5ccdd) to head (b388c23).
⚠️ Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
src/core/shell.ts 98.94% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #34      +/-   ##
==========================================
- Coverage   99.53%   99.50%   -0.03%     
==========================================
  Files          50       50              
  Lines        4540     5085     +545     
==========================================
+ Hits         4519     5060     +541     
- Misses         21       25       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@greptile-apps
Copy link

greptile-apps bot commented Mar 19, 2026

Greptile Summary

This PR fixes a false positive (#29) where rm -rf /tmp/foo 2>/dev/null was incorrectly blocked because the 2 in 2>/dev/null was parsed as a positional argument rather than an IO-number redirect prefix. The fix is a comprehensive shell-parsing overhaul in src/core/shell.ts: a pre-pass (_stripAttachedIoNumbers) normalises the raw command string before shell-quote sees it (stripping attached IO numbers, converting $((…)) arithmetic to a sentinel form, and rewriting backticks to $(…)), while the token-walking logic gains dedicated branches for redirect operators, process substitutions, arithmetic expansions, and attached command substitutions.

Key changes and observations:

  • Correct fix for the root cause: 2>/dev/null-style redirects are now stripped cleanly; test coverage in rules-rm.test.ts and parsing-helpers.test.ts confirms both attached (123>/dev/null) and spaced (123 >/dev/null) forms are handled correctly.
  • Redirect target leakage: A trailing plain-string token after a parenthesized redirect target (e.g. >file$(cmd)suffix) is not consumed by _getRedirectTargetInfo and ends up pushed into the command's argument list as a false argument (see inline comment at line 921–935).
  • _isPathLikeBacktickSuffix is too narrow: Only / and . trigger the BACKTICK_ATTACHED_SUFFIX_SENTINEL injection; suffix characters such as -, _, or alphanumerics cause the suffix to detach into its own segment (see inline comment at line 1089).
  • Arithmetic double-paren closing lacks quote-context guard: The command[i+1] === ')' look-ahead that consumes the second ) of $((…)) does not check !inSingle && !inDouble, which could over-consume in malformed input (see inline comment at line 519).
  • Existing depth > 1 redirect gap remains: Redirect operators inside nested subshells at depth > 1 in extractCommandSubstitution are still not handled (acknowledged in prior review threads).
  • Test suite is well-expanded with ~274 new lines covering redirects, arithmetic, process substitutions, backtick edge cases, and inline substitution parsing.

Confidence Score: 3/5

  • Safe to merge for the primary fix, but three new parsing edge cases in the overhauled logic need attention before production.
  • The root-cause fix is correct and well-tested. However, the redirect-target trailing-token leak (line 921), the too-narrow _isPathLikeBacktickSuffix (line 1089), and the missing quote-context guard on the arithmetic double-paren close (line 519) are new logic issues introduced by this PR that don't have corresponding tests. The pre-existing depth>1 redirect gap (flagged in earlier threads) also remains. None of these cause false negatives that would let genuinely dangerous commands through undetected — they are conservatively biased — but they do mean the parsing model still has rough edges that could produce spurious arguments or stray segments in unusual inputs.
  • src/core/shell.ts — specifically _getRedirectTargetInfo (lines 921–935), _isPathLikeBacktickSuffix (lines 1089–1091), and the arithmetic closing-paren look-ahead (lines 519–525).

Important Files Changed

Filename Overview
src/core/shell.ts Major parsing overhaul: adds redirect-op skipping, arithmetic sentinel, backtick-to-$() conversion, process substitution, and inline substitution extraction. Several edge cases work correctly, but trailing tokens after parenthesized redirect targets leak into command segments, _isPathLikeBacktickSuffix is too narrow, and the arithmetic double-paren closing check lacks quote-context guards. The existing depth > 1 redirect-handling gap (flagged in earlier threads) also remains.
tests/core/analyze/parsing-helpers.test.ts Comprehensive test additions covering redirects, arithmetic expansion, backtick substitutions, process substitution, glob redirect targets, inline substitutions, and edge cases. Good coverage of the new parsing paths.
tests/core/rules-rm.test.ts Adds targeted tests for the root-cause fix: rm -rf /tmp/foo 2>/dev/null, attached IO-number stripping, and conservative blocking of numeric targets before redirects.
tests/bin/explain/command.test.ts Adds end-to-end explain-trace tests verifying the parse segments shown to users match the new redirect-stripping behaviour. Good regression coverage for the issue #29 scenario.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["splitShellCommands(command)"] --> B["_stripAttachedIoNumbers()\nStrip IO-number digits (2>, 1>)\nConvert $((…)) to ARITH_SENTINEL\nConvert backtick to $(…)"]
    B --> C["shell-quote parse()\nProduces ParseEntry tokens"]
    C --> D{Token classifier}
    D -->|Shell operator| E["Flush current segment\nStart new segment"]
    D -->|Redirect op| F["_getRedirectTargetInfo()\nExtract redirect target string"]
    F -->|Target contains dollar-paren| G["_pushInlineSubstitutionSegments()\nextractInlineCommandSubstitutions()"]
    F -->|Target is glob pattern| H["advance:2 — skip glob, no segment"]
    D -->|Process substitution| I["extractProcessSubstitution()\nRecurse into body"]
    D -->|Dollar-paren cmd sub| J["extractCommandSubstitution()"]
    J -->|Starts with ARITH_SENTINEL| K["_extractArithmeticSubstitution()\nCollect expression text\nExtract nested cmd subs"]
    J -->|Normal content| L["Depth-track parens\nOperators flush segments\nRedirects call _getRedirectTargetInfo"]
    D -->|String ending in dollar| M["_isAttachedCommandSubstitutionStart\nExtract prefix, recurse into sub"]
    D -->|Plain string or glob| N["_getCommandTokenText()\n_pushInlineSubstitutionSegments()\ncurrent.push(token)"]
    E --> D
    G --> D
    I --> D
    K --> D
    L --> D
    M --> D
    N --> D
Loading

Last reviewed commit: "fix(shell): handle g..."

…rsing

Add five test cases to improve line coverage for src/core/shell.ts from 95.62% to 99.31%:
- Process substitution nested inside command substitution (lines 236-245)
- Backtick with attached path suffix inside command substitution (line 271)
- Attached command substitution inside command substitution (lines 278-292)
- Unclosed backtick handling without hanging (lines 485-488, 1009, 1011)
- Operator token inside parenthesized redirect target (lines 1060-1061)

Remaining uncovered lines (895, 897, 1019, 1060-1061) are defensive/unreachable code
where callers pre-check conditions before calling these code paths.
@kenryu42
Copy link
Owner Author

@greptileai review the PR

Comment on lines +481 to +501

if (!inSingle && char === '`') {
const endIndex = _findBacktickEnd(command, i + 1);
if (endIndex === -1) {
result += char;
atTokenBoundary = false;
i++;
continue;
}

result += `$(${command.slice(i + 1, endIndex)})`;
if (
atTokenBoundary &&
command[endIndex + 1] &&
_isPathLikeBacktickSuffix(command[endIndex + 1] as string)
) {
result += BACKTICK_ATTACHED_SUFFIX_SENTINEL;
}
atTokenBoundary = false;
i = endIndex + 1;
continue;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Backtick inside double-quoted string converts to $(...) before shell-quote sees it

_stripAttachedIoNumbers runs before shell-quote and converts every unquoted backtick to $(...) form. The guard is only !inSingle, so backticks inside double-quoted strings are also transformed.

In bash, "foo`cmd`bar" is valid command-substitution-inside-a-double-quoted-string, so the transformation is semantically correct. However, after the transformation the result is "foo$(cmd)bar", still inside the double-quoted span. When shell-quote later tokenises this, the whole "foo$(cmd)bar" token ends up as a single string token (shell-quote doesn't split command substitutions inside double-quoted tokens). That string token is later passed to extractInlineCommandSubstitutions, which does handle $(...) inside strings, so the nested command is analysed correctly.

The subtle issue is the BACKTICK_ATTACHED_SUFFIX_SENTINEL injection that happens at line 497: the sentinel is appended when atTokenBoundary is true and the character after the closing backtick is / or .. At this point we are inside inDouble = true, so atTokenBoundary was set to false when the " was processed (line 477). That means the sentinel is never appended for backtick-in-double-quotes, so this edge case does not actually fire and the sentinel logic is harmless here.

The real concern is a different one: if inDouble is true when the backtick is encountered, the backtick-end search at line 483 (_findBacktickEnd) does not account for quote context — it simply looks for the next unescaped backtick regardless of whether that backtick is inside a nested single-quoted span. A fragment like "'a`b`c'" (backtick inside single-quotes inside double-quotes) would incorrectly be transformed into "'a$(bc'"(splitting at the first backtick and leaving a stray backtick). Consider adding a!inDoubleguard, or making_findBacktickEnd` quote-aware.

Comment on lines +379 to +384
if (typeof token === 'string') {
_pushInlineSubstitutionSegments(innerSegments, token);
expression += token;
i++;
continue;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 _extractArithmeticSubstitution: expression retained in output after inline-sub extraction

When a plain string token is encountered inside an arithmetic expression (e.g., the foo in $((foo+$(cmd)))), both things happen:

  1. _pushInlineSubstitutionSegments(innerSegments, token) — scans the token for any embedded $(...) and appends any found inner-command segments to innerSegments.
  2. expression += token — the raw token text (which still contains any $(...) syntax) is appended to the running expression string.

If that token contains an embedded substitution such as foo$(bar)baz (which shell-quote would deliver as the single string foo$(bar)baz), step 1 correctly extracts ['bar'] into innerSegments, but step 2 leaves foo$(bar)baz in expression. When the expression is eventually flushed as [expression] it will contain the raw $(bar) syntax. This segment is then seen by the rule engine as an expression string — not a danger in practice today because no rule would match on $(bar), but it makes the expression segment misleading and inconsistent with how the same situation is handled in extractCommandSubstitution (which clears currentSegment before recursing into the inner substitution).

Consider stripping the already-extracted inline substitution spans from the expression string before appending, or skipping the expression += step for any portion that was captured by _pushInlineSubstitutionSegments.

@kenryu42
Copy link
Owner Author

@greptileai review the PR

Comment on lines +201 to +203
if (tokens[startIndex] === ARITHMETIC_SENTINEL) {
return _extractArithmeticSubstitution(tokens, startIndex);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Sentinel collision causes false negative for crafted commands

extractCommandSubstitution checks tokens[startIndex] === ARITHMETIC_SENTINEL before anything else. If an analyzed command contains the literal sentinel string in a $(...) substitution — e.g. $(__CC_SAFETY_NET_ARITH_SENTINEL__ rm -rf /) — then _extractArithmeticSubstitution is invoked instead of the normal command-parsing path.

Inside _extractArithmeticSubstitution, plain string tokens (like rm, -rf, /) are appended to an expression buffer and flushed as a single joined segment ['rm-rf/']. Because those tokens are concatenated without separators, the resulting segment rm-rf/ does not match the rm rule and the dangerous command is never detected.

Nested $(...) inside the sentinel-triggered path are still analyzed correctly (the _isCommandSubstitutionStart guard at line 321 handles them), so the false negative only applies to direct argument tokens at the top level of the fake arithmetic expression.

While the sentinel is sufficiently opaque to avoid accidental collisions in real shell scripts, the fix is inexpensive: export the sentinel as an unexported module-level symbol and make extractCommandSubstitution verify that the sentinel was actually injected by _stripAttachedIoNumbers (e.g. by checking the preceding token was $ and the one before that was the arithmetic $(( position) rather than trusting that any matching string is legitimate. Alternatively, keep the sentinel approach but additionally verify that tokens[startIndex - 1] is a ( operator (i.e. the sentinel was placed there by the pre-processor).

Comment on lines +248 to +255
if (depth === 1 && _isRedirectOp(token)) {
const { redirectTarget, advance } = _getRedirectTargetInfo(tokens, i);
if (redirectTarget !== null) {
_pushInlineSubstitutionSegments(innerSegments, redirectTarget);
}
i += advance;
continue;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Redirect targets inside grouped subshells (depth > 1) are added as command arguments

The _isRedirectOp guard at line 248 is wrapped in depth === 1, so it only fires for redirects at the outermost level of a $(...) substitution. For commands like echo $( (rm -rf /tmp/foo 2>/dev/null) ), the > operator at depth 2 reaches _getCommandTokenText({op: '>'}), which returns null (no pattern property) and is silently skipped with i++. The redirect target token /dev/null that immediately follows it IS a plain string, so it gets pushed onto currentSegment, producing the segment ['rm', '-rf', '/tmp/foo', '/dev/null'].

In this specific case the false positive is harmless (the rm-rule flags /dev/null as an unrecognised target and blocks conservatively), but the same mechanism means:

  • echo $( (cmd 2>/dev/null; rm -rf /) ) at depth 2 produces ['cmd', '/dev/null', 'rm', '-rf', '/'] — all concatenated into one segment, so the rule engine sees command cmd with /dev/null rm -rf / as arguments and may not fire the rm rule.
  • Any "safe" command inside a grouped subshell with a redirect is flagged because its redirect target appears to be an extra argument.

The fix is to move the _isRedirectOp check (and the accompanying advance) outside the depth === 1 guard, or, simpler, to apply it for depth >= 1:

- if (depth === 1 && _isRedirectOp(token)) {
+ if (_isRedirectOp(token)) {

The existing depth-tracking already handles the paren balance correctly, so no other changes are needed.

@kenryu42
Copy link
Owner Author

@greptileai review the PR

@kenryu42 kenryu42 merged commit 46a35a7 into main Mar 19, 2026
9 checks passed
@kenryu42 kenryu42 deleted the fix/positional-args-parsing branch March 19, 2026 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rm -rf false positive: 2>/dev/null redirect parsed as positional arguments

1 participant