diff --git a/.github/workflows/e2e-tests.yml b/.github/workflows/e2e-tests.yml index 0e6b864..7ad68f6 100644 --- a/.github/workflows/e2e-tests.yml +++ b/.github/workflows/e2e-tests.yml @@ -107,11 +107,11 @@ jobs: # with -u tests/minimal_init.lua sidesteps both, and it sets the exit # code (0 = pass, non-zero = fail/error) via :cq, so no output parsing. # - # bash_detect specs are Unix-path-only today (issue #46 handoff item 3): - # looks_like_path rejects backslashes and resolve() only treats - # /-prefixed strings as absolute. Excluded until ported to Windows paths. - $specs = Get-ChildItem tests/plugin -Filter *_spec.lua | - Where-Object { $_.Name -ne 'pre_tool_bash_detect_spec.lua' } + # The shell detector is now Windows-aware (issue #46 follow-up): + # pre_tool_shell_detect_spec.lua (renamed from *_bash_detect_spec) runs + # its Windows-path + PowerShell rows here and marks the POSIX rows + # pending, so the whole tests/plugin directory runs on Windows. + $specs = Get-ChildItem tests/plugin -Filter *_spec.lua $failed = @() foreach ($spec in $specs) { Write-Host "-- $($spec.Name) --" diff --git a/docs/adr/0009-shell-detect-split-axes.md b/docs/adr/0009-shell-detect-split-axes.md new file mode 100644 index 0000000..6c45dba --- /dev/null +++ b/docs/adr/0009-shell-detect-split-axes.md @@ -0,0 +1,26 @@ +# Shell write/delete detection splits path-convention from command-grammar + +Status: accepted + +Windows support (issue #46) needs the Tier-1 shell-write/delete detector (`pre_tool/shell_detect.lua`, renamed from `bash_detect.lua`; it feeds the [change](../../CONTEXT.md#change) [status](../../CONTEXT.md#status) values `deleted` / `bash_modified` / `bash_created`) to work on Windows. The obvious framings — "make it per-OS" or "add a Windows branch" — are both wrong, because the detector entangles **two independent axes that do not align onto OS**: *path conventions* (`/`-absolute, `~/`, `/dev/`) and *command grammar* (which verbs/operators write or delete). A git-bash shell on Windows has POSIX grammar with Windows-shaped paths; a PowerShell shell has PowerShell grammar with Windows paths — so OS is not the seam. + +We therefore split the two axes: rename `bash_detect` → `shell_detect` behind the unchanged `M.detect(cmd, cwd)` interface; extract an OS-selected **path-convention adapter** (Unix, byte-identical to the historical behaviour, plus Windows: `C:\`/`C:/`, UNC `\\…`, backslash separators, relative-against-a-backslash-cwd); and add a **PowerShell command grammar** alongside the POSIX one. Both grammars run on every command — they share the redirect operator and differ only in verbs, and the PowerShell tables deliberately exclude the bash aliases (`rm`/`cp`/`mv`/`tee`), so adding PowerShell provably cannot change a POSIX result. + +The PowerShell grammar was **informed by an empirical finding, not guessed** — the same discipline [ADR-0007](0007-windows-shim-via-shared-powershell-discovery.md) forced for the RPC layer. Ground truth (raw hook stdin teed on a Windows box, cross-checked against Claude Code session transcripts) corrected two premises we held going in: + +- **It is a separate tool, not the Bash tool emitting PowerShell.** Claude Code on Windows exposes a distinct **`PowerShell`** tool alongside `Bash` and routes shell file ops through it (the Haiku model deletes via `Remove-Item`, moves via `Move-Item`, writes via `Set-Content`/`Out-File`/`Add-Content`); `tool_name` is `"PowerShell"` with a Bash-shaped `{command}` payload. (Opus, by contrast, drives git-bash and emits POSIX `rm` with Windows-shaped paths — exactly the cross-cutting case the axis split predicted.) +- **Routing was therefore not "fine"** — the detector was not the only gap. The PreToolUse matcher was `Edit|Write|MultiEdit|Bash`, so the hook never fired for the `PowerShell` tool. The full fix is three layers: **matcher** (add `PowerShell` to the claudecode Pre/PostToolUse matchers so the hook fires), **normaliser** (fold `tool_name="PowerShell"` onto canonical `Bash`, so the dispatcher/emitters and `shell_detect` need no awareness of a separate tool), and the **detection split** itself. + +## Considered Options + +- **Per-OS `shell_detect` (grammar tables keyed by OS)** — rejected: cuts along the wrong axis. OS conflates path conventions and command grammar, and the git-bash-on-Windows case (POSIX grammar, Windows paths) breaks the mapping outright. +- **"Just a Windows branch"** — rejected: scatters OS conditionals through a delicate, edge-case-heavy module (its cases come from real historical bugs) along that same wrong axis. +- **Split the two axes** *(chosen)* — path conventions become an OS-selected adapter; command grammars are added by vocabulary, independent of OS. + +## Consequences + +- The path-convention adapter is the highest-value extraction and is needed whatever shell the agent uses (Windows paths show up even under git-bash). The Windows adapter emits canonical backslash paths, because `fnamemodify(":p")` is a no-op on an already-absolute Windows path, so the detector's output must already match the form Edit/Write key the changes registry with — otherwise the neo-tree indicator keys a different entry and misses. +- POSIX behaviour stays byte-identical through the restructure; `pre_tool_shell_detect_spec.lua` (renamed alongside the module) is the safety net, with POSIX rows and Windows rows each marked `pending` on the other OS. +- `M.detect(cmd, cwd)` stays stable, so callers (`pre_tool.handle`) are untouched. +- **Scope honesty:** the PowerShell vocabulary is table-shaped (verb→category tables), but the POSIX matchers remain the original per-verb functions rather than one unified grammar table. "A second grammar slots in as data" is met functionally — PowerShell was added without disturbing POSIX — not as a single shared table; a full POSIX grammar-table refactor was not needed to land Windows and stays out of scope. The broader integration-registry / install-engine consolidation from the same architecture review is likewise deferred. +- This refines [ADR-0007](0007-windows-shim-via-shared-powershell-discovery.md)'s "one implementation per OS" principle for the *detection* layer: detection is split by *capability axis*, not by OS. diff --git a/lua/code-preview/backends/claudecode.lua b/lua/code-preview/backends/claudecode.lua index e5192b5..5f3756f 100644 --- a/lua/code-preview/backends/claudecode.lua +++ b/lua/code-preview/backends/claudecode.lua @@ -25,6 +25,16 @@ end local HOOK_MARKER = "code-preview" local LEGACY_HOOK_MARKER = "claude-preview" -- match old entries during transition +-- Tools whose proposals we intercept. On Windows, Claude Code exposes a +-- distinct `PowerShell` tool alongside `Bash` and routes shell file ops +-- (Remove-Item / Move-Item / Set-Content …) through it — observed with the +-- Haiku model, which deletes via `Remove-Item`. Without `PowerShell` in the +-- matcher the PreToolUse hook never fires for those proposals, so a shell +-- delete/write never marks neo-tree (issue #46 follow-up). The normaliser +-- folds `PowerShell` into the canonical `Bash` path; here we just make sure +-- the hook fires. Harmless on Unix (no such tool is ever emitted there). +local TOOL_MATCHER = "Edit|Write|MultiEdit|Bash|PowerShell" + -- The hook entry is per-OS (issue #46 / ADR-0007): a .sh shim on Unix, a .ps1 -- shim on Windows invoked through PowerShell. The installer writes the -- interpreter explicitly into Claude Code's `command` field, since the file is @@ -104,11 +114,11 @@ function M.install() -- Add our entries. On Windows the command invokes PowerShell explicitly -- against the .ps1 shim; on Unix it's the bare .sh path. See ADR-0007. table.insert(data.hooks.PreToolUse, { - matcher = "Edit|Write|MultiEdit|Bash", + matcher = TOOL_MATCHER, hooks = { { type = "command", command = hook_command(preview) } }, }) table.insert(data.hooks.PostToolUse, { - matcher = "Edit|Write|MultiEdit|Bash", + matcher = TOOL_MATCHER, hooks = { { type = "command", command = hook_command(close) } }, }) diff --git a/lua/code-preview/diff.lua b/lua/code-preview/diff.lua index d12043d..79a6ccb 100644 --- a/lua/code-preview/diff.lua +++ b/lua/code-preview/diff.lua @@ -57,9 +57,10 @@ local function mark_change_and_reveal(abs_file_path, action) local reveal_dir = nil if cfg.neo_tree.reveal_root == "git" then local parent = vim.fn.fnamemodify(abs_file_path, ":h") - local git_out = vim.fn.systemlist( - "git -C " .. vim.fn.shellescape(parent) .. " rev-parse --show-toplevel 2>/dev/null" - ) + -- List-form (no shell): avoids the POSIX-only `2>/dev/null` redirect, which + -- misbehaves under Windows cmd. `shell_error` still gates the result, so a + -- non-repo parent (git's stderr + non-zero exit) simply leaves reveal_dir nil. + local git_out = vim.fn.systemlist({ "git", "-C", parent, "rev-parse", "--show-toplevel" }) if vim.v.shell_error == 0 and git_out[1] and git_out[1] ~= "" then reveal_dir = git_out[1] end diff --git a/lua/code-preview/pre_tool/bash_detect.lua b/lua/code-preview/pre_tool/bash_detect.lua deleted file mode 100644 index 3eafcd2..0000000 --- a/lua/code-preview/pre_tool/bash_detect.lua +++ /dev/null @@ -1,254 +0,0 @@ --- pre_tool/bash_detect.lua — Tier 1 shell-write + rm detection. --- --- Inputs: a Bash command string and the project cwd. --- Output: a structured table { rm_paths = {...}, write_paths = {...} }. --- --- The edge cases here all come from real bugs in the historical bash --- pre-tool detection logic; resist "obvious simplifications" without first --- reading bash_detect_spec.lua. - -local M = {} - --- Resolve a raw token (possibly relative, possibly ~-prefixed) to an absolute --- path. Caller is responsible for unquoting first. -local function resolve(p, cwd) - if p == "" then return nil end - if p:sub(1, 1) == "/" then return p end - if p == "~" then return os.getenv("HOME") or "~" end - if p:sub(1, 2) == "~/" then - return (os.getenv("HOME") or "~") .. "/" .. p:sub(3) - end - return cwd .. "/" .. p -end - --- Strip a single matching pair of surrounding quotes. -local function unquote(s) - if #s >= 2 then - local first, last = s:sub(1, 1), s:sub(-1) - if (first == '"' and last == '"') or (first == "'" and last == "'") then - return s:sub(2, -2) - end - end - return s -end - --- Undo shell backslash-escapes that bash actually requires for filenames: --- \', \", \\, \ , \$, \&. Pseudo-escapes like \n / \t are NOT touched — --- those typically appear inside quoted strings (e.g. `printf '\n\n'`) --- and undoing them here would turn the leak into a fake-looking path. -local SHELL_ESCAPED = { ["'"]=true, ['"']=true, ["\\"]=true, [" "]=true, ["$"]=true, ["&"]=true } -local function shell_unescape(s) - return (s:gsub("\\(.)", function(c) return SHELL_ESCAPED[c] and c or ("\\" .. c) end)) -end - --- True if the string looks like a transient file we should skip. -local function is_transient(p) - if p:match("%.tmp$") or p:match("%.bak$") or p:match("%.swp$") then return true end - if p:sub(-1) == "~" then return true end - if p:sub(1, 5) == "/dev/" then return true end - return false -end - --- Reject obvious non-paths (leaked escape sequences, stray quotes, etc.). --- This catches cases like `printf '\n\n'` where the regex picks --- up `\n\n'` from inside a quoted string. -local function looks_like_path(p) - if p == "" then return false end - -- Reject backslash and double-quote: those typically leak from inside - -- quoted strings (e.g. `printf '\n\n'`). Apostrophe is allowed - -- because shell_unescape produces it legitimately from `\'`. - if p:find('[\\"]') then return false end - local first = p:sub(1, 1) - if first == "/" then return true end - if p:sub(1, 2) == "./" or p:sub(1, 3) == "../" then return true end - if p:sub(1, 2) == "~/" then return true end - if first:match("[%w_]") then return true end - return false -end - --- ── rm detection ───────────────────────────────────────────────── - --- Iterate sub-commands separated by ; && || (single or double). -local function each_subcommand(cmd) - local parts = {} - local buf = {} - local i = 1 - while i <= #cmd do - local c = cmd:sub(i, i) - local n = cmd:sub(i + 1, i + 1) - if c == ";" then - table.insert(parts, table.concat(buf)); buf = {}; i = i + 1 - elseif (c == "&" and n == "&") or (c == "|" and n == "|") then - table.insert(parts, table.concat(buf)); buf = {}; i = i + 2 - elseif c == "&" or c == "|" then - table.insert(parts, table.concat(buf)); buf = {}; i = i + 1 - else - table.insert(buf, c); i = i + 1 - end - end - if #buf > 0 then table.insert(parts, table.concat(buf)) end - return parts -end - -local function detect_rm_in(subcmd, cwd) - local cmd = subcmd:gsub("^%s+", "") - -- Optional `sudo `, then `rm` as standalone command. - local body = cmd:match("^sudo%s+rm%s+(.*)$") or cmd:match("^rm%s+(.*)$") - if not body then return {} end - - local paths = {} - for token in body:gmatch("%S+") do - -- Skip flags (-rf, --force, etc.). Single-dash followed by alpha; or `--`. - if not token:match("^%-") then - local p = shell_unescape(unquote(token)) - -- Strip trailing CR (Windows-style payloads). - p = p:gsub("\r$", "") - if p ~= "" then - local abs = resolve(p, cwd) - if abs then table.insert(paths, abs) end - end - end - end - return paths -end - -function M.detect_rm_paths(cmd, cwd) - local out = {} - local seen = {} - for _, sub in ipairs(each_subcommand(cmd)) do - for _, p in ipairs(detect_rm_in(sub, cwd)) do - if not seen[p] then - seen[p] = true - table.insert(out, p) - end - end - end - return out -end - --- ── write-path detection ──────────────────────────────────────── - --- Match output redirections (`>`, `>>`, `&>`, `&>>`) and capture the filename. --- Excludes FD redirections (`2>&1`) by requiring no digit prefix on bare `>`. -local function detect_redirects(cmd) - local out = {} - -- Walk character by character so we can disambiguate `2>&1` (skip) from - -- `> file` (capture). Cheap and explicit beats a clever regex here. - local i = 1 - while i <= #cmd do - local c = cmd:sub(i, i) - local prev = i > 1 and cmd:sub(i - 1, i - 1) or "" - if c == ">" then - -- Skip if prev is digit (FD redirection like 2>) — unless prev is &. - if prev:match("%d") then - i = i + 1 - else - -- Skip ">>" doubling - if cmd:sub(i + 1, i + 1) == ">" then i = i + 1 end - i = i + 1 - -- Skip whitespace - while i <= #cmd and cmd:sub(i, i):match("%s") do i = i + 1 end - -- Collect token until terminator. Special: `>&1` etc. emit empty. - if cmd:sub(i, i) == "&" then - -- It's an FD dup like `>&1`; skip the dup target. - i = i + 1 - while i <= #cmd and cmd:sub(i, i):match("[%w_]") do i = i + 1 end - else - local start = i - while i <= #cmd do - local ch = cmd:sub(i, i) - if ch:match("[%s&;|<>(){}`]") then break end - i = i + 1 - end - if i > start then - local tok = cmd:sub(start, i - 1) - if tok ~= "" and not tok:match("^/dev/(null|stdout|stderr|tty)$") then - table.insert(out, tok) - end - end - end - end - else - i = i + 1 - end - end - return out -end - --- `mv SRC DST` / `cp SRC DST` — emit DST. Greedy last token; misses quoted --- paths with spaces and the GNU `-t DST SRC...` flag (acceptable for Tier 1). -local function detect_mv_cp(cmd) - local out = {} - for _, sub in ipairs(each_subcommand(cmd)) do - local body = sub:match("^%s*mv%s+(.+)$") or sub:match("^%s*cp%s+(.+)$") - if body then - local tokens = {} - for tok in body:gmatch("%S+") do table.insert(tokens, tok) end - if #tokens > 0 then table.insert(out, tokens[#tokens]) end - end - end - return out -end - --- `tee [-a] FILE` — captures only the first target. -local function detect_tee(cmd) - local out = {} - -- Match "tee" optionally followed by "-a", then the next token. - local body = cmd:match("tee%s+%-a%s+([^%s&;|<>(){}`]+)") - if body then - table.insert(out, body) - else - body = cmd:match("tee%s+([^%s&;|<>(){}`]+)") - if body and body ~= "-a" then table.insert(out, body) end - end - return out -end - --- `sed -i ... FILE` — last token wins; BSD's backup-suffix would be flagged --- too (acceptable for Tier 1). -local function detect_sed_i(cmd) - local out = {} - -- Match `sed` then a flag-token containing `i`, then capture trailing args - -- up to the next pipe/semicolon. - for body in cmd:gmatch("sed%s+%-[%a]*i[%a]*%s+([^|&;]+)") do - local tokens = {} - for tok in body:gmatch("%S+") do table.insert(tokens, tok) end - if #tokens > 0 then table.insert(out, tokens[#tokens]) end - end - return out -end - -function M.detect_write_paths(cmd, cwd) - local raw = {} - for _, p in ipairs(detect_redirects(cmd)) do table.insert(raw, p) end - for _, p in ipairs(detect_mv_cp(cmd)) do table.insert(raw, p) end - for _, p in ipairs(detect_tee(cmd)) do table.insert(raw, p) end - for _, p in ipairs(detect_sed_i(cmd)) do table.insert(raw, p) end - - local out = {} - local seen = {} - for _, r in ipairs(raw) do - local p = shell_unescape(unquote(r)) - if looks_like_path(p) then - local abs = resolve(p, cwd) - if abs and not is_transient(abs) and not seen[abs] then - seen[abs] = true - table.insert(out, abs) - end - end - end - return out -end - ---- Combined entry point used by pre_tool.handle for Bash proposals. ---- @param cmd string the raw Bash command ---- @param cwd string project cwd for relative-path resolution ---- @return table { rm_paths = {...absolute...}, write_paths = {...absolute...} } -function M.detect(cmd, cwd) - return { - rm_paths = M.detect_rm_paths(cmd, cwd), - write_paths = M.detect_write_paths(cmd, cwd), - } -end - -return M diff --git a/lua/code-preview/pre_tool/init.lua b/lua/code-preview/pre_tool/init.lua index 3bdf9ce..6889d6d 100644 --- a/lua/code-preview/pre_tool/init.lua +++ b/lua/code-preview/pre_tool/init.lua @@ -11,7 +11,7 @@ local M = {} local normalisers = require("code-preview.pre_tool.normalisers") local emitters = require("code-preview.pre_tool.emitters") -local bash_detect = require("code-preview.pre_tool.bash_detect") +local shell_detect = require("code-preview.pre_tool.shell_detect") local apply_edit = require("code-preview.apply.edit") local apply_multi_edit = require("code-preview.apply.multi_edit") @@ -167,7 +167,7 @@ end local function handle_bash(input) local cmd = input.tool_input.command or "" - local detected = bash_detect.detect(cmd, input.cwd or "") + local detected = shell_detect.detect(cmd, input.cwd or "") local touched = false diff --git a/lua/code-preview/pre_tool/normalisers.lua b/lua/code-preview/pre_tool/normalisers.lua index 78d961b..26ce41c 100644 --- a/lua/code-preview/pre_tool/normalisers.lua +++ b/lua/code-preview/pre_tool/normalisers.lua @@ -19,6 +19,27 @@ local function identity(raw) return raw end +-- Claude Code's hook payload is already canonical ({tool_name, cwd, tool_input}), +-- so its normaliser is almost identity. The one translation is Windows-only: in +-- addition to `Bash`, Claude Code exposes a distinct `PowerShell` tool and routes +-- shell file ops through it (e.g. the Haiku model deletes via `Remove-Item …`, +-- moves via `Move-Item`, writes via `Set-Content`/`Out-File`). Semantically those +-- are shell proposals — Tier-1 indicator-only, `bash_*` origin prefix — identical +-- in handling to `Bash`. Folding `PowerShell` onto the canonical `Bash` tool here +-- means the dispatcher, emitters, and shell_detect need no awareness of a separate +-- tool: shell_detect's grammar is what tells PowerShell and POSIX commands apart. +-- (STEP 0 of the shell-detect work confirmed tool_name="PowerShell"; the hook +-- matcher gained `PowerShell` so the proposal reaches us at all.) +local function claudecode(raw) + if raw and raw.tool_name == "PowerShell" then + local out = {} + for k, v in pairs(raw) do out[k] = v end + out.tool_name = "Bash" + return out + end + return raw +end + -- OpenCode tools as of 2026-05-19: edit, write, multiedit, bash, apply_patch -- (plus read, glob, grep, which the TS-side allowlist filters out before they -- ever reach this normaliser). Update this map when OpenCode adds a tool the @@ -234,7 +255,7 @@ local function codex(raw) end M.normalisers = { - claudecode = identity, + claudecode = claudecode, opencode = opencode, copilot = copilot, codex = codex, diff --git a/lua/code-preview/pre_tool/shell_detect.lua b/lua/code-preview/pre_tool/shell_detect.lua new file mode 100644 index 0000000..838f227 --- /dev/null +++ b/lua/code-preview/pre_tool/shell_detect.lua @@ -0,0 +1,602 @@ +-- pre_tool/shell_detect.lua — Tier 1 shell-write + delete detection. +-- +-- (Renamed from bash_detect.lua: the detector handles more than bash now.) +-- +-- Inputs: a shell command string and the project cwd. +-- Output: a structured table { rm_paths = {...}, write_paths = {...} }. +-- +-- The detector entangles two INDEPENDENT axes, and the architecture review +-- (issue #46 follow-up) split them on purpose — they are NOT the same as "OS": +-- +-- 1. Path conventions — how a raw token resolves to an absolute path, and +-- what counts as a path at all. POSIX (`/`-absolute, `~/`, `/dev/`) vs +-- Windows (`C:\`/`C:/`, UNC `\\…`, backslash separators, relative against +-- a backslash cwd). Selected by OS via `path_adapter()`. +-- 2. Command grammar — which verbs/operators write or delete. POSIX +-- (`rm`, `>`/`>>`, `mv`, `cp`, `tee`, `sed -i`) and PowerShell +-- (`Remove-Item`, `Set-Content`, `Out-File`, `Move-Item`, …). Both +-- grammars run on every command regardless of OS; they share the redirect +-- operator and differ only in verbs, so adding PowerShell cannot change a +-- POSIX result (verified by the POSIX rows in shell_detect_spec.lua). +-- +-- Why PowerShell at all: on Windows, Claude Code routes shell file ops through +-- a distinct `PowerShell` tool (the Haiku model deletes via `Remove-Item …`). +-- The claudecode normaliser folds that onto the canonical Bash path, so those +-- commands arrive here as PowerShell syntax. See STEP 0 of the handoff. +-- +-- The POSIX edge cases here all come from real bugs in the historical bash +-- pre-tool detection logic; resist "obvious simplifications" without first +-- reading shell_detect_spec.lua. + +local M = {} + +local function is_windows() + return vim.fn.has("win32") == 1 +end + +-- ── Quoting / escaping primitives ──────────────────────────────── + +-- Strip a single matching pair of surrounding quotes. +local function unquote(s) + if #s >= 2 then + local first, last = s:sub(1, 1), s:sub(-1) + if (first == '"' and last == '"') or (first == "'" and last == "'") then + return s:sub(2, -2) + end + end + return s +end + +-- Undo shell backslash-escapes that bash actually requires for filenames: +-- \', \", \\, \ , \$, \&. Pseudo-escapes like \n / \t are NOT touched — +-- those typically appear inside quoted strings (e.g. `printf '\n\n'`) +-- and undoing them here would turn the leak into a fake-looking path. +-- +-- POSIX ONLY. On Windows a backslash is a path separator, not an escape, so the +-- Windows adapter must not run this (it would eat `C:\new`'s separator or fold +-- a UNC `\\` prefix). That difference is exactly why cleaning lives in the path +-- adapter rather than being shared. +local SHELL_ESCAPED = { ["'"]=true, ['"']=true, ["\\"]=true, [" "]=true, ["$"]=true, ["&"]=true } +local function shell_unescape(s) + return (s:gsub("\\(.)", function(c) return SHELL_ESCAPED[c] and c or ("\\" .. c) end)) +end + +-- ── POSIX path adapter (behaviour byte-identical to the historical detector) ── + +local unix_paths = {} + +-- Resolve a raw token (possibly relative, possibly ~-prefixed) to an absolute +-- path. Caller is responsible for cleaning (unquote/unescape) first. +function unix_paths.resolve(p, cwd) + if p == "" then return nil end + if p:sub(1, 1) == "/" then return p end + if p == "~" then return os.getenv("HOME") or "~" end + if p:sub(1, 2) == "~/" then + return (os.getenv("HOME") or "~") .. "/" .. p:sub(3) + end + return cwd .. "/" .. p +end + +-- Unquote, undo bash escapes, drop a trailing CR (Windows-style payloads). +function unix_paths.clean(s) + local p = shell_unescape(unquote(s)) + return (p:gsub("\r$", "")) +end + +-- Reject obvious non-paths (leaked escape sequences, stray quotes, etc.). +-- This catches cases like `printf '\n\n'` where the redirect +-- regex picks up `\n\n'` from inside a quoted string. +function unix_paths.looks_like_path(p) + if p == "" then return false end + -- Reject backslash and double-quote: those typically leak from inside + -- quoted strings (e.g. `printf '\n\n'`). Apostrophe is allowed + -- because shell_unescape produces it legitimately from `\'`. + if p:find('[\\"]') then return false end + local first = p:sub(1, 1) + if first == "/" then return true end + if p:sub(1, 2) == "./" or p:sub(1, 3) == "../" then return true end + if p:sub(1, 2) == "~/" then return true end + if first:match("[%w_]") then return true end + return false +end + +-- True if the string looks like a transient file we should skip. +function unix_paths.is_transient(p) + if p:match("%.tmp$") or p:match("%.bak$") or p:match("%.swp$") then return true end + if p:sub(-1) == "~" then return true end + if p:sub(1, 5) == "/dev/" then return true end + return false +end + +-- ── Windows path adapter ───────────────────────────────────────── +-- +-- A Windows box runs BOTH git-bash (POSIX grammar, `/c/…` or relative paths) +-- and PowerShell (Windows paths), so this adapter is a superset: it accepts the +-- Unix forms too. Output is canonicalised to BACKSLASH separators with `..` +-- collapsed, because that is the exact form Claude Code's Edit/Write proposals +-- key the changes registry with (e.g. `D:\proj\file.lua`), and `fnamemodify` +-- `:p` is a no-op on an already-absolute path — it neither flips separators nor +-- collapses `..`. Emitting any other form would key a different registry entry +-- than the editor path for the same file, and the neo-tree indicator would miss. + +local win_paths = {} + +local function win_is_absolute(p) + return p:match("^%a:[\\/]") ~= nil -- drive-letter: C:\ or C:/ + or p:match("^\\\\") ~= nil -- UNC: \\server\share + or p:sub(1, 1) == "/" -- unix-rooted (git-bash) +end + +-- Canonicalise to backslash separators with ./.. collapsed. +local function win_canon(p) + return (vim.fs.normalize(p):gsub("/", "\\")) +end + +function win_paths.resolve(p, cwd) + if p == "" then return nil end + if p == "~" then return os.getenv("USERPROFILE") or os.getenv("HOME") or "~" end + if p:match("^~[\\/]") then + local home = os.getenv("USERPROFILE") or os.getenv("HOME") or "~" + return win_canon(home .. "/" .. p:sub(3)) + end + if win_is_absolute(p) then + -- UNC must keep its leading "\\"; vim.fs.normalize preserves it. + return win_canon(p) + end + return win_canon((cwd or "") .. "/" .. p) +end + +-- Unquote and drop a trailing CR. No bash unescaping: backslash is a separator. +function win_paths.clean(s) + local p = unquote(s) + return (p:gsub("\r$", "")) +end + +-- Positive-shape match: accept things that look like a real Windows/Unix path, +-- reject leaked escape sequences. We can't blanket-reject backslash like the +-- POSIX adapter does (backslash is a legitimate separator here), so instead we +-- only accept recognised path shapes. The html-comment leak `\n\n'` matches +-- none of them (a single leading `\` is neither a drive, a UNC `\\`, nor a +-- relative/absolute prefix), so it is still rejected. +function win_paths.looks_like_path(p) + if p == "" then return false end + if p:find('"') then return false end -- stray quote leak + if p:match("^%a:[\\/]") then return true end -- C:\ or C:/ + if p:match("^\\\\") then return true end -- UNC \\server + if p:sub(1, 1) == "/" then return true end -- /unix/abs (git-bash) + if p:match("^~[\\/]?") then return true end -- ~ or ~/ or ~\ + if p:match("^%.%.?[\\/]") then return true end -- ./ ../ .\ ..\ + if p:match("^%.[%w_]") then return true end -- .step0 .config dotdirs + if p:sub(1, 1):match("[%w_]") then return true end -- bare / relative name + return false +end + +function win_paths.is_transient(p) + if p:match("%.[Tt][Mm][Pp]$") or p:match("%.[Bb][Aa][Kk]$") or p:match("%.[Ss][Ww][Pp]$") then + return true + end + if p:sub(-1) == "~" then return true end + return false +end + +local function path_adapter() + return is_windows() and win_paths or unix_paths +end + +-- ── Sub-command splitting (shared) ─────────────────────────────── + +-- Iterate sub-commands separated by ; && || (single or double) and newlines. +-- PowerShell uses `;`, `|`, and newlines; `&&`/`||` exist in PS7 but not 5.1 +-- (the installed hook runs under 5.1). POSIX uses all of these. Splitting on +-- the union is safe: a separator absent from a given shell simply never occurs. +local function each_subcommand(cmd) + local parts = {} + local buf = {} + local i = 1 + while i <= #cmd do + local c = cmd:sub(i, i) + local n = cmd:sub(i + 1, i + 1) + if c == ";" or c == "\n" or c == "\r" then + table.insert(parts, table.concat(buf)); buf = {}; i = i + 1 + elseif (c == "&" and n == "&") or (c == "|" and n == "|") then + table.insert(parts, table.concat(buf)); buf = {}; i = i + 2 + elseif c == "&" or c == "|" then + table.insert(parts, table.concat(buf)); buf = {}; i = i + 1 + else + table.insert(buf, c); i = i + 1 + end + end + if #buf > 0 then table.insert(parts, table.concat(buf)) end + return parts +end + +-- ── POSIX command grammar ──────────────────────────────────────── + +local function detect_rm_in(subcmd, cwd, adapter) + local cmd = subcmd:gsub("^%s+", "") + -- Optional `sudo `, then `rm` as standalone command. + local body = cmd:match("^sudo%s+rm%s+(.*)$") or cmd:match("^rm%s+(.*)$") + if not body then return {} end + + local paths = {} + for token in body:gmatch("%S+") do + -- Skip flags (-rf, --force, etc.). Single-dash followed by alpha; or `--`. + if not token:match("^%-") then + local p = adapter.clean(token) + if p ~= "" then + local abs = adapter.resolve(p, cwd) + if abs then table.insert(paths, abs) end + end + end + end + return paths +end + +-- Match output redirections (`>`, `>>`, `&>`, `&>>`) and capture the filename. +-- Excludes FD redirections (`2>&1`) by requiring no digit prefix on bare `>`. +-- Shared by POSIX and PowerShell (both use `>` / `>>`). +local function detect_redirects(cmd) + local out = {} + -- Walk character by character so we can disambiguate `2>&1` (skip) from + -- `> file` (capture). Cheap and explicit beats a clever regex here. + local i = 1 + while i <= #cmd do + local c = cmd:sub(i, i) + local prev = i > 1 and cmd:sub(i - 1, i - 1) or "" + if c == ">" then + -- Skip if prev is digit (FD redirection like 2>) — unless prev is &. + if prev:match("%d") then + i = i + 1 + else + -- Skip ">>" doubling + if cmd:sub(i + 1, i + 1) == ">" then i = i + 1 end + i = i + 1 + -- Skip whitespace + while i <= #cmd and cmd:sub(i, i):match("%s") do i = i + 1 end + -- Collect token until terminator. Special: `>&1` etc. emit empty. + if cmd:sub(i, i) == "&" then + -- It's an FD dup like `>&1`; skip the dup target. + i = i + 1 + while i <= #cmd and cmd:sub(i, i):match("[%w_]") do i = i + 1 end + else + local start = i + while i <= #cmd do + local ch = cmd:sub(i, i) + if ch:match("[%s&;|<>(){}`]") then break end + i = i + 1 + end + if i > start then + local tok = cmd:sub(start, i - 1) + if tok ~= "" and not tok:match("^/dev/(null|stdout|stderr|tty)$") then + table.insert(out, tok) + end + end + end + end + else + i = i + 1 + end + end + return out +end + +-- `mv SRC DST` / `cp SRC DST` — emit DST. Greedy last token; misses quoted +-- paths with spaces and the GNU `-t DST SRC...` flag (acceptable for Tier 1). +local function detect_mv_cp(cmd) + local out = {} + for _, sub in ipairs(each_subcommand(cmd)) do + local body = sub:match("^%s*mv%s+(.+)$") or sub:match("^%s*cp%s+(.+)$") + if body then + local tokens = {} + for tok in body:gmatch("%S+") do table.insert(tokens, tok) end + if #tokens > 0 then table.insert(out, tokens[#tokens]) end + end + end + return out +end + +-- `tee [-a] FILE` — captures only the first target. +local function detect_tee(cmd) + local out = {} + -- Match "tee" optionally followed by "-a", then the next token. + local body = cmd:match("tee%s+%-a%s+([^%s&;|<>(){}`]+)") + if body then + table.insert(out, body) + else + body = cmd:match("tee%s+([^%s&;|<>(){}`]+)") + if body and body ~= "-a" then table.insert(out, body) end + end + return out +end + +-- `sed -i ... FILE` — last token wins; BSD's backup-suffix would be flagged +-- too (acceptable for Tier 1). +local function detect_sed_i(cmd) + local out = {} + -- Match `sed` then a flag-token containing `i`, then capture trailing args + -- up to the next pipe/semicolon. + for body in cmd:gmatch("sed%s+%-[%a]*i[%a]*%s+([^|&;]+)") do + local tokens = {} + for tok in body:gmatch("%S+") do table.insert(tokens, tok) end + if #tokens > 0 then table.insert(out, tokens[#tokens]) end + end + return out +end + +-- ── PowerShell command grammar ─────────────────────────────────── +-- +-- PowerShell cmdlets are PascalCase Verb-Noun (`Remove-Item`) with aliases +-- (`ri`, `del`, `rd`), named parameters (`-Path`, `-Destination`), switches +-- (`-Force`, `-Recurse`), positional args, comma-lists, and here-strings +-- (`@"…"@`) for `-Value`. We deliberately keep the bash aliases (`rm`, `cp`, +-- `mv`, `tee`) OUT of these tables — those are handled by the POSIX grammar +-- above, and excluding them keeps the POSIX behaviour provably unchanged. + +-- Tokenise a PowerShell sub-command. Quoted regions (including here-strings) +-- span whitespace and become a single token with their quotes preserved, so a +-- here-string `-Value @"a\nb"@` never leaks its body words as positional paths. +local function ps_tokenise(s) + local tokens, i, n = {}, 1, #s + while i <= n do + local c = s:sub(i, i) + if c:match("%s") then + i = i + 1 + else + local start = i + while i <= n do + local ch = s:sub(i, i) + if ch == "'" or ch == '"' then + -- Here-string @' / @" (quote preceded by @ then a newline). + local hs_quote = ch + if s:sub(i - 1, i - 1) == "@" then + -- consume to the closing "@ / '@ + i = i + 1 + while i <= n do + if s:sub(i, i) == hs_quote and s:sub(i + 1, i + 1) == "@" then + i = i + 2; break + end + i = i + 1 + end + else + -- ordinary quoted region + i = i + 1 + while i <= n and s:sub(i, i) ~= hs_quote do i = i + 1 end + i = i + 1 -- skip closing quote + end + elseif ch:match("%s") then + break + else + i = i + 1 + end + end + table.insert(tokens, s:sub(start, i - 1)) + end + end + return tokens +end + +-- Split a parameter value on top-level commas (PowerShell `-Path a,b,c`), +-- ignoring commas inside quotes. +local function ps_split_commas(tok) + local parts, buf, i, n = {}, {}, 1, #tok + local q = nil + while i <= n do + local c = tok:sub(i, i) + if q then + buf[#buf + 1] = c + if c == q then q = nil end + elseif c == "'" or c == '"' then + q = c; buf[#buf + 1] = c + elseif c == "," then + parts[#parts + 1] = table.concat(buf); buf = {} + else + buf[#buf + 1] = c + end + i = i + 1 + end + parts[#parts + 1] = table.concat(buf) + return parts +end + +-- Parameters that take a following value (lowercased, leading dash stripped). +local PS_VALUE_PARAMS = { + path = true, literalpath = true, filepath = true, destination = true, + newname = true, target = true, value = true, encoding = true, width = true, +} +-- Switch parameters that take no value. +local PS_SWITCHES = { + force = true, recurse = true, confirm = true, whatif = true, verbose = true, + nonewline = true, noclobber = true, passthru = true, append = true, +} + +-- Collect a parameter value starting at token `i`, joining comma-connected +-- tokens so an array literal written with spaces (`-Path "a", "b"`) is captured +-- as a single value `"a","b"` instead of leaking "b" as a stray positional. +-- Stops at a flag (`-Force`) even mid-comma, since that is malformed anyway. +local function ps_gather_value(tokens, i) + if i > #tokens then return nil, i end + local parts = { tokens[i] } + i = i + 1 + while i <= #tokens do + local prev, cur = parts[#parts], tokens[i] + if (prev:match(",%s*$") or cur:match("^,")) and not cur:match("^%-[%a]") then + parts[#parts + 1] = cur + i = i + 1 + else + break + end + end + return table.concat(parts), i +end + +-- Parse `Verb-Noun` style args into { named = {param=value}, positional = {...} }. +local function ps_parse_args(tokens) + local named, positional = {}, {} + local i = 1 + while i <= #tokens do + local t = tokens[i] + local pname = t:match("^%-([%a][%w]*)$") + if pname then + local key = pname:lower() + if PS_SWITCHES[key] then + i = i + 1 + elseif PS_VALUE_PARAMS[key] then + -- Known value parameter: consume its (possibly comma-list) value. Also + -- swallows `-Value "x"` so the content never lands in `positional`. + named[key], i = ps_gather_value(tokens, i + 1) + else + -- Unknown parameter: consume a following value only if it isn't a flag; + -- otherwise treat as a standalone switch. + local nxt = tokens[i + 1] + if nxt and not nxt:match("^%-[%a]") then + named[key], i = ps_gather_value(tokens, i + 1) + else + i = i + 1 + end + end + else + -- Positional — also gather a comma-list (`Remove-Item a, b`). + positional[#positional + 1], i = ps_gather_value(tokens, i) + end + end + return named, positional +end + +-- Cmdlet/alias → category. Bash aliases (rm/cp/mv/tee) intentionally excluded. +local PS_DELETE = { + ["remove-item"] = true, ri = true, del = true, erase = true, + rd = true, rmdir = true, +} +local PS_WRITE = { + ["set-content"] = true, sc = true, ["add-content"] = true, ac = true, + ["out-file"] = true, ["tee-object"] = true, +} +local PS_MOVE = { + ["move-item"] = true, mi = true, move = true, ["rename-item"] = true, + ren = true, rni = true, +} +local PS_COPY = { + ["copy-item"] = true, cpi = true, copy = true, +} + +-- Pull the path value(s) for a category out of parsed args. `named_keys` lists +-- the value-params to check in priority order; `pos_index` is the positional +-- fallback (1-based). Returns a list of raw tokens (pre-clean), comma-expanded. +local function ps_targets(named, positional, named_keys, pos_index) + for _, k in ipairs(named_keys) do + if named[k] then return ps_split_commas(named[k]) end + end + local p = positional[pos_index] + if p then return ps_split_commas(p) end + return {} +end + +local function detect_ps(subcmd) + local cmd = subcmd:gsub("^%s+", "") + local verb = cmd:match("^([%a][%w%-]*)") + if not verb then return {}, {} end + local key = verb:lower() + + local category + if PS_DELETE[key] then category = "delete" + elseif PS_WRITE[key] then category = "write" + elseif PS_MOVE[key] then category = "move" + elseif PS_COPY[key] then category = "copy" + else return {}, {} end + + local rest = cmd:sub(#verb + 1) + local named, positional = ps_parse_args(ps_tokenise(rest)) + + local rm_raw, write_raw = {}, {} + if category == "delete" then + -- Delete can target many files: `-Path "a","b"` (array) and/or several bare + -- positionals (`Remove-Item a b`). Collect the named path array AND every + -- positional, each comma-expanded. + for _, k in ipairs({ "path", "literalpath" }) do + if named[k] then + for _, x in ipairs(ps_split_commas(named[k])) do rm_raw[#rm_raw + 1] = x end + end + end + for _, pos in ipairs(positional) do + for _, x in ipairs(ps_split_commas(pos)) do rm_raw[#rm_raw + 1] = x end + end + elseif category == "write" then + -- Out-File uses -FilePath; Set/Add-Content use -Path; positional 1 either way. + write_raw = ps_targets(named, positional, { "path", "literalpath", "filepath" }, 1) + else -- move / copy → the destination is the write target + write_raw = ps_targets(named, positional, { "destination", "newname" }, 2) + end + return rm_raw, write_raw +end + +-- ── Aggregation ────────────────────────────────────────────────── + +-- Clean → resolve → filter a list of raw path tokens into absolute paths. +local function finalise(raw_list, cwd, adapter, opts) + opts = opts or {} + local out, seen = {}, {} + for _, r in ipairs(raw_list) do + local p = adapter.clean(r) + -- rm/delete trust their explicit targets and skip the looks_like_path + -- heuristic (matches the historical `rm` behaviour); writes/redirects run + -- it to reject leaked escape sequences and stray tokens. + if opts.skip_path_check or adapter.looks_like_path(p) then + local abs = adapter.resolve(p, cwd) + if abs and (opts.skip_transient or not adapter.is_transient(abs)) and not seen[abs] then + seen[abs] = true + out[#out + 1] = abs + end + end + end + return out +end + +-- POSIX `rm`/`sudo rm` returns already-resolved absolute paths; PowerShell +-- delete returns raw tokens that still need clean/resolve. Keep the two sources +-- explicit so the POSIX ones are never double-resolved. +function M.detect_rm_paths(cmd, cwd) + local adapter = path_adapter() + local out, seen = {}, {} + local function add(abs) + if abs and not seen[abs] then seen[abs] = true; out[#out + 1] = abs end + end + for _, sub in ipairs(each_subcommand(cmd)) do + -- POSIX rm: detect_rm_in returns resolved absolute paths. + for _, abs in ipairs(detect_rm_in(sub, cwd, adapter)) do add(abs) end + -- PowerShell delete: raw tokens → clean → resolve. + local ps_rm = select(1, detect_ps(sub)) + for _, abs in ipairs(finalise(ps_rm, cwd, adapter, { skip_path_check = true, skip_transient = true })) do + add(abs) + end + end + return out +end + +function M.detect_write_paths(cmd, cwd) + local adapter = path_adapter() + local raw = {} + -- Shared redirects + POSIX verbs (these return raw tokens). + for _, p in ipairs(detect_redirects(cmd)) do raw[#raw + 1] = p end + for _, p in ipairs(detect_mv_cp(cmd)) do raw[#raw + 1] = p end + for _, p in ipairs(detect_tee(cmd)) do raw[#raw + 1] = p end + for _, p in ipairs(detect_sed_i(cmd)) do raw[#raw + 1] = p end + -- PowerShell write / move / copy targets (raw tokens). + for _, sub in ipairs(each_subcommand(cmd)) do + local _, ps_write = detect_ps(sub) + for _, p in ipairs(ps_write) do raw[#raw + 1] = p end + end + return finalise(raw, cwd, adapter) +end + +--- Combined entry point used by pre_tool.handle for shell proposals. +--- @param cmd string the raw shell command (POSIX or PowerShell) +--- @param cwd string project cwd for relative-path resolution +--- @return table { rm_paths = {...absolute...}, write_paths = {...absolute...} } +function M.detect(cmd, cwd) + return { + rm_paths = M.detect_rm_paths(cmd, cwd), + write_paths = M.detect_write_paths(cmd, cwd), + } +end + +return M diff --git a/tests/backends/claudecode/test_install.sh b/tests/backends/claudecode/test_install.sh index 34065a7..4a210dd 100644 --- a/tests/backends/claudecode/test_install.sh +++ b/tests/backends/claudecode/test_install.sh @@ -26,7 +26,11 @@ test_install_claude_hooks() { assert_contains "$content" "PostToolUse" "should have PostToolUse hook" || return 1 assert_contains "$content" "code-preview-diff.sh" "should reference diff script" || return 1 assert_contains "$content" "code-close-diff.sh" "should reference close script" || return 1 - assert_contains "$content" "Edit|Write|MultiEdit|Bash" "should match Edit/Write/MultiEdit/Bash tools" || return 1 + # PowerShell is matched too: on Windows Claude Code routes shell file ops + # (Remove-Item / Set-Content …) through a distinct PowerShell tool (issue #46 + # follow-up). The matcher is the same on every OS; the normaliser folds + # PowerShell onto Bash. + assert_contains "$content" "Edit|Write|MultiEdit|Bash|PowerShell" "should match Edit/Write/MultiEdit/Bash/PowerShell tools" || return 1 } # ── Test: Uninstall Claude Code hooks ──────────────────────────── diff --git a/tests/plugin/pre_tool_bash_detect_spec.lua b/tests/plugin/pre_tool_bash_detect_spec.lua deleted file mode 100644 index eb9560f..0000000 --- a/tests/plugin/pre_tool_bash_detect_spec.lua +++ /dev/null @@ -1,82 +0,0 @@ --- pre_tool_bash_detect_spec.lua — Tier 1 shell-write + rm detection. --- --- Table-driven. Each row pins a documented edge case from the historical --- bash pre-tool detection logic. New rows go at the bottom with a short --- comment explaining the case. Resist "obvious simplifications" to --- bash_detect.lua without first reading these rows. - -local bash_detect = require("code-preview.pre_tool.bash_detect") - -local CWD = "/proj" -local HOME = os.getenv("HOME") or "/root" - -local function sorted(t) - local copy = {} - for _, v in ipairs(t) do table.insert(copy, v) end - table.sort(copy) - return copy -end - -describe("bash_detect.detect_rm_paths", function() - local cases = { - { name = "plain rm", cmd = "rm foo.txt", expect = { CWD .. "/foo.txt" } }, - { name = "rm with flags", cmd = "rm -rf build", expect = { CWD .. "/build" } }, - { name = "sudo rm absolute", cmd = "sudo rm /etc/foo", expect = { "/etc/foo" } }, - { name = "rm with home tilde", cmd = "rm ~/.config/x", expect = { HOME .. "/.config/x" } }, - { name = "apostrophe in filename", cmd = "rm \"it's-mine.txt\"", expect = { CWD .. "/it's-mine.txt" } }, - { name = "compound with &&", cmd = "rm a && rm b", expect = { CWD .. "/a", CWD .. "/b" } }, - { name = "compound with ;", cmd = "rm a ; rm b", expect = { CWD .. "/a", CWD .. "/b" } }, - { name = "/tmp not filtered for rm", cmd = "rm /tmp/foo", expect = { "/tmp/foo" } }, - { name = "trailing CR stripped", cmd = "rm a.txt\r", expect = { CWD .. "/a.txt" } }, - { name = "non-rm command ignored", cmd = "echo rm foo", expect = {} }, - -- Real-world case: Claude Code escapes apostrophes in unquoted filenames. - { name = "backslash-escaped apostrophe", cmd = "rm it\\'s-mine.txt", expect = { CWD .. "/it's-mine.txt" } }, - -- Backslash-escaped spaces (`rm my\ file.txt`) would need a real shell - -- tokeniser to split correctly; documented as a Tier 1 limitation. - } - for _, c in ipairs(cases) do - it(c.name, function() - assert.are.same(sorted(c.expect), sorted(bash_detect.detect_rm_paths(c.cmd, CWD))) - end) - end -end) - -describe("bash_detect.detect_write_paths", function() - local cases = { - { name = "single > redirect", cmd = "echo x > foo.txt", expect = { CWD .. "/foo.txt" } }, - { name = "append >> redirect", cmd = "echo x >> log.txt", expect = { CWD .. "/log.txt" } }, - { name = "absolute redirect", cmd = "echo x > /var/log/y", expect = { "/var/log/y" } }, - { name = "/tmp not filtered for writes", cmd = "echo x > /tmp/real", expect = { "/tmp/real" } }, - { name = "FD redirection 2>&1 skipped",cmd = "cmd 2>&1", expect = {} }, - { name = "/dev/null filtered", cmd = "cmd > /dev/null", expect = {} }, - { name = "/dev/stderr filtered", cmd = "cmd > /dev/stderr", expect = {} }, - { name = "mv writes destination", cmd = "mv a.tmp b", expect = { CWD .. "/b" } }, - { name = "cp writes destination", cmd = "cp a b", expect = { CWD .. "/b" } }, - { name = "tee target", cmd = "echo x | tee foo.txt", expect = { CWD .. "/foo.txt" } }, - { name = "tee -a target", cmd = "echo x | tee -a foo.txt", expect = { CWD .. "/foo.txt" } }, - { name = "sed -i target", cmd = "sed -i 's/x/y/' foo.txt", expect = { CWD .. "/foo.txt" } }, - { name = "transient .tmp filtered", cmd = "echo x > scratch.tmp", expect = {} }, - { name = "transient ~ filtered", cmd = "echo x > foo~", expect = {} }, - { name = "home tilde expanded", cmd = "echo x > ~/notes.md", expect = { HOME .. "/notes.md" } }, - -- The HTML-comment false positive: the redirect regex picks up the `>` - -- in `-->`, but looks_like_path rejects the resulting token because it - -- contains a backslash (leaked from inside a quoted string). - { name = "html comment false positive", cmd = "printf '\\n\\n'", expect = {} }, - { name = "dedup duplicates", cmd = "echo a > x.txt && echo b > x.txt", expect = { CWD .. "/x.txt" } }, - -- Apostrophe-escaped paths in redirects (companion to the rm case). - { name = "backslash-escaped apostrophe in redirect", cmd = "echo x > it\\'s.txt", expect = { CWD .. "/it's.txt" } }, - } - for _, c in ipairs(cases) do - it(c.name, function() - assert.are.same(sorted(c.expect), sorted(bash_detect.detect_write_paths(c.cmd, CWD))) - end) - end -end) - -describe("bash_detect.detect (combined)", function() - it("returns both rm and write paths", function() - local r = bash_detect.detect("rm old.txt && echo x > new.txt", CWD) - assert.are.same({ CWD .. "/old.txt" }, r.rm_paths) - assert.are.same({ CWD .. "/new.txt" }, r.write_paths) - end) -end) diff --git a/tests/plugin/pre_tool_handle_spec.lua b/tests/plugin/pre_tool_handle_spec.lua index a575278..31b272d 100644 --- a/tests/plugin/pre_tool_handle_spec.lua +++ b/tests/plugin/pre_tool_handle_spec.lua @@ -32,18 +32,23 @@ describe("pre_tool.handle (Bash)", function() assert.equals("bash_created", changes.get(p)) end) - it("redirect to existing file marks bash_modified", function() - -- Skipped on Windows: this case needs to create the file on disk (io.open - -- of a /tmp path fails on Windows) AND have bash_detect resolve a Windows - -- path, which is Unix-path-only today (issue #46, handoff item 3). The - -- bash-on-Windows work will re-enable this. The new-file/rm cases above use - -- forward-slash paths that don't need an on-disk file, so they still run. - if vim.fn.has("win32") == 1 then - return pending("bash_detect is Unix-path-only on Windows (issue #46)") - end - local p = "/tmp/code-preview-test-existing-" .. tostring(vim.loop.hrtime()) + it("shell write to an existing file marks bash_modified", function() + -- Portable across OSes: use the OS temp dir (no hardcoded /tmp) so the file + -- can actually be created on disk on Windows too. Drive the same shell-write + -- the agent would: a redirect on Unix, and a PowerShell Set-Content on + -- Windows (Claude Code's PowerShell tool, folded onto Bash by the + -- normaliser). Both route through handle_bash → shell_detect. + local dir = vim.fn.fnamemodify(vim.fn.tempname(), ":h") + -- Join with the native separator: the changes registry keys on + -- fnamemodify(":p"), which does not flip separators on Windows, so a mixed + -- path here would not match the detector's canonical backslash output. + local sep = (vim.fn.has("win32") == 1) and "\\" or "/" + local p = dir .. sep .. "code-preview-test-existing-" .. tostring(vim.loop.hrtime()) .. ".txt" local fh = assert(io.open(p, "w")); fh:write("hi"); fh:close() - pre_tool.handle(payload("Bash", { command = "echo x > " .. p }), "claudecode") + local cmd = (vim.fn.has("win32") == 1) + and ('Set-Content -Path "' .. p .. '" -Value "x"') + or ("echo x > " .. p) + pre_tool.handle(payload("Bash", { command = cmd }), "claudecode") assert.equals("bash_modified", changes.get(p)) os.remove(p) end) diff --git a/tests/plugin/pre_tool_normaliser_spec.lua b/tests/plugin/pre_tool_normaliser_spec.lua index dd6d1b1..e728a8d 100644 --- a/tests/plugin/pre_tool_normaliser_spec.lua +++ b/tests/plugin/pre_tool_normaliser_spec.lua @@ -20,6 +20,27 @@ describe("normalisers.normalise (claudecode)", function() it("unknown backend falls back to identity", function() assert.are.same(canonical, normalisers.normalise(canonical, "future-agent")) end) + + -- Windows: Claude Code routes shell file ops through a distinct `PowerShell` + -- tool (Haiku deletes via Remove-Item). It carries the same {command} shape as + -- Bash and is a shell proposal, so the normaliser folds it onto canonical Bash. + it("folds the PowerShell tool onto canonical Bash", function() + local raw = { + tool_name = "PowerShell", + cwd = "/proj", + tool_input = { command = 'Remove-Item -Path "foo.txt" -Force' }, + } + local out = normalisers.normalise(raw, "claudecode") + assert.equals("Bash", out.tool_name) + assert.equals('Remove-Item -Path "foo.txt" -Force', out.tool_input.command) + assert.equals("/proj", out.cwd) + end) + + it("does not mutate the input payload when folding PowerShell", function() + local raw = { tool_name = "PowerShell", cwd = "/proj", tool_input = { command = "ls" } } + normalisers.normalise(raw, "claudecode") + assert.equals("PowerShell", raw.tool_name) + end) end) describe("normalisers.normalise (opencode)", function() diff --git a/tests/plugin/pre_tool_shell_detect_spec.lua b/tests/plugin/pre_tool_shell_detect_spec.lua new file mode 100644 index 0000000..a14f9c1 --- /dev/null +++ b/tests/plugin/pre_tool_shell_detect_spec.lua @@ -0,0 +1,177 @@ +-- pre_tool_shell_detect_spec.lua — Tier 1 shell-write + delete detection. +-- +-- (Renamed from pre_tool_bash_detect_spec.lua when bash_detect → shell_detect.) +-- +-- Two axes, tested as such (see shell_detect.lua's header): +-- +-- * POSIX block — the historical bash rows, byte-for-byte. They pin +-- documented edge cases from real bugs; resist "obvious simplifications" +-- without reading them. They assume Unix path semantics (`/`-absolute, +-- $HOME, /dev/), so on Windows they are marked `pending` — the Unix CI +-- runners are the regression guard for POSIX behaviour. +-- * Windows block — git-bash-on-Windows POSIX commands AND PowerShell +-- commands, both with Windows paths. Inputs are the real STEP 0 samples +-- captured from Claude Code (the Haiku model emits `Remove-Item …` etc. +-- through a `PowerShell` tool). Marked `pending` off Windows. +-- +-- New rows go at the bottom of the relevant block with a short comment. + +local shell_detect = require("code-preview.pre_tool.shell_detect") + +local IS_WIN = vim.fn.has("win32") == 1 +local CWD = "/proj" +local HOME = os.getenv("HOME") or "/root" + +local function sorted(t) + local copy = {} + for _, v in ipairs(t) do table.insert(copy, v) end + table.sort(copy) + return copy +end + +-- ── POSIX grammar (Unix path semantics) ────────────────────────── + +describe("shell_detect.detect_rm_paths (POSIX)", function() + local cases = { + { name = "plain rm", cmd = "rm foo.txt", expect = { CWD .. "/foo.txt" } }, + { name = "rm with flags", cmd = "rm -rf build", expect = { CWD .. "/build" } }, + { name = "sudo rm absolute", cmd = "sudo rm /etc/foo", expect = { "/etc/foo" } }, + { name = "rm with home tilde", cmd = "rm ~/.config/x", expect = { HOME .. "/.config/x" } }, + { name = "apostrophe in filename", cmd = "rm \"it's-mine.txt\"", expect = { CWD .. "/it's-mine.txt" } }, + { name = "compound with &&", cmd = "rm a && rm b", expect = { CWD .. "/a", CWD .. "/b" } }, + { name = "compound with ;", cmd = "rm a ; rm b", expect = { CWD .. "/a", CWD .. "/b" } }, + { name = "/tmp not filtered for rm", cmd = "rm /tmp/foo", expect = { "/tmp/foo" } }, + { name = "trailing CR stripped", cmd = "rm a.txt\r", expect = { CWD .. "/a.txt" } }, + { name = "non-rm command ignored", cmd = "echo rm foo", expect = {} }, + -- Real-world case: Claude Code escapes apostrophes in unquoted filenames. + { name = "backslash-escaped apostrophe", cmd = "rm it\\'s-mine.txt", expect = { CWD .. "/it's-mine.txt" } }, + -- Backslash-escaped spaces (`rm my\ file.txt`) would need a real shell + -- tokeniser to split correctly; documented as a Tier 1 limitation. + } + for _, c in ipairs(cases) do + it(c.name, function() + if IS_WIN then return pending("POSIX path semantics: Unix-only") end + assert.are.same(sorted(c.expect), sorted(shell_detect.detect_rm_paths(c.cmd, CWD))) + end) + end +end) + +describe("shell_detect.detect_write_paths (POSIX)", function() + local cases = { + { name = "single > redirect", cmd = "echo x > foo.txt", expect = { CWD .. "/foo.txt" } }, + { name = "append >> redirect", cmd = "echo x >> log.txt", expect = { CWD .. "/log.txt" } }, + { name = "absolute redirect", cmd = "echo x > /var/log/y", expect = { "/var/log/y" } }, + { name = "/tmp not filtered for writes", cmd = "echo x > /tmp/real", expect = { "/tmp/real" } }, + { name = "FD redirection 2>&1 skipped",cmd = "cmd 2>&1", expect = {} }, + { name = "/dev/null filtered", cmd = "cmd > /dev/null", expect = {} }, + { name = "/dev/stderr filtered", cmd = "cmd > /dev/stderr", expect = {} }, + { name = "mv writes destination", cmd = "mv a.tmp b", expect = { CWD .. "/b" } }, + { name = "cp writes destination", cmd = "cp a b", expect = { CWD .. "/b" } }, + { name = "tee target", cmd = "echo x | tee foo.txt", expect = { CWD .. "/foo.txt" } }, + { name = "tee -a target", cmd = "echo x | tee -a foo.txt", expect = { CWD .. "/foo.txt" } }, + { name = "sed -i target", cmd = "sed -i 's/x/y/' foo.txt", expect = { CWD .. "/foo.txt" } }, + { name = "transient .tmp filtered", cmd = "echo x > scratch.tmp", expect = {} }, + { name = "transient ~ filtered", cmd = "echo x > foo~", expect = {} }, + { name = "home tilde expanded", cmd = "echo x > ~/notes.md", expect = { HOME .. "/notes.md" } }, + -- The HTML-comment false positive: the redirect regex picks up the `>` + -- in `-->`, but looks_like_path rejects the resulting token because it + -- contains a backslash (leaked from inside a quoted string). + { name = "html comment false positive", cmd = "printf '\\n\\n'", expect = {} }, + { name = "dedup duplicates", cmd = "echo a > x.txt && echo b > x.txt", expect = { CWD .. "/x.txt" } }, + -- Apostrophe-escaped paths in redirects (companion to the rm case). + { name = "backslash-escaped apostrophe in redirect", cmd = "echo x > it\\'s.txt", expect = { CWD .. "/it's.txt" } }, + } + for _, c in ipairs(cases) do + it(c.name, function() + if IS_WIN then return pending("POSIX path semantics: Unix-only") end + assert.are.same(sorted(c.expect), sorted(shell_detect.detect_write_paths(c.cmd, CWD))) + end) + end +end) + +describe("shell_detect.detect combined (POSIX)", function() + it("returns both rm and write paths", function() + if IS_WIN then return pending("POSIX path semantics: Unix-only") end + local r = shell_detect.detect("rm old.txt && echo x > new.txt", CWD) + assert.are.same({ CWD .. "/old.txt" }, r.rm_paths) + assert.are.same({ CWD .. "/new.txt" }, r.write_paths) + end) +end) + +-- ── Windows paths: git-bash POSIX commands + PowerShell commands ── +-- +-- cwd is a backslash drive path (Claude Code's hook payload form on Windows). +-- Expected outputs are canonical backslash paths — the form the changes +-- registry keys on (matching Edit/Write), so the neo-tree indicator lands. + +local WCWD = [[C:\proj]] + +describe("shell_detect.detect_rm_paths (Windows)", function() + local cases = { + -- git-bash POSIX rm with a relative path → resolved against the backslash cwd. + { name = "bash rm relative", cmd = [[rm foo.txt]], expect = { [[C:\proj\foo.txt]] } }, + -- The original STEP 0 bug: a Windows-absolute path was treated as relative + -- and the cwd was prepended, producing garbage. It must stay absolute now. + { name = "bash rm windows-absolute", cmd = [[rm -f "D:\other\cp-src.txt"]], expect = { [[D:\other\cp-src.txt]] } }, + -- PowerShell delete — the real Haiku sample. + { name = "Remove-Item -Path relative", cmd = [[Remove-Item -Path ".step0\del.txt" -Force]], expect = { [[C:\proj\.step0\del.txt]] } }, + { name = "Remove-Item windows-absolute",cmd = [[Remove-Item -Path "D:\proj\x.txt"]], expect = { [[D:\proj\x.txt]] } }, + { name = "Remove-Item comma-list", cmd = [[Remove-Item -Path "a.txt","b.txt" -Force]], expect = { [[C:\proj\a.txt]], [[C:\proj\b.txt]] } }, + -- Real Haiku multi-delete: a comma-list with a SPACE after the comma, which + -- naive tokenising splits into "a", + a stray "b" positional. Both files + -- must register (regression: only the first was marked). + { name = "Remove-Item comma-list with spaces", cmd = [[Remove-Item -Path "D:\proj\temp1.txt", "D:\proj\temp2.txt" -Force]], expect = { [[D:\proj\temp1.txt]], [[D:\proj\temp2.txt]] } }, + { name = "Remove-Item multiple positionals", cmd = [[Remove-Item a.txt b.txt]], expect = { [[C:\proj\a.txt]], [[C:\proj\b.txt]] } }, + { name = "Remove-Item switches before positional", cmd = [[Remove-Item -Recurse -Force build]], expect = { [[C:\proj\build]] } }, + { name = "del alias positional", cmd = [[del bad.txt]], expect = { [[C:\proj\bad.txt]] } }, + { name = "ri alias", cmd = [[ri ".step0\gone.log"]], expect = { [[C:\proj\.step0\gone.log]] } }, + -- Non-delete cmdlets must not register. + { name = "Get-Content ignored", cmd = [[Get-Content "x.txt"]], expect = {} }, + } + for _, c in ipairs(cases) do + it(c.name, function() + if not IS_WIN then return pending("Windows path semantics: Windows-only") end + assert.are.same(sorted(c.expect), sorted(shell_detect.detect_rm_paths(c.cmd, WCWD))) + end) + end +end) + +describe("shell_detect.detect_write_paths (Windows)", function() + local cases = { + -- git-bash POSIX redirect (forward slash relative) resolved to backslash. + { name = "bash redirect relative", cmd = [[echo x > out.txt]], expect = { [[C:\proj\out.txt]] } }, + { name = "bash redirect dotdir", cmd = [[echo x > .step0/new.txt]], expect = { [[C:\proj\.step0\new.txt]] } }, + { name = "bash redirect windows-abs", cmd = [[echo x > "D:\logs\y.txt"]], expect = { [[D:\logs\y.txt]] } }, + -- PowerShell writes — real samples. + { name = "Set-Content -Path", cmd = [[Set-Content -Path ".step0\new.txt" -Value "hi"]], expect = { [[C:\proj\.step0\new.txt]] } }, + { name = "Set-Content here-string", cmd = 'Set-Content -Path ".step0\\ps-pswrite.txt" -Value @"\nLine 1\nLine 2\n"@', expect = { [[C:\proj\.step0\ps-pswrite.txt]] } }, + { name = "Add-Content", cmd = [[Add-Content -Path ".step0\ps-existing.txt" -Value "appended"]], expect = { [[C:\proj\.step0\ps-existing.txt]] } }, + { name = "Out-File -FilePath", cmd = [[Out-File -FilePath "log.txt"]], expect = { [[C:\proj\log.txt]] } }, + { name = "Out-File via pipeline positional", cmd = [[Get-ChildItem | Out-File "list.txt"]], expect = { [[C:\proj\list.txt]] } }, + { name = "Move-Item destination", cmd = [[Move-Item -Path ".step0\a" -Destination ".step0\b.txt" -Force]], expect = { [[C:\proj\.step0\b.txt]] } }, + { name = "Copy-Item destination", cmd = [[Copy-Item -Path ".step0\a" -Destination ".step0\c.txt" -Force]], expect = { [[C:\proj\.step0\c.txt]] } }, + { name = "Move-Item windows-abs dest", cmd = [[Move-Item -Path "a" -Destination "D:\dst\b.txt"]], expect = { [[D:\dst\b.txt]] } }, + { name = "UNC destination", cmd = [[Set-Content -Path "\\srv\share\f.txt" -Value x]], expect = { [[\\srv\share\f.txt]] } }, + -- Look-alikes that must NOT be flagged as writes. + { name = "Out-Null not a write", cmd = [[New-Item -ItemType Directory d | Out-Null]], expect = {} }, + { name = "Out-String not a write", cmd = [[Get-Process | Out-String]], expect = {} }, + { name = "transient .tmp filtered", cmd = [[Set-Content -Path "scratch.tmp" -Value x]], expect = {} }, + { name = "FD redirect skipped", cmd = [[some-exe 2>&1]], expect = {} }, + } + for _, c in ipairs(cases) do + it(c.name, function() + if not IS_WIN then return pending("Windows path semantics: Windows-only") end + assert.are.same(sorted(c.expect), sorted(shell_detect.detect_write_paths(c.cmd, WCWD))) + end) + end +end) + +describe("shell_detect.detect combined (Windows)", function() + it("PowerShell delete + write in one ;-separated command", function() + if not IS_WIN then return pending("Windows path semantics: Windows-only") end + local r = shell_detect.detect( + [[Remove-Item -Path "old.txt" -Force; Set-Content -Path "new.txt" -Value "x"]], WCWD) + assert.are.same({ [[C:\proj\old.txt]] }, r.rm_paths) + assert.are.same({ [[C:\proj\new.txt]] }, r.write_paths) + end) +end)