Skip to content

fix(scanner): support non-ASCII project paths and plugin-provided skills#20

Open
levelfly wants to merge 1 commit intomcpware:mainfrom
levelfly:fix/non-ascii-paths-and-plugin-skills
Open

fix(scanner): support non-ASCII project paths and plugin-provided skills#20
levelfly wants to merge 1 commit intomcpware:mainfrom
levelfly:fix/non-ascii-paths-and-plugin-skills

Conversation

@levelfly
Copy link
Copy Markdown

Summary

Two related gaps in scanner.mjs that become visible on Windows when project paths contain non-ASCII characters (CJK etc.) and when skills are shipped by installed plugins.

1. Non-ASCII project path decoding

Claude Code encodes real paths into ~/.claude/projects/ directory names by replacing every non-alphanumeric character with -. A path containing CJK characters produces runs of trailing dashes — one per path separator and one per CJK character.

The existing segment-based resolver splits the encoded name on -, DFS-matches each segment against real directory entries, and can never match the empty segments produced by runs of CJK characters. When it returns null, discoverScopes() silently drops the project via if (!realPath) continue — every non-ASCII-path project disappears from the sidebar, along with its memories and sessions.

Fix: three resolution strategies tried in order:

  1. resolveViaSessionCwd() — reads the first few .jsonl files in the encoded dir and pulls the cwd field. Ground truth from Claude Code itself; no pattern guessing, and it handles the unavoidable encoding collision where two sibling CJK dirs of equal length collapse to the same string.
  2. Existing segment resolver — unchanged, still fast for ASCII paths.
  3. resolveEncodedProjectPathUnicode() — character-level pattern matcher. Crucially, - in the encoded name is not a pure wildcard: since the encoding preserves [A-Za-z0-9-], an encoded - means the original char was not alphanumeric. Treating - as "matches any non-alphanumeric char" correctly rejects a purely numeric name against a pattern of all dashes (digits would have encoded as digits, not dashes), while still accepting CJK names.

If all three fail, the project is still accepted rather than silently dropped. A new prettifyEncodedPath() generates a readable fallback name by turning runs of dashes into , and scanMemories / scanSessions work fine because they only need scope.claudeProjectDir (the encoded path), not the real repoDir. Downstream code that reads scope.repoDir (e.g. MCP project config lookup) already handles the null case.

A post-pass disambiguates duplicate basenames by prepending the parent directory when two projects share a basename.

2. Plugin-provided skills

README says Skills covers "Personal, Project, and installed Plugins", but scanSkills() only reads the first two. Plugins installed via claude plugin install live at ~/.claude/plugins/cache/<marketplace>/<plugin>/<version>/ and commonly ship a skills/ subdirectory. None of those skills render in the dashboard today.

Fix:

  • Extract per-skill reading into a reusable readSkillEntry() helper.
  • Read ~/.claude/plugins/installed_plugins.json and scan each plugin's <installPath>/skills/.
  • User-scope plugins → Global. Project-scope plugins → matched to the right project by re-encoding the plugin's projectPath via a new encodeClaudeProjectName() helper and comparing to scope.id (the encoded dir name). This tolerates encoding collisions and matches Claude Code's own behavior.
  • Plugin skills get subType: \"plugin-skill\" and bundle: <pluginName> so the UI can distinguish them.

Test plan

  • Fresh reinstall via npx @mcpware/claude-code-organizer, non-ASCII-path projects render in the sidebar with decoded names
  • Memories and sessions inside non-ASCII projects are listed
  • ASCII-only projects still resolve via the fast segment resolver (unchanged path)
  • Duplicate basename disambiguation works
  • Encoding collision: when two sibling paths encode to the same string, resolveViaSessionCwd picks the one whose sessions are actually present
  • Plugin skills from both user-scope and project-scope plugins appear in the correct scope with the plugin name as bundle label

All changes are isolated to src/scanner.mjs. No schema / API / UI changes.

Note: this is a resubmission of a previous PR that was closed due to stray personal paths in the description. The fix itself is unchanged.

Two related gaps in scanner.mjs become visible on Windows with non-ASCII
(e.g. CJK) directory names and with skills shipped by installed plugins.

## 1. Project path decoding for non-ASCII dirs

Claude Code encodes project paths into `~/.claude/projects/<encoded>` by
replacing every non-alphanumeric character with '-'. A path containing
CJK characters produces runs of dashes, one per encoded char, plus a
dash for each path separator.

The existing segment-based resolver splits the encoded name on '-' and
DFS-matches each segment against real directory entries. With runs of
empty segments (one per CJK character) it can never find a match and
returns null, at which point `discoverScopes()` silently drops the
project via `if (!realPath) continue`. Projects with non-ASCII names —
along with every session and memory file inside them — become invisible
in the sidebar.

This change adds three resolution strategies, tried in order:

  1. `resolveViaSessionCwd()` — reads the first few `.jsonl` session
     files in the encoded dir and pulls the `cwd` field. This is the
     ground truth: Claude Code writes the real working directory into
     every session entry, so no pattern-matching guesswork is needed.
     It also handles the unavoidable collision where two sibling CJK
     directories of equal length encode to the same string.

  2. The existing segment-based resolver — unchanged, still fast for
     normal ASCII paths.

  3. `resolveEncodedProjectPathUnicode()` — a character-level pattern
     matcher that walks the filesystem from the root and, at each level,
     tries every directory entry whose name length fits the pattern.
     The key insight is that '-' in the encoded name is NOT a pure
     wildcard: since the encoding preserves `[A-Za-z0-9]` (and '-'
     itself), an encoded '-' at some position means the original
     character was NOT alphanumeric. Treating '-' as "matches any
     non-alphanumeric char" correctly rejects a match like a purely
     numeric directory name against a pattern of all dashes (digits
     would have encoded as digits, not dashes), while still allowing
     CJK names to match.

If all three fail, the project is still accepted by `discoverScopes()`
(rather than silently dropped) and given a readable display name by a
new `prettifyEncodedPath()` helper that turns runs of dashes into `…`.
Its memories and sessions are scanned normally because `scanMemories()`
and `scanSessions()` only need `scope.claudeProjectDir` — which is the
encoded path under `~/.claude/projects/` and always valid — not the real
`repoDir`. Downstream code that reads `scope.repoDir` (e.g. MCP project
config lookup) already handles the null case.

A post-pass also disambiguates duplicate display names by prepending the
parent directory when two projects share a basename.

## 2. Plugin-provided skills

The README states that the Skills category covers "Personal (~/.claude/skills),
Project (.claude/skills), and installed Plugins", but `scanSkills()` only
reads the first two. Plugins installed via `claude plugin install` live at
`~/.claude/plugins/cache/<marketplace>/<plugin>/<version>/` and commonly
ship a `skills/` subdirectory; none of those skills render in the dashboard.

This change:

  - Extracts the per-skill reading logic from `scanSkills()` into a
    reusable `readSkillEntry()` helper.
  - Reads `~/.claude/plugins/installed_plugins.json` and, for each
    installed plugin, scans `<installPath>/skills/` for SKILL.md files.
  - Routes user-scope plugins to the Global scope and project-scope
    plugins to the matching project scope. The match is computed by
    re-encoding the plugin's `projectPath` via a new
    `encodeClaudeProjectName()` helper and comparing to `scope.id`
    (the encoded dir name). This tolerates encoding collisions and
    matches Claude Code's own behavior.
  - Marks plugin skills with `subType: "plugin-skill"` and
    `bundle: <pluginName>` so the UI can distinguish them.

## Tested on Windows 11

Before: non-ASCII-path projects are absent from the sidebar; their
memories and sessions are unreachable; plugin-provided skills do not
appear.

After: all projects render with correct names; memories and sessions
in non-ASCII projects are scanned; plugin skills appear in the
appropriate scope labelled with their source plugin.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant