Skip to content

refactor(debug-files): scan paths via shared walkFiles walker#1150

Open
BYK wants to merge 2 commits into
mainfrom
byk/refactor/debug-files-shared-walker
Open

refactor(debug-files): scan paths via shared walkFiles walker#1150
BYK wants to merge 2 commits into
mainfrom
byk/refactor/debug-files-shared-walker

Conversation

@BYK

@BYK BYK commented Jun 26, 2026

Copy link
Copy Markdown
Member

What

Replace the bespoke recursive directory walker in src/lib/dif/scan.ts
(collectFiles) with the project's shared, well-tested walkFiles walker
(src/lib/scan/) — the same traversal foundation the DSN code-scanner uses.

This is a preliminary, self-contained refactor that the upcoming
debug-files upload WASM-consumption features (embedded PPDB extraction,
--il2cpp-mapping) will build on. It has no behavior change for users.

Why

walkFiles gives parallel readdir, dev:ino-based symlink-cycle detection
(sturdier than the previous visited-realpath set), AbortSignal support, and
far more test coverage than the hand-rolled walker.

Behavior preservation

walkFiles is policy-free, so its DSN-tuned defaults are explicitly overridden
to match the existing debug-file scanning contract — debug files are large
binaries that routinely live in gitignored build output (build/,
target/, Xcode DerivedData):

Option Value Why
respectGitignore false debug files live in gitignored dirs
alwaysSkipDirs [] default skip-dirs are build-output dirs — exactly where DIFs are
maxFileSize Infinity default is 256 KB; DIFs are MBs. Size gating stays in prepareDifs (format-before-size, drives oversizedCount)
followSymlinks true matches previous cycle-safe behavior
classifyBinary false see below

The deliberate format-before-size ordering that drives
PrepareDifsResult.oversizedCount is untouched and remains in prepareDifs.

New shared-walker option: classifyBinary

Adds classifyBinary?: boolean (default true, preserving current behavior) to
WalkOptions. When false, the walker skips the per-file ~8 KB NUL sniff used
to populate WalkEntry.isBinary and reports isBinary: false without reading
file contents. The DIF scanner ignores isBinary, so it opts out to avoid an
extra head-read per file on large binary trees. Also benefits other binary-tree
consumers (e.g. sourcemap discovery).

Tests

  • test/lib/scan/walker.test.ts: classifyBinary: false yields a NUL blob with
    isBinary: false while the default still reports true (proves the sniff is
    skipped).
  • test/lib/dif/scan.test.ts: new scanPaths traversal coverage — recursive
    discovery, explicit-file passthrough, files under .gitignore'd /
    node_modules dirs are still found
    (regression guard for the swap),
    overlapping-arg dedupe, symlink-cycle safety, and ENOENT → ValidationError.

Verification

pnpm run typecheck, pnpm run lint, pnpm run check:deps clean;
test/lib/scan + test/lib/dif suites pass (99 tests).

Replace the bespoke recursive directory walker in dif/scan.ts with the
project's shared, well-tested walkFiles walker (the DSN code-scanner's
traversal foundation): parallel readdir, inode-based symlink-cycle
detection, and far more test coverage.

walkFiles is policy-free, so its DSN-tuned defaults are overridden to
preserve debug-file scanning behavior: debug files are large binaries
that live in gitignored build output, so respectGitignore and the default
build-output skip-dirs are disabled and the 256 KB size cap is lifted
(size gating stays in prepareDifs to keep the format-before-size ordering
that drives oversizedCount).

Adds a classifyBinary opt-out to walkFiles (default true) so consumers
that ignore isBinary -- like this scanner -- skip the per-file 8 KB NUL
sniff.
@github-actions github-actions Bot added the risk: medium PR risk score: medium label Jun 26, 2026
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor
PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://cli.sentry.dev/_preview/pr-1150/

Built to branch gh-pages at 2026-06-26 20:44 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@cursor cursor Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit ba2848f. Configure here.

Comment thread src/lib/dif/scan.ts
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Codecov Results 📊

✅ Patch coverage is 90.00%. Project has 5123 uncovered lines.
✅ Project coverage is 81.49%. Comparing base (base) to head (head).

Files with missing lines (2)
File Patch % Lines
src/lib/dif/scan.ts 87.50% ⚠️ 3 Missing and 2 partials
src/lib/scan/walker.ts 100.00% ⚠️ 1 partials
Coverage diff
@@            Coverage Diff             @@
##          main       #PR       +/-##
==========================================
+ Coverage    81.46%    81.49%    +0.03%
==========================================
  Files          397       397         —
  Lines        27686     27682        -4
  Branches     17972     17980        +8
==========================================
+ Hits         22555     22559        +4
- Misses        5131      5123        -8
- Partials      1861      1862        +1

Generated by Codecov Action

Addresses a Cursor Bugbot finding on the walkFiles migration: the root
cwd frame is pushed directly rather than through maybeDescend (the only
place inodes are recorded), so a descendant symlink pointing back at the
scan root was followed once, re-listing the root subtree under a second
path prefix. The pre-walkFiles DIF scanner avoided this by recording the
root's realpath before reading children. Seed the root inode at walk
init when followSymlinks is enabled to restore that behavior; the default
(non-following) walk is unaffected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

risk: medium PR risk score: medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant