refactor(debug-files): scan paths via shared walkFiles walker#1150
Open
BYK wants to merge 2 commits into
Open
Conversation
Replace the bespoke recursive directory walker in dif/scan.ts with the project's shared, well-tested walkFiles walker (the DSN code-scanner's traversal foundation): parallel readdir, inode-based symlink-cycle detection, and far more test coverage. walkFiles is policy-free, so its DSN-tuned defaults are overridden to preserve debug-file scanning behavior: debug files are large binaries that live in gitignored build output, so respectGitignore and the default build-output skip-dirs are disabled and the 256 KB size cap is lifted (size gating stays in prepareDifs to keep the format-before-size ordering that drives oversizedCount). Adds a classifyBinary opt-out to walkFiles (default true) so consumers that ignore isBinary -- like this scanner -- skip the per-file 8 KB NUL sniff.
Contributor
|
Contributor
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit ba2848f. Configure here.
Contributor
Codecov Results 📊✅ Patch coverage is 90.00%. Project has 5123 uncovered lines. Files with missing lines (2)
Coverage diff@@ Coverage Diff @@
## main #PR +/-##
==========================================
+ Coverage 81.46% 81.49% +0.03%
==========================================
Files 397 397 —
Lines 27686 27682 -4
Branches 17972 17980 +8
==========================================
+ Hits 22555 22559 +4
- Misses 5131 5123 -8
- Partials 1861 1862 +1Generated by Codecov Action |
Addresses a Cursor Bugbot finding on the walkFiles migration: the root cwd frame is pushed directly rather than through maybeDescend (the only place inodes are recorded), so a descendant symlink pointing back at the scan root was followed once, re-listing the root subtree under a second path prefix. The pre-walkFiles DIF scanner avoided this by recording the root's realpath before reading children. Seed the root inode at walk init when followSymlinks is enabled to restore that behavior; the default (non-following) walk is unaffected.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

What
Replace the bespoke recursive directory walker in
src/lib/dif/scan.ts(
collectFiles) with the project's shared, well-testedwalkFileswalker(
src/lib/scan/) — the same traversal foundation the DSN code-scanner uses.This is a preliminary, self-contained refactor that the upcoming
debug-files uploadWASM-consumption features (embedded PPDB extraction,--il2cpp-mapping) will build on. It has no behavior change for users.Why
walkFilesgives parallelreaddir,dev:ino-based symlink-cycle detection(sturdier than the previous visited-
realpathset),AbortSignalsupport, andfar more test coverage than the hand-rolled walker.
Behavior preservation
walkFilesis policy-free, so its DSN-tuned defaults are explicitly overriddento match the existing debug-file scanning contract — debug files are large
binaries that routinely live in gitignored build output (
build/,target/, XcodeDerivedData):respectGitignorefalsealwaysSkipDirs[]maxFileSizeInfinityprepareDifs(format-before-size, drivesoversizedCount)followSymlinkstrueclassifyBinaryfalseThe deliberate format-before-size ordering that drives
PrepareDifsResult.oversizedCountis untouched and remains inprepareDifs.New shared-walker option:
classifyBinaryAdds
classifyBinary?: boolean(defaulttrue, preserving current behavior) toWalkOptions. Whenfalse, the walker skips the per-file ~8 KB NUL sniff usedto populate
WalkEntry.isBinaryand reportsisBinary: falsewithout readingfile contents. The DIF scanner ignores
isBinary, so it opts out to avoid anextra head-read per file on large binary trees. Also benefits other binary-tree
consumers (e.g. sourcemap discovery).
Tests
test/lib/scan/walker.test.ts:classifyBinary: falseyields a NUL blob withisBinary: falsewhile the default still reportstrue(proves the sniff isskipped).
test/lib/dif/scan.test.ts: newscanPathstraversal coverage — recursivediscovery, explicit-file passthrough, files under
.gitignore'd /node_modulesdirs are still found (regression guard for the swap),overlapping-arg dedupe, symlink-cycle safety, and ENOENT →
ValidationError.Verification
pnpm run typecheck,pnpm run lint,pnpm run check:depsclean;test/lib/scan+test/lib/difsuites pass (99 tests).