Skip to content

feat(debug-files): scan inside .zip archives on upload#1141

Open
BYK wants to merge 3 commits into
mainfrom
byk/feat/debug-files-upload-zips
Open

feat(debug-files): scan inside .zip archives on upload#1141
BYK wants to merge 3 commits into
mainfrom
byk/feat/debug-files-upload-zips

Conversation

@BYK

@BYK BYK commented Jun 25, 2026

Copy link
Copy Markdown
Member

Summary

Adds ZIP archive scanning to sentry debug-files upload, matching the legacy sentry-cli behavior (try_open_zip / walk_difs_zip). This is Tier B of the debug-files upload roadmap (Tier A core upload landed in #1139, limits + --derived-data in #1140).

.zip files encountered during a scan are now expanded in memory and their entries run through the same filter pipeline (--type/--id/feature filters, size gate) as on-disk files. Pass --no-zips to skip them.

# .zip archives are scanned in place by default
sentry debug-files upload ./symbols.zip
sentry debug-files upload ./build            # finds DIFs in any *.zip under ./build

# opt out
sentry debug-files upload ./build --no-zips

Implementation

  • New src/lib/dif/zip.tsreadZipDifEntries():
    • Detects archives by .zip extension and PK local-header magic (a misnamed non-archive falls back to normal parsing; a real DIF named foo.zip still works).
    • Extracts entries with fflate.unzipSync.
    • Zip-bomb guard: the pre-decompression filter rejects entries whose declared uncompressed size exceeds the server's max_file_size (or omitted = no gate) before inflation, so oversized entries are never materialized. Directory/empty entries are dropped.
    • Malformed/unreadable archives are skipped (debug log), never thrown.
    • Nested archives are not recursed (a .zip inside a .zip is an opaque, non-object entry), matching the legacy tool.
  • prepareDifs() gains a scanZips option (default true). The on-disk per-file logic is extracted into prepareFileDif(), and the parse step into a shared difFromBuffer() so on-disk files and in-memory ZIP entries share the same parse + filter path. Entry display paths are synthetic "<zip>/<entry>" so logs and DIF names stay meaningful.
  • upload command — adds the --no-zips flag and threads scanZips: !flags["no-zips"] through.

Notes / scope

  • fflate added as a devDependency (bundled at build time; check:deps passes).
  • The whole archive is read into memory (fflate's unzipSync is non-streaming); typical symbol zips are tens/hundreds of MB. Per-entry decompression remains bounded by the size gate.
  • Still deferred (Tier C): --symbol-maps (BCSymbolMap) and --il2cpp-mapping line mappings — these need new symbolic WASM exports.

Testing

  • test/lib/dif/zip.test.ts — 12 unit tests (magic/extension detection, malformed input, size gate, no-recursion, scanZips toggle, --type filtering, container not double-counted, misnamed-archive fallback). Archives built in memory with fflate.zipSync — no binary fixtures.
  • test/commands/debug-files/upload.test.ts — 2 command-level tests (.zip scanned by default; --no-zips ignores entries).
  • All 99 dif + debug-files tests pass; typecheck, lint, check:deps, check:patches, check:fragments clean.

Add ZIP archive scanning to `debug-files upload`, matching the legacy
sentry-cli behavior. `.zip` files encountered during a scan are expanded
in memory and their entries run through the same filter pipeline as
on-disk files. Disable with `--no-zips`.

- New `src/lib/dif/zip.ts`: `readZipDifEntries()` detects archives by
  `.zip` extension + `PK` magic, extracts entries via `fflate.unzipSync`,
  and bounds decompression with a pre-decompression size filter
  (zip-bomb guard). Directory and empty entries are dropped; nested
  archives are not recursed.
- `prepareDifs()` gains a `scanZips` option (default true); the per-file
  path is extracted into `prepareFileDif()` and the parse step into the
  shared `difFromBuffer()` so on-disk and in-memory entries share logic.
- `upload` command: add `--no-zips` flag; thread `scanZips` through.

Tests: 12 zip unit tests + 2 command wiring tests. Docs/skill updated.
@github-actions github-actions Bot added the risk: high PR risk score: high label Jun 25, 2026
@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor
PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://cli.sentry.dev/_preview/pr-1141/

Built to branch gh-pages at 2026-06-25 20:50 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Comment thread src/lib/dif/zip.ts
Comment on lines +129 to +138
if (isDirectoryEntry(file.name) || file.originalSize === 0) {
return false;
}
if (maxFileSize > 0 && file.originalSize > maxFileSize) {
oversizedCount += 1;
log.warn(
`Skipping ${path}/${file.name}: uncompressed size ${file.originalSize} exceeds maximum file size ${maxFileSize}`
);
return false;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The zip-bomb guard is bypassed for streaming ZIPs because file.originalSize can be undefined, allowing oversized files to be decompressed and risking memory exhaustion.
Severity: HIGH

Suggested Fix

Explicitly check if file.originalSize is a number before performing size comparisons. Reject files where file.originalSize is undefined or not a number to prevent them from being decompressed. For example, modify the filter condition to if (typeof file.originalSize !== 'number' || file.originalSize > maxFileSize) { ... } to safely handle entries with an unknown original size.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: src/lib/dif/zip.ts#L129-L138

Potential issue: The zip-bomb guard in `src/lib/dif/zip.ts` can be bypassed when
processing streaming ZIP archives. For these archives, the `fflate` library's
`UnzipFileInfo.originalSize` property is `undefined`. The existing checks,
`file.originalSize === 0` and `file.originalSize > maxFileSize`, both evaluate to
`false` in this scenario. This allows an entry of any size within a streaming ZIP to
bypass the size validation, leading to it being fully decompressed into memory. A
malicious actor could exploit this to cause a memory exhaustion denial-of-service by
uploading a crafted streaming ZIP file.

Did we get this right? 👍 / 👎 to inform future reviews.

Comment thread src/lib/dif/scan.ts
@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Codecov Results 📊

✅ Patch coverage is 89.61%. Project has 5110 uncovered lines.
✅ Project coverage is 81.52%. Comparing base (base) to head (head).

Files with missing lines (2)
File Patch % Lines
src/lib/dif/scan.ts 89.47% ⚠️ 4 Missing and 2 partials
src/lib/dif/zip.ts 89.74% ⚠️ 4 Missing
Coverage diff
@@            Coverage Diff             @@
##          main       #PR       +/-##
==========================================
+ Coverage    81.48%    81.52%    +0.04%
==========================================
  Files          396       397        +1
  Lines        27588     27651       +63
  Branches     17912     17950       +38
==========================================
+ Hits         22478     22541       +63
- Misses        5110      5110         —
- Partials      1866      1866         —

Generated by Codecov Action

Comment thread src/lib/dif/scan.ts
Source bundles (and JVM bundles / .src.zip) are a ZIP preceded by an
8-byte SYSB+version header, so they start with SYSB, not PK. Verify
readZipDifEntries returns null for them so they upload as sourcebundle
DIFs rather than being expanded into their inner source files.

@cursor cursor Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 4a41f2e. Configure here.

Comment thread src/lib/dif/zip.ts
@BYK

BYK commented Jun 25, 2026

Copy link
Copy Markdown
Member Author

CI status: all green except an external Anthropic API outage

Every check passes except E2E Tests, which fails only on test/e2e/skill-eval.test.ts (the LLM-driven skill eval). The failure is an upstream outage, not a code issue:

[planner] API error: Invalid response body while trying to fetch
https://api.anthropic.com/v1/messages: Premature close
  • All 126 non-LLM E2E tests pass; only the 2 skill-eval cases fail, both with the same Premature close error talking to api.anthropic.com.
  • Reproduced identically across 3 runs over ~25 min — api.anthropic.com is currently flaky/down.
  • Green here: Lint & Typecheck, Unit Tests, Validate generated files, Build npm (22/24), Build Binary, Build Docs, warden, CodeQL, dependency-review.

This PR does not touch the skill or any LLM path. Will re-run once the Anthropic API recovers.

…ession

Node 24.17.0 / 22.23.0 (CVE-2026-48931 http.Agent fix) added a `data`
listener on idle sockets that makes keep-alive fetch reuse throw false
ERR_STREAM_PREMATURE_CLOSE errors. The skill-eval E2E planner hits this
as 'Premature close' talking to api.anthropic.com. Fixed in 24.18.0
(nodejs/node#64004). A floating `node-version: "24"` silently reuses the
runner's pre-cached buggy 24.17.0, so pin the exact patched version.
@BYK

BYK commented Jun 25, 2026

Copy link
Copy Markdown
Member Author

Correction: the Premature close E2E failure was not an Anthropic outage — it's the Node.js ERR_STREAM_PREMATURE_CLOSE regression from the CVE-2026-48931 security release (24.17.0 / 22.23.0), the same bug we hit in Craft/GCS. The http.Agent fix added a data listener on idle sockets that makes keep-alive fetch reuse throw a false Premature close, which the skill-eval planner trips. Fixed in 24.18.0 (nodejs/node#64004).

The E2E job used floating node-version: "24", which silently reuses the runner's pre-cached buggy 24.17.0. Pinned it to 24.18.0 in d7175e2. Re-running now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

risk: high PR risk score: high

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant