Skip to content

Binary files not filtered leading to oversized diff and AI context overflow #11

@SunYanbox

Description

@SunYanbox

Description

getFileDiff(), getFilesDiff(), and getBranchDiff() in src/gitService.ts do not filter out binary files before feeding diff into AI generation. While Git outputs a short Binary files differ summary for known binary extensions, several scenarios still cause problems:

  1. Uncommon binary extensions (.whl, .blend, .bin, etc.) may be treated as text by Git, producing garbled binary content in the diff output
  2. Even the short Binary files differ line wastes token budget when many binary files are modified
  3. No .gitattributes is provided or detected at project level
  4. AI-powered features (generatePrContent, generateCommitMessage, generatePrDescription) depend on diff from getFilesDiff / getBranchDiff — garbage binary content consumes context window and can cause AI calls to exceed token limits

Affected Code

  • src/gitService.tsgetFileDiff() (line 293), getFilesDiff() (line 788), getBranchDiff() (line 720)
  • src/aiService.ts — all generation functions consume diff output
  • src/inputService.ts — passes diff to AI without pre-filtering

Suggested Solution

  1. Built-in common binary extension denylist — skip these in getFilesDiff / getBranchDiff:
    .png, .jpg, .jpeg, .gif, .ico, .svg, .webp, .bmp, .tiff, .mp4, .avi, .mov, .wmv, .flv, .mkv, .mp3, .wav, .flac, .ogg, .aac, .wma, .zip, .tar, .gz, .7z, .rar, .exe, .dll, .so, .dylib, .wasm, .pdf, .doc, .docx, .xls, .xlsx, .ppt, .pptx, .ttf, .otf, .woff, .woff2, .eot, .pyc, .class, .o, .a, .lib, .obj

  2. Detect binary content in diff output — check if the diff contains Binary files or GIT binary patch headers and skip those entries

  3. Optionally generate a default .gitattributes with common binary patterns, or prompt users to set one up

  4. Filter at diff collection entry points (getFilesDiff / getBranchDiff) so downstream callers don't need to worry about it

No implementation required for this issue — design discussion and solution architecture only.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions