Ingest layer (URL / zip / git clone) is unbounded upstream of the 50 MiB per-file gate

## Summary

The per-file read cap added by the bounded-reads work (50 MiB, enforced in `build_context._validate_file_sizes` and the analyzer guards) sits downstream of `InputHandler.resolve()`, which pulls the scan target from a URL, zip, or git clone with no size budget of its own. A large download or a decompression bomb exhausts memory or disk before the per-file gate runs, so the 50 MiB guarantee does not hold for url/zip/git inputs.

## Specifics (`src/skillspector/input_handler.py`)

- `_download_file` (157-159): `client.get(url)` then `response.content` buffers the whole body in memory, no streaming, no Content-Length or byte-count limit. A multi-GB URL is a memory DoS.
- `_extract_zip` (182): `zf.extractall(extract_dir)` with no uncompressed-size or file-count budget. A zip bomb fills the disk. (zipfile sanitises member names, so this is not Zip Slip / path traversal.)
- `_clone_git` (131): `git clone --depth 1` has a 60s timeout but no post-clone size cap; a large shallow repo still lands on disk.

## Suggested direction

- Stream URL downloads with a hard byte ceiling, abort once exceeded.
- Bound zip extraction by total uncompressed size and member count (check `ZipInfo.file_size` before extracting).
- Cap clone size (a disk-usage check after clone, or a partial-clone filter), and surface the cap.
- Document in the README that 50 MiB is a per-file analysis limit, not an ingest limit.

## Tests missing

zip-bomb, oversized-URL, oversized-clone, and single-file fail-closed coverage.

Surfaced during review of the bounded-reads PR (#19) by @rng1995.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ingest layer (URL / zip / git clone) is unbounded upstream of the 50 MiB per-file gate #131

Summary

Specifics (`src/skillspector/input_handler.py`)

Suggested direction

Tests missing

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Ingest layer (URL / zip / git clone) is unbounded upstream of the 50 MiB per-file gate #131

Description

Summary

Specifics (src/skillspector/input_handler.py)

Suggested direction

Tests missing

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Specifics (`src/skillspector/input_handler.py`)