fix(input-handler): validate URLs against SSRF and add zip-slip protection#159
Conversation
|
Thanks for picking this up. I read the post-#159 One gap remains, and I think it is load-bearing. On the zip side I'd leave Happy to be wrong on the redirect point, push back with the hop you think revalidates if I missed one. Otherwise the allowlist and private-IP work is good and I'd like to see it land, with |
…ip-slip protection Three security hardening changes to the input handler: 1. SSRF protection: Both git clone and file download now validate the target hostname against an allowlist of known-safe hosts (GitHub, GitLab, Bitbucket, HuggingFace). URLs resolving to private/loopback IPs are blocked regardless of hostname. 2. Zip-slip prevention: Archive extraction now validates every member path against the extraction directory to prevent path traversal attacks (e.g., ../../etc/passwd entries). 3. Configurable allowlists: ALLOWED_GIT_HOSTS and ALLOWED_DOWNLOAD_HOSTS are module-level constants that can be extended for enterprise deployments with internal forges. Fixes NVIDIA#147
Set follow_redirects=False on the httpx download client. The URL allowlist (_validate_url_host) only checked the initial request URL, so a 3xx redirect from an allowed host to a private/metadata IP bypassed SSRF protection. Since we control the expected download URLs (GitHub raw, HuggingFace), legitimate redirects are not needed. Add test asserting follow_redirects=False is passed to httpx.Client. Credit to @wernerkasselman-au for identifying the redirect gap. Signed-off-by: mimran-khan <mohammed_imran.khan@outlook.com>
1d2baef to
f5305b9
Compare
|
Thanks for the thorough review @wernerkasselman-au — you're right about the redirect bypass, and the
Branch has been rebased on latest |
rng1995
left a comment
There was a problem hiding this comment.
Verdict: Approve — solid SSRF + zip-slip hardening that fails closed.
What's good
_validate_url_host(src/skillspector/input_handler.py, ~L69-88) enforces an exact-or-subdomain host allowlist and rejects hosts resolving to private/loopback/link-local/reserved IPs;_is_private_ip(~L34-49) fails closed on resolution failure (returnsTrue). Applied to both_clone_gitand_download_file.- Zip-slip guard checks each member before
extractall(~L117-123). - Nice test coverage incl. the cloud-metadata endpoint
169.254.169.254.
Non-blocking nits
- Zip-slip check uses
str(member_path).startswith(str(extract_dir.resolve()))(L119) — a string-prefix match that can false-pass a sibling dir like<extract_dir>-evil. PreferPath.is_relative_to. (No actual escape today sincezipfile.extractallalso strips.., but the check as written is weaker than it looks.) _is_git_urlstill uses substring membership (any(allowed in host ...), ~L60-61), sogithub.com.evil.comis classified as a git URL — though the strict_validate_url_hostblocks it downstream, so no exploit.
Coordination
This overlaps #164 (size bounds) and #163/#116 — all rewrite _download_file/_extract_zip. #159 adds SSRF but not size bounds; #164 adds size bounds but not SSRF. Whichever merges second must integrate the other's protection so the result has both SSRF allowlisting and ingest caps.
Tests
Good: private-IP detection, blocked internal/metadata/arbitrary hosts, allowed forges, zip-slip blocked + normal/nested zips fine. LGTM.
Summary
The input handler accepts arbitrary URLs for git clone and file download without validating the target host. This enables Server-Side Request Forgery (SSRF) — an attacker could craft a skill referencing internal endpoints (cloud metadata at 169.254.169.254, internal services at 10.x.x.x, etc.) to exfiltrate data or probe the network.
Additionally,
_extract_zipusesextractall()without path traversal checks, enabling zip-slip attacks where crafted archives write files outside the extraction directory.Changes
SSRF Protection
_is_private_ip()to detect loopback, private, link-local, and reserved addresses_validate_url_host()method enforcing allowlist + private IP check_clone_git()and_download_file()validate before network accessALLOWED_GIT_HOSTSandALLOWED_DOWNLOAD_HOSTSfrozensetsZip-Slip Prevention
_extract_zip()iterates all zip members and validates resolved paths stay within extraction directoryValueErrorwith clear message on path traversal attemptAllowlist Design
Testing
23 new tests covering:
Existing 6 input handler tests continue to pass.
Fixes #147