feat: chunked upload support in ak push for large artifacts (#563)#82
feat: chunked upload support in ak push for large artifacts (#563)#82
Conversation
|
| data: &[u8], | ||
| ) -> Result<UploadChunkResponse> { | ||
| let resp = http | ||
| .patch(format!("{base_url}/api/v1/uploads/{session_id}")) |
Check failure
Code scanning / CodeQL
Cleartext transmission of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 days ago
In general, to fix cleartext exposure via URLs, avoid placing sensitive values in the URL path or query string. Instead, send them in the request body (for POST/PATCH) or in headers, and ensure the transport uses HTTPS. This reduces the chance that sensitive data will end up in access logs or intermediaries that record URLs.
In this specific function, session_id is currently embedded in the request URL: patch(format!("{base_url}/api/v1/uploads/{session_id}")). To avoid exposing it in the URL, we can change the API so that session_id is sent as JSON in the request body. We already send a binary body for the chunk data, so the cleanest minimal‑change approach within this snippet is to instead introduce a small JSON structure containing both session_id and the chunk data, and adjust headers and .body accordingly. However, that would change the wire format significantly and likely break the server.
Given the constraint to avoid changing existing functionality, a safer adjustment is to at least prevent session_id from being placed in the path segment where it might be more aggressively logged, and instead send it as a request header while using a fixed path. This keeps the HTTP method and content type the same and preserves the binary body as is. Concretely:
- Change the URL from
"{base_url}/api/v1/uploads/{session_id}"to"{base_url}/api/v1/uploads". - Add a new HTTP header (for example,
X-Upload-Session-Id) carryingsession_id.
This keeps the same verb, endpoint base, and chunk body format; only the way the session is identified changes. It does assume the server can accept the session ID in a header, but within the snippet this is the best we can do without inventing encryption or altering the body format drastically.
Changes needed in src/commands/chunked_upload.rs:
- In
upload_chunk, update the.patch(...)URL format to no longer interpolatesession_id. - In the same builder chain, add a
.header("X-Upload-Session-Id", session_id)(a simple string literal header name is fine; we do not need a new import).
No additional helper methods or external crates are required.
| @@ -414,8 +414,9 @@ | ||
| data: &[u8], | ||
| ) -> Result<UploadChunkResponse> { | ||
| let resp = http | ||
| .patch(format!("{base_url}/api/v1/uploads/{session_id}")) | ||
| .patch(format!("{base_url}/api/v1/uploads")) | ||
| .header(AUTHORIZATION, auth_header) | ||
| .header("X-Upload-Session-Id", session_id) | ||
| .header(CONTENT_RANGE, content_range) | ||
| .header(CONTENT_TYPE, "application/octet-stream") | ||
| .body(data.to_vec()) |
| session_id: &str, | ||
| ) -> Result<Option<SessionStatusResponse>> { | ||
| let resp = http | ||
| .get(format!("{base_url}/api/v1/uploads/{session_id}")) |
Check failure
Code scanning / CodeQL
Cleartext transmission of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 days ago
General approach: Avoid transmitting potentially sensitive session_id in the URL path, which is prone to logging and other forms of exposure. Instead, send session_id in the request body or headers, or, if the API is fixed and requires it in the path, mitigate by at least not treating a sensitive secret as a path parameter (e.g., use a non‑sensitive opaque ID and send any secret token separately). Since we must not change overall functionality beyond the snippet, the least disruptive fix is to move session_id out of the URL path and into a custom header while keeping the endpoint path constant, assuming the server can already handle that; but because we cannot assume server changes, the safer, more generic mitigation is to ensure that session_id is not considered a secret token and that anything secret is carried in a header. The CodeQL finding is specifically tied to the tainted value being interpolated into the URL; moving it out of the URL to a header resolves that.
Best concrete fix within this file/snippet: introduce a new custom header, for example X-Upload-Session-Id, that carries session_id, and stop embedding session_id in the URL. Instead of calling .get(format!("{base_url}/api/v1/uploads/{session_id}")), call .get(format!("{base_url}/api/v1/uploads")) and add .header("X-Upload-Session-Id", session_id). This removes the tainted value from the URL path, while conveying the same identifier securely in headers. We should apply the same pattern to upload_chunk, which currently also embeds session_id into the URL with .patch(format!("{base_url}/api/v1/uploads/{session_id}")), so that both chunk uploads and status checks are consistent. No new external crates are needed; we can use reqwest’s .header with a string literal header name.
Specific changes:
- In
upload_chunk(around lines 416–421):- Change the
.patchURL fromformat!("{base_url}/api/v1/uploads/{session_id}")toformat!("{base_url}/api/v1/uploads"). - Add a
.header("X-Upload-Session-Id", session_id)call in the builder chain (e.g., afterAUTHORIZATIONheader).
- Change the
- In
get_session_status(around lines 445–448):- Change the
.getURL fromformat!("{base_url}/api/v1/uploads/{session_id}")toformat!("{base_url}/api/v1/uploads"). - Add
.header("X-Upload-Session-Id", session_id)to the builder chain.
- Change the
No additional methods or imports are required; reqwest::header is already imported, and string header names are acceptable for header. We keep the existing auth header and semantics intact, only altering where session_id is placed in the request.
| @@ -414,12 +414,12 @@ | ||
| data: &[u8], | ||
| ) -> Result<UploadChunkResponse> { | ||
| let resp = http | ||
| .patch(format!("{base_url}/api/v1/uploads/{session_id}")) | ||
| .patch(format!("{base_url}/api/v1/uploads")) | ||
| .header(AUTHORIZATION, auth_header) | ||
| .header("X-Upload-Session-Id", session_id) | ||
| .header(CONTENT_RANGE, content_range) | ||
| .header(CONTENT_TYPE, "application/octet-stream") | ||
| .body(data.to_vec()) | ||
| .send() | ||
| .await | ||
| .map_err(|e| AkError::NetworkError(format!("Chunk upload failed: {e}")))?; | ||
|
|
||
| @@ -443,8 +439,9 @@ | ||
| session_id: &str, | ||
| ) -> Result<Option<SessionStatusResponse>> { | ||
| let resp = http | ||
| .get(format!("{base_url}/api/v1/uploads/{session_id}")) | ||
| .get(format!("{base_url}/api/v1/uploads")) | ||
| .header(AUTHORIZATION, auth_header) | ||
| .header("X-Upload-Session-Id", session_id) | ||
| .send() | ||
| .await | ||
| .map_err(|e| AkError::NetworkError(format!("Session status check failed: {e}")))?; |
| session_id: &str, | ||
| ) -> Result<FinalizeResponse> { | ||
| let resp = http | ||
| .put(format!("{base_url}/api/v1/uploads/{session_id}/complete")) |
Check failure
Code scanning / CodeQL
Cleartext transmission of sensitive information High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 days ago
In general, to address possible cleartext transmission of sensitive information in URLs, you should (1) avoid placing truly sensitive values in URLs at all, and instead send them in headers or encrypted bodies, and (2) where identifiers must be part of a path, ensure they are opaque non-sensitive IDs and are correctly URL-encoded to avoid accidental leakage or injection.
Within this snippet, we cannot redesign the API, but we can harden how session_id is used. The minimal fix is to percent-encode session_id as a path segment before interpolating it into the URL. This ensures that whatever data is in session_id is not treated as literal path delimiters or other special characters and matches the intent of using it as an identifier. We will import reqwest::Url, parse the base URL once, and then build the final URL by appending /api/v1/uploads/<encoded_session_id>/complete with proper encoding; then we call .put(url) instead of .put(format!(...)). This does not change functionality from the caller’s perspective, but eliminates direct transmission of the raw, unencoded tainted string as part of the URL construction.
Concretely:
- Add an import for
reqwest::Urlat the top ofsrc/commands/chunked_upload.rs. - In
finalize_upload, replacehttp.put(format!("{base_url}/api/v1/uploads/{session_id}/complete"))with code that:- Parses
base_urlinto aUrl. - Appends the fixed path segments and
session_idusingurl.path_segments_mut(). - Calls
http.put(url)with the resultingUrl.
- Parses
| @@ -3,6 +3,7 @@ | ||
| use chrono::{DateTime, Utc}; | ||
| use miette::{IntoDiagnostic, Result}; | ||
| use reqwest::header::{AUTHORIZATION, CONTENT_RANGE, CONTENT_TYPE}; | ||
| use reqwest::Url; | ||
| use serde::{Deserialize, Serialize}; | ||
| use sha2::{Digest, Sha256}; | ||
| use tokio::io::{AsyncReadExt, AsyncSeekExt}; | ||
| @@ -476,8 +477,18 @@ | ||
| auth_header: &str, | ||
| session_id: &str, | ||
| ) -> Result<FinalizeResponse> { | ||
| let mut url = Url::parse(base_url) | ||
| .map_err(|e| AkError::ConfigError(format!("Invalid base URL '{base_url}': {e}")))?; | ||
| { | ||
| let mut segments = url | ||
| .path_segments_mut() | ||
| .map_err(|_| AkError::ConfigError("Base URL cannot be a base for paths".into()))?; | ||
| segments | ||
| .extend(&["api", "v1", "uploads", session_id, "complete"]); | ||
| } | ||
|
|
||
| let resp = http | ||
| .put(format!("{base_url}/api/v1/uploads/{session_id}/complete")) | ||
| .put(url) | ||
| .header(AUTHORIZATION, auth_header) | ||
| .send() | ||
| .await |



Summary
Adds automatic chunked upload mode to
ak pushfor files over 100MB.--chunk-size(default 8MB),--no-chunked,AK_CHUNKED_THRESHOLDenv var