feat(CubeAPI): support e2b V3 template-build protocol end-to-end by chengjoey · Pull Request #422 · TencentCloud/CubeSandbox

chengjoey · 2026-06-02T07:03:53Z

Add V3 routes: POST /v3/templates, GET .../files/{hash}, POST /v2/.../builds/{bid}, GET .../status (handlers/templates_v3.rs)

feat: #74

cubesandboxbot · 2026-06-02T07:04:21Z

Claude finished @chengjoey's task in 1m 46s —— View job

Code Review: PR #422 — feat(CubeAPI): support e2b V3 template-build protocol

Overview

This PR adds comprehensive E2B V3 protocol support to CubeAPI, including a new OCI registry reverse proxy, in-memory build registry, V3 endpoint handlers (POST /v3/templates, GET files/{hash}, POST /v2/builds/{bid}, GET /status), and the readyCmd → probe translation layer. The implementation is well-structured with clear separation of handlers, services, and models. 21 files changed (~3,150 additions).

Issues Found

1. [Bug] V3 build status log entries all get Utc::now() as timestamp — templates.rs:822

In v3_get_build_status, when converting internal build log lines to V3BuildLogEntry, every entry gets chrono::Utc::now() as its timestamp instead of the original timestamp:

// templates_v3.rs / v3_get_build_status calls get_template_build_status
// which returns Vec<String> (formatted as "TIMESTAMP LINE"), then:
.logs.iter()
.map(|line| V3BuildLogEntry {
    timestamp: chrono::Utc::now(),  // always "now", not the original timestamp
    message: line.clone(),
    level: "info".to_string(),
})

The BuildContext already stores proper BuildLogLine { timestamp, line } structs. The fix is either (a) return structured data from get_template_build_status instead of stringified logs, or (b) have v3_get_build_status read from the BuildRegistry directly instead of going through the string path.

2. [Minor] GET /templates/{tid}/files/{hash} returns 201 CREATED — templates_v3.rs:41

Standard HTTP semantics dictate 200 OK for a successful GET. The 201 CREATED value here is non-standard. If intentional (e.g., required by the E2B SDK), this should be documented in a code comment.

3. [Concern] No rate limiting on V3 template routes — routes.rs:168-212

The build_template_routes function uses with_auth() (auth-only), while sandbox routes use with_auth_and_rate_limit(). The new V3 template build endpoints (POST /v3/templates, POST /v2/.../builds/{bid}) aren't rate-limited. This could be a DoS vector — an attacker could repeatedly trigger expensive CubeMaster pipeline dispatches without any per-key throttling.

4. [Concern] In-memory build registry grows unbounded — builds.rs:104

BuildRegistry wraps a DashMap with no eviction policy for completed/failed builds. The comment acknowledges this ("acceptable for a build flow that always reaches a terminal state within minutes"), but abandoned builds or builds whose clients never poll will leak memory indefinitely. Log lines are truncated at 10k, but the BuildContext structs (including full CreateTemplateRequest) are never removed. Consider a TTL-based eviction or limiting the total entry count.

5. [Testing gap] No integration tests for V3 protocol flow — templates.rs:1199+

Unit tests exist for the helpers (build_resources, merge_envs, parse_ready_url, build_probe, etc.) and for create_template_e2b_mode. However, the core V3 protocol methods have no test coverage:

v3_create_template
v3_trigger_build
v3_get_build_status

The route test (v3_template_build_routes_are_reachable in routes.rs:559) verifies routing + the create response envelope, but doesn't test the dispatch or status polling flow.

6. [Minor] Duplicate registry credential generation logic — templates.rs:192-208 and templates.rs:859-883

issue_registry_credential() (line 859) duplicates logic already present in create_template_e2b_mode() (lines 192-208). Both compute the credential URL, repository prefix, and password. create_template_e2b_mode could call issue_registry_credential() and then add its additional fields.

7. [Minor] cube:// URI in from_template resolution is unresolved — templates.rs:685

format!("cube://{}", parent)

The comment says "letting downstream callers resolve it," but nothing in this PR handles the cube:// scheme. If this is a forward-looking placeholder, consider noting it more explicitly (e.g., with a TODO or feature flag).

8. [Nit] build_registry_router ignores its _state parameter — routes.rs:316

The function signature uses _state: &AppState but the Router is built purely from path definitions. The registry handler (registry::proxy) later accesses state via State<AppState>, so the state IS needed at the merged router level. However, the underscore prefix is misleading — this function does indirectly use the state through the handlers.

Correct Design Decisions

Separate V3 handlers file (handlers/templates_v3.rs): Cleanly distinguishes the new E2B protocol surface from the existing CubeSandbox-native templates handlers.
parse_ready_url safety: The function explicitly rejects non-loopback hosts and URLs without explicit ports, preventing misdirected probes.
with_auth on template routes: New V3 routes inherit auth checking via the existing infrastructure.
In-memory build registry with DashMap: Appropriate for the expected short-lived build lifecycle, with dual-indexing (by build_id and {template_id}::{build_id}) for flexible lookups.
Hop-by-hop header stripping in registry proxy: The HOP_BY_HOP constant at registry.rs:20 correctly prevents proxy-specific headers from being forwarded.

Summary

Category	Count
Bugs	1 (timestamp)
API correctness	1 (GET→201)
Security/Scale	2 (rate limiting, memory eviction)
Testing	1 (missing V3 coverage)
Code quality	3 (duplication, unresolved URI, unused param)

The PR is fundamentally sound and well-structured. The timestamp bug (issue #1) is the only item that would produce incorrect runtime behavior for consumers of the status API. The missing rate limiting (issue #3) is worth addressing before shipping if this API is exposed to unconstrained clients.

fslongjin

I reviewed the V3 template-build implementation with a focus on correctness, architecture, and workaround risk. The main blockers are the current test compilation failure, the fake file-cache hit for unsupported context builds, and the registry proxy/auth model.

fslongjin · 2026-06-08T06:25:45Z

+    /// route exists and returns the V3 envelope.
+    #[tokio::test]
+    async fn v3_template_build_routes_are_reachable() {
+        let server = test_server();


This test currently does not compile because test_server() is async and returns a Future. Please await it before using the TestServer methods:

let server = test_server().await;

Right now cargo test fails with no method named post/get found for opaque type impl Future<Output = TestServer>, so this PR cannot pass the existing Rust test suite in its current state.

fslongjin · 2026-06-08T06:25:45Z

+    /// currently supports) don't need the build context.
+    pub fn v3_get_file_upload(&self, _template_id: &str, _files_hash: &str) -> AppResult<crate::models::V3TemplateFileUpload> {
+        Ok(crate::models::V3TemplateFileUpload {
+            present: true,


Always returning present: true makes the SDK believe the build context already exists server-side, so it will skip uploading the tarball. That is only safe for pure fromImage builds.

For Dockerfile/steps-based V3 builds this silently drops the build context, and the later trigger step may dispatch a registry image reference that was never produced. I think unsupported context builds should fail explicitly here, or this endpoint should return a real upload URL and persist/consume the uploaded context.

done， fromImage will succeed, Dockerfile/steps-based will point to 501

fslongjin · 2026-06-08T06:25:45Z

+            // letting downstream callers resolve it. Adjust to your local
+            // convention if needed.
+            format!("cube://{}", parent)
+        } else if !ctx.image_ref.is_empty() {


This fallback uses ctx.image_ref as long as it is non-empty, but that value is populated during POST /v3/templates; it does not prove that the client has actually pushed the image.

If neither fromImage nor a supported fromTemplate source is provided, we should verify that the registry push has completed before dispatching CubeMaster. Otherwise the API accepts the build and moves the failure to a later image-pull stage, which makes the error much harder to diagnose.

fslongjin · 2026-06-08T06:25:45Z

+            // synthesise a CubeMaster reference of the form `cube://<tid>`,
+            // letting downstream callers resolve it. Adjust to your local
+            // convention if needed.
+            format!("cube://{}", parent)


This introduces a cube://<template> source reference, but I do not see corresponding support in this PR for resolving that scheme downstream in CubeMaster/cubelet.

If the downstream stack does not already understand cube://, this path will look supported at the API layer but fail later during image resolution. Please either implement the resolution in this PR or reject fromTemplate with a clear 501 Not Implemented until the path is actually wired end-to-end.

done, Now the displayed return is 501

fslongjin · 2026-06-08T06:25:45Z

+        let log_entries: Vec<V3BuildLogEntry> = logs
+            .iter()
+            .map(|line| V3BuildLogEntry {
+                timestamp: chrono::Utc::now(),


These timestamps are generated at status-poll time, not at log-write time. That means the same historical log line will receive a new timestamp every time the client polls.

BuildContext already stores BuildLogLine { timestamp, line }, so the V3 conversion should preserve that structured timestamp instead of assigning Utc::now() here.

fslongjin · 2026-06-08T06:25:45Z

+    Router::new()
+        .route("/v2/", get(registry::ping))
+        .route("/v2", get(registry::ping))
+        .route("/v2/*path", any(registry::proxy))


The registry proxy is mounted without CubeAPI auth or rate limiting. I understand Docker registry auth is a different credential domain, but this implementation also does not validate the short-lived RegistryCredential issued by CubeAPI; it simply forwards requests to the upstream registry.

Given the one-click deployment points this at a default local registry, exposing CubeAPI could accidentally expose a public writable registry proxy. Please add credential validation/rate limiting here, or make the deployment contract require an authenticated upstream registry and document that clearly.

fslongjin · 2026-06-08T06:25:45Z

+        // → POST /v2/.../builds/{bid} → GET .../status. Routes are mounted
+        // at the same level as /templates so both `/v3/...` and `/v2/...`
+        // segments are reachable.
+        .route("/v3/templates", post(templates_v3::v3_create_template))


Template build routes are currently protected by auth only, while sandbox routes use with_auth_and_rate_limit.

The new V3 endpoints can allocate in-memory build contexts and trigger expensive CubeMaster template-build pipelines, so they should also be rate-limited or governed by a dedicated build quota.

V3 build-pipeline routes moved to with_auth_and_rate_limit.

fslongjin · 2026-06-08T06:25:45Z

+
+/// Thread-safe, in-process build registry.
+#[derive(Clone, Default)]
+pub struct BuildRegistry {


The in-memory build registry has no eviction policy. Completed, failed, and abandoned builds remain in the DashMap indefinitely, including the original create request, credentials, status fields, logs, and the duplicate pair index.

For a long-running CubeAPI process this can become an unbounded memory leak. Please add a TTL cleanup path for terminal builds and/or a maximum registry size.

BuildRegistry gains TTL + size-cap + background GC

fslongjin · 2026-06-08T06:25:45Z

+                    tag = %parsed.tag,
+                    "manifest pushed; marking build as image-pushed"
+                );
+                state.services.templates.mark_image_pushed(&parsed.tag);


This marks a build as image-pushed based only on the manifest tag. It does not verify that the repository path matches the expected <repo_prefix>/<templateID> for the build context.

The build IDs are UUID-like and hard to guess, but the proxy should still validate both repo and tag against the stored build context before advancing the build state.

done, mark_image_pushed add repo arg, cross-checks the manifest repo

fslongjin · 2026-06-08T06:25:45Z

+    ///   3. `body.from_template` — copy from another known CubeSandbox
+    ///      template (resolved via CubeMaster `get_template`).
+    ///
+    /// `start_cmd` becomes container `args`; `ready_cmd` becomes a Probe.Exec.


Several comments still say readyCmd is translated into Probe.Exec, but the actual implementation parses a loopback URL and creates an HTTP GET probe because CubeMaster/cubelet do not support exec probes here.

Please update the comments/API model docs to say Probe.HttpGet; otherwise future maintainers may assume exec-style ready commands are supported end-to-end.

Add V3 routes: POST /v3/templates, GET .../files/{hash}, POST /v2/.../builds/{bid}, GET .../status (handlers/templates_v3.rs) Signed-off-by: joeyczheng <joeyczheng@tencent.com>

Security: * Per-build short-lived push credentials (`bld_<random>` + 256-bit password) replace the global `_token` shared secret. * Registry reverse-proxy validates Basic auth against an in-memory credential index and enforces repo scoping; 401 with WWW-Authenticate + 403 on cross-build access. Per-credential rate limit added. * `mark_image_pushed` cross-checks the manifest repo, not just the tag. * Startup WARN when public bind meets unauthenticated loopback upstream. Resource bounds: * `BuildRegistry` gains TTL + size-cap + background GC; in-flight builds are never evicted. New `build_registry_*` config knobs. * Registry proxy streams request/response bodies end-to-end instead of buffering, so concurrent multi-GiB pushes no longer pin the heap. * V3 build-pipeline routes moved to `with_auth_and_rate_limit`. Correctness: * `image_pushed` flag is the single source of truth for "client really pushed"; OCI fallback no longer dispatches against an unpushed ref. * `fromTemplate` and `dockerfile`/`steps`-only builds are rejected with 501 until a real resolver/builder ships, instead of failing obscurely deeper in the pipeline. * `BuildLogLine.timestamp` is preserved across status polls. * Probe synthesis comments fixed: `readyCmd` -> `Probe.HttpGet`, not `Probe.Exec`. Signed-off-by: joeyczheng <joeyczheng@tencent.com>

cubesandboxbot · 2026-06-10T09:56:40Z

+    CredentialOutcome::Authenticated(ctx)
+}
+
+fn constant_time_eq_strings(a: &str, b: &str) -> bool {


constant_time_eq_strings — compiler may optimize away the timing-safe loop

When a.len() != b.len() (line 382), the function walks the longer slice XORing each byte with 0 (byte ^ 0) and discards the result (let _ = diff;). A sufficiently smart optimizer can identify diff as dead code and eliminate the loop entirely, defeating the constant-time purpose.

Consider either:

Using a proven constant-time comparison crate (e.g. subtle's ConstantTimeEq);

Writing the result through std::sync::atomic::compiler_fence or core::ptr::read_volatile to prevent elision.

The equal-length path (line 392-396) does use diff as the return value, so that path is fine — only the unequal-length path is affected.

cubesandboxbot · 2026-06-10T09:56:42Z

+                timestamp: Utc::now(),
+                line,
+            });
+            const MAX_LOGS: usize = 10_000;


Vec::drain(0..drop) is O(n) on every append past cap

Once the log exceeds MAX_LOGS (10,000), every new append_log call drains drop elements from the front and shifts all remaining entries left. With a tail capacity of 10K, this is ~10K element moves per append.

Consider using VecDeque instead — pop_front() is O(1), and the push/pop pattern is exactly what it's designed for. The log is append-only and indexed by offset, which VecDeque supports efficiently via make_contiguous when needed.

cubesandboxbot · 2026-06-10T09:56:43Z

+        self.get(&bid)
+    }
+
+    pub fn get(&self, build_id: &str) -> Option<BuildContext> {


get() clones the full BuildContext including the log buffer

Every status poll calls get() or get_by_pair(), which clones the entire BuildContext (line 294). With a log buffer of up to 10K entries, each clone allocates and copies all of them — even when the caller only reads non-log fields.

Consider one of:

Wrapping the DashMap value in Arc<BuildContext> — clone() becomes an atomic refcount increment instead of a deep copy;

Separating the log buffer into an Arc<Vec<BuildLogLine>> field so only the log pointer is cloned rather than the full vector.

cubesandboxbot · 2026-06-10T09:57:08Z

+    CredentialOutcome::Authenticated(ctx)
+}
+
+fn constant_time_eq_strings(a: &str, b: &str) -> bool {


constant_time_eq_strings — compiler may optimize away the timing-safe loop

When a.len() != b.len() (line 382), the function walks the longer slice XORing each byte with 0 (byte ^ 0) and discards the result (let _ = diff;). A sufficiently smart optimizer can identify diff as dead code and eliminate the loop entirely, defeating the constant-time purpose.

Consider either:

Using a proven constant-time comparison crate (e.g. subtle's ConstantTimeEq);

Writing the result through std::sync::atomic::compiler_fence or core::ptr::read_volatile to prevent elision.

The equal-length path (line 392-396) does use diff as the return value, so that path is fine — only the unequal-length path is affected.

cubesandboxbot · 2026-06-10T09:57:09Z

+                timestamp: Utc::now(),
+                line,
+            });
+            const MAX_LOGS: usize = 10_000;


Vec::drain(0..drop) is O(n) on every append past cap

Once the log exceeds MAX_LOGS (10,000), every new append_log call drains drop elements from the front and shifts all remaining entries left. With a tail capacity of 10K, this is ~10K element moves per append.

Consider using VecDeque instead — pop_front() is O(1), and the push/pop pattern is exactly what it's designed for. The log is append-only and indexed by offset, which VecDeque supports efficiently via make_contiguous when needed.

cubesandboxbot · 2026-06-10T09:57:10Z

+        self.get(&bid)
+    }
+
+    pub fn get(&self, build_id: &str) -> Option<BuildContext> {


get() clones the full BuildContext including the log buffer

Every status poll calls get() or get_by_pair(), which clones the entire BuildContext (line 294). With a log buffer of up to 10K entries, each clone allocates and copies all of them — even when the caller only reads non-log fields.

Consider one of:

Wrapping the DashMap value in Arc<BuildContext> — clone() becomes an atomic refcount increment instead of a deep copy;

Separating the log buffer into an Arc<Vec<BuildLogLine>> field so only the log pointer is cloned rather than the full vector.

chengjoey requested review from chenhengqi, fslongjin, ls-ggg, tinklone and up2wing as code owners June 2, 2026 07:03

fslongjin reviewed Jun 8, 2026

View reviewed changes

chengjoey added 2 commits June 10, 2026 17:49

feat(CubeAPI): support e2b V3 template-build protocol end-to-end

e655f59

Add V3 routes: POST /v3/templates, GET .../files/{hash}, POST /v2/.../builds/{bid}, GET .../status (handlers/templates_v3.rs) Signed-off-by: joeyczheng <joeyczheng@tencent.com>

chengjoey force-pushed the feat/e2b-template branch from 6e93954 to 8f8f6fa Compare June 10, 2026 09:49

cubesandboxbot Bot reviewed Jun 10, 2026

View reviewed changes

Uh oh!

Conversation

chengjoey commented Jun 2, 2026

Uh oh!

cubesandboxbot Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: PR #422 — feat(CubeAPI): support e2b V3 template-build protocol

Overview

Issues Found

Correct Design Decisions

Summary

Uh oh!

fslongjin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chengjoey Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

cubesandboxbot Bot commented Jun 2, 2026 •

edited

Loading

chengjoey Jun 10, 2026 •

edited

Loading