feat(templates): E2B-style display names with CubeMaster name cache by HuChundong · Pull Request #607 · TencentCloud/CubeSandbox

HuChundong · 2026-06-22T06:23:42Z

Summary

Add optional human-readable template name (CubeMaster display_name) with cluster-wide case-insensitive uniqueness (20260623051112_template_display_name_index.sql).
Web: list/show display names; PATCH rename on template detail; optional name on create.
API: POST /sandboxes and template paths accept display name or tpl-* id; CubeAPI resolves name→id once at entry, then uses existing tpl-id paths.
CubeMaster: GET /cube/template/lookup, POST /cube/template/display-name; sync name reservation at submit (ReserveTemplateDisplayNameForCreate → 409); release name after failed builds.
Name cache (CubeMaster): display_name → template_id cached in templatecenter (300s TTL, 4096 entries, singleflight); invalidated on rename/delete/create. CubeAPI is a thin client with no local cache.

Architecture

Layer	Approach
CubeAPI	Thin resolver; delegates all name lookups to CubeMaster
CubeMaster	Owns `display_name`, lookup cache, and invalidation
Uniqueness	Global per cluster (no tenant/team namespace today)

Compared to E2B infra: E2B uses team-scoped aliases reserved in RegisterBuild; we use global display_name_key + sync submit check. We release names on build failure (E2B keeps aliases until template delete).

Test plan

go test ./pkg/templatecenter/...
go test ./pkg/service/httpservice/cube/...
cargo test -p cube-api
go test ./pkg/base/dao/migrate/ -run TestMigrationFilenames
Apply migration 20260623051112_template_display_name_index on staging (run audit queries in migration header first)
Create template with name → create sandbox by name when READY
Duplicate name at submit → 409; failed build → name reusable

Migration note

This migration may clear legacy tpl-/snap- prefixed names and dedupe display names before adding the unique index. Run preflight audit queries on staging before production.

Assisted-by: Cursor:Composer

kinwin-ustc · 2026-06-22T07:56:50Z

You can check if PR #612 meets your requirements? The scale of the changes is much smaller

cubesandboxbot · 2026-06-22T08:43:46Z

+
+        match self
+            .fetch_resolved_template(&template_ref, &template_id)
+            .await


Redundant second fetch_resolved_template call

update_template_name already calls fetch_resolved_template on line 144 to get the existing template metadata (including the old display_name for cache invalidation). After the successful update, it calls fetch_resolved_template again on line 164 solely to build the response. This adds an unnecessary cross-service HTTP roundtrip + DB query.

The response could be constructed from the already-fetched existing record (which was returned just 20 lines earlier), substituting the new name for display_name. Consider eliminating this second fetch to reduce latency — especially since the CubeMaster PATCH update already returned a success envelope, so the template definition is expected to still exist.

cubesandboxbot · 2026-06-22T08:43:47Z

+            .await?;
+        let existing = self
+            .fetch_resolved_template(&template_ref, &template_id)
+            .await?;


Extra fetch_resolved_template purely for display_name cache invalidation

delete_template calls fetch_resolved_template solely to retrieve the old display_name for cache invalidation (line 228: for display_name in template_names(&existing.display_name)). This triggers an HTTP roundtrip to CubeMaster + a DB query just to get a single string field that's already available in CubeMaster's response to the lookup endpoint.

Consider having GET /cube/template/lookup also return display_name alongside template_id, so the single lookup call serves both the ID resolution and the cache invalidation data.

cubesandboxbot · 2026-06-22T08:43:48Z

+	return normalized, nil
+}
+
+func mapDefinitionCreateDuplicateError(err error, displayName string) error {


Raw database errors propagated to API responses

mapDefinitionCreateDuplicateError returns err unmodified when the error message does not contain "1062" or "Duplicate entry". This means transient MySQL errors (e.g. deadlocks, connection timeouts, schema mismatches) flow through to the API response's RetMsg field verbatim (via handleTemplateLookupAction line 118 and handleTemplateDisplayNameAction line 168).

While these are authenticated endpoints, exposing raw DB error texts can reveal internal schema details, engine versions, and connection strings. Consider wrapping non-duplicate errors into a generic internal-error message before returning to clients, while logging the raw error server-side.

cubesandboxbot · 2026-06-22T08:43:49Z

+    error::{AppError, AppResult},
+};
+
+const CACHE_TTL: Duration = Duration::from_secs(60);


Cache TTL of 60 seconds is unnecessarily short

Template display names are effectively immutable after creation — they only change through the explicit update_template_name endpoint, which triggers targeted cache invalidation. Every 60s cache miss forces a cross-service HTTP call to CubeMaster's /cube/template/lookup endpoint (which queries the DB).

Since inline invalidation already handles renames on the originating replica, a TTL of 300-600s would dramatically reduce CubeMaster load without increasing the staleness window beyond what the existing invalidation already provides. Multi-replica deployments already tolerate up to one TTL window of staleness; 60s is unnecessarily tight for a mapping that changes at human timescales.

cubesandboxbot · 2026-06-22T08:43:49Z

+//
+
+//! Template name → templateID resolution with an in-memory cache.
+//!


Module doc comment omits several routes using name resolution

The doc comment lists POST /sandboxes, GET/PATCH/DELETE /templates/{templateID}, and compat-adopt as the only routes with name resolution. However, the TemplateService also resolves names for POST /templates/:templateID (rebuild), POST /templates/:templateID/builds/:buildID (start build), and GET /templates/:templateID/builds/:buildID/status (get build status). A developer reading this comment would miss roughly half the affected endpoints. Recommend updating the comment to reflect the full set of routes.

HuChundong · 2026-06-22T08:49:57Z

You can check if PR #612 meets your requirements? The scale of the changes is much smaller

These two PRs overlap slightly, but they have completely different priorities. This PR focuses on syncing the E2B API, making template-based sandbox creation the core feature. Displaying names on the WebUI serves little purpose if those names can't actually be used to spin up sandboxes.

HuChundong

Automated review of b297013 — see inline notes for actionable items.

HuChundong · 2026-06-22T10:17:42Z

+            });
+        }
+        let key = reference.to_ascii_lowercase();
+        if let Some(template_id) = self.cache_get(&key).await {


Multi-replica stale cache on destructive paths

get_template_build_status re-validates cache hits via fetch_resolved_template, but create_sandbox, delete_template, and update_template_name trust cached name→ID mappings for up to 300s on replicas that did not handle the rename/delete.

After a name is freed or reassigned, other CubeAPI replicas can still route POST /sandboxes or DELETE/PATCH to the old template. Consider reusing the build-status revalidation pattern (or a lightweight lookup) on cache hits for mutating endpoints.

HuChundong · 2026-06-22T10:17:42Z

+            .template_names
+            .resolve_template_ref(&template_ref)
+            .await?;
+        let names_for_invalidation = if TemplateNameCache::needs_resolution(&template_ref) {


Service-layer tests missing for name-resolution orchestration

template_names.rs has solid unit tests, but update_template_name, delete_template (name vs tpl-id branches), and sandbox create-by-name paths in this module are untested.

A few focused tests with a mock CubeMasterClient would guard cache invalidation and the delete-by-name skip-fetch behavior added in this PR.

HuChundong · 2026-06-22T10:17:42Z

+
+CALL cubemaster_assert_table_exists('t_cube_template_definition');
+
+-- Drop invalid reserved-prefix names on templates (snapshots keep their names).


Migration preflight clears legacy names irreversibly

Lines 14–37 blank out reserved-prefix names and duplicate display names before adding the unique index. This is reasonable for greenfield but can surprise production upgrades.

Please confirm staging/release notes call this out explicitly (PR body mentions it — worth a pre-migration audit query for affected rows).

HuChundong · 2026-06-22T10:18:06Z

Code review (`b297013`)

Overall the PR is in good shape after the review fixes: name validation is centralized in CubeMaster, internal DB errors are masked, redundant fetches were trimmed, and test coverage on the Master side is solid.

Worth addressing / confirming

Multi-replica cache staleness (medium) — Process-local 300s cache is only invalidated on the writing replica. Mutating paths (POST /sandboxes, DELETE/PATCH /templates/{name}) do not re-validate cache hits, unlike get_template_build_status. Document as a known limitation or add revalidation on cache hits for destructive routes.
CubeAPI service tests (medium) — Add mock-based tests for update_template_name / delete_template orchestration and cache invalidation; unit tests in template_names.rs do not cover these flows.
Migration 0010 preflight (low) — Preflight UPDATEs silently clear invalid/duplicate legacy display names. Ensure staging migration checklist includes an audit query before apply.

Look good

DB uniqueness via display_name_key + placeholder reservation at submit
Generic internal errors in Master handlers
OpenAPI / module docs aligned with name-resolution routes
CubeMaster unit + HTTP handler tests for lookup/rename paths

No blocking issues found for merge after acknowledging multi-replica cache behavior in release notes or follow-up.

cubesandboxbot · 2026-06-22T10:40:14Z

+    async fn lookup_and_cache(&self, reference: &str, key: &str) -> AppResult<String> {
+        let lock = {
+            let mut locks = self.lookup_locks.write().await;
+            locks
+                .entry(key.to_string())
+                .or_insert_with(|| Arc::new(Mutex::new(())))
+                .clone()
+        };


Potential memory leak: lookup_locks entries are never removed. Every unique name ever resolved adds an Arc<Mutex<()>> to the lookup_locks map, but no code cleans them up — not when cache entries are evicted (TTL or capacity), not on invalidation. Over a long-running process with many distinct template names, this grows unbounded, violating the MAX_CACHE_ENTRIES contract on self.entries.

Suggestion: Either clean up lookup_locks entries alongside cache eviction (cache_set / invalidate_name), or switch to a fixed-size striped lock pool (e.g., 64 or 256 slots via usize::from_str_bytes hash) to bound memory and reduce write-lock contention on the lookup_locks RwLock.

cubesandboxbot · 2026-06-22T10:40:17Z

+    /// Resolve a display name for mutating API paths, always refreshing via lookup.
+    pub async fn resolve_template_ref_for_mutation(&self, reference: &str) -> AppResult<String> {
+        let reference = reference.trim();
+        if reference.is_empty() {
+            return Err(AppError::BadRequest(
+                "template reference is empty".to_string(),
+            ));
+        }
+        if !Self::needs_resolution(reference) {
+            return Ok(reference.to_string());
+        }
+        let key = reference.to_ascii_lowercase();
+        self.lookup_and_cache(reference, &key).await
+    }


Doc/behavior mismatch: The doc comment says "always refreshing via lookup," but the implementation calls lookup_and_cache, which checks the cache first and returns a cached value if still valid (300s TTL). Mutating paths (delete, rebuild, adopt-baseline) can therefore operate on stale template_id mappings for up to 300s after a name is reassigned on another replica.

Suggestion: Either bypass the cache for mutation paths (clear entry before lookup), or update the doc comment to accurately describe the caching behavior (e.g., "uses cached resolution when available; cache TTL bounds cross-replica staleness").

cubesandboxbot · 2026-06-22T10:40:22Z

+			}},
+		}
+	}
+	normalized, _ := templatecenter.NormalizeTemplateDisplayName(name)


Error silently discarded: NormalizeTemplateDisplayName can return an error (e.g., name exceeds 256 chars after lookup succeeded, or contains unexpected characters), but the error is discarded via _. The response will have a valid template_id but an empty/incorrect display_name, which is misleading for API consumers.

Suggestion: Log the error or fall back to the original name:

normalized := name if n, err := templatecenter.NormalizeTemplateDisplayName(name); err == nil { normalized = n } else { log.G(ctx).Warnf("lookup returned valid template but name normalization failed: %v", err) }

cubesandboxbot · 2026-06-22T10:44:32Z

PR Review: E2B-style display names with CubeMaster name cache

Overview

This is a well-structured feature addition implementing human-readable template display names. The architecture follows a clean layered pattern: CubeAPI is a thin resolver that delegates all name lookups to CubeMaster via HTTP, which owns the name cache (300s TTL, 4096 entries, singleflight dedup) and all invalidation logic. The display_name_key generated column with a UNIQUE index provides case-insensitive uniqueness at the database level. Test coverage is solid overall with a healthy test-to-code ratio.

Key Findings

Medium: Missing input validation at CubeAPI boundary (CubeAPI/src/models/mod.rs:671)
UpdateTemplateRequest.name is a bare String with no validation. Other request structs use derive[Validate] but this one does not. All documented constraints are enforced only by CubeMaster -- a defense-in-depth gap adding unnecessary network round-trips.

Medium: Internal error details forwarded to API callers (CubeAPI/src/services/templates.rs:102-106)
Transport errors from CubeMaster format messages like "CubeMaster unavailable: {e}" that include internal cluster URLs, returned as 502/503 to API callers. These should be sanitized server-side before forwarding.

Medium: Tracing context dropped in displayNameFromTemplateImageJob (CubeMaster/pkg/templatecenter/template_display_name.go:382)
Uses context.Background() for logging instead of accepting a context.Context parameter. Breaks distributed trace correlation on build failure log lines.

Medium: ReleaseTemplateDisplayNameAfterBuildFailure is untested (template_display_name.go:235)
Called on every build failure to release the display name. Has no tests despite non-trivial branching logic. If broken, names become permanently locked on failed builds.

Low: No rate limiting on /templates/lookup (CubeAPI/src/routes.rs:189)
The template routes use with_auth (auth only), not with_auth_and_rate_limit. The lookup endpoint is an unthrottled name-to-ID oracle for authenticated clients.

Low: Cache invalidation maintainability gap (CubeMaster/pkg/templatecenter/cache.go:204)
invalidateTemplateCaches does not invalidate the display name cache, but there is no comment explaining this. Future developers must remember to invalidate both caches separately.

Minor Items

CopyableText in TemplateDetail.tsx duplicates copy-to-clipboard logic from CopyButton
Five template endpoints absent from OpenAPI spec
evictOneDisplayNameCacheEntry() uses O(n) Items() copy to delete one entry
fresh=true handler path untested in template_display_name_test.go

Positive Notes

Singleflight correctly deduplicates concurrent lookups with separate group keys for fresh vs cached
Migration SQL is well-documented with pre-audit queries and clear rollback
Display_name max length (256) is consistent across Rust, Go, SQL, and OpenAPI spec
TemplateNameCache correctly bypasses resolution for tpl-/snap- prefixed IDs
Inline comments posted for specific issues above

cubesandboxbot · 2026-06-22T10:46:25Z

+    async fn lookup_and_cache(&self, reference: &str, key: &str) -> AppResult<String> {
+        let lock = {
+            let mut locks = self.lookup_locks.write().await;
+            locks
+                .entry(key.to_string())
+                .or_insert_with(|| Arc::new(Mutex::new(())))
+                .clone()
+        };


Potential memory leak: lookup_locks entries are never removed. Every unique name ever resolved adds an Arc<Mutex<()>> to the lookup_locks map, but no code cleans them up — not when cache entries are evicted (TTL or capacity), not on invalidation. Over a long-running process with many distinct template names, this grows unbounded, violating the MAX_CACHE_ENTRIES contract on self.entries.

Suggestion: Either clean up lookup_locks entries alongside cache eviction (cache_set / invalidate_name), or switch to a fixed-size striped lock pool (e.g., 64 or 256 slots via usize::from_str_bytes hash) to bound memory and reduce write-lock contention on the lookup_locks RwLock.

cubesandboxbot · 2026-06-22T10:46:26Z

+    /// Resolve a display name for mutating API paths, always refreshing via lookup.
+    pub async fn resolve_template_ref_for_mutation(&self, reference: &str) -> AppResult<String> {
+        let reference = reference.trim();
+        if reference.is_empty() {
+            return Err(AppError::BadRequest(
+                "template reference is empty".to_string(),
+            ));
+        }
+        if !Self::needs_resolution(reference) {
+            return Ok(reference.to_string());
+        }
+        let key = reference.to_ascii_lowercase();
+        self.lookup_and_cache(reference, &key).await
+    }


Doc/behavior mismatch: The doc comment says "always refreshing via lookup," but the implementation calls lookup_and_cache, which checks the cache first and returns a cached value if still valid (300s TTL). Mutating paths (delete, rebuild, adopt-baseline) can therefore operate on stale template_id mappings for up to 300s after a name is reassigned on another replica.

Suggestion: Either bypass the cache for mutation paths (clear entry before lookup), or update the doc comment to accurately describe the caching behavior (e.g., "uses cached resolution when available; cache TTL bounds cross-replica staleness").

cubesandboxbot · 2026-06-22T10:46:28Z

+			}},
+		}
+	}
+	normalized, _ := templatecenter.NormalizeTemplateDisplayName(name)


Error silently discarded: NormalizeTemplateDisplayName can return an error (e.g., name exceeds 256 chars after lookup succeeded, or contains unexpected characters), but the error is discarded via _. The response will have a valid template_id but an empty/incorrect display_name, which is misleading for API consumers.

Suggestion: Log the error or fall back to the original name:

normalized := name if n, err := templatecenter.NormalizeTemplateDisplayName(name); err == nil { normalized = n } else { log.G(ctx).Warnf("lookup returned valid template but name normalization failed: %v", err) }

cubesandboxbot · 2026-06-23T07:49:58Z

+	return def.DisplayName
+}
+
+func assertDisplayNameAvailableForCreate(ctx context.Context, normalized, templateID string) error {


Performance: Two DB queries under write lock

This function does two separate DB round-trips while holding the per-name write lock:

findTemplateIDByDisplayNameFromDB — SELECT template_id

GetDefinition — full row fetch (only status is needed)

Combine into a single query that fetches both template_id and status:

cubesandboxbot · 2026-06-23T10:49:10Z

+	if err != nil {
+		return err
+	}
+	if err := ValidateTemplateDisplayNameAvailable(ctx, normalized, templateID); err != nil {


TOCTOU: availability check and DB write are not atomic

ValidateTemplateDisplayNameAvailable (line 308) queries the DB to check name availability, then the Update on line 312-314 writes the new name. Between these two operations, a concurrent caller can claim the same name. The DB unique index on display_name_key catches the duplicate at write time, but the resulting error is ambiguous — mapDefinitionCreateDuplicateError maps it to a generic "internal error" for non-key errors (template_display_name.go:356), and the user sees an opaque 500 instead of a clear 409 Conflict.

The existing withDisplayNameCreateLock primitive (line 203) provides exactly the serialization this path needs (check + write under the per-name write lock) but is not used here. Consider wrapping the check+write in withTemplateWriteLock to match the pattern used by ensureTemplateDefinition.

cubesandboxbot · 2026-06-23T10:49:12Z

+
+// ReserveTemplateDisplayNameForCreate checks that displayName is free before persisting
+// a template definition. Names on FAILED templates are cleared so they can be reused.
+func ReserveTemplateDisplayNameForCreate(ctx context.Context, displayName, templateID string) error {


Misleading export: lock is released before the caller INSERTs

ReserveTemplateDisplayNameForCreate acquires the per-name write lock, verifies availability against the DB, then releases the lock immediately (the fn is a no-op func() error { return nil }). The caller must separately INSERT the template definition afterward. Between lock release and the INSERT, any concurrent caller can claim the same name.

By contrast, withDisplayNameCreateLock (line 203) supports the correct pattern: the caller passes the INSERT as fn and it runs under the lock. The exported ReserveTemplateDisplayNameForCreate is a footgun because it looks like a complete "check and reserve" API but only does the check. Consider either removing the export (unexport and use withDisplayNameCreateLock directly in the caller) or documenting prominently that callers must still INSERT under a separate lock.

cubesandboxbot · 2026-06-23T10:49:38Z

            AppError::BadRequest(msg) => (StatusCode::BAD_REQUEST, 400, msg.clone()),
+            AppError::BadGateway(msg) => (StatusCode::BAD_GATEWAY, 502, msg.clone()),
+            AppError::ServiceUnavailable(msg) => {
+                (StatusCode::SERVICE_UNAVAILABLE, 503, msg.clone())


Error messages leak internal details to API clients

These new variants follow the same pattern as AppError::Internal (line 54): the full error message is serialized into the JSON response body. For BadGateway and ServiceUnavailable, this leaks HTTP error details from cross-service CubeMaster calls (timeouts, connection refusals) directly to external callers. Consider logging the full error server-side and returning a generic message to the client, matching the Go-side pattern (template_display_name.go:356 — log detail, return "internal error").

cubesandboxbot · 2026-06-23T10:54:42Z

+--    WHERE t.kind = 'template' AND TRIM(t.display_name) <> '' AND t.id <> keep.keep_id;
+
+-- +goose NO TRANSACTION
+-- +goose Up


Up is destructive, Down is not reversible

The two UPDATEs (clearing tpl-/snap- prefixed names and deduplicating display_name) permanently destroy original metadata. The Down migration only drops the column and index — it cannot restore the cleared names. If the migration is applied and then rolled back, the original display names are lost. Consider either making Up non-destructive (log warnings instead of clearing) or adding a backup table/column that Down can restore from.

cubesandboxbot · 2026-06-23T10:54:44Z

@@ -20,6 +20,11 @@ import (
 const (
 	templateDefinitionCacheTTL = 360 * time.Minute


Per-replica cache: inconsistent state across replicas for up to 300s

The cache comment (lines 21-25) acknowledges this is per-process only, but during rename/create on replica A, other replicas serve stale positive or not-found cache entries until TTL expiry. A caller hitting replica B after a rename still gets the old templateID for up to 300s; a newly created template is invisible on replicas with a not-found cache entry for 30s. The fresh=true parameter mitigates this but requires callers to know when to use it.

Consider: shorter TTL for not-found cache (already 30s is reasonable), or a shared backing store (Redis) for multi-replica deployments.

cubesandboxbot · 2026-06-23T10:54:48Z

+	if fresh {
+		groupKey = "fresh:" + cacheKey
+	}
+	result, err := templateDisplayNameFetchGroup.Do(groupKey, func() (interface{}, error) {


Double cache check is correct but fragile — should be documented

The cache is checked before entering the singleflight group (lines 98-104), and again inside the callback (lines 110-117). The inner check handles the race where a concurrent goroutine populated the cache between the outer check and entering Do. Without a comment explaining this, a future reader could remove the inner check as "redundant dead code," breaking the coalescing guarantee. Consider adding a brief inline comment.

cubesandboxbot · 2026-06-23T10:54:50Z

+    let display_name = display_name.trim();
+    if display_name.is_empty() {
+        Vec::new()
+    } else {


names: Vec<String> is always 0 or 1 — consider Option<String>

template_names (line 85) returns either vec![] or vec!["name"]. Every downstream consumer (Go sends display_name as scalar, Rust wraps in Vec, TypeScript reads names[0]) treats this as a single optional value. If forward-compatibility is intended (e.g., aliases), document it explicitly; otherwise this indirection adds unnecessary cognitive load across the full stack.

cubesandboxbot · 2026-06-23T10:55:14Z

+	templateDisplayNameCache.Set(key, templateID, templateDisplayNameCacheTTL)
+}
+
+func evictOneDisplayNameNotFoundCacheEntry() {


Not-found cache eviction is O(n)

When the cache reaches templateDisplayNameNotFoundCacheMaxLen (1024), every subsequent insert calls Items() which allocates and returns a shallow copy of the entire map, then iterates to find one key to delete. Under a burst of not-found probes (e.g., enumeration of nonexistent names), this creates unnecessary GC pressure. A FIFO ring buffer or counter-based eviction would be O(1).

cubesandboxbot · 2026-06-23T10:56:00Z

+	if err != nil {
+		return err
+	}
+	if err := ValidateTemplateDisplayNameAvailable(ctx, normalized, templateID); err != nil {


TOCTOU: availability check and DB write are not atomic

ValidateTemplateDisplayNameAvailable (line 308) queries the DB to check name availability, then the Update on line 312-314 writes the new name. Between these two operations, a concurrent caller can claim the same name. The DB unique index on display_name_key catches the duplicate at write time, but the resulting error is ambiguous — mapDefinitionCreateDuplicateError maps it to a generic "internal error" for non-key errors (template_display_name.go:356), and the user sees an opaque 500 instead of a clear 409 Conflict.

The existing withDisplayNameCreateLock primitive (line 203) provides exactly the serialization this path needs (check + write under the per-name write lock) but is not used here. Consider wrapping the check+write in withTemplateWriteLock to match the pattern used by ensureTemplateDefinition.

cubesandboxbot · 2026-06-23T10:56:00Z

+
+// ReserveTemplateDisplayNameForCreate checks that displayName is free before persisting
+// a template definition. Names on FAILED templates are cleared so they can be reused.
+func ReserveTemplateDisplayNameForCreate(ctx context.Context, displayName, templateID string) error {


Misleading export: lock is released before the caller INSERTs

ReserveTemplateDisplayNameForCreate acquires the per-name write lock, verifies availability against the DB, then releases the lock immediately (the fn is a no-op func() error { return nil }). The caller must separately INSERT the template definition afterward. Between lock release and the INSERT, any concurrent caller can claim the same name.

By contrast, withDisplayNameCreateLock (line 203) supports the correct pattern: the caller passes the INSERT as fn and it runs under the lock. The exported ReserveTemplateDisplayNameForCreate is a footgun because it looks like a complete "check and reserve" API but only does the check. Consider either removing the export (unexport Reserve and use withDisplayNameCreateLock directly in the caller) or documenting prominently that callers must still INSERT under a separate lock.

cubesandboxbot · 2026-06-23T11:15:23Z

+    /// use `tpl-` or `snap-` prefix; cluster-wide case-insensitive unique among templates
+    /// that currently hold the name (assigned when the definition is created;
+    /// released after failed builds). Sandboxes may reference this name only after
+    /// the template build succeeds (template ready on a node).


Defense-in-depth: Name validation is missing on the Rust/CubeAPI side

The name field is a plain Option<String> with no #[validate(...)] annotation. Actual syntax enforcement (character set, length, tpl-/snap- prefix rejection) happens only on the Go/CubeMaster side inside NormalizeTemplateDisplayName. Invalid names are forwarded across the internal HTTP link before being rejected.

Consider adding a #[validate(custom = ...)] validator that mirrors the Go-side rules so CubeAPI rejects bad input at the API boundary. This also keeps the utoipa-annotated constraints schema-accurate at the edge.

cubesandboxbot · 2026-06-23T11:15:58Z

@@ -100,12 +108,14 @@ func (g *templateFetchGroup) Do(key string, fn func() (interface{}, error)) (int
 	g.calls[key] = call


Latent correctness bug: custom singleflight silently returns (nil, nil) on panic

This deferred cleanup runs regardless of whether fn() panics. If the callback panics, call.val and call.err remain at their Go zero values (nil, nil). All goroutines waiting on <-call.done (line 104) wake up and return (nil, nil), receiving a wrong result.

In findTemplateIDByDisplayName (template_display_name.go:131), the result is type-asserted: templateID, _ := result.(string) — a nil interface value produces "", which is then returned as a valid (but wrong) template ID.

Recommend adding recover() in this deferred function that sets call.err and re-panics, or replacing with golang.org/x/sync/singleflight.

cubesandboxbot · 2026-06-23T11:16:23Z

+				return clearDefinitionDisplayNameLocked(ctx, templateID, def)
+			})
+		}
+	}


Cache inconsistency on unexpected GetDefinition error

When GetDefinition returns an unexpected error (transient DB error, not ErrTemplateNotFound), def is nil and the code falls through to clearDefinitionDisplayNameLocked(ctx, templateID, def) at line 290 (now line 290 in the new file). Inside that function (line 265: if def != nil { invalidateTemplateDisplayNameCache(def.DisplayName) }), the cache invalidation is skipped because def is nil. The DB update still runs (setting display_name to ""), but the old display name remains in the in-memory cache until TTL expiry, causing subsequent lookups to return a stale template ID.

Recommend returning the error instead of falling through, or fetching the display name separately for cache invalidation.

cubesandboxbot · 2026-06-23T11:17:02Z

+	if err != nil {
+		code := int(errorcode.ErrorCode_MasterInternalError)
+		retMsg := "internal error"
+		switch {


Enumeration oracle: distinct HTTP codes reveal name state

This error handling block returns distinct status codes for different failure modes:

404 (ErrorCode_NotFound) → name not found

400 (ErrorCode_MasterParamsError) → name exists but is ambiguous or invalid

200 → name exists and resolves to a template ID

An authenticated user can probe whether any specific display name exists, enumerate template names, and distinguish "not found" from "invalid syntax." This feeds into the cluster-wide uniqueness concern — an attacker can build a catalog of template names.

Recommend returning a generic failure code (e.g., 404) for all three lookup failures, and relying on client-side regex validation (mirroring NormalizeTemplateDisplayName) for the create-form UX check instead of API-level discrimination.

cubesandboxbot · 2026-06-23T11:52:22Z

+	}
+	normalized, err := NormalizeTemplateDisplayName(req.DisplayName)
+	if err != nil {
+		log.G(context.Background()).Warnf("template image job display_name normalize failed: %v", err)


Observation: displayNameFromTemplateImageJob (line 382) takes no context.Context parameter and uses context.Background() for logging (lines 388, 393). This severs distributed trace correlation — log lines emitted here have no link to the originating request.

Every other public function in this file accepts a context.Context. Adding one here and propagating it from callers (job_repo.go line 157, image_job_runner.go lines 211-214) would be a small change with significant observability benefit.

cubesandboxbot · 2026-06-23T11:52:24Z


+/// Body for PATCH /templates/:templateID (update display name).
+#[derive(Debug, Deserialize, ToSchema)]
+pub struct UpdateTemplateRequest {


Observation: UpdateTemplateRequest.name is a bare String with no validation annotations. Other request structs in this file (e.g. NewSandbox) use #[derive(Validate)] with constraints. The documented constraints (max 256 chars, ASCII letters/digits/hyphen/underscore only, no tpl-/snap- prefix) are enforced only by CubeMaster, not at the CubeAPI boundary.

This is a defense-in-depth gap: invalid names pass through CubeAPI to CubeMaster before being rejected, costing a network round-trip. A CubeMaster regression in error code mapping could produce unexpected HTTP status codes instead of the documented 400.

cubesandboxbot · 2026-06-23T11:52:25Z

+        CubeMasterError::Http(e) if e.is_timeout() || e.is_connect() => {
+            AppError::ServiceUnavailable(format!("CubeMaster unavailable: {e}"))
+        }
+        CubeMasterError::Http(e) => AppError::BadGateway(format!("CubeMaster request failed: {e}")),


Observation: When CubeMaster returns an unexpected transport error, map_resolve_err formats messages like "CubeMaster unavailable: {e}"/"CubeMaster request failed: {e}" where {e} is the full reqwest::Error Display string. This can include the internal CubeMaster URL (e.g., http://cubemaster.internal:8080/...) and connection details, which are forwarded to the API caller as 502/503 responses.

Suggestion: Log the full error server-side and return a sanitized message like "upstream service unavailable" to avoid leaking internal network topology.

cubesandboxbot · 2026-06-23T11:52:28Z

+
+// ReleaseTemplateDisplayNameAfterBuildFailure clears display_name on a template
+// whose image build failed so the name can be reused on a later create.
+func ReleaseTemplateDisplayNameAfterBuildFailure(ctx context.Context, templateID string) {


Observation: ReleaseTemplateDisplayNameAfterBuildFailure is called on every build failure to release the display name so it can be reused (from failTemplateImageJob in job_repo.go line 150). If this function breaks, failed templates permanently lock their display names, preventing users from retrying under the same name.

This function has no unit tests despite non-trivial logic: it skips on empty templateID, tolerates ErrTemplateNotFound, and logs on other errors. None of these paths are exercised.

Suggestion: Add a unit test covering the success path, the empty-ID skip, and the ErrTemplateNotFound tolerance.

cubesandboxbot · 2026-06-23T11:52:40Z

+	}
+	normalized, err := NormalizeTemplateDisplayName(req.DisplayName)
+	if err != nil {
+		log.G(context.Background()).Warnf("template image job display_name normalize failed: %v", err)


Observation: displayNameFromTemplateImageJob (line 382) takes no context.Context parameter and uses context.Background() for logging (lines 388, 393). This severs distributed trace correlation — any log line emitted here has no link to the originating request.

Every other public function in this file accepts a context.Context. Adding one here and plumbing it from callers (job_repo.go line 157, image_job_runner.go lines 211-214) would be a small change with significant observability benefit.

Suggestion: Add ctx context.Context parameter and propagate from callers.

cubesandboxbot · 2026-06-23T11:52:41Z


+/// Body for PATCH /templates/:templateID (update display name).
+#[derive(Debug, Deserialize, ToSchema)]
+pub struct UpdateTemplateRequest {


Observation: UpdateTemplateRequest.name is a bare String with no validation. Other request structs in this file (e.g. NewSandbox) use #[derive(Validate)] with constraints. The documented constraints (max 256 chars, ASCII letters/digits/hyphen/underscore only, no tpl-/snap- prefix) are enforced only by CubeMaster, not at the CubeAPI boundary.

This is a defense-in-depth gap: invalid names pass through CubeAPI to CubeMaster before being rejected, costing a network round-trip. A CubeMaster regression in error code mapping could produce unpredictable HTTP status codes instead of the documented 400.

Suggestion: Add #[derive(Validate)] with length and regex constraints matching the documented rules, or add a manual validator. The #[validate] derive on CreateTemplateRequest (line 601) is also never called — consider fixing both.

cubesandboxbot · 2026-06-23T11:52:41Z

+        CubeMasterError::Http(e) if e.is_timeout() || e.is_connect() => {
+            AppError::ServiceUnavailable(format!("CubeMaster unavailable: {e}"))
+        }
+        CubeMasterError::Http(e) => AppError::BadGateway(format!("CubeMaster request failed: {e}")),


Observation: When CubeMaster returns an unexpected transport error, map_resolve_err formats messages like "CubeMaster unavailable: {e}"/"CubeMaster request failed: {e}" where {e} is the full reqwest::Error Display string. This can include the internal CubeMaster URL (e.g., http://cubemaster.internal:8080/...) and connection details, which are forwarded to the API caller as 502/503 responses.

Suggestion: Log the full error server-side and return a sanitized message like "upstream service unavailable" to avoid leaking internal network topology.

cubesandboxbot · 2026-06-23T11:52:43Z

+
+// ReleaseTemplateDisplayNameAfterBuildFailure clears display_name on a template
+// whose image build failed so the name can be reused on a later create.
+func ReleaseTemplateDisplayNameAfterBuildFailure(ctx context.Context, templateID string) {


Observation: ReleaseTemplateDisplayNameAfterBuildFailure is called on build failure to release the display name so it can be reused (called from failTemplateImageJob in job_repo.go line 150). If this function breaks, failed templates permanently lock their display names, and users cannot retry under the same name.

This function has no unit tests despite non-trivial logic: it skips on empty templateID, tolerates ErrTemplateNotFound, and logs on other errors. None of these paths are exercised.

Suggestion: Add a unit test covering the success path, the empty-ID skip, and the ErrTemplateNotFound tolerance.

Add cluster-wide template display names (CubeMaster display_name) so sandboxes and APIs can reference templates by human-readable name. CubeMaster owns lookup, reservation at definition creation, bounded positive/negative caches with invalidation, and a DB migration with dedupe; CubeAPI exposes thin name resolution and GET /templates/lookup; Web adds create hints, list/detail rename, and i18n. Assisted-by: Cursor:Composer Signed-off-by: carmake <gycm520@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>

HuChundong requested review from chenhengqi, fslongjin, ls-ggg, tinklone and up2wing as code owners June 22, 2026 06:23

HuChundong force-pushed the feat/template-name-e2b-align branch from d9adc05 to ff87c16 Compare June 22, 2026 07:36

HuChundong force-pushed the feat/template-name-e2b-align branch from ff87c16 to 2ed302c Compare June 22, 2026 08:28

cubesandboxbot Bot reviewed Jun 22, 2026

View reviewed changes

HuChundong force-pushed the feat/template-name-e2b-align branch from 2ed302c to 214dc12 Compare June 22, 2026 08:52

HuChundong marked this pull request as draft June 22, 2026 08:55

HuChundong force-pushed the feat/template-name-e2b-align branch from 214dc12 to f801c11 Compare June 22, 2026 09:06

HuChundong marked this pull request as ready for review June 22, 2026 09:16

HuChundong force-pushed the feat/template-name-e2b-align branch from 19ceabf to b297013 Compare June 22, 2026 09:25

HuChundong commented Jun 22, 2026

View reviewed changes

HuChundong force-pushed the feat/template-name-e2b-align branch from 9438441 to 4483cba Compare June 22, 2026 10:32

cubesandboxbot Bot reviewed Jun 22, 2026

View reviewed changes

HuChundong force-pushed the feat/template-name-e2b-align branch from 4483cba to 52079e1 Compare June 22, 2026 10:55

cubesandboxbot Bot reviewed Jun 23, 2026

View reviewed changes

HuChundong force-pushed the feat/template-name-e2b-align branch from 5e7a6b5 to 4f5faba Compare June 23, 2026 10:44

cubesandboxbot Bot reviewed Jun 23, 2026

View reviewed changes

HuChundong force-pushed the feat/template-name-e2b-align branch from 4f5faba to 6118ff2 Compare June 23, 2026 11:03

cubesandboxbot Bot reviewed Jun 23, 2026

View reviewed changes

HuChundong force-pushed the feat/template-name-e2b-align branch 2 times, most recently from 2954300 to 6983aa7 Compare June 23, 2026 11:25

cubesandboxbot Bot reviewed Jun 23, 2026

View reviewed changes

HuChundong force-pushed the feat/template-name-e2b-align branch from 6983aa7 to 25a5804 Compare June 23, 2026 12:16

HuChundong marked this pull request as ready for review June 23, 2026 12:35


		CALL cubemaster_assert_table_exists('t_cube_template_definition');

		-- Drop invalid reserved-prefix names on templates (snapshots keep their names).

		@@ -20,6 +20,11 @@ import (
		const (
		templateDefinitionCacheTTL = 360 * time.Minute

		@@ -100,12 +108,14 @@ func (g *templateFetchGroup) Do(key string, fn func() (interface{}, error)) (int
		g.calls[key] = call

Uh oh!

Conversation

HuChundong commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture

Test plan

Migration note

Uh oh!

kinwin-ustc commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cubesandboxbot Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

HuChundong commented Jun 22, 2026

Uh oh!

HuChundong left a comment

Choose a reason for hiding this comment

Uh oh!

HuChundong Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

HuChundong Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

HuChundong Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

HuChundong commented Jun 22, 2026

Code review (b297013)

Worth addressing / confirming

Look good

Uh oh!

cubesandboxbot Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: E2B-style display names with CubeMaster name cache

Overview

Key Findings

Minor Items

Positive Notes

Uh oh!

cubesandboxbot Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 23, 2026

Choose a reason for hiding this comment

HuChundong commented Jun 22, 2026 •

edited

Loading

kinwin-ustc commented Jun 22, 2026 •

edited

Loading

Code review (`b297013`)

cubesandboxbot Bot commented Jun 22, 2026 •

edited

Loading