Skip to content

Feat/webCaseCenter#615

Open
liciazhu wants to merge 45 commits into
TencentCloud:masterfrom
liciazhu:feat/webCaseCenterV5
Open

Feat/webCaseCenter#615
liciazhu wants to merge 45 commits into
TencentCloud:masterfrom
liciazhu:feat/webCaseCenterV5

Conversation

@liciazhu

Copy link
Copy Markdown
Contributor

add cube web case center and mod web sandbox detail

@liciazhu liciazhu force-pushed the feat/webCaseCenterV5 branch 3 times, most recently from b9cb05d to 321e017 Compare June 25, 2026 01:52
.env("CUBE_API_ENVD_AUTH", &self.envd_auth)
.current_dir(&work_dir);

if std::env::var("CUBE_API_KEY").is_err() {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: Hardcoded default CUBE_API_KEY

When CUBE_API_KEY is not set in the parent process environment, the example subprocess receives a well-known default key (cube_0000000000...). Any user who triggers an example run can extract this key from the process environment and use it to authenticate arbitrary API calls.

Either remove the fallback entirely (scripts should handle a missing key gracefully) or use an obviously non-functional sentinel (e.g., empty string) that scripts can detect and refuse to use.

}

// 3. Any healthy/ready template
if let Ok(tpls) = templates.list_templates().await {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance: Template validation iterates healthy candidates twice + makes 2N gRPC calls

Line 448 chains tpls.iter() (the full set) after the filtered subset, so every template is visited at least once and healthy/ready templates are validated twice. The inner templates.get_template() call is a gRPC round-trip to CubeMaster — in the worst case this makes 2N sequential gRPC calls.

The fix: replace .chain(tpls.iter()) with .chain(tpls.iter().filter(|t| t.status != "healthy" && t.status != "ready")), or better, inline the status check from list_templates and skip get_template entirely for templates already confirmed as healthy/ready.

Comment thread CubeAPI/src/services/examples.rs Outdated
}
};

let fingerprint = {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DefaultHasher uses random seeds per-process — cache misses after every restart

DefaultHasher::new() uses RandomState with per-process seeds, so the same requirements.txt content produces a different hash after every server restart. The fingerprint stamp file written by a previous process lifetime will never match, causing pip3 install to rerun unconditionally on the first example run after each restart.

Replace with a deterministic hash (e.g., SipHasher::new_with_keys(0, 0)) or simply compare the file content directly for these trivially small requirements files.

Comment thread CubeAPI/src/db.rs Outdated
.map_err(anyhow::Error::from)
}

#[allow(dead_code)]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code with explicit #[allow(dead_code)] — will drift from schema

This method is never called (explicitly acknowledged by the attribute). DB access methods that go unused will silently drift as columns are renamed/added/removed, then fail when someone finally calls them. Either wire it into a real caller or remove it.

Comment thread CubeAPI/src/config/mod.rs
fn default_sandbox_proxy_url() -> String {
std::env::var("AGENTHUB_SANDBOX_PROXY_URL").unwrap_or_else(|_| "http://127.0.0.1".to_string())
}
fn default_envd_auth() -> String {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: Default envd_auth is a well-known base64 value

The default Basic cm9vdDo= decodes to root: with an empty password — the envd built-in default. This shared credential is sent to every sandbox's Jupyter/envd endpoint. Anyone who learns it (including by running code in their own sandbox and observing request headers) can access the in-sandbox services of other sandboxes. Use per-sandbox credentials (the sandbox detail already includes an envd_access_token field) instead of a cluster-wide shared secret.

// Pull all images in parallel, then inspect each.
let handles: Vec<_> = STORE_IMAGES
/// Pulls all store images from the registry then returns the updated metadata.
pub async fn refresh_store_meta(State(state): State<AppState>) -> impl IntoResponse {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security & performance: Concurrent Docker pulls with no rate limiting

This spawns one docker pull --quiet per catalog image concurrently via spawn_blocking. With 6+ images (some multi-GB), this can saturate Docker daemon capacity and network bandwidth. Also, POST /store/refresh should use with_auth_and_rate_limit in the router (like the node isolation endpoint) rather than plain with_auth. Consider adding a semaphore to cap concurrent pulls (e.g., 2-3 at a time).

.env("CUBE_API_ENVD_AUTH", &self.envd_auth)
.current_dir(&work_dir);

if std::env::var("CUBE_API_KEY").is_err() {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: Hardcoded default CUBE_API_KEY

When CUBE_API_KEY is not set in the parent process environment, the example subprocess receives a well-known default key (cube_0000000000...). Any user who triggers an example run can extract this key from the process environment and use it to authenticate arbitrary API calls.

Either remove the fallback entirely (scripts should handle a missing key gracefully) or use an obviously non-functional sentinel (e.g., empty string) that scripts can detect and refuse to use.

}

// 3. Any healthy/ready template
if let Ok(tpls) = templates.list_templates().await {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance: Template validation iterates healthy candidates twice + makes 2N gRPC calls

Line 448 chains tpls.iter() (the full set) after the filtered subset, so every template is visited at least once and healthy/ready templates are validated twice. The inner templates.get_template() call is a gRPC round-trip to CubeMaster — in the worst case this makes 2N sequential gRPC calls.

The fix: replace .chain(tpls.iter()) with .chain(tpls.iter().filter(|t| t.status != "healthy" && t.status != "ready")), or better, inline the status check from list_templates and skip get_template entirely for templates already confirmed as healthy/ready.

Comment thread CubeAPI/src/services/examples.rs Outdated
}
};

let fingerprint = {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DefaultHasher uses random seeds per-process → cache misses after every restart

DefaultHasher::new() uses RandomState with per-process seeds, so the same requirements.txt content produces a different hash after every server restart. The fingerprint stamp file written by a previous process lifetime will never match, causing pip3 install to rerun unconditionally on the first example run after each restart.

Replace with a deterministic hash (e.g., SipHasher::new_with_keys(0, 0)) or simply compare the file content directly for these trivially small requirements files.

Comment thread CubeAPI/src/examples/registry.rs Outdated
/// calls return the same `&'static` slice, so the front-end gets a stable
/// ordering without the borrow checker complaining about temporaries.
pub fn scenario_registry() -> &'static [ScenarioSpec] {
Box::leak(Box::new(vec![

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scenario_registry() currently allocates and leaks a new Vec<ScenarioSpec> on every call, even though the comment says the registry is materialized once. Endpoints such as list, source, and run can call this repeatedly, so a long-lived API process will grow memory over time. Please store the vector behind OnceLock/LazyLock and return the initialized slice, ideally with a regression test that repeated calls reuse the same registry.

Comment thread CubeAPI/src/services/exec.rs Outdated
let start = Instant::now();

let output = match req.language.as_str() {
"python" => self.exec_python(sandbox_id, domain, &req.code).await?,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeout_secs is only used after the Jupyter/envd request has already returned; the service still awaits exec_python/exec_bash without a real deadline. A hung or slow sandbox request can keep the handler occupied, and larger user timeouts will still be cut off by the outer 30s route timeout. Please wrap the actual execution future in tokio::time::timeout and align the route timeout with the allowed request budget.


let start = Instant::now();
let max_secs = sc.timeout_secs.unwrap_or(120);
let run_result = timeout(Duration::from_secs(max_secs), cmd.output()).await;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timing out cmd.output() drops the future but does not explicitly kill the spawned process or its process tree, so long-running Python/Go/Chromium children can continue after the API has returned. This path also installs requirements.txt during a request, mutating the server's Python environment and using a source-tree stamp file without concurrency control. Please spawn and kill/wait the child on timeout, and move dependency installation to build/deploy time or an isolated, locked cache that fails closed.

Comment thread CubeAPI/src/routes.rs Outdated
fn build_examples_run_routes(state: &AppState, auth_configured: bool) -> Router<AppState> {
let routes = Router::new().route("/examples/run", post(examples::run_example));
if auth_configured {
routes.layer(middleware::from_fn_with_state(state.clone(), unified_auth))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

POST /examples/run is moved to a long-timeout router with auth only, but it no longer gets the rate limiting used by other resource-creating or execution endpoints. Because this route starts local processes and can create sandboxes or browser sessions for up to several minutes, missing rate limits increases abuse and resource-exhaustion risk. Please apply the existing auth+rate-limit layer here, or add a dedicated concurrency/rate budget for example runs.

// `hidden: true` filter so even an attacker who knows the IDs cannot reach
// these scenarios through the HTTP API.

export const EXAMPLE_SCENARIOS: ExampleScenario[] = [

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds a second hand-maintained example registry in the frontend while the backend registry also defines scenario IDs, files, categories, store items, visibility, and run metadata. The two sources can drift, producing UI entries that cannot run or backend examples with stale labels/topology in the UI. Please make the backend /examples response include the UI metadata, or generate both Rust and TS registries from a shared data source.

pub async fn exec_code(
State(state): State<AppState>,
Path(sandbox_id): Path<String>,
Json(body): Json<ExecCodeRequest>,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExecCodeRequest derives Validate, but this handler only extracts Json(body) and never calls body.validate() before executing the request. As a result, the documented code length constraint is not enforced and the API can accept payloads larger than the model contract suggests. Please validate here and map failures to a 400 response, or use a shared validated-JSON extractor.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds the compiled examples/cube-bench/cube-bench binary alongside the Go source. Committing the build artifact substantially increases PR size, is platform/architecture-specific, and makes supply-chain review noisier. Please remove the binary and add an ignore rule for this executable or for example build outputs.

Comment thread CubeAPI/src/services/examples.rs Outdated
exit_code,
success,
elapsed_ms,
steps: Vec::new(),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The run response now exposes steps, and the frontend adds a StepTimeline, but the service always returns steps: Vec::new() and the page does not render the timeline from backend data. This leaves a partially implemented API/UI surface that users cannot observe and future maintainers may assume is complete. Please either generate and render real step logs, or remove the unused field/component/comment until the feature is wired through.

"""

with Sandbox.create(template=template_id) as sandbox:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This added blank line contains trailing whitespace, so git diff --check still fails for the PR. Please remove the stray spaces (and any other diff-check formatting issues) to avoid CI or review noise.

Comment thread web/src/pages/SandboxCases.tsx Outdated
</div>
);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git diff --check also reports this as a new blank line at EOF. Please remove the extra trailing blank line so the PR is clean for whitespace checks.

@liciazhu liciazhu force-pushed the feat/webCaseCenterV5 branch from 6cfb21c to 488bea9 Compare June 25, 2026 07:33
liciazhu added 2 commits June 25, 2026 17:11
Signed-off-by: liciazhu <liciazhu@tencent.com>
Signed-off-by: liciazhu <liciazhu@tencent.com>
liciazhu added 21 commits June 25, 2026 17:11
Signed-off-by: liciazhu <liciazhu@tencent.com>
Signed-off-by: liciazhu <liciazhu@tencent.com>
Signed-off-by: liciazhu <liciazhu@tencent.com>
Signed-off-by: liciazhu <liciazhu@tencent.com>
Signed-off-by: liciazhu <liciazhu@tencent.com>
Signed-off-by: liciazhu <liciazhu@tencent.com>
Signed-off-by: liciazhu <liciazhu@tencent.com>
Signed-off-by: liciazhu <liciazhu@tencent.com>
…SandboxCases

Signed-off-by: liciazhu <liciazhu@tencent.com>
Signed-off-by: liciazhu <liciazhu@tencent.com>
Signed-off-by: liciazhu <liciazhu@tencent.com>
Signed-off-by: liciazhu <liciazhu@tencent.com>
… prefix

Signed-off-by: liciazhu <liciazhu@tencent.com>
…put, centralise credentials

Signed-off-by: liciazhu <liciazhu@tencent.com>
…stry

Signed-off-by: liciazhu <liciazhu@tencent.com>
Signed-off-by: liciazhu <liciazhu@tencent.com>
Signed-off-by: liciazhu <liciazhu@tencent.com>
…licy

Signed-off-by: liciazhu <liciazhu@tencent.com>
POST /store/refresh previously spawned an unbounded number of
concurrent blocking tasks — one docker-pull per catalog entry —
which could saturate Docker daemon and network bandwidth when the
catalog is large.

Introduce a tokio Semaphore (MAX_CONCURRENT_PULLS = 3) so that at
most 3 docker pulls run simultaneously. Each catalog image acquires
a permit before entering the spawn_blocking call and releases it
automatically on drop.

Assisted-by: CodeBuddy:claude-sonnet-4-6
Signed-off-by: liciazhu <liciazhu@tencent.com>
…ement Improvement in scenario_registry();Consistent Auth + Rate Limiting on /examples/run;Remove Committed Binary from Repository;Whitespace Formatting Fixes

Signed-off-by: liciazhu <liciazhu@tencent.com>
…xec-code

Signed-off-by: liciazhu <liciazhu@tencent.com>
@liciazhu liciazhu force-pushed the feat/webCaseCenterV5 branch from 488bea9 to 2f8f47f Compare June 25, 2026 09:12
Signed-off-by: liciazhu <liciazhu@tencent.com>
.env("CUBE_TEMPLATE_ID", &template_id)
.env("SSL_CERT_FILE", ssl_cert)
.env("AGENTHUB_SANDBOX_PROXY_URL", &self.sandbox_proxy_url)
.env("CUBE_API_ENVD_AUTH", &self.envd_auth)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: envd_auth credential leaked into example subprocess environment

The full envd_auth header value (credential for the sandbox proxy) is injected as the CUBE_API_ENVD_AUTH environment variable into every example subprocess. Example scripts (including user-edited code in req.code) can read os.environ["CUBE_API_ENVD_AUTH"] and authenticate as root to the sandbox proxy, bypassing API-level authorization.

Impact: Escalation from "can run an example" to "can execute arbitrary code inside any sandbox."

Suggestion: Do not pass envd_auth to subprocesses. If example scripts need proxy access, create a scoped time-limited token instead.

Comment on lines +429 to +448
if let Ok(tpls) = templates.list_templates().await {
let matched = tpls.iter().find(|t| {
(t.status == "healthy" || t.status == "ready")
&& t.image_info.as_deref() == Some(image_ref.as_str())
});
if let Some(t) = matched {
tracing::info!(
store_item_id = %sid,
image = %image_ref,
template_id = %t.template_id,
"matched template via store_item_id"
);
return Ok(t.template_id.clone());
}
}
}
}

// 3. Any healthy/ready template
if let Ok(tpls) = templates.list_templates().await {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance: N+1 HTTP cascade in template resolution

resolve_template_id performs up to ~15 sequential HTTP round-trips to CubeMaster per example run: up to 3 individual get_template calls for candidates (line ~405), then a list_templates call (line ~429), then another list_templates call (line ~448, duplicating the result from line ~429), then iterates all returned templates calling get_template individually for each.

Impact: Latency floor of O(N) sequential network hops to CubeMaster on every example run.

Suggestion:

  1. Call list_templates once and reuse the result
  2. Check the list's own status field rather than re-verifying each candidate with get_template (the list already returns status)
  3. Use tokio::try_join_all for candidate probing in the first step

Comment thread CubeAPI/src/config/mod.rs
std::env::var("CUBE_API_DEFAULT_KEY")
.ok()
.filter(|s| !s.is_empty())
.or_else(|| Some("cube_0000000000000000000000000000000000000000".to_string()))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: Hardcoded fallback API key

When CUBE_API_DEFAULT_KEY is not set, this fallback value cube_0000000000000000000000000000000000000000 is used as an API key and injected into every example subprocess. This is a known static key shared across all deployments that don't explicitly configure it — effectively a backdoor credential.

Suggestion: Remove the hardcoded fallback and require explicit configuration. Log a loud warning at startup if the env var is unset.

className="jupyter-svg-output max-h-[300px] overflow-auto rounded-md bg-white p-2 ring-1 ring-border/60"
dangerouslySetInnerHTML={{
__html: DOMPurify.sanitize(result.svg, {
USE_PROFILES: { svg: true, svgFilters: true },

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: SVG XSS via DOMPurify's SVG profile

Jupyter result.svg payloads are sanitized with DOMPurify.sanitize(result.svg, { USE_PROFILES: { svg: true, svgFilters: true } }). The SVG profile allows <script>, event handlers (onload, etc.), and <use> with external references. An SVG like <svg><script>fetch(document.cookie)</script></svg> would execute in the WebUI origin.

Impact: Stored XSS — any user who can produce a Jupyter SVG output can run JS in administrators' browsers.

Suggestion: Either strip SVG output entirely (render as text), or use a stricter sanitizer that removes scripts, event handlers, and external references from SVGs.

const out: Node<TopologyNodeData>[] = [];
for (let col = 0; col < layers.length; col++) {
const layer = layers[col];
const controlInLayer = layer.filter((id) => nodes.find((n) => n.id === id)?.plane === 'control');

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance: O(N²) scan in topology layout

nodes.find((n) => n.id === id) is called in the innermost loop of layoutNodes, scanning the entire nodes array for each ID in each layer. For a topology with ~20 nodes this is fine, but the pattern repeats 3× per layer (control filter, data filter, node lookup).

Suggestion: Build a Map<string, LocalizedNode> outside the loop and use .get(id) — reduces from O(N²) to O(N) and is a one-line change.

if program == "go" {
let dir_name = format!(".tmp_run_{}", Uuid::new_v4());
let dir = base_dir.join(&dir_name);
std::fs::create_dir_all(&dir).map_err(|e| {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance: Blocking std::fs calls on async runtime

Filesystem operations (create_dir_all, write, read_to_string, remove_dir_all, remove_file, copy) run directly on the tokio async thread. Under high concurrency, each blocks a thread that could otherwise process requests.

Suggestion: Use tokio::fs::* equivalents or wrap in tokio::task::spawn_blocking.

/// - `stderr` → `{ "type": "stderr", "text": "..." }`
/// - `result` → `{ "type": "result", "text": "...", ... }`
/// - `error` → `{ "type": "error", "name": "...", "value": "...", "traceback": [...] }`
fn parse_jupyter_ndjson(bytes: &[u8]) -> AppResult<JupyterOutput> {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test Coverage: Untested pure parsing functions

This file contains several pure, deterministic parsing functions — parse_jupyter_ndjson (line 295), parse_connect_stream (line 374), build_response (line 264), connect_envelope (line 366), decode_b64_lossy (line 427), and parse_exit_status (line 434) — with zero test coverage. These parse raw binary streams from envd/Jupyter; malformed input would crash the handler or produce wrong results.

The rest of the crate has an established #[cfg(test)] pattern (12+ files use it). These parsing functions are ideal candidates for unit tests as they require no network, subprocess, or database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants