Skip to content

feat(CubeAPI): inject sandbox env_vars on sandbox creation#634

Open
xiongxz wants to merge 3 commits into
TencentCloud:masterfrom
xiongxz:codex/fix-cubeapi-env-vars-injection
Open

feat(CubeAPI): inject sandbox env_vars on sandbox creation#634
xiongxz wants to merge 3 commits into
TencentCloud:masterfrom
xiongxz:codex/fix-cubeapi-env-vars-injection

Conversation

@xiongxz

@xiongxz xiongxz commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

The SDK's envVars field was silently dropped at sandbox creation: CreateSandboxRequest.containers was hard-coded to vec![], so user-supplied env vars never reached the container. The downstream path (CubeMaster template merge + Cubelet OCI builder) already supports per-container envs — only the CubeAPI wiring was missing. This wires NewSandbox.env_vars into the create request.

Scope

This PR puts create-time env vars into the container's PID-1 OCI Process.Env, so they are inherited by processes that start after those vars are present — cold-boot / rootfs-only paths, services started after a snapshot restore, and non-envd agents/services that read os.environ at startup. It does not affect processes already running at template-snapshot time: a normal Template is booted, runtime/service processes start, and memory is snapshotted; sandbox creation restores from that snapshot rather than re-running the entrypoint, so those restored processes keep their snapshot-time env and do not re-read the per-sandbox vars.

It also does not cover envd-backed command execution (commands.run / run_code). envd's process.Start (e2b-dev/infra packages/envd, internal/services/process/handler/handler.go) builds each command's env from an in-process defaults.EnvVars map plus the per-request envs, taking only PATH from its inherited environment; vars baked into PID-1's env therefore never reach an envd-spawned command. defaults.EnvVars is populated solely by envd's /init endpoint, which is what #566 wires up. The two PRs are complementary layers:

  • fix(CubeAPI): propagate create-time env vars to envd-backed commands #566 — envd-backed command execution (commands.run / run_code); post-create envd /init, snapshot-restore-safe.
  • This PR — lower-level container env propagation for processes actually started after the env vars are present (cold-boot / rootfs-only paths, post-restore-started services, non-envd agents).

Changes

  • CubeAPI/src/services/sandboxes.rs: new build_env_container converts Option<EnvVars> into a single-container ContainerSpec carrying the vars on envs, replacing the previous containers: vec![].
  • The container's image and command/args are intentionally left empty:
    • CubeMaster's applyTemplateToContainer fills the image from the resolved template.
    • Cubelet's command.WithProcessArgs falls back to the image Entrypoint/Cmd, so cube-entrypoint.sh still runs and envd still starts.
  • With no env vars, an empty list is returned — prior behaviour is unchanged.
  • Validation: keys must be non-empty, whitespace-free, with no = and no control characters (is_valid_env_key); values must be non-empty, not whitespace-only, and NUL-free; reserved names (LD_PRELOAD/LD_AUDIT/LD_LIBRARY_PATH/GCONV_PATH/GCONV_MODULES) are dropped. All drops are logged via tracing::warn!.
  • Tests cover the empty-map, populated, all-dropped early-return, reserved+whitespace+invalid filtering, and is_valid_env_key cases.

How it works

POST /sandboxes (envVars) → build_env_container (validate + filter) → ContainerSpec { envs } → CubeMaster template merge (ctr.Envs = append(ctr.Envs, templateCtr.Envs...); the template envs are a copy of the image ENV, appended after the user envs) → Cubelet env.GenOpt feeds first the image ENV, then the container envs, into containerd's oci.WithEnv. oci.WithEnv dedupes by key keeping the last value, writing the result into the OCI Process.Env (PID 1), inherited by envd and all user commands.

Safety

  • Image-owned vars consumed by cube-entrypoint.sh (ENVD_PORT, ENVD_BIN, ENVD_LOG_FILE, ENVD_EXTRA_ARGS) and PATH are protected: the template's Envs (a full copy of the image ENV) is appended after the user's, so the template value wins on key conflicts via containerd's oci.WithEnv last-value-wins dedup (the call site is Cubelet/pkg/container/env/env.go). A user cannot override them through env vars.
  • Non-image dangerous vars (LD_PRELOAD/LD_AUDIT/LD_LIBRARY_PATH/GCONV_PATH/GCONV_MODULES) are not covered by that protection (they are not in the image ENV, so no template copy shadows them), so they are dropped by the reserved-name check. These inject code via the dynamic linker or glibc's gconv mechanism. The check is intentionally case-sensitive (glibc's dynamic linker is case-sensitive). The container is privileged and the owner has root, so this is API-boundary defence-in-depth rather than a hard isolation boundary.
  • HOSTNAME and CUBE_CONTAINER_ID_* are injected by the runtime/Cubelet and cannot be overridden through this path.
  • envd_access_token is a vestigial field (always None), so this change does not affect token auth.

Known limitation

When a user-supplied env var shares a key with an image/template ENV, the template value wins (due to the append order in applyTemplateToContainer plus oci.WithEnv last-value-wins). This does not fully match E2B's "user wins" semantics, but only affects keys that collide with image-owned vars; business env vars (the common case, e.g. OPENAI_API_KEY, SESSION_ID) inject correctly. Fully aligning with E2B requires flipping the CubeMaster append order — to be addressed in a follow-up.

@xiongxz xiongxz force-pushed the codex/fix-cubeapi-env-vars-injection branch from 2ab40af to 8f06d19 Compare June 24, 2026 09:04
@xiongxz

xiongxz commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

All CI checks are passing now. Could a maintainer please review when you have a chance? Thanks.

@kinwin-ustc

Copy link
Copy Markdown
Collaborator

Should we perform some validation on the incoming environment variables? For example, empty values, or key environment variables that customers cannot customize, such as LD_PRELOAD

xiongxz added a commit to xiongxz/CubeSandbox that referenced this pull request Jun 24, 2026
Validate env vars before forwarding them to the container, per the PR
TencentCloud#634 review:
- Drop entries with an empty key or value.
- Drop reserved names a sandbox owner must not set: LD_PRELOAD and
  LD_AUDIT. They are not image-owned, so the template-merge protection
  does not cover them; they would inject a shared library into every
  dynamically linked process in the sandbox (including envd). A warn!
  is logged when any reserved name is dropped.

Also remove a stale unused import surfaced while rebuilding.

Co-Authored-By: Claude <noreply@anthropic.com>
@xiongxz

xiongxz commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

@kinwin-ustc Thanks for the catch — added validation in build_env_container (65addd5):

  • Empty values/keys: dropped (no effect, only noise).
  • Reserved names: LD_PRELOAD and LD_AUDIT are dropped, with a warn! logged when any are. These are not image-owned, so the template-merge protection does not cover them — they would inject a shared library into every dynamically linked process in the sandbox (including envd).

On scope: I considered a broader reserved set (LD_LIBRARY_PATH, HTTP(S)_PROXY/ALL_PROXY, PYTHONPATH/NODE_PATH/...) but kept it to the pure-injection vectors that have no legitimate use. The others all have strong legitimate uses, and the container is privileged (the owner has root), so this is API-boundary defence-in-depth rather than a hard isolation boundary. Happy to expand the list if you'd prefer.

The image-owned vars consumed by cube-entrypoint.sh (ENVD_*) and the Cubelet-injected HOSTNAME/CUBE_CONTAINER_ID_* remain protected downstream (template env + last-write-wins), so they don't need to be listed here.

Comment thread CubeAPI/src/services/sandboxes.rs Outdated
return Vec::new();
};
// Reserved env var names a sandbox owner must not set via the API.
const RESERVED_ENV_KEYS: &[&str] = &["LD_PRELOAD", "LD_AUDIT"];

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: Reserved env var deny-list is critically incomplete

LD_PRELOAD and LD_AUDIT are blocked, but several equally dangerous env vars are missing:

  • LD_LIBRARY_PATH — Directs the dynamic linker to search user-specified directories first. A sandbox user who can write files can hijack libc to inject code into every process (including envd). This is as dangerous as LD_PRELOAD.
  • BASH_ENV / ENV (POSIX) — Bash/POSIX shells execute the file referenced by these variables as a script on non-interactive startup. Direct code injection via system()/popen().
  • IFS — The Internal Field Separator can subvert how shell scripts parse commands, turning safe commands into dangerous ones.
  • GLIBC_TUNABLES — Modern glibc tunables can alter runtime security boundaries.
  • GODEBUG, GOMEMLIMIT, GOGC — Cubelet is written in Go, and these runtime env vars can affect its behavior inside the sandbox.

Recommendation: At minimum add these to RESERVED_ENV_KEYS. A longer-term improvement would be an allow-list pattern (only ^[A-Z_][A-Z0-9_]*$ keys that match known-safe patterns) instead of a deny-list, which is inherently reactive.

let dropped: Vec<&String> = vars
.keys()
.filter(|k| RESERVED_ENV_KEYS.contains(&k.as_str()))
.collect();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code quality: Double traversal of the HashMap

The function iterates vars.keys() once to collect reserved names for the warning log (lines 737-740), then iterates vars.into_iter() a second time (line 749) to filter and build the envs vec. In the common case (no reserved keys), the first pass allocates a Vec<&String> that is immediately dropped.

Recommendation: Merge into a single filter_map pass with a side-effect collection for the warning:

let mut dropped: Vec<String> = Vec::new();
let envs: Vec<EnvVar> = vars
    .into_iter()
    .filter_map(|(key, value)| {
        if key.is_empty() || value.is_empty() {
            return None;
        }
        if RESERVED_ENV_KEYS.contains(&key.as_str()) {
            dropped.push(key);
            return None;
        }
        Some(EnvVar { key, value })
    })
    .collect();

This eliminates the double traversal and defers the dropped allocation until a reserved key is actually encountered.

Comment thread CubeAPI/src/services/sandboxes.rs Outdated
.into_iter()
.filter(|(key, value)| {
!key.is_empty() && !value.is_empty() && !RESERVED_ENV_KEYS.contains(&key.as_str())
})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security + correctness: Missing env var key validation

The filter checks !key.is_empty() but doesn't validate that keys conform to POSIX naming conventions ([A-Za-z_][A-Za-z0-9_]*). Keys containing shell metacharacters ($, `, |, ;, \n), =, or leading/trailing whitespace pass through silently. These could cause issues downstream in Cubelet or the container runtime.

Additionally, whitespace-only values (e.g., " ") pass the !value.is_empty() check but carry no usable content — the doc comment claims such entries are dropped.

Recommendation: Add POSIX key validation and reject whitespace-only keys/values:

fn is_valid_env_key(key: &str) -> bool {
    !key.is_empty()
        && key.len() <= 128
        && key.starts_with(|c: char| c.is_ascii_alphabetic() || c == '_')
        && key.chars().all(|c| c.is_ascii_alphanumeric() || c == '_')
}

Also consider adding a reasonable MAX_ENV_VAR_VALUE_LENGTH (e.g., 4096) to prevent oversized values from causing memory pressure downstream.

}

#[test]
fn env_container_wraps_each_var_as_an_envvar() {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test gap: Whitespace-only keys/values, tracing warning, and ContainerSpec field assertions

A few notable test coverage gaps:

  1. Whitespace-only values: No test exercises keys with whitespace-only values (e.g., ("KEY", " ")) — the filter currently lets them through since " ".is_empty() == false.
  2. tracing::warn! emission: The warning log for reserved keys is never asserted — install a tracing subscriber in the test to catch regressions in the warning format.
  3. Partial ContainerSpec assertion: The happy-path test only checks image.image, command, args, and envs. The remaining fields (name, working_dir, resources, volume_mounts, dns_config, r_limit, security_context, probe, annotations, image.storage_media) are all set to None but never verified, leaving room for silent drift.
  4. No serialization round-trip: The containers field was previously always vec![], and there's no test verifying that a CreateSandboxRequest with a populated containers field serializes to the expected JSON shape for CubeMaster.

}))
}

/// Build the single-container payload that carries the user-supplied env vars

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation: Template-vs-user env var priority differs from E2B semantics

The PR description correctly states that template envs win on key collision — this is verified by applyTemplateToContainer's append order (ctr.Envs = append(ctr.Envs, templateCtr.Envs...)), where template entries come after user entries in the array, and the OCI runtime spec applies last-value-wins.

However, this is the opposite of E2B's behavior, where user-supplied env vars override template defaults. A note in the doc comment would help users familiar with E2B:

/// Note on priority: template env vars are appended after user-supplied
/// ones, so template values win on key collision. This differs from E2B,
/// where user vars override template defaults.

Also, the PR description references containerd's replaceOrAppendEnvValues, but this function is never called in the codebase — the actual mechanism is the OCI runtime spec's last-value-wins semantics. Consider correcting the PR description.

xiongxz and others added 2 commits June 25, 2026 04:12
The SDK's envVars field was silently dropped at sandbox creation:
CreateSandboxRequest.containers was hard-coded to vec![], so user env
vars never reached the container. The downstream path (CubeMaster
template merge + Cubelet OCI builder) already supports container envs,
so only the CubeAPI wiring was missing.

Add build_env_container, which converts NewSandbox.env_vars into a
single-container ContainerSpec carrying the vars on envs. The image and
command are left empty on purpose: CubeMaster's applyTemplateToContainer
fills the image from the template, and Cubelet's command.WithProcessArgs
falls back to the image Entrypoint/Cmd, so cube-entrypoint.sh still runs
and envd still starts. With no env vars we return an empty list,
preserving prior behaviour.

Also fix create_sandbox_request_serializes_lifecycle_flags, which was
missing the required distribution_scope field and broke cargo test on
master.

Co-Authored-By: Claude <noreply@anthropic.com>

Assisted-by: Codex:GPT-5
Signed-off-by: xiongxz <xiuzhang.xiong@lexmount.com>
Validate env vars before forwarding them to the container, per the PR
TencentCloud#634 review:
- Drop entries with an empty key or value.
- Drop reserved names a sandbox owner must not set: LD_PRELOAD and
  LD_AUDIT. They are not image-owned, so the template-merge protection
  does not cover them; they would inject a shared library into every
  dynamically linked process in the sandbox (including envd). A warn!
  is logged when any reserved name is dropped.

Also remove a stale unused import surfaced while rebuilding.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: xiongxz <xiuzhang.xiong@lexmount.com>
@xiongxz xiongxz force-pushed the codex/fix-cubeapi-env-vars-injection branch from 65addd5 to c441da1 Compare June 25, 2026 04:12
@xiongxz

xiongxz commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the review pass — went through all five and acted on each:

#2 (double traversal) — adopted. Merged into a single filter_map pass with dropped_reserved/dropped_invalid collected as side effects; no second iteration of the map.

#3 (key/value validation) — adopted, scoped to correctness not POSIX. Added is_valid_env_key: rejects empty, =-containing, and control-char keys. The = case is the real correctness bug — downstream serialises as KEY=VALUE and strings.Cut (in containerd's oci.WithEnv/replaceOrAppendEnvValues) splits on the first =, so a key like FOO=BAR would silently become key FOO; a value-less entry there is even treated as unset. Whitespace-only values are now dropped too (doc already promised this). I deliberately did not enforce [A-Za-z_][A-Za-z0-9_]*: real keys use lowercase, dashes, dots and digits (app.db.url, npm_config_foo), and a strict allow-list would break legitimate callers for no security gain in a privileged+root container. No length cap for the same reason — happy to add MAX_ENV_VAR_VALUE_LENGTH if you'd like a hard ceiling, but it's policy not correctness.

#4 (tests) — mostly adopted. Added: whitespace-only value dropped, invalid-key (FOO=BAR, \n, \0) dropped, full ContainerSpec field assertions (all the previously-unchecked None fields), and an is_valid_env_key unit test. Skipped the tracing::warn! capture test — there's no in-tree subscriber harness for it, the boilerplate outweighs the value, and the drop behaviour it logs is already covered. Also skipped the serialization round-trip: that exercises serde config on CreateSandboxRequest, not build_env_container, so it belongs elsewhere if we want it.

#5 (docs + E2B priority) — adopted, and you're right that the priority difference deserves more than a footnote. Two things here:

  1. replaceOrAppendEnvValues correction. Good catch — it's never called in this repo; it's an internal of containerd's oci.WithEnv. Rewrote the doc comment and PR body to cite the actual in-repo path: Cubelet/pkg/container/env/env.go builds [image.Env...] ++ [container.Envs...] and feeds it to oci.WithEnv, whose dedup is last-value-wins.

  2. The priority is the reverse of E2B, and here's the full chain. E2B's contract is "user wins" on a key collision; CubeSandbox is the opposite — the image/template value wins:

    • CubeMaster cubeboxutil.go:126 appends the template envs after the user envs: ctr.Envs = append(ctr.Envs, templateCtr.Envs...), and the template Envs is a copy of the image ENV (getCubeboxReqTemplate). So the array is [user...] ++ [image-ENV-copy...].
    • Cubelet env.go then does oci.WithEnv(image.Env) followed by oci.WithEnv(containerEnvs).
    • oci.WithEnv keeps the last value per key, so the image-ENV copy (appended last) overrides the user value on any collision.

    Impact is narrow: only a user var whose key collides with an image-baked ENV (ENVD_*, PATH, or anything the template image sets) is silently overridden. Business vars (OPENAI_API_KEY, SESSION_ID, …) inject correctly because they don't collide. It's also load-bearing for safety — this reversal is exactly why a user can't override ENVD_PORT/ENVD_BIN/PATH through env vars.

    Fully matching E2B ("user wins") means flipping the append order in cubeboxutil.go:126 to append(slices.Clone(templateCtr.Envs), ctr.Envs...). But that removes the merge protection from ENVD_*/HOSTNAME/CUBE_CONTAINER_ID_*, so the deny-list would have to grow to cover them, and the template isn't cached (fresh unmarshal per request), so there's no aliasing risk. That's a larger change than this PR's wiring fix, so I've recorded it as a follow-up rather than bundling it here.

#1 (expand the deny-list) — partially adopted. The container is privileged and the owner already has root, so this list is API-boundary defence-in-depth, not a hard boundary — it stays scoped to pure-injection vectors with no legitimate use. Working through the suggestions:

  • LD_LIBRARY_PATHadded. Genuinely the closest peer to LD_PRELOAD: it can hijack the dynamic linker's search path to load a trojan libc/other lib into every dynamically linked process (including envd). Same no-legitimate-use-that-isn't-already-better-served-directly class as the others.
  • BASH_ENV/ENV/IFS — shell-injection vectors, but the owner has root and already runs arbitrary code; these only add surprise against a non-root sub-process. Defence-in-depth the root owner bypasses; kept off to avoid breaking legitimate shell tooling.
  • GODEBUG/GOMEMLIMIT/GOGC — pushing back. These inject into the container's PID 1 and the user's own code, not into Cubelet (Cubelet is a separate process on the host). A sandbox user running a Go program legitimately sets these; blocking them breaks valid use. envd itself is Rust, not Go. GLIBC_TUNABLES likewise mostly affects the user's own glibc programs.
  • Interpreter load-path vars (PYTHONPATH/NODE_PATH/PERL5OPT/RUBYLIB/JAVA_TOOL_OPTIONS…) — same reasoning: strong legitimate use as standard runtime configuration, and root bypasses them.
  • On allow-list vs deny-list long-term: agreed a deny-list is reactive; a documented allow-pattern is a reasonable follow-up once we've seen what keys real callers send.

The reserved set is now LD_PRELOAD, LD_AUDIT, LD_LIBRARY_PATH.

Comment thread CubeAPI/src/services/sandboxes.rs Outdated
return Vec::new();
};
// Reserved env var names a sandbox owner must not set via the API.
const RESERVED_ENV_KEYS: &[&str] = &["LD_PRELOAD", "LD_AUDIT", "LD_LIBRARY_PATH"];

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security concern: The env var blocklist is missing GCONV_PATH and GCONV_MODULES, which are well-known code-injection vectors via glibc's gconv mechanism. When any process in the container calls iconv_open() (transitively through glibc locale operations, shell builtins, or many language runtimes), glibc searches the paths in GCONV_PATH for a shared library — enabling the same class of code injection as LD_PRELOAD. This has been weaponised in real-world exploits (e.g., CVE-2024-2961).

Suggestion:

const RESERVED_ENV_KEYS: &[&str] = &[
    "GCONV_MODULES",
    "GCONV_PATH",
    "LD_AUDIT",
    "LD_LIBRARY_PATH",
    "LD_PRELOAD",
];

let envs: Vec<EnvVar> = vars
.into_iter()
.filter_map(|(key, value)| {
if RESERVED_ENV_KEYS.contains(&key.as_str()) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security concern: The RESERVED_ENV_KEYS.contains() check performs an exact string match and is_valid_env_key does not trim whitespace from the key. An attacker sending " LD_PRELOAD" (leading space) or "LD_PRELOAD " (trailing space) will pass the key-format check (spaces/tabs are not control characters) and won't match the exact "LD_PRELOAD" literal. Linux execve treats leading spaces as part of the var name (so the dynamic linker ignores it), but if downstream CubeMaster or Cubelet normalizes keys (e.g., strings.TrimSpace), the trimmed key would bypass the API filter entirely.

Suggestion: Trim the key before both the format check and the reserved-list check:

.filter_map(|(key, value)| {
    let key = key.trim().to_string();
    if key.is_empty() {
        dropped_invalid.push(/* ... */);
        return None;
    }
    // ... rest using `key` instead of the original
})

or a control character) from sandbox create request"
);
}
if envs.is_empty() {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test gap: This early-return path (when every env var is dropped during validation, leaving an empty list) is not tested. For example, build_env_container(Some(HashMap::from([("LD_PRELOAD".into(), "x".into())]))) or build_env_container(Some(HashMap::from([("KEEP".into(), "".into())]))) should both return Vec::new(), but neither is verified. Adding a test for this path would ensure the filtering-logic-to-empty-result contract holds.

@xiongxz xiongxz force-pushed the codex/fix-cubeapi-env-vars-injection branch from c441da1 to d1eaa20 Compare June 25, 2026 04:19
/// serialised as `KEY=VALUE` and downstream code splits on the first `=`), or one
/// containing a control character (e.g. a newline, which would break the `KEY=VALUE`
/// line format).
fn is_valid_env_key(key: &str) -> bool {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitespace-only keys pass validation. A key like " " or "\t" is non-empty, contains no =, and contains no control characters, so is_valid_env_key accepts it. This would result in a semantically meaningless env var name being forwarded downstream. Consider also rejecting keys that are (or become after trimming) whitespace-only.

Additionally, values receive almost no validation — only value.trim().is_empty() is checked. A value containing \n, \r, or \0 could be problematic depending on how the container runtime serializes the ContainerSpec. While JSON serialization would escape these correctly, it's worth validating or documenting the assumption.

Comment thread CubeAPI/src/services/sandboxes.rs Outdated
return Vec::new();
};
// Reserved env var names a sandbox owner must not set via the API.
const RESERVED_ENV_KEYS: &[&str] = &["LD_PRELOAD", "LD_AUDIT", "LD_LIBRARY_PATH"];

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reserved-key check is case-sensitive. Linux's ld.so is case-sensitive, so ld_preload won't bypass the protection on Linux. However, for defense-in-depth, consider also normalizing the key case for comparison, or documenting why case sensitivity is intentional.

Completeness: There are many other env vars with similar privilege-escalation potential that are not blocked: LD_DEBUG, GCONV_PATH, GIO_EXTRA_MODULES, JAVA_TOOL_OPTIONS, PYTHONPATH, PERL5LIB, RUBYLIB, BASH_ENV, BASH_FUNC_*. Whether these are exploitable depends on which runtimes exist inside the sandbox, but the current blocklist covers only the three most obvious candidates.

Comment thread CubeAPI/src/services/sandboxes.rs Outdated
Comment on lines +914 to +930
let containers = build_env_container(Some(vars));
assert_eq!(containers.len(), 1);
let container = &containers[0];
// The remaining single-container fields stay unset, matching the
// doc: image/command/args are resolved downstream.
assert!(container.image.image.is_empty());
assert!(container.image.storage_media.is_none());
assert!(container.command.is_none());
assert!(container.args.is_none());
assert!(container.working_dir.is_none());
assert!(container.resources.is_none());
assert!(container.volume_mounts.is_none());
assert!(container.dns_config.is_none());
assert!(container.r_limit.is_none());
assert!(container.security_context.is_none());
assert!(container.probe.is_none());
assert!(container.annotations.is_none());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brittle test assertion. This test asserts that every single field of ContainerSpec is None or empty (image, command, args, working_dir, resources, volume_mounts, dns_config, r_limit, security_context, probe, annotations, and more). Any field added to ContainerSpec will break this test even if build_env_container is still correct. Consider only asserting the fields that build_env_container is responsible for setting (i.e., envs), and spot-checking the rest.

/// serialised as `KEY=VALUE` and downstream code splits on the first `=`), or one
/// containing a control character (e.g. a newline, which would break the `KEY=VALUE`
/// line format).
fn is_valid_env_key(key: &str) -> bool {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitespace-only keys pass validation. A key like " " or "\t" is non-empty, contains no =, and contains no control characters, so is_valid_env_key accepts it. This would result in a semantically meaningless env var name being forwarded downstream. Consider also rejecting keys that are (or become after trimming) whitespace-only.

Additionally, values receive almost no validation — only value.trim().is_empty() is checked. A value containing \n, \r, or \0 could be problematic depending on how the container runtime serializes the ContainerSpec. While JSON serialization would escape these correctly, it's worth validating or documenting the assumption.

Comment thread CubeAPI/src/services/sandboxes.rs Outdated
return Vec::new();
};
// Reserved env var names a sandbox owner must not set via the API.
const RESERVED_ENV_KEYS: &[&str] = &["LD_PRELOAD", "LD_AUDIT", "LD_LIBRARY_PATH"];

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reserved-key check is case-sensitive. Linux's ld.so is case-sensitive, so ld_preload won't bypass the protection on Linux. However, for defense-in-depth, consider also normalizing the key case for comparison, or documenting why case sensitivity is intentional.

Completeness: There are many other env vars with similar privilege-escalation potential that are not blocked: LD_DEBUG, GCONV_PATH, GIO_EXTRA_MODULES, JAVA_TOOL_OPTIONS, PYTHONPATH, PERL5LIB, RUBYLIB, BASH_ENV, BASH_FUNC_*. Whether these are exploitable depends on which runtimes exist inside the sandbox, but the current blocklist covers only the three most obvious candidates.

Comment thread CubeAPI/src/services/sandboxes.rs Outdated
Comment on lines +914 to +930
let containers = build_env_container(Some(vars));
assert_eq!(containers.len(), 1);
let container = &containers[0];
// The remaining single-container fields stay unset, matching the
// doc: image/command/args are resolved downstream.
assert!(container.image.image.is_empty());
assert!(container.image.storage_media.is_none());
assert!(container.command.is_none());
assert!(container.args.is_none());
assert!(container.working_dir.is_none());
assert!(container.resources.is_none());
assert!(container.volume_mounts.is_none());
assert!(container.dns_config.is_none());
assert!(container.r_limit.is_none());
assert!(container.security_context.is_none());
assert!(container.probe.is_none());
assert!(container.annotations.is_none());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brittle test assertion. This test asserts that every single field of ContainerSpec is None or empty (image, command, args, working_dir, resources, volume_mounts, dns_config, r_limit, security_context, probe, annotations, and more). Any field added to ContainerSpec will break this test even if build_env_container is still correct. Consider only asserting the fields that build_env_container is responsible for setting (i.e., envs), and spot-checking the rest.

Address review feedback on build_env_container:

- Merge the reserved-name warning and the value filter into a single
  filter_map pass over the map.
- Add is_valid_env_key: reject empty, whitespace-only, whitespace-padded,
  '='-containing, and control-char keys. A '=' in the key corrupts the
  downstream KEY=VALUE split (containerd splits on the first '='), and a
  value-less entry there is treated as unset; leading/trailing whitespace
  would bypass the reserved-name check under a future downstream trim.
- Drop whitespace-only values and values containing a NUL byte (a NUL
  truncates the KEY=VALUE C string at execve).
- Expand the reserved list to the pure code-injection vectors with no
  legitimate use: LD_PRELOAD, LD_AUDIT, LD_LIBRARY_PATH (dynamic linker),
  and GCONV_PATH, GCONV_MODULES (glibc gconv). The check is intentionally
  case-sensitive — glibc's dynamic linker is case-sensitive.
- Expand tests: whitespace/padded/NUL cases, the all-dropped early-return
  path, and is_valid_env_key coverage; simplify the ContainerSpec assertions
  to the fields this function owns.
- Correct the doc/PR description: the env merge happens in Cubelet's
  env.GenOpt via containerd oci.WithEnv (last-value-wins), not the internal
  replaceOrAppendEnvValues helper.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: xiongxz <xiuzhang.xiong@lexmount.com>
@xiongxz xiongxz force-pushed the codex/fix-cubeapi-env-vars-injection branch from d1eaa20 to f4502d9 Compare June 25, 2026 04:41
@xiongxz

xiongxz commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the second pass — went through all six:

GCONV_PATH / GCONV_MODULES missing — added. Right call; same class as LD_PRELOAD. gconv modules are .sos loaded on iconv_open(), so GCONV_PATH (and GCONV_MODULES, which can reference one by absolute path) are pure code-injection vectors with no legitimate use in a sandbox. Reserved set is now LD_PRELOAD, LD_AUDIT, LD_LIBRARY_PATH, GCONV_PATH, GCONV_MODULES.

" LD_PRELOAD" whitespace bypass — fixed. You noted ld.so is whitespace-sensitive so it's inert on Linux today, but you're right that a future downstream TrimSpace would reopen it. is_valid_env_key now rejects any key with leading/trailing whitespace (and whitespace-only keys), so a padded reserved name is dropped as invalid before the exact-match check ever runs. We reject rather than trim, so valid input is never silently mutated.

Whitespace-only keys / value control chars — keys fixed; values, NUL only. Whitespace-only keys are now rejected (same change as above). For values, I reject only a NUL byte — it truncates the KEY=VALUE C string at execve. \n/\r are forwarded verbatim: they're user data, JSON/protobuf escape them for transport, and the runtime treats them as ordinary bytes. Rejecting them would break legitimate multiline values for no correctness gain.

Early-return test gap — added. env_container_returns_empty_when_all_entries_dropped covers the path where a non-empty map filters down to nothing.

Case-sensitivity — documented, intentionally kept. Since ld.so is case-sensitive, ld_preload is inert and the exact match is correct; added a doc line saying so. On completeness (PYTHONPATH/PERL5LIB/BASH_ENV/GIO_EXTRA_MODULES/…): same reasoning as before — strong legitimate use as runtime config, and the privileged+root owner bypasses them, so they stay out. The list stays scoped to pure injection vectors with no legitimate use.

Brittle full-field assertion — simplified. Fair point (and I'll note it partly contradicts the earlier ask for full-field assertions). Trimmed it to the contract this function owns: one container, image left empty, envs populated with the survivors. Dropped the exhaustive None list so it won't break on an unrelated ContainerSpec field.

@zyl1121

zyl1121 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

This looks very close to #566, but the injection point is different.

My understanding is that this PR injects create-time env vars into the container / OCI env, while #566 focused on envd initialization for envd-backed command execution. My earlier concern with the container-level approach was that if envd is already started on the template-based startup path, commands.run may still not see the injected envs even though later-started processes do.

Did you happen to verify this specifically on a real cluster for the post-create execution paths? In particular, after Sandbox.create(envs=...) returns, can commands.run(...) immediately read those env vars? If you also checked whether it affects run_code, that result would be very helpful too.

@xiongxz

xiongxz commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the careful read — and you're right to flag this. I traced it into the envd source (e2b-dev/infra packages/envd, which is what /usr/bin/envd is built from per docker/Dockerfile.cube-base) and confirmed the concern: commands.run / run_code will not see these vars through this PR alone.

envd's process.Start (internal/services/process/handler/handler.go:147-170) builds each command's env from defaults.EnvVars (an in-process global) plus the per-request envs, taking only PATH from its inherited environment. So vars placed in the container's PID-1 OCI env — which is what this PR does — never reach an envd-spawned command. defaults.EnvVars is populated solely by envd's /init endpoint (internal/api/init.go:189-194), which is precisely the mechanism #566 wires up. For the envd-backed execution path, #566 is the correct fix and I don't intend this PR to cover it.

Where this PR does carry weight — and where #566 can't help — is processes started independently in the sandbox: a separate agent or service with its own process and port that the template image launches, rather than something envd spawns per-command. Those inherit the container's PID-1 environment normally, so the create-time vars are visible to them from the moment the container starts. #566's /init only feeds envd's internal defaults.EnvVars, which is consumed solely by envd's own process.Start and does not propagate to processes started outside envd. So the two are genuinely complementary, at different layers:

One concrete benefit worth calling out: the common agent pattern is to read config from env at process startup (12-factor). Because this PR bakes the vars into PID-1's env at creation time, such a process sees them immediately on start — there's no post-create injection step and no race window between container start and a later /init. Today we work around the absence of this by exposing an /init-style endpoint on the agent's own port and pushing envs after the sandbox is up; this PR removes that adaptation.

Happy to add a short "Scope" note to the PR description making the commands.run limitation explicit and pointing at #566, if that's useful.

@zyl1121

zyl1121 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Thanks for the clarification. I agree these two PRs are complementary, but I still have one concern about the wording / scope.

The key nuance is CubeSandbox's Template lifecycle: a normal Template is not just a rootfs image, it goes through Boot & Snapshot first. The VM is cold-booted, runtime/service processes may already be started, and then the memory/state is snapshotted. Later sandbox creation restores from that snapshot instead of freshly starting the template entrypoint again. Because of that, processes already alive at template snapshot time usually will not re-read create-time OCI Process.Env during sandbox creation, so the effect may not match the expectation that all template services are newly started and will pick up those env vars.

So my understanding is: #566 fixes the immediate envd-backed execution path by initializing envd's runtime env state after sandbox creation, while #634 improves the lower-level container env propagation path for cases where processes are actually started after those env vars are present, such as cold-boot/rootfs-only paths, services started after restore, or future non-envd agents/services. That split makes sense to me; maybe it would be worth considering narrowing the “template-launched services” wording in the PR description a bit, to avoid implying that already-running snapshot-restored services will also re-read these env vars.

@xiongxz

xiongxz commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Good catch — that's a real nuance and I'd over-stated it. The snapshot lifecycle is exactly as you describe: a normal Template is booted, runtime/service processes start, and memory is snapshotted; sandbox creation restores from that snapshot rather than re-running the entrypoint, so processes already alive at snapshot time keep their snapshot-time env and won't re-read the per-sandbox vars.

I've narrowed the Scope wording in the PR description to match the split you laid out:

  • fix(CubeAPI): propagate create-time env vars to envd-backed commands #566 — envd-backed execution (commands.run / run_code); post-create envd /init, snapshot-restore-safe.
  • This PR — lower-level container env propagation for processes that actually start after the env vars are present (cold-boot / rootfs-only paths, services started after a snapshot restore, non-envd agents/services), with an explicit note that snapshot-restored already-running processes don't re-read them.

Thanks — that keeps the description from implying more than it delivers.

@xiongxz

xiongxz commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Hi @kinwin-ustc — gentle nudge to take another look when you have a moment.

Your earlier point on validation is now in place: env-var keys must be non-empty, whitespace-free, =-free, and control-char-free, values are NUL-free, and a reserved-name list (LD_PRELOAD / LD_AUDIT / LD_LIBRARY_PATH / GCONV_PATH / GCONV_MODULES) drops the pure code-injection vectors — all drops are logged via tracing::warn!. The two rounds of bot review and @zyl1121's commands.run point have been addressed too, and the description now has an explicit Scope section making clear what this PR covers (PID-1 container env, inherited by processes started after the vars are present) vs. what's left to #566 (envd-backed commands.run / run_code).

All CI checks are green and tests pass. Is there anything else you'd like added or adjusted before this is ready to merge? Happy to handle whatever's outstanding.

@kinwin-ustc

Copy link
Copy Markdown
Collaborator

We already have a method to inject environment variables when creating templates, which can truly inject environment variables into all programs in the sandbox, including programs pre-launched during snapshot creation, as well as programs launched after the sandbox is established. This has a wider applicability than the method in this PR. I would like to understand the real demand scenarios covered by this PR.

@xiongxz

xiongxz commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Thanks — you're right that the template-time path (Template.build(envs=...) / --env, baked into the snapshot via container_overrides.envs in template_request.go) is broader: it reaches both pre-launched and post-create processes.

The gap this PR targets is per-sandbox env vars that differ on each Sandbox.create()OPENAI_API_KEY, SESSION_ID, per-tenant config. Template envs are frozen per template (identical for every sandbox), so they can't carry per-instance values; standing up a separate template per value isn't viable (cold-boot + snapshot + replication each time).

Is there an existing per-sandbox path I've missed? Sandbox.create(env_vars=...) exists in the SDK but is currently a no-op — CubeAPI drops envVars (containers: vec![]), which is exactly what this PR wires up. If the template mechanism or another path truly covers per-sandbox envs, we'll happily close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants