Skip to content

fix(runtime): reject invalid box resource sizing at create boundary (P1-9)#662

Open
lilongen wants to merge 1 commit into
boxlite-ai:mainfrom
lilongen:fix/p1-9-cpus-zero-validation
Open

fix(runtime): reject invalid box resource sizing at create boundary (P1-9)#662
lilongen wants to merge 1 commit into
boxlite-ai:mainfrom
lilongen:fix/p1-9-cpus-zero-validation

Conversation

@lilongen
Copy link
Copy Markdown
Contributor

@lilongen lilongen commented Jun 5, 2026

Problem (P1-9)

Creating a box with cpus=0 (and undersized memory/disk) was silently accepted. create() persisted the box in Configured status and returned a box_id — so the SDK reported success — but the box could never boot: libkrun krun_set_vm_config(0, ...) returns EINVAL (-22). Under auto_remove, the unbootable box then vanished on first start, leaving the caller with a dangling id:

rt.create(BoxOptions(image="ubuntu:24.04", cpus=0), name="...")
# -> box.id = 4936bb66...        (SDK reports success)
rt.get_info("4936bb66...")        # -> None
rt.remove("4936bb66...")          # -> HTTP 404 "Sandbox not found"

Root cause: create_inner() performed no resource-sizing validation before persisting. The existing BoxOptions::sanitize() only checks option combinations (auto_remove/detach, isolate_mounts) and runs in the boot path, not at create.

Fix

Validate resource sizing at the create boundary so every entry point (REST, Python/Node/Go SDK, CLI) fails fast instead of handing back a doomed box_id:

  • New BoxOptions::validate_resources() + consts MIN_CPUS=1, MIN_MEMORY_MIB=256, MIN_DISK_SIZE_GB=1. Only explicitly-set (Some) fields are checked; None keeps the engine default.
  • Called from create_inner() before the box is persisted; returns InvalidArgument → REST maps to HTTP 400 via the existing error table.

Lower-bound validation only (per scope decision); no upper bound added.

Testing

All run locally:

  • Rust unit/integration (cargo test -p boxlite --no-default-features --lib): 2 reproducers through runtime.create() (cpus=0 / memory=128 / disk=0 rejected, valid sizing still creates) + 2 focused validate_resources() unit tests (reject + accept-at-minimum boundary).
  • Two-side verification: reverting the create_inner call line makes both reproducers FAIL (pointing at the bug); restoring makes them PASS.
  • Real REST E2E (boxlite serve + curl): POST /v1/boxes {"cpus":0}HTTP 400 invalid argument: cpus must be >= 1, got 0 (fixed) vs HTTP 201 + later krun_set_vm_config EINVAL(-22) (unfixed).
  • Real Python SDK repro (ticket's exact call): rt.create(BoxOptions(cpus=0)) now raises; valid sizing still creates.
  • make test:unit:python: 114 passed, 0 failed.
  • cargo fmt -p boxlite -- --check and cargo clippy -p boxlite --no-default-features --lib clean.

Note

The exact get_info()=None / remove() 404 symptom requires auto_remove=true (cloud/SDK default), where the failed box is reaped. On local REST (auto_remove=false) the box is preserved as Configured/Failed. Both share the same root cause — this fix cuts it off at create for both paths.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Implemented mandatory resource validation that enforces minimum CPU, memory, and disk space requirements before container creation, preventing invalid undersized configurations from being persisted and subsequently failing to start.
  • Tests

    • Added comprehensive test coverage for resource constraint enforcement, including validation of rejection for insufficient resources and acceptance of configurations at minimum boundaries.

cpus=0 (and undersized memory/disk) were silently accepted by create():
the box was persisted in Configured status and a box_id returned, but it
could never boot — libkrun set_vm_config(0, ...) returns EINVAL — so the
box was unusable and, under auto_remove, vanished on first start. Callers
saw create() "succeed", then get_info() return None and remove() 404
("Sandbox not found"), with no error surfaced by create().

Validate cpus>=1, memory_mib>=256, disk_size_gb>=1 in
BoxOptions::validate_resources(), called from create_inner() before the
box is persisted. This fails fast with InvalidArgument at every entry
point (REST -> HTTP 400, Python/Node/Go SDK, CLI) instead of handing back
a doomed box_id. Only explicitly-set (Some) fields are checked; None keeps
the engine default.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 5, 2026 14:06
@cla-assistant
Copy link
Copy Markdown

cla-assistant Bot commented Jun 5, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


lile seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Rejects invalid resource sizing at create() time to prevent persisting unbootable boxes (e.g., cpus=0) and returning a “dangling” box_id.

Changes:

  • Add BoxOptions::validate_resources() with minimum thresholds for CPU/memory/disk.
  • Call options.validate_resources()? during runtime create() to fail fast.
  • Add unit + async runtime tests covering invalid and boundary-valid sizing.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/boxlite/src/runtime/rt_impl.rs Enforces resource validation during create() and adds regression tests for P1-9 scenarios.
src/boxlite/src/runtime/options.rs Introduces resource minimum constants + validate_resources() and unit tests for validation behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +941 to +945
let err = zero_cpus.validate_resources().unwrap_err();
assert!(
matches!(err, BoxliteError::InvalidArgument(ref m) if m.contains("cpus")),
"cpus=0 should be rejected as InvalidArgument: {err:?}"
);
Comment on lines +350 to +354
// Reject invalid resource sizing at the boundary (P1-9). Otherwise an
// unbootable box (e.g. cpus=0) is persisted, create() returns a
// box_id, and the box silently vanishes on first start under
// auto_remove — leaving the caller chasing a dangling id.
options.validate_resources()?;
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 5, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 8e09896b-0881-48a5-9639-d28448a67f32

📥 Commits

Reviewing files that changed from the base of the PR and between 71e1754 and 7e6f0ba.

📒 Files selected for processing (2)
  • src/boxlite/src/runtime/options.rs
  • src/boxlite/src/runtime/rt_impl.rs

📝 Walkthrough

Walkthrough

This PR adds resource validation to the box creation flow. BoxOptions now defines minimum sizing constants and validates explicitly provided cpus, memory_mib, and disk_size_gb fields. RuntimeImpl::create_inner calls this validation upfront, rejecting invalid configurations before persistence. Comprehensive unit and integration tests cover both undersized rejections and valid boundary cases.

Changes

Resource validation on box creation

Layer / File(s) Summary
BoxOptions resource constraints and validation
src/boxlite/src/runtime/options.rs
BoxOptions defines MIN_CPUS, MIN_MEMORY_MIB, and MIN_DISK_SIZE_GB constants. The validate_resources() method checks only explicitly set fields (cpus, memory_mib, disk_size_gb) and returns BoxliteError::InvalidArgument with field-specific messages when values fall below minimums. Unit tests verify rejection of undersized values including cpus=0 and acceptance of defaults and minimum-boundary values.
Runtime enforcement via create_inner
src/boxlite/src/runtime/rt_impl.rs
RuntimeImpl::create_inner calls options.validate_resources()? immediately after the shutdown check, preventing persistence of unbootable boxes. Integration tests verify create() rejects cpus=0, undersized/zero memory_mib and disk_size_gb, and accepts valid explicit sizing and default configurations.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🐰 Resource bounds now guard the gate,
No box shall start with sizing irate,
Validation early, failures fast,
Ensures our engines forever last! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately describes the main fix: rejecting invalid box resource sizing at the create boundary. It is concise, specific, and clearly conveys the primary change.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

/// Minimum valid vCPU count. `cpus=0` is silently accepted by the type
/// (`u8`) but can never boot — libkrun `set_vm_config(0, ...)` returns
/// EINVAL.
pub const MIN_CPUS: u8 = 1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not good to add the restriction here? every service has it's own custom restriction

Copy link
Copy Markdown
Member

@DorianZheng DorianZheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea to add global restriction to the sdk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants