Skip to content

Devops#503

Open
vishwejeet wants to merge 2 commits intoalibaba:mainfrom
superagi:devops
Open

Devops#503
vishwejeet wants to merge 2 commits intoalibaba:mainfrom
superagi:devops

Conversation

@vishwejeet
Copy link

Summary

  • What is changing and why?

Testing

  • Not run (explain why)
  • Unit tests
  • Integration tests
  • e2e / manual verification

Breaking Changes

  • None
  • Yes (describe impact and migration path)

Checklist

  • Linked Issue or clearly described motivation
  • Added/updated docs (if needed)
  • Added/updated tests (if needed)
  • Security impact considered
  • Backward compatibility considered

- Introduced default workspace volume size configuration in StorageConfig.
- Implemented endpoints for retrieving sandbox logs and establishing interactive terminal sessions via WebSocket.
- Enhanced Kubernetes sandbox service to support pausing and resuming workloads, along with PVC management during these operations.
- Updated middleware to allow WebSocket connections without HTTP auth and request ID processing.
- Created a comprehensive runbook detailing the deployment process for SuperSandbox on Kubernetes.
- Included prerequisites, installation steps for gVisor and agent-sandbox-controller, and configuration guidelines.
- Provided verification steps and API quick reference for managing sandboxes.
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d720925f99

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +68 to +72
async def __call__(self, scope, receive, send):
"""Pass WebSocket connections through without HTTP auth."""
if scope["type"] == "websocket":
await self.app(scope, receive, send)
return

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce API-key checks on WebSocket terminal sessions

When server.api_key is configured, this early return exempts every WebSocket handshake from AuthMiddleware, and the new sandbox_terminal() route in server/src/api/lifecycle.py does not perform any manual header validation before opening a shell. That leaves /sandboxes/{id}/terminal reachable without OPEN-SANDBOX-API-KEY, which bypasses the service's only auth control for the most privileged sandbox operation.

Useful? React with 👍 / 👎.

Comment on lines +687 to +691
raise NotImplementedError("Pause is not supported for BatchSandbox provider")

def resume_workload(self, sandbox_id: str, namespace: str) -> None:
"""Resume is not supported for BatchSandbox provider."""
raise NotImplementedError("Resume is not supported for BatchSandbox provider")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid turning default BatchSandbox pause/resume into 500s

On Kubernetes setups that use the default provider selection (create_workload_provider() falls back to batchsandbox), both pause and resume now raise NotImplementedError here. KubernetesSandboxService.pause_sandbox() and resume_sandbox() catch that as a generic exception and translate it to HTTP 500, so the existing pause/resume endpoints regress from a clear 'unsupported' response to an internal-server-error for the default provider.

Useful? React with 👍 / 👎.

Comment on lines +175 to +179
self.k8s_client.create_pvc(
namespace=namespace,
name=pvc_name,
storage_size=self._workspace_volume_size,
labels={SANDBOX_ID_LABEL: sandbox_id},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Tie workspace PVC cleanup to the Sandbox lifecycle

This workspace claim is created as a standalone PVC and only receives a label; nothing in this flow adds an ownerReference after the Sandbox CR is created. As a result, agent-sandbox workloads that expire via shutdownPolicy=Delete or are removed out of band will leak their PVC permanently, because delete_workload() is never called to run the manual cleanup path.

Useful? React with 👍 / 👎.

Comment on lines +1790 to +1794
raise NotImplementedError("Logs are not yet implemented for Docker runtime")

def exec_sandbox_terminal(self, sandbox_id: str, command: str = "/bin/bash"):
"""Open terminal to a Docker sandbox container."""
raise NotImplementedError("Terminal is not yet implemented for Docker runtime")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Return a supported error for Docker logs and terminal

The new /sandboxes/{id}/logs and /sandboxes/{id}/terminal routes are registered unconditionally in server/src/api/lifecycle.py, but the Docker backend now throws raw NotImplementedErrors here. On a Docker deployment, the log endpoint becomes an HTTP 500 and the terminal socket closes with code 1011, which looks like a server failure instead of an unsupported capability.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant