fix: critical bugs across execd, server, and SDKs#468
Conversation
Post() was using http.MethodPut instead of http.MethodPost due to a copy-paste error from the Put() method. This caused all Jupyter API calls that used client.Post() to send PUT requests instead of POST. Made-with: Cursor
getQueryType() would panic with index-out-of-range when given an empty or whitespace-only query string because strings.Fields() returns an empty slice. Add a length check before accessing the first element. Also add the missing rows.Err() check after the SELECT iteration loop, which is required to detect errors that cause rows.Next() to return false prematurely. Made-with: Cursor
When cmd.Start() fails, the done channel was never closed, causing the two tailStdPipe goroutines to block forever on <-done. Close the channel and wait for the goroutines before returning. Also defer-close stdout and stderr file descriptors returned by stdLogDescriptor(), which were previously never closed. Made-with: Cursor
os.NewFile(uintptr(syscall.Stdin), os.DevNull) wraps the real stdin (fd 0) with a misleading name, so background commands still read from the process stdin. Use os.Open(os.DevNull) to actually open /dev/null. Made-with: Cursor
When engine.Run() fails (e.g., port already in use), the process logged the error but exited with code 0, making it invisible to process supervisors and container orchestrators. Made-with: Cursor
create_sandbox_service() was called twice: once at module level in lifecycle.py (used by all routes) and once in the lifespan handler in main.py (stored in app.state but never referenced). The duplicate instance maintained separate internal state (expiration timers, OSSFS ref counts, Docker client) that was never synchronized with the one actually serving requests. Made-with: Cursor
CodesAdapter was not including self.execd_endpoint.headers when building HTTP headers, unlike CommandsAdapter in the sandbox SDK which correctly spreads them. This omission could cause requests to fail in server-proxy mode where routing headers are required. Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
This PR addresses several correctness and reliability issues in the execd runtime, the lifecycle server startup flow, and Python SDK adapters/model conversion.
Changes:
- Fix execd runtime robustness (SQL empty-query panic prevention,
rows.Err()handling, improved command startup/IO handling, non-zero exit on server start failure). - Fix HTTP method bug in execd Jupyter auth client (
Post()now actually uses POST). - Improve Python SDKs: propagate
execd_endpoint.headersin the code-interpreter adapter and add OSSFS volume backend conversion in the sandbox SDK.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| server/src/main.py | Removes duplicate sandbox service initialization from server lifespan startup path. |
| sdks/sandbox/python/src/opensandbox/adapters/converter/sandbox_model_converter.py | Adds OSSFS backend support when converting domain Volume to API Volume. |
| sdks/code-interpreter/python/src/code_interpreter/adapters/code_adapter.py | Ensures endpoint-required headers are included in execd requests. |
| components/execd/pkg/runtime/sql.go | Prevents panic on empty SQL; adds rows.Err() handling after iteration. |
| components/execd/pkg/runtime/command.go | Improves command file descriptor cleanup and background stdin handling. |
| components/execd/pkg/jupyter/auth/client.go | Fixes Post() to use http.MethodPost instead of PUT. |
| components/execd/main.go | Exits with non-zero status when execd server fails to start. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
|
Thanks for submission, please sign the CLA. 😊 |
|
It looks like there are some linter errors. Could you fix them it? https://github.com/alibaba/OpenSandbox/actions/runs/23200575894/job/67517987478?pr=468 |
to_api_volume() only handled host and pvc backends, silently dropping the ossfs configuration. OSSFS volumes created by users would be sent to the API without their backend config, causing mount failures. Map all OSSFS domain fields (bucket, endpoint, credentials, version, options) to the corresponding API model. Made-with: Cursor
5555915 to
8d40208
Compare
Fixed. The pyright errors were caused by passing |
Pangjiping
left a comment
There was a problem hiding this comment.
LGTM. @ninan-nn Please take a look at this when you have some free time.
Summary
Post()sent PUT requestsrows.Err()checkrunCommandwhencmd.Start()failssandbox_serviceinstantiation in server lifespanexecd_endpoint.headersin code-interpreter adapterTesting
Breaking Changes
Checklist