Skip to content

fix: critical bugs across execd, server, and SDKs#468

Merged
ninan-nn merged 8 commits intoalibaba:mainfrom
wishhyt:fix/critical-bugs
Mar 18, 2026
Merged

fix: critical bugs across execd, server, and SDKs#468
ninan-nn merged 8 commits intoalibaba:mainfrom
wishhyt:fix/critical-bugs

Conversation

@wishhyt
Copy link
Contributor

@wishhyt wishhyt commented Mar 17, 2026

Summary

  • Fix copy-paste bug in execd auth client where Post() sent PUT requests
  • Fix panic on empty SQL query and missing rows.Err() check
  • Fix goroutine/fd leaks in runCommand when cmd.Start() fails
  • Fix background command stdin still reading from real stdin instead of /dev/null
  • Exit with non-zero code when execd server fails to start
  • Remove duplicate sandbox_service instantiation in server lifespan
  • Add missing execd_endpoint.headers in code-interpreter adapter
  • Add OSSFS volume backend support in SDK model converter

Testing

  • Unit tests (existing tests not broken)
  • Manual verification of each fix

Breaking Changes

  • None

Checklist

  • Clearly described motivation per commit
  • Each commit is atomic and focused
  • Security impact considered
  • Backward compatibility considered

wishhyt added 7 commits March 17, 2026 22:48
Post() was using http.MethodPut instead of http.MethodPost due to a
copy-paste error from the Put() method. This caused all Jupyter API
calls that used client.Post() to send PUT requests instead of POST.

Made-with: Cursor
getQueryType() would panic with index-out-of-range when given an empty
or whitespace-only query string because strings.Fields() returns an
empty slice. Add a length check before accessing the first element.

Also add the missing rows.Err() check after the SELECT iteration loop,
which is required to detect errors that cause rows.Next() to return
false prematurely.

Made-with: Cursor
When cmd.Start() fails, the done channel was never closed, causing
the two tailStdPipe goroutines to block forever on <-done. Close the
channel and wait for the goroutines before returning.

Also defer-close stdout and stderr file descriptors returned by
stdLogDescriptor(), which were previously never closed.

Made-with: Cursor
os.NewFile(uintptr(syscall.Stdin), os.DevNull) wraps the real stdin
(fd 0) with a misleading name, so background commands still read from
the process stdin. Use os.Open(os.DevNull) to actually open /dev/null.

Made-with: Cursor
When engine.Run() fails (e.g., port already in use), the process
logged the error but exited with code 0, making it invisible to
process supervisors and container orchestrators.

Made-with: Cursor
create_sandbox_service() was called twice: once at module level in
lifecycle.py (used by all routes) and once in the lifespan handler
in main.py (stored in app.state but never referenced). The duplicate
instance maintained separate internal state (expiration timers, OSSFS
ref counts, Docker client) that was never synchronized with the one
actually serving requests.

Made-with: Cursor
CodesAdapter was not including self.execd_endpoint.headers when
building HTTP headers, unlike CommandsAdapter in the sandbox SDK which
correctly spreads them. This omission could cause requests to fail in
server-proxy mode where routing headers are required.

Made-with: Cursor
Copilot AI review requested due to automatic review settings March 17, 2026 14:55
@CLAassistant
Copy link

CLAassistant commented Mar 17, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses several correctness and reliability issues in the execd runtime, the lifecycle server startup flow, and Python SDK adapters/model conversion.

Changes:

  • Fix execd runtime robustness (SQL empty-query panic prevention, rows.Err() handling, improved command startup/IO handling, non-zero exit on server start failure).
  • Fix HTTP method bug in execd Jupyter auth client (Post() now actually uses POST).
  • Improve Python SDKs: propagate execd_endpoint.headers in the code-interpreter adapter and add OSSFS volume backend conversion in the sandbox SDK.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
server/src/main.py Removes duplicate sandbox service initialization from server lifespan startup path.
sdks/sandbox/python/src/opensandbox/adapters/converter/sandbox_model_converter.py Adds OSSFS backend support when converting domain Volume to API Volume.
sdks/code-interpreter/python/src/code_interpreter/adapters/code_adapter.py Ensures endpoint-required headers are included in execd requests.
components/execd/pkg/runtime/sql.go Prevents panic on empty SQL; adds rows.Err() handling after iteration.
components/execd/pkg/runtime/command.go Improves command file descriptor cleanup and background stdin handling.
components/execd/pkg/jupyter/auth/client.go Fixes Post() to use http.MethodPost instead of PUT.
components/execd/main.go Exits with non-zero status when execd server fails to start.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@Pangjiping
Copy link
Collaborator

Thanks for submission, please sign the CLA. 😊

@Pangjiping
Copy link
Collaborator

Pangjiping commented Mar 18, 2026

It looks like there are some linter errors. Could you fix them it?

https://github.com/alibaba/OpenSandbox/actions/runs/23200575894/job/67517987478?pr=468

to_api_volume() only handled host and pvc backends, silently dropping
the ossfs configuration. OSSFS volumes created by users would be sent
to the API without their backend config, causing mount failures.

Map all OSSFS domain fields (bucket, endpoint, credentials, version,
options) to the corresponding API model.

Made-with: Cursor
@wishhyt wishhyt force-pushed the fix/critical-bugs branch from 5555915 to 8d40208 Compare March 18, 2026 04:36
@wishhyt
Copy link
Contributor Author

wishhyt commented Mar 18, 2026

It looks like there are some linter errors. Could you fix them it?

https://github.com/alibaba/OpenSandbox/actions/runs/23200575894/job/67517987478?pr=468

Fixed. The pyright errors were caused by passing str | None fields to a parameter expecting str. Added explicit None checks to narrow the types before constructing the API model.

Copy link
Collaborator

@Pangjiping Pangjiping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @ninan-nn Please take a look at this when you have some free time.

Copy link
Collaborator

@ninan-nn ninan-nn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM : >

@ninan-nn ninan-nn merged commit 02cc2fd into alibaba:main Mar 18, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants