Fix race in closing transfer streams#154
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the transfer-stream framing/teardown protocol to avoid transport-level half-closes (which vsock-proxy can translate into bidirectional shutdown) by introducing an application-level EOF marker (zero-length frame), and adds tests/CI coverage around the framing layer.
Changes:
- Switch stream shutdown semantics from
Close()/CloseWrite()to a zero-length “EOF frame”, and makeClose()idempotent withsync.Once. - Add frame-size limits (10 MiB) and extend the shim bridge to understand zero-length EOF frames.
- Add unit + fuzz tests for framing, and add CI jobs to run unit tests (race) and fuzz targets.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| plugins/vminit/streaming/plugin.go | Adds EOF-frame handling, maxFrameSize, and makes Close() send EOF once (no transport close). |
| plugins/vminit/streaming/plugin_test.go | Adds unit and fuzz coverage for the framing protocol and EOF behavior. |
| plugins/shim/streaming/plugin.go | Updates shim bridge to send/recognize EOF frames instead of CloseWrite(). |
| integration/transfer_test.go | Updates integration test stream framing to match EOF-frame protocol + size guard. |
| .github/workflows/ci.yml | Adds unit-test (race) and fuzz-test CI jobs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
473b3bd to
1b5dbb8
Compare
Signed-off-by: Derek McGowan <derek@mcg.dev>
1b5dbb8 to
f16c587
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
plugins/vminit/streaming/plugin.go:233
- Send() writes the payload length as a uint32 but doesn't enforce maxFrameSize (or guard against uint32 overflow). This can result in sending frames that the peer will now reject (or, for extremely large messages, an incorrect wrapped length prefix). Recommend validating len(data) <= maxFrameSize (and <= math.MaxUint32) and returning an error before writing anything to the connection.
func (s *vsockStream) Send(a typeurl.Any) error {
data, err := proto.Marshal(typeurl.MarshalProto(a))
if err != nil {
return fmt.Errorf("failed to marshal stream message: %w", err)
}
if err := binary.Write(s.conn, binary.BigEndian, uint32(len(data))); err != nil {
return fmt.Errorf("failed to write frame length: %w", err)
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Close()/CloseWrite()) with application-level EOF (zero-length frame) for transfer streams. The vsock proxy sends a bidirectional SHUTDOWN ontransport close, which races with in-flight data and can cause the peer to lose the last chunk.
plugins/shim/streaming) to use the same zero-length EOF protocol — the original commit only fixed the direct-vsock path but the production TTRPC→VM pathstill used
CloseWrite().bridgeVMToTTRPCso the shim bridge exits cleanly on VM-side EOF instead of forwarding an empty message.maxFrameSize(10 MiB) guard onRecv()to prevent OOM from a buggy or malicious peer sending a huge length prefix.sync.OncetoClose()so the EOF marker is sent exactly once.connection error handling.
Fuzz*targets across all packages, 60s each).