Skip to content

Fix race in closing transfer streams#154

Merged
dmcgowan merged 1 commit intocontainerd:mainfrom
dmcgowan:fix-transfer-close-race
Apr 14, 2026
Merged

Fix race in closing transfer streams#154
dmcgowan merged 1 commit intocontainerd:mainfrom
dmcgowan:fix-transfer-close-race

Conversation

@dmcgowan
Copy link
Copy Markdown
Member

  • Replace transport-level close (Close()/CloseWrite()) with application-level EOF (zero-length frame) for transfer streams. The vsock proxy sends a bidirectional SHUTDOWN on
    transport close, which races with in-flight data and can cause the peer to lose the last chunk.
  • Fix the shim bridge (plugins/shim/streaming) to use the same zero-length EOF protocol — the original commit only fixed the direct-vsock path but the production TTRPC→VM path
    still used CloseWrite().
  • Handle zero-length frames in bridgeVMToTTRPC so the shim bridge exits cleanly on VM-side EOF instead of forwarding an empty message.
  • Add maxFrameSize (10 MiB) guard on Recv() to prevent OOM from a buggy or malicious peer sending a huge length prefix.
  • Add sync.Once to Close() so the EOF marker is sent exactly once.
  • Add unit tests (10 cases) and fuzz tests (3 targets) for the streaming framing protocol covering roundtrip, EOF, bidirectional, large messages, oversized frame rejection, and
    connection error handling.
  • Add CI jobs for unit tests (with race detector) and fuzz tests (auto-discovers all Fuzz* targets across all packages, 60s each).

Copilot AI review requested due to automatic review settings April 11, 2026 06:19
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the transfer-stream framing/teardown protocol to avoid transport-level half-closes (which vsock-proxy can translate into bidirectional shutdown) by introducing an application-level EOF marker (zero-length frame), and adds tests/CI coverage around the framing layer.

Changes:

  • Switch stream shutdown semantics from Close()/CloseWrite() to a zero-length “EOF frame”, and make Close() idempotent with sync.Once.
  • Add frame-size limits (10 MiB) and extend the shim bridge to understand zero-length EOF frames.
  • Add unit + fuzz tests for framing, and add CI jobs to run unit tests (race) and fuzz targets.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
plugins/vminit/streaming/plugin.go Adds EOF-frame handling, maxFrameSize, and makes Close() send EOF once (no transport close).
plugins/vminit/streaming/plugin_test.go Adds unit and fuzz coverage for the framing protocol and EOF behavior.
plugins/shim/streaming/plugin.go Updates shim bridge to send/recognize EOF frames instead of CloseWrite().
integration/transfer_test.go Updates integration test stream framing to match EOF-frame protocol + size guard.
.github/workflows/ci.yml Adds unit-test (race) and fuzz-test CI jobs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread plugins/vminit/streaming/plugin.go
Comment thread plugins/shim/streaming/plugin.go Outdated
Comment thread plugins/shim/streaming/plugin.go
Comment thread integration/transfer_test.go
@dmcgowan dmcgowan force-pushed the fix-transfer-close-race branch from 473b3bd to 1b5dbb8 Compare April 13, 2026 06:00
Signed-off-by: Derek McGowan <derek@mcg.dev>
Copilot AI review requested due to automatic review settings April 14, 2026 18:11
@dmcgowan dmcgowan force-pushed the fix-transfer-close-race branch from 1b5dbb8 to f16c587 Compare April 14, 2026 18:11
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

plugins/vminit/streaming/plugin.go:233

  • Send() writes the payload length as a uint32 but doesn't enforce maxFrameSize (or guard against uint32 overflow). This can result in sending frames that the peer will now reject (or, for extremely large messages, an incorrect wrapped length prefix). Recommend validating len(data) <= maxFrameSize (and <= math.MaxUint32) and returning an error before writing anything to the connection.
func (s *vsockStream) Send(a typeurl.Any) error {
	data, err := proto.Marshal(typeurl.MarshalProto(a))
	if err != nil {
		return fmt.Errorf("failed to marshal stream message: %w", err)
	}
	if err := binary.Write(s.conn, binary.BigEndian, uint32(len(data))); err != nil {
		return fmt.Errorf("failed to write frame length: %w", err)
	}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread plugins/vminit/streaming/plugin.go
Comment thread plugins/shim/streaming/plugin.go
Comment thread plugins/vminit/streaming/plugin_test.go
Comment thread integration/transfer_test.go
@dmcgowan dmcgowan merged commit 576f82a into containerd:main Apr 14, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants