Skip to content

add unit test coverage for NetworkOutputBuffer #4957

@aglinxinyuan

Description

@aglinxinyuan

amber/src/main/scala/org/apache/texera/amber/engine/architecture/sendsemantics/partitioners/Partitioner.scala defines the NetworkOutputBuffer class — the per-receiver batched-tuple sender that every concrete Partitioner subclass uses to push data downstream. It has stateful buffer + flush semantics that nothing currently pins:

Path Behavior to pin
addTuple below batchSize Tuple appended to the internal buffer; nothing sent yet.
addTuple at the exact batchSize boundary Auto-flush — a single DataFrame carrying every buffered tuple is sent to the receiver.
Multiple auto-flush cycles Each batch produces exactly one DataFrame; tuples don't leak across batches; sequence numbers / receiver stay consistent.
flush() on a non-empty buffer One DataFrame sent; buffer reset to empty.
flush() on an empty buffer No-op (no DataFrame sent) — pin so a regression that sends an empty DataFrame per call breaks here.
sendState Flush-bookended: any pending tuples drain first (in their own DataFrame), then a StateFrame is sent, then a trailing flush() runs (no-op since the post-state buffer is already empty). The bookend flushes ensure state ordering relative to data is preserved.
Constructor params batchSize defaults to ApplicationConfig.defaultDataTransferBatchSize; can be overridden. to and dataOutputPort exposed as val.

The test wires a real NetworkOutputGateway with a recording handler so the assertions exercise the production codepath end-to-end (sequence number assignment, channel-id construction, payload type) rather than mocking the gateway.

Priority

P3 - Low

Task Type

  • Testing / QA

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No fields configured for Task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions