Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
174 changes: 174 additions & 0 deletions include/raft.h.in
Original file line number Diff line number Diff line change
Expand Up @@ -1022,6 +1022,180 @@ RAFT_API void raft_seed(struct raft *r, unsigned random);
* The @update output parameter contains changes that user code should perform
* after the raft_step() call returns (e.g. new entries that must be persisted,
* new messages that must be sent, etc.).
*
* Memory Ownership Model for raft_event Fields
* =============================================
*
* The caller must understand the memory ownership semantics for each event type
* to avoid memory leaks or use-after-free bugs. This section documents who owns
* the memory for each field in the raft_event union after raft_step() returns.
*
* RAFT_START Event
* ----------------
* - event->start.term: Scalar value (no ownership concerns)
* - event->start.voted_for: Scalar value (no ownership concerns)
* - event->start.metadata: Ownership fully remains with the caller, who can
* free it (and any nested field) after the call to raft_step.
* - event->start.start_index: Scalar value (no ownership concerns)
* - event->start.entries: PARTIAL OWNERSHIP TRANSFERRED. The ownership of the
* event->start.entries pointer always remains with the caller. On success,
* the entries themselves are consumed by the RAFT instance and must not be
* freed. On error, ownership of the individual entries remain with the
* caller. If successfully consumed by RAFT, raft_free will be called on
* entries[x].batch once per unique batch when RAFT no longer needs these
* entries to be kept in memory. Hence the caller must NOT free the entries
* after a successful call to raft_step().
*
* The entries must be allocated with raft_malloc (or another raft_*
* allocation functions). The batch pointer of each entry must point to the
* base address of the allocated memory where the entry is located. As an
* example, with 7 entries placed into two batches, the batch pointers may
* look like the following:
*
* entries[0].batch = <mem-location-1>; // Allocation for the first 4 entries
* entries[1].batch = <mem-location-1>;
* entries[2].batch = <mem-location-1>;
* entries[3].batch = <mem-location-1>;
* entries[4].batch = <mem-location-2>; // Allocation for the last 3 entries
* entries[5].batch = <mem-location-2>;
* entries[6].batch = <mem-location-2>;
*
* Importantly, batch locations cannot alternate, e.g. the following would be
* invalid:
*
* entries[0].batch = <mem-location-1>;
* entries[1].batch = <mem-location-2>;
* entries[2].batch = <mem-location-1>;
*
* - event->start.n_entries: Scalar value (no ownership concerns)
*
* RAFT_RECEIVE Event
* ------------------
* - event->receive.message: The message structure itself is NOT owned by raft
* (caller retains ownership), EXCEPT for the following nested fields which ARE
* owned by raft after raft_step() is called, REGARDLESS of success or error:
*
* For RAFT_APPEND_ENTRIES messages:
* - message->append_entries.entries: OWNERSHIP PARTIALLY TRANSFERRED upon
* calling raft_step(). The caller retains ownership of the entries array.
* The batch memory referenced by entries[].batch is owned by raft after
* the call. The caller must NOT free this memory, even if raft_step()
* succeeds. The ownership of the entries' memory is however left to the
* caller in case of a failure of raft_step().
*
* IMPORTANT: Contrary to RAFT_START, RAFT_RECEIVE expects all the entries
* to belong to a SINGLE batch, so all the entries' batch pointer must be
* the same. This is because we expect RAFT_RECEIVE events to come from the
* network and hence to have been placed in a single allocation. The batch
* pointer does not need to point to the start of the first entry's data,
* it simply must point to the address to pass to raft_free when it RAFT
* will deallocate it.
*
* For RAFT_INSTALL_SNAPSHOT messages:
* - message->install_snapshot.data.base: OWNERSHIP TRANSFERRED, the snapshot
* data buffer is owned by raft after a successful call to raft_step. The
* caller must NOT free this memory, unless raft_step returns an error, in
* which case the caller retains ownership of the memory.
* - message->install_snapshot.conf: ownership is retained by the caller.
*
* All other message types: No nested ownership transfer; the message content
* is copied or consumed during processing.
*
* IMPORTANT: The ownership transfer happens immediately when raft_step() is
* called, not based on the result. This means the caller must NEVER attempt to
* free or clean up these transferred fields, regardless of whether raft_step()
* succeeds or fails.
*
* RAFT_PERSISTED_ENTRIES Event
* ----------------------------
* - event->persisted_entries.index: Scalar value (no ownership concerns)
*
* RAFT_PERSISTED_SNAPSHOT Event
* -----------------------------
* - event->persisted_snapshot.metadata: Passed by value; the structure is copied.
* The nested configuration is copied internally. Caller retains ownership of
* the event structure.
* - event->persisted_snapshot.offset: Scalar value (no ownership concerns)
* - event->persisted_snapshot.last: Scalar value (no ownership concerns)
*
* RAFT_CONFIGURATION Event
* ------------------------
* - event->configuration.index: Scalar value (no ownership concerns)
* - event->configuration.conf: OWNERSHIP TRANSFERRED to raft. The configuration
* will be consumed if needed or closed if not needed. The caller must NOT close
* this configuration after calling raft_step(), regardless of success or error.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would probably be best to make a copy, like of other similar cases. Opinions are welcome.

*
* RAFT_SNAPSHOT Event
* -------------------
* - event->snapshot.metadata: Passed by value; the structure is copied. The
* nested configuration is NOT deeply copied - the caller must ensure the
* configuration remains valid during the raft_step() call, but may free it
* afterward. Caller retains ownership.
* - event->snapshot.trailing: Scalar value (no ownership concerns)
*
* RAFT_TIMEOUT Event
* ------------------
* No event-specific fields (no ownership concerns)
*
* RAFT_SUBMIT Event
* -----------------
* - event->submit.entries: OWNERSHIP TRANSFERRED to raft on success. On error,
* the caller must free the entries. The entries must have their batch field
* properly set (non-NULL) as raft expects batched entries. Each entry's buf.base
* must point into the batch memory. On error, the caller is responsible for
* freeing both the entries array and the referenced batches.
* - event->submit.n: Scalar value (no ownership concerns)
*
* RAFT_CATCH_UP Event
* -------------------
* - event->catch_up.server_id: Scalar value (no ownership concerns)
*
* RAFT_TRANSFER Event
* -------------------
* - event->transfer.server_id: Scalar value (no ownership concerns)
*
* Error Handling and Memory Cleanup
* ==================================
*
* When raft_step() returns an error:
* - For RAFT_START events: The caller must free the metadata structure itself
* (but NOT the configuration field, which has been transferred). The entries
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned, this seems unnecessarily complicated, and I would change the current implementation to make a copy of the configuration field.

* are automatically freed by raft_step() on error, so the caller must NOT
* free them.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd change this too and leave all ownership to the caller.

* - For RAFT_SUBMIT events: The caller must clean up entries using
* entryBatchesDestroy() or equivalent.
* - For RAFT_RECEIVE events: Ownership of transferred nested fields (entries,
* snapshot data, configurations) has ALREADY been transferred even on error,
* so the caller must NOT free them.
* - For RAFT_CONFIGURATION events: Ownership has been transferred; the caller
* must NOT close the configuration.
* - For all other events: No cleanup required by caller.
*
* The caller should check the return value and handle errors appropriately:
*
* int rv = raft_step(r, &event, &update);
* if (rv != 0) {
* if (event.type == RAFT_START) {
* // Clean up metadata structure (but not the configuration!)
* if (event.start.metadata != NULL) {
* raft_free(event.start.metadata);
* }
* // DO NOT free entries - raft_step already freed them on error
* } else if (event.type == RAFT_SUBMIT) {
* // Clean up entries on submit error
* entryBatchesDestroy(event.submit.entries, event.submit.n);
* }
* // Handle other error cases...
* } else {
* // On success
* if (event.type == RAFT_START) {
* // Still need to free the metadata structure itself
* if (event.start.metadata != NULL) {
* raft_free(event.start.metadata);
* }
* // DO NOT free entries or configuration
* }
* }
*/
RAFT_API int raft_step(struct raft *r,
struct raft_event *event,
Expand Down
Loading