You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
refactor(storage): replace batch (queue,state) index with active_batch membership
Replace the secondary index over the batch table's mutable `state` column with
an `active_batch` membership table that answers the only queue-scoped query the
pipeline needs: "which batches in this queue are still active?" (the batch
controller uses it to find conflict dependencies; the cancel controller uses it
to find the batch holding a request). A row is intended to exist while its batch
is non-terminal, so the table stays bounded by the live speculation window
rather than growing with batch history. `queue` leads the PK so listing is a
PK-prefix scan and the table is shardable by queue — an access pattern that
ports cleanly to a key-value store (queue = partition key, batch_id = sort key),
unlike a server-maintained secondary index over a mutable non-key column.
Membership is best-effort, not an exact mirror of batch state, and is
maintained without transactions:
- Create writes the membership row before the batch row. This ordering is
required for correctness: whenever a batch row is visible to a reader its
membership row is already present, so a concurrent ListActive can never miss
an active batch. INSERT IGNORE keeps the membership write idempotent across
retries.
- If the batch insert then fails, Create deliberately leaves the membership row
in place. A returned error does not prove the row was not written (an
ambiguous failure can commit the batch row and still return an error), so
deleting would risk permanently orphaning a live, non-terminal batch from
ListActive. A dangling membership is the safe direction.
- ListActive resolves each member by primary key: a terminal batch's membership
is best-effort removed (race-free — a terminal batch is fully committed and
its id is never reused); a missing batch is skipped but NOT removed (it may
belong to an in-flight Create that has written its membership but not yet its
batch row). Cleanup failures are swallowed so a read never fails on index
maintenance, and terminal-state writers (merge, speculate, dlq) need not touch
the index.
Genuinely dangling rows (failed/crashed creates) and batches stuck in a
non-terminal state are left for a future reconcile/prune job, documented in
schema/README.md.
Integration tests cover the self-heal and membership invariants:
- TestActiveBatch_SelfHealsTerminalMembership
- TestActiveBatch_SkipsDanglingMembershipWithoutDeleting
- TestActiveBatch_CreateKeepsMembershipOnDuplicate
- TestActiveBatch_CreateKeepsMembershipOnFailedInsert
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
// Read all membership rows and release the connection before resolving each
203
+
// batch, since Get issues its own query.
204
+
ids, err:=s.activeBatchIDs(ctx, queue)
205
+
iferr!=nil {
206
+
returnnil, err
184
207
}
185
208
186
-
query:="SELECT id, queue, contains, dependencies, score, state, version FROM batch WHERE queue = ? AND state IN (?"+strings.Repeat(", ?", len(states)-1) +")"
187
-
188
-
args:=make([]any, 1+len(states))
189
-
args[0] =queue
190
-
fori, state:=rangestates {
191
-
args[i+1] =state
209
+
varresults []entity.Batch
210
+
for_, id:=rangeids {
211
+
batch, err:=s.Get(ctx, id)
212
+
iferr!=nil {
213
+
ifstorage.IsNotFound(err) {
214
+
// Missing batch: either an in-flight Create or a dangling row. We
215
+
// can't tell them apart, so skip without deleting.
216
+
continue
217
+
}
218
+
returnnil, fmt.Errorf("failed to get active batch id=%q queue=%q: %w", id, queue, err)
219
+
}
220
+
ifbatch.State.IsTerminal() {
221
+
// Stale membership: the batch has finished. Race-free to remove since
222
+
// its id is never reused.
223
+
s.removeActive(ctx, queue, id)
224
+
continue
225
+
}
226
+
results=append(results, batch)
192
227
}
193
228
194
-
rows, err:=s.db.QueryContext(ctx, query, args...)
229
+
returnresults, nil
230
+
}
231
+
232
+
// activeBatchIDs reads the batch IDs recorded as active for the queue, owning the
233
+
// result set's lifecycle so the caller can resolve each batch after it's closed.
The `batch` table is reachable only by its primary key (`id`). It carries no secondary index — every access pattern is expressed as a primary-key get or as a key-prefix scan over a companion membership table (see `active_batch` below). This keeps the access patterns portable to a key-value / document store, where a server-maintained secondary index over a mutable, non-key column (such as `state`) is not a primitive every backend offers cheaply.
6
6
7
-
The `batch` table has a composite secondary index on `(queue, state)`. This index supports the `GetByQueueAndStates` query, which retrieves batches filtered by queue and one or more states. Without this index, the query would require a full table scan.
7
+
## active_batch table
8
8
9
-
#### Trade-offs
9
+
`active_batch` is the membership index that answers "which batches in this queue are still active?" — the only queue-scoped query the pipeline needs (the batch controller uses it to find conflict dependencies; the cancel controller uses it to find the batch holding a request). A row is intended to exist per non-terminal batch, so the table stays bounded by the live speculation window rather than full batch history. The correspondence is best-effort, not exact: readers treat membership as a hint and resolve each batch by primary key — see *Maintenance and self-healing* below.
10
10
11
-
-**Write overhead**: Every `INSERT` and `UPDATE` to the `batch` table must also update the secondary index, adding latency to write operations.
12
-
-**Storage cost**: The index consumes additional disk space proportional to the number of rows in the table.
13
-
-**Lock contention**: Under high write concurrency, index maintenance can increase lock contention on the affected index pages.
11
+
`queue` leads the composite primary key `(queue, batch_id)`, so listing a queue's active batches is a primary-key-prefix scan and the table is shardable by queue. On a key-value store the same shape maps directly onto a partition key (`queue`) and sort key (`batch_id`) with no secondary index.
14
12
15
-
#### Future: Prune job
13
+
###Maintenance and self-healing
16
14
17
-
As the `batch` table grows, the secondary index will grow with it, increasing storage costs and degrading write performance. To mitigate this, a prune job should be introduced to periodically delete batches in terminal states (`succeeded`, `failed`, `cancelled`) that are older than a configurable retention period. This keeps the table and its indexes bounded in size, ensuring consistent query and write performance over time.
15
+
`BatchStore.Create` writes the membership row before the batch row, so a batch row is never visible to `ListActive` without its membership. If the batch insert then fails, `Create` leaves the membership row in place: a returned error doesn't prove the row wasn't written (an ambiguous failure can commit it and still error), so deleting could permanently hide a live batch. A dangling row is the safe direction.
16
+
17
+
On read, `ListActive` resolves each member by primary key. A **terminal** batch's membership is best-effort removed (race-free — its id is never reused). A **missing** batch is skipped but not removed, since it may belong to an in-flight `Create` that hasn't written its batch row yet. Cleanup failures are swallowed, so reads never fail on index maintenance and terminal-state writers (merge, speculate, dlq) never touch the index. Because the two writes are independent (no transaction), the design tolerates partial failure via idempotent retries and read-time reconciliation.
18
+
19
+
### Future: prune / reconcile job
20
+
21
+
Read-time reconciliation only removes terminal memberships, so two kinds of stale row need a periodic sweep: dangling memberships whose batch never landed (a failed or crashed create), and memberships of batches that are stuck in a non-terminal state (e.g. an orphan stuck in `created` after a mid-process failure). A reconcile job should sweep both, keeping the table bounded independently of read traffic.
0 commit comments