fix: return all eligible FIFO partition jobs per sweep instead of one#18
fix: return all eligible FIFO partition jobs per sweep instead of one#18
Conversation
Previously GetDueJobsAsync returned at most one job per partition per sweep via a partition_heads CTE that joined on MIN(sequence_number). This capped throughput to one job per partition per poll cycle. The fix removes the partition_heads CTE and replaces it with a simple blocked_partitions filter: any partition with a Processing job or a Pending job with prior attempts is excluded entirely; all other partitions contribute all their eligible jobs up to the batch size cap. Ordering by (scheduled_at, sequence_number) preserves FIFO delivery. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c6e49d2c36
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| var candidates = unpartitioned | ||
| .Concat(partitionHeads) | ||
| var candidates = eligible |
There was a problem hiding this comment.
Keep one leaseable head per partition
Selecting directly from eligible now allows multiple jobs with the same partition key to be returned in one GetDueJobsAsync sweep. Because QueuePoller.RunAsync leases every returned job and QueuePump can hand them to concurrent workers, a later job in the partition can run before an earlier one has completed or retried (for example after the first job is rescheduled with Attempts > 0), which breaks the documented “processed one at a time” partition semantics and can reorder side effects.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
You're right — this was a semantic break. Returning multiple jobs from the same partition per sweep allows a later job to be leased and executed before an earlier one finishes retrying, which violates the 'processed one at a time' FIFO guarantee. Reverted in e1c471b; the PR now only contains the flaky timing test fix.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
500ms slack over a 1s grace period is too tight for CI. The key invariant is that StopAsync returns well before the 3s worker finishes; 2.5s proves that without being brittle. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Returning multiple jobs from the same partition per sweep breaks the documented invariant: a later job can be leased and executed before an earlier one finishes retrying, reordering side effects. All partition-logic changes are reverted. Only the flaky timing test fix (grace-period upper bound 1.5s → 2.5s) remains in this PR. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Current setup will not allow this since JobWorker can run with multiple threads |
Summary
GetDueJobsAsyncreturned at most one job per partition per sweep via apartition_headsCTE that joined onMIN(sequence_number), capping throughput to one job per partition per poll cycle regardless of batch size.partition_headsCTE from all three SQL dialects (PostgreSQL, SQL Server, MySQL) and the EF Core LINQ fallback, replacing it with a simpleblocked_partitionsfilter.batchSize, ordered by(scheduled_at, sequence_number).Changes
PostgreSqlDialect,SqlServerDialect,MySqlDialect— simplifiedGetDueJobsSQLEntityFrameworkCoreStorage— simplified LINQ fallback (removedpartitionHeadsgroupBy)InMemoryStorage— removedpartitionHeadsgroupBy, same eligibility logicIAtomizerStorage— updated XML docs to reflect new contractAtomizerStorageContractTests— updated FIFO-07 contract tests; added batch-size cap testInMemoryStorageTests— updated one test that asserted old one-per-partition behaviourTest plan
dotnet test tests/Atomizer.Tests— 125/125 passeddotnet test tests/Atomizer.EntityFrameworkCore.Tests— 136/136 passed (includes SQLite contract tests)csharpier check .— no formatting issues🤖 Generated with Claude Code