Skip to content

feat: add ActivePartitionBatchRing.GetKeysByPartition()#943

Merged
pracucci merged 3 commits intomainfrom
add-GetKeysByPartition
Apr 1, 2026
Merged

feat: add ActivePartitionBatchRing.GetKeysByPartition()#943
pracucci merged 3 commits intomainfrom
add-GetKeysByPartition

Conversation

@pracucci
Copy link
Copy Markdown
Contributor

What this PR does:

In this PR I'm adding ActivePartitionBatchRing.GetKeysByPartition(). This will be used to optimise the Mimir distributors when writing to Kafka partitions.

ActivePartitionBatchRing.GetKeysByPartition() is a simplified alternative to `DoBatchWithOptions() for partition rings where replication factor and quorum is always 1.

ActivePartitionBatchRing.GetKeysByPartition() Uses a two-pass approach with a single backing array for all index slices, achieving only 5 allocations regardless of partition count:

BenchmarkActivePartitionBatchRing_GetKeysByPartition/partitions=10    63µs    12.6KiB   5 allocs/op
BenchmarkActivePartitionBatchRing_GetKeysByPartition/partitions=100   88µs    18.9KiB   5 allocs/op

Which issue(s) this PR fixes:

N/A

Checklist

  • Tests updated

Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci changed the title Add ActivePartitionBatchRing.GetKeysByPartition() feat: add ActivePartitionBatchRing.GetKeysByPartition() Mar 27, 2026
Copy link
Copy Markdown
Contributor

@jhalterman jhalterman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci enabled auto-merge (squash) April 1, 2026 08:13
@pracucci pracucci merged commit 1cf1f4c into main Apr 1, 2026
16 of 17 checks passed
@pracucci pracucci deleted the add-GetKeysByPartition branch April 1, 2026 08:43
pracucci added a commit to grafana/mimir that referenced this pull request Apr 2, 2026
…l to produce Kafka records (#14898)

#### What this PR does

In this PR I'm introducing an optimization to reduce distributor CPU
utilization when writing to ingest storage with a large number of
partitions (order of thousands) by batching all partitions into a single
`ProduceSync()` call instead of one per partition.

Previously, `DoBatchWithOptions` spawned a goroutine per partition, each
making its own `ProduceSync()` call. This was the dominant CPU cost
(~35% cumulative in profiling with 3.5K partitions). Now,
`sendWriteRequestToPartitions()` uses
`ActivePartitionBatchRing.GetKeysByPartition()` to group keys by
partition, builds a single `[]PartitionWriteRequest`, and calls
`MultiWriteSync()` which sends all records in one `ProduceSync()` call.

To do this change, I've updated the vendored dskit to get
`ActivePartitionBatchRing.GetKeysByPartition()` ([dskit
PR](grafana/dskit#943)).

There are a couple of small behavioural changes:

- `cortex_ingest_storage_writer_latency_seconds` now tracks the latency
to write to all partitions in a single call, instead of per-partition
latency.
- Success metrics (`sent_bytes_total`, `input_bytes_total`, latency) are
only tracked when all records succeed, to keep them consistent on
partial failures.

When testing this optimization in a Mimir test cluster with 3.5K Kafka
partitions, the distributors CPU was reduced by -33%:

<img width="720" height="471" alt="screenshot_2026-03-27_at_16 16
59_720"
src="https://github.com/user-attachments/assets/041afbdb-08a0-4355-b6c3-40f93535aecc"
/>


#### Which issue(s) this PR fixes or relates to

N/A

#### Checklist

- [x] Tests updated.
- [ ] Documentation added.
- [x] `CHANGELOG.md` updated - the order of entries should be
`[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. If changelog entry
is not needed, please add the `changelog-not-needed` label to the PR.
- [ ]
[`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md)
updated with experimental features.

---------

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants