Add hook `HookQueueStateCount` + read middleware as hooks and vice versa by brandur · Pull Request #1203 · riverqueue/river

brandur · 2026-04-12T05:31:30Z

Here, add a new hook called HookQueueStateCount which gets invoked to
produce job queue count statistics. We do this by adding a new
maintenance service which like other maintenance services, runs only on
the leader, so we only have one client performing counts at any given
time. Furthermore, in order to not introduce a potential operational
problem without opt-in from River users, the counts only run if a
HookQueueStateCount hook/middleware has been added to the client.

The reason we do all this to to implement a feature requested by one of
users: for otelriver in contrib to be able to emit queue count
metrics, which seems like a pretty reasonable ask for the package to be
able to do, and something that every River user would likely want access
to in their ops charts.

A slight oddity, but which I think is probably okay, is that the new
hook ideally stays a hook, but the existing otelriver middleware is a
middleware. It'd be nice not to have to put otelriver.Middleware into
both a client's Hooks and Middleware configuration, so we modify
client to allow for hooks that middleware and middleware which are
hooks. This lets otelriver.Middleware continue doing what it was
already doing, but also to start producing new counts as a hook.

See [1] for context, but here we implement `HookQueueStateCount` in the existing OTEL middleware so that it can emit metrics around queue counts, a pretty handy feature that we have on definitive request for, but which I'd guess most River users would like to have access to. [1] riverqueue/river#1203

brandur · 2026-04-12T06:03:35Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f2024369e5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

client.go

riverdriver/riverpgxv5/internal/dbsqlc/river_job.sql

See [1] for context, but here we implement `HookQueueStateCount` in the existing OTEL middleware so that it can emit metrics around queue counts, a pretty handy feature that we have on definitive request for, but which I'd guess most River users would like to have access to. [1] riverqueue/river#1203

Here, add a new hook called `HookQueueStateCount` which gets invoked to produce job queue count statistics. We do this by adding a new maintenance service which like other maintenance services, runs only on the leader, so we only have one client performing counts at any given time. Furthermore, in order to not introduce a potential operational problem without opt-in from River users, the counts only run if a `HookQueueStateCount` hook/middleware has been added to the client. The reason we do all this to to implement a feature requested by one of users: for `otelriver` in contrib to be able to emit queue count metrics, which seems like a pretty reasonable ask for the package to be able to do, and something that every River user would likely want access to in their ops charts. A slight oddity, but which I think is _probably_ okay, is that the new hook ideally stays a hook, but the existing `otelriver` middleware is a middleware. It'd be nice not to have to put `otelriver.Middleware` into both a client's `Hooks` and `Middleware` configuration, so we modify client to allow for hooks that middleware and middleware which are hooks. This lets `otelriver.Middleware` continue doing what it was already doing, but also to start producing new counts as a hook.

bgentry

It should probably be straightforward to update riverui to work with these changes, but we should make sure before merging it since it breaks an API it uses.

bgentry · 2026-04-13T18:35:44Z

riverdriver/river_driver_interface.go


 	JobCancel(ctx context.Context, params *JobCancelParams) (*rivertype.JobRow, error)
 	JobCountByAllStates(ctx context.Context, params *JobCountByAllStatesParams) (map[rivertype.JobState]int, error)
-	JobCountByQueueAndState(ctx context.Context, params *JobCountByQueueAndStateParams) ([]*JobCountByQueueAndStateResult, error)


This is a breaking change for the UI. It has a few callers in https://github.com/riverqueue/riverui/blob/fa202780c19504cb8510d0b2c8d92f8ff31d8dc9/handler_api_endpoint.go

I was going to hold off on asking about this, but in light of this break, I'm wondering if we should give any thought to whether or not the UI runs its own queries for this purpose or somehow reuses logic in River so that we don't have multiple redundant queriers. But I think it probably is going to be too hard to try and share that state among backend nodes and UI frontend nodes.

bgentry · 2026-04-13T18:46:19Z

riverdriver/riverpgxv5/internal/dbsqlc/river_job.sql

+    queue,
+    state,
+    COUNT(*) AS count
+FROM /* TEMPLATE: schema */river_job
+WHERE queue = ANY(@queue_names::text[])
+GROUP BY queue, state
+ORDER BY queue, state;


Also I believe this might break the prior query's efficient usage of the prioritized fetch index; I'm running an agent w/ benchmarks now to get some more details on whether this is a significant perf regression.

Yeah so I opened #1211 with a perf benchmark showing this is a huge regression. Essentially the old split format was able to effectively leverage the index whereas this variant does not:

GPT 5.4 high explanation:

The new query is inefficient for the current schema because it filters on queue and groups on (queue, state), but the important existing btree on river_job is keyed (state, queue, priority, scheduled_at, id). PostgreSQL btree indexes are most useful when the query constrains the leftmost columns first, and this query no longer does that. Because state is not fixed in the predicate, the planner cannot use the index as a selective access path in the same way, so it tends to fall back to a parallel sequential scan and aggregate after reading a large slice of the table.

The old query shape was more appropriate for the existing index because it split the work into separate branches with constant state predicates, specifically state = 'available' and state = 'running'. That matches the left edge of the (state, queue, ...) index, and once Postgres has a fixed state, it can efficiently narrow by queue within the same index. In practice that gave us index-only or bitmap/index-driven plans instead of scanning the table broadly, which is why the old query performed much better on selective queue lists.

If we want to fix this without adding indexes, the main option is to reshape the new query so it restores a constant-state predicate per branch. Concretely, that means counting each state in its own grouped subquery, then UNION ALLing those results together or joining them back to the requested queue list. That is more verbose than the current grouped query, but it realigns the SQL with the index we already have and gives the planner a much better chance of using it efficiently.

The benchmark is probably useful regardless of how we proceed on this PR but I'll let you decide how to proceed on it ✌️

This adds a benchmark on top of #1203 to make the queue-state count query regression easy to reproduce and discuss. The query in that branch now groups by `(queue, state)` while filtering only on `queue`, which no longer lines up with the existing `(state, queue, priority, scheduled_at, id)` index on `river_job`. The new benchmark seeds a migrated `river_job` table and compares the current `JobCountByQueueAndState` implementation against the legacy query shape for a few queue-list sizes. That gives us a durable way to show the planner behavior and quantify the difference before deciding whether to reshape the SQL or add another index.

This adds a benchmark on top of #1203 to make the queue-state count query regression easy to reproduce and discuss. It compares the current `JobCountByQueueAndState` implementation against the legacy query shape on the same migrated `river_job` schema. The benchmark stays lightweight by default, but can be scaled locally with `RIVER_BENCH_QUEUE_STATE_COUNT_NUM_JOBS` to reproduce the planner regression with a couple hundred thousand rows and quantify the gap.

brandur force-pushed the brandur-queue-state-count branch 3 times, most recently from af432f2 to f202436 Compare April 12, 2026 05:56

brandur mentioned this pull request Apr 12, 2026

Generate OTEL queue count metrics riverqueue/rivercontrib#47

Open

brandur force-pushed the brandur-queue-state-count branch from f202436 to 109e1fe Compare April 12, 2026 06:09

chatgpt-codex-connector bot reviewed Apr 12, 2026

View reviewed changes

client.go Outdated Show resolved Hide resolved

riverdriver/riverpgxv5/internal/dbsqlc/river_job.sql Outdated Show resolved Hide resolved

brandur force-pushed the brandur-queue-state-count branch 3 times, most recently from 7266cb8 to b7694fa Compare April 12, 2026 06:44

brandur requested a review from bgentry April 12, 2026 06:53

brandur force-pushed the brandur-queue-state-count branch from b7694fa to 3d3c21a Compare April 12, 2026 17:31

bgentry reviewed Apr 13, 2026

View reviewed changes

bgentry mentioned this pull request Apr 13, 2026

Add queue-state count regression benchmark #1211

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add hook `HookQueueStateCount` + read middleware as hooks and vice versa#1203

Add hook `HookQueueStateCount` + read middleware as hooks and vice versa#1203
brandur wants to merge 1 commit intomasterfrom
brandur-queue-state-count

brandur commented Apr 12, 2026

Uh oh!

brandur commented Apr 12, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

bgentry left a comment

Uh oh!

bgentry Apr 13, 2026

Uh oh!

bgentry Apr 13, 2026

Uh oh!

bgentry Apr 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

brandur commented Apr 12, 2026

Uh oh!

brandur commented Apr 12, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

bgentry left a comment

Choose a reason for hiding this comment

Uh oh!

bgentry Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

bgentry Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

bgentry Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bgentry Apr 13, 2026 •

edited

Loading