Add queue + GC Indexes by MOZGIII · Pull Request #310 · piercefreeman/waymark

MOZGIII · 2026-04-02T06:08:30Z

This PR adds targeted Postgres indexes to reduce long-term queueing and cleanup slowdown as table sizes grow, without changing runtime behavior.

Runtime Calls Affected

Queue dequeue/claim path:
- Query label: select:queued_instances
- Method: poll_queued_instances_once
Expired lock reclaim path:
- Query label: update:queued_instances_expired_unlock
- Method: reclaim_expired_instance_locks
Done-instance garbage collection candidate scan:
- Query label: select:runner_instances_gc_candidates
- Method: collect_done_instances_impl

Expected Performance Impact

The thinking is that GC paths doing seqscans could impair the performance of the hot loops by locking the database; this idea is consistent with our observations in production as database calls start taking longer as we go, and the certain threshold makes the system pivot into a catastrophically bad performance (most likely ratio between the long seqscan durations and the GC polling timouts).

Operating without access to the production database, at this time I wasn't able to verify that this is the root cause, or even a significant slowdown; the metrics for this are also not yet present.

One important check we could do before merging this is disabling the GC: if there are no GC runs, there are no GC seqscans, and thus should be no slowdowns.

github-actions · 2026-04-02T06:16:20Z

Coverage Report

Python Coverage

Metric	Coverage
Lines	71.2%
Branches	57.3%

Download HTML Report

Rust Coverage

Metric	Coverage
Lines	65.5%
Branches	N/A

Download HTML Report

_{Compared to main branch}

piercefreeman

I don't see any obvious downside to this, but holding off on approval of merge to main until we do a bit of independent benchmarking. At the limit indexes can slow down writes more than they speed up reads so we have some more quantitative benchmarking against our test cluster.

MOZGIII · 2026-04-04T07:07:28Z

We actually could consider making some of indexes async, especially for things like gc...

Anyhow, I'll bench this first in the soak test.

piercefreeman

Based on observed prod behavior on a dev branch, does seem to speed things up.

MOZGIII · 2026-04-09T15:03:05Z

@piercefreeman, to clarify.

I don't think we were running with this one - the improvement is from the #320. But that one needs dissection. In that other one the changes to the rust side could be irrelevant, but on the DB side - it tweaks the indexes and autovacuum, while this one includes the same index addition but also two more - and no autovacuum changes.

My understanding, though, is that the key improvement was the autovacuum tweaks though. Not saying these indexes in this PR are not useful - they still seem to be useful; but they're not helping with the toast issue at prod.

piercefreeman · 2026-04-09T15:09:44Z

Must have mixed up my non-mainline branches. In that case, let's prefer 320... I want to motivate any indexes that we create and we at least have some proof the other one didn't degrade performance. Ideally we'd have a cleaner ablation test where we add things one by one.

MOZGIII · 2026-04-09T15:48:37Z

Superseded with #320

MOZGIII requested a review from piercefreeman April 2, 2026 06:08

MOZGIII changed the title ~~Add Queue + GC Indexes~~ Add queue + GC Indexes Apr 2, 2026

piercefreeman reviewed Apr 3, 2026

View reviewed changes

Add Queue + GC Indexes

4b99528

MOZGIII force-pushed the mzg/2026-04-02/db-idx-1 branch from 8c7852d to 4b99528 Compare April 4, 2026 07:01

MOZGIII mentioned this pull request Apr 5, 2026

Optimize update:runner_instances_state performance with indexing and autovacuum tuning [do not merge, yet?] #320

Merged

piercefreeman approved these changes Apr 9, 2026

View reviewed changes

MOZGIII closed this Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add queue + GC Indexes#310

Add queue + GC Indexes#310
MOZGIII wants to merge 1 commit intomainfrom
mzg/2026-04-02/db-idx-1

MOZGIII commented Apr 2, 2026

Uh oh!

github-actions bot commented Apr 2, 2026 •

edited

Loading

Python Coverage

Rust Coverage

Uh oh!

piercefreeman left a comment

Uh oh!

MOZGIII commented Apr 4, 2026

Uh oh!

piercefreeman left a comment

Uh oh!

MOZGIII commented Apr 9, 2026

Uh oh!

piercefreeman commented Apr 9, 2026

Uh oh!

MOZGIII commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MOZGIII commented Apr 2, 2026

Runtime Calls Affected

Expected Performance Impact

Uh oh!

github-actions bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report

Python Coverage

Rust Coverage

Uh oh!

piercefreeman left a comment

Choose a reason for hiding this comment

Uh oh!

MOZGIII commented Apr 4, 2026

Uh oh!

piercefreeman left a comment

Choose a reason for hiding this comment

Uh oh!

MOZGIII commented Apr 9, 2026

Uh oh!

piercefreeman commented Apr 9, 2026

Uh oh!

MOZGIII commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Apr 2, 2026 •

edited

Loading