feat(gateway): add reconciler lease for HA multi-replica deployments#1577
Conversation
|
This is WIP |
|
Need to figure out what we want to do in CI for HA setups. |
|
/ok to test 26b9e70 |
Introduce a database-backed lease that ensures only one gateway replica runs the watch and reconcile loops. Includes lease primitives with CAS safety, cooperative cancellation via watch channels, SQLite bypass for single-replica deployments, and integration tests covering failover, contention, and CAS chain integrity. Signed-off-by: Derek Carr <decarr@redhat.com>
26b9e70 to
ac1e51f
Compare
|
/ok to test ac1e51f |
|
Label |
|
/ok to test ac1e51f |
|
Label |
|
My agent had this for feedback: |
|
I did test this locally with postgres and validated this worked as expected, so I am okay to merge this as-is to get the ball rolling forward |
|
Output from my test: • Validated PR #1577 locally on branch pr-1577. What passed
Important caveat |
|
thanks @TaylorMutch , merging so we can move forward next with your PR. |
Summary
Introduce a database-backed reconciler lease so that only one gateway replica runs the watch and reconcile loops in Postgres-backed HA deployments. SQLite (single-replica) deployments skip the lease and run unconditionally as before.
The lease is a lightweight JSON record in the objects table using CAS for cross-replica safety. A lease coordinator on each replica attempts acquisition, runs renewal while holding, and releases on shutdown for fast failover. Watch and reconcile loops now accept a cancellation channel for cooperative shutdown.
Related Issue
Closes #1429
Changes
Testing
mise run pre-commitpassesChecklist