Skip to content

fix(spurctld): use correct namespace when creating leader election Lease#81

Merged
shiv-tyagi merged 2 commits intoROCm:mainfrom
shiv-tyagi:fix/leader-election-namespace
Apr 14, 2026
Merged

fix(spurctld): use correct namespace when creating leader election Lease#81
shiv-tyagi merged 2 commits intoROCm:mainfrom
shiv-tyagi:fix/leader-election-namespace

Conversation

@shiv-tyagi
Copy link
Copy Markdown
Member

Summary

  • try_acquire() set the Lease ObjectMeta.namespace to leases.resource_url() which returns the full K8s API path (e.g. /apis/coordination.k8s.io/v1/namespaces/spur/leases) instead of just "spur"
  • This caused a 400 BadRequest ("the namespace of the provided object does not match the namespace sent on the request") on every Lease creation attempt, making leader election permanently broken on fresh deployments
  • The bug was masked when a Lease already existed from a prior run (the PATCH/renew path doesn't include namespace in the body)

Fix

Pass the namespace string through to try_acquire() and use it directly in the Lease ObjectMeta instead of calling resource_url().

Before (main branch)

spurctld stuck in infinite retry loop — never acquires leadership:
Screenshot 2026-04-14 122129

After (this PR)

Lease created and acquired on first attempt:
Screenshot 2026-04-14 123150

Testing

  • All 776 unit tests pass (cargo test)
  • Bug reproduced on live 3-node K8s cluster (v1.33)
  • Fix verified on same cluster — Lease created, leader acquired, nodes registered
  • Added K8s integration test (TEST 7 in k8s_test.sh) that verifies:
    • Controller starts with --enable-leader-election
    • spurctld-leader Lease is created in the correct namespace
    • No leader election errors in logs

The Lease object's namespace was set to `leases.resource_url()` which
returns the full API path (e.g. "coordination.k8s.io/v1/namespaces/spur/leases")
instead of the actual namespace string. This caused a 400 BadRequest on
every Lease creation attempt, making leader election non-functional.

Pass the namespace through to try_acquire() and use it directly.

Made-with: Cursor
Verifies that spurctld with --enable-leader-election creates the
K8s Lease, acquires leadership without errors, and the Lease lands
in the correct namespace.

Made-with: Cursor
@shiv-tyagi shiv-tyagi force-pushed the fix/leader-election-namespace branch from bf22f30 to b1df087 Compare April 14, 2026 07:28
@shiv-tyagi shiv-tyagi marked this pull request as draft April 14, 2026 07:33
@shiv-tyagi shiv-tyagi force-pushed the fix/leader-election-namespace branch 3 times, most recently from cfb758a to 87f2b74 Compare April 14, 2026 08:11
@shiv-tyagi
Copy link
Copy Markdown
Member Author

This fixes what we have today but I filed #82 to suggest some design improvements.

@shiv-tyagi shiv-tyagi requested a review from powderluv April 14, 2026 08:42
@shiv-tyagi shiv-tyagi marked this pull request as ready for review April 14, 2026 08:43
@shiv-tyagi shiv-tyagi merged commit 66c3994 into ROCm:main Apr 14, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants