Skip to content

fix(gravity): grow tunnel slots for discovered endpoints#247

Merged
jhaynie merged 1 commit intomainfrom
fix/gravity-tunnel-slots-for-discovered-endpoints
May 5, 2026
Merged

fix(gravity): grow tunnel slots for discovered endpoints#247
jhaynie merged 1 commit intomainfrom
fix/gravity-tunnel-slots-for-discovered-endpoints

Conversation

@jhaynie
Copy link
Copy Markdown
Member

@jhaynie jhaynie commented May 5, 2026

Summary

  • grow tunnel stream slots before reconnecting endpoint indexes discovered after startup
  • keep existing tunnel stream entries intact while appending missing slots
  • update the regression test for high endpoint indexes to verify slots are now in bounds

Validation

  • go test ./...

Production evidence

During us-west Ion rollout, Hadrons cycled control to the new Ion and fired OnEndpointReady, but Ion saw 0 StreamSessionPackets and timed out forwarders with no tunnel streams. This fixes the missing tunnel slot path that prevented reconnectSingleEndpoint from opening streams for endpoint indexes beyond the initial tunnelStreams length.

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced tunnel stream slot allocation mechanism to ensure safe handling when creating per-endpoint streams and prevent potential race conditions during initialization.
  • Tests

    • Updated test suite to verify dynamic tunnel stream slot expansion and preservation behavior when additional endpoints are discovered.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

📝 Walkthrough

Walkthrough

Introduces per-endpoint tunnel stream slot preallocation via a new ensureTunnelStreamSlots method in GravityClient. The method is called from establishTunnelStreams to ensure sufficient tunnel stream slots exist before creating per-endpoint streams. A corresponding test replaces an out-of-bounds scenario test with one validating slot expansion and preservation.

Changes

Tunnel Stream Slot Preallocation

Layer / File(s) Summary
Core Implementation
gravity/grpc_client.go
New private method ensureTunnelStreamSlots(endpointIndex, streamsPerGravity int) computes required slots as (endpointIndex + 1) * streamsPerGravity, locks tunnel stream state, and extends g.streamManager.tunnelStreams with nil entries. Method is called in establishTunnelStreams after determining streamsPerGravity to preallocate slots before stream creation.
Tests
gravity/endpoint_independence_test.go
TestTunnelOffsetOutOfBounds replaced with TestEnsureTunnelStreamSlotsForAddedEndpoint. New test verifies that calling ensureTunnelStreamSlots(5, streamsPerGravity) allocates sufficient slots for endpoint 5, preserves existing slots, and ensures computed tunnel offset is within bounds.
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
gravity/endpoint_independence_test.go (1)

827-828: 🏗️ Heavy lift

Exercise the reconnect path, not only the helper.

Line 828 proves ensureTunnelStreamSlots itself resizes correctly, but the production regression was the missing call from the reconnect/stream-establish path. If that call is removed or skipped later, this test still passes. Please add one assertion through establishTunnelStreams/the reconnect flow so the regression test protects the call site too.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@gravity/endpoint_independence_test.go` around lines 827 - 828, The test
currently only verifies ensureTunnelStreamSlots directly; change it to exercise
the reconnect/stream-establish path by invoking the actual
reconnect/establishment flow (call establishTunnelStreams or the function that
triggers reconnect behavior) and assert the pool's stream slot count after that
call; specifically, ensure the test calls establishTunnelStreams (or the
reconnect entrypoint used in production) after setting streamsPerGravity and
still observes the resized slots from ensureTunnelStreamSlots, so the regression
where the helper is skipped during reconnect is covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@gravity/endpoint_independence_test.go`:
- Around line 827-828: The test currently only verifies ensureTunnelStreamSlots
directly; change it to exercise the reconnect/stream-establish path by invoking
the actual reconnect/establishment flow (call establishTunnelStreams or the
function that triggers reconnect behavior) and assert the pool's stream slot
count after that call; specifically, ensure the test calls
establishTunnelStreams (or the reconnect entrypoint used in production) after
setting streamsPerGravity and still observes the resized slots from
ensureTunnelStreamSlots, so the regression where the helper is skipped during
reconnect is covered.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d98a7aaf-266a-4bab-9176-514514531913

📥 Commits

Reviewing files that changed from the base of the PR and between c1b94cd and 1da8231.

📒 Files selected for processing (2)
  • gravity/endpoint_independence_test.go
  • gravity/grpc_client.go
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build
  • GitHub Check: Analyze (go)
🔇 Additional comments (2)
gravity/grpc_client.go (2)

2030-2043: Tunnel slot growth logic is correct and state-preserving.

This helper safely grows tunnelStreams under lock and preserves existing entries while appending missing slots, which matches the reconnect safety requirement for higher endpoint indexes.


4788-4789: Preallocation is correctly wired before tunnel offset indexing.

Calling ensureTunnelStreamSlots before computing/using tunnelOffset closes the out-of-bounds path during endpoint reconnect for newly discovered indexes.

@jhaynie jhaynie merged commit da7db9e into main May 5, 2026
5 checks passed
@jhaynie jhaynie deleted the fix/gravity-tunnel-slots-for-discovered-endpoints branch May 5, 2026 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant