fix(sensor): bound fillRange so a Datastore timeout can't hot-loop#84
Merged
Conversation
When iter.Next() returned a non-Done error (a Datastore DeadlineExceeded), fillRange did `continue` with no break, backoff, or cap, and ctx is context.Background() so there was no deadline to abort it. A terminal iterator error therefore spun into an unbounded loop: it re-issued the same failing query forever, emitting one "Failed to get next block" warn per iteration. On 2026-06-10 this turned a single transient Datastore latency spike on the shared `amoy` database into an ~8-hour stale-data outage and a ~100k logs/sec storm, in both dev and prod simultaneously (both read the same Datastore, so each instance's query flood kept the DB slow for the other). - fillRange now breaks on a persistent iterator error, matching rpc.go and heimdall.go, so the provider goroutine returns to normal polling instead of wedging. - Bound the backfill range to blockBufferSize blocks behind the head. After a freeze/recovery the head can jump far ahead of prevBlockNumber; querying more than the buffer can hold is wasted work (the oldest are evicted anyway). The 512 buffer cap is now a named constant used in both places, and a clamp logs how many blocks were skipped so gaps stay visible.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Description
When iter.Next() returned a non-Done error (a Datastore DeadlineExceeded), fillRange did
continuewith no break, backoff, or cap, and ctx is context.Background() so there was no deadline to abort it. A terminal iterator error therefore spun into an unbounded loop: it re-issued the same failing query forever, emitting one "Failed to get next block" warn per iteration.On 2026-06-10 this turned a single transient Datastore latency spike on the shared
amoydatabase into an ~8-hour stale-data outage and a ~100k logs/sec storm, in both dev and prod simultaneously (both read the same Datastore, so each instance's query flood kept the DB slow for the other).Jira / Linear Tickets
Testing