feat: add missed block gap fill worker#60
Conversation
📝 WalkthroughWalkthroughAdds a new GapFillWorker background service that scans and retries rows in Changes
Sequence DiagramsequenceDiagram
participant GW as GapFillWorker
participant DB as Database (PgPool)
participant RPC as RPC Server
participant IDX as Indexer
participant MET as Metrics
participant BC as Broadcast Channel
loop continuous run()
GW->>DB: SELECT eligible failed_blocks FOR UPDATE SKIP LOCKED (backoff filter, LIMIT)
alt rows claimed
GW->>DB: UPDATE last_failed_at = NOW() for claimed rows
GW->>RPC: Fetch block data (rate-limited)
alt RPC success with block
GW->>IDX: build BlockBatch & call write_batch_and_clear_failed_block(batch, block_number)
IDX->>DB: ensure_partitions_exist(pool, block_number)
IDX->>DB: write batch (COPY / tx) and DELETE failed_blocks row
GW->>DB: SELECT COUNT(*) FROM failed_blocks
GW->>MET: update missing_blocks gauge
GW->>BC: broadcast recovery event
else RPC missing / error or write failure
GW->>DB: increment retry_count, set next last_failed_at (exponential backoff)
end
else no eligible rows
GW->>GW: sleep IDLE_SLEEP
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
backend/crates/atlas-server/src/indexer/gap_fill_worker.rs (2)
103-111: Consider persistingcopy_clientacross batches for connection reuse.Currently,
connect_copy_clientis called on everyprocess_batch()invocation, establishing a new TCP connection each time. For a worker that runs infrequently (5-minute idle sleep), this overhead is acceptable, but persisting the connection in the struct would reduce latency and resource usage.This is a minor optimization that could be addressed in a follow-up if gap-fill volume increases.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/crates/atlas-server/src/indexer/gap_fill_worker.rs` around lines 103 - 111, Persist the copy client instead of creating it every process_batch call: add a field (e.g., copy_client: Option<CopyClient> or Arc<CopyClient>) to the worker struct used in gap_fill_worker.rs, initialize it lazily by calling Indexer::connect_copy_client(...) once (e.g., in the worker constructor or on first process_batch run) and reuse that instance in subsequent process_batch invocations, and handle reconnects by detecting connection errors from the existing copy_client and re-calling Indexer::connect_copy_client to replace it; reference the existing connect_copy_client function and the local variable copy_client in the diff to locate where to change.
177-180: Consider using a more efficient query for frequently accessed metric.
SELECT COUNT(*) FROM failed_blocksis O(n), but sincefailed_blocksis expected to be small (only unrecoverable blocks), this is acceptable. If the table grows large, consider usingpg_class.reltuplesas done elsewhere in the codebase.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/crates/atlas-server/src/indexer/gap_fill_worker.rs` around lines 177 - 180, The current code calls get_missing_block_count() after successful gap-fill which likely runs a COUNT(*) on failed_blocks (used by set_indexer_missing_blocks); replace or guard that expensive COUNT query with a cheaper estimate using pg_class.reltuples like elsewhere in the codebase when table size is large: update the get_missing_block_count() implementation (or add a new helper) to prefer querying pg_class.reltuples for relname = 'failed_blocks' and fall back to SELECT COUNT(*) only for small tables or when a precise value is required, and ensure the call sites (including where succeeded > 0 and set_indexer_missing_blocks is invoked) use the updated helper.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/crates/atlas-server/src/indexer/gap_fill_worker.rs`:
- Around line 103-111: Persist the copy client instead of creating it every
process_batch call: add a field (e.g., copy_client: Option<CopyClient> or
Arc<CopyClient>) to the worker struct used in gap_fill_worker.rs, initialize it
lazily by calling Indexer::connect_copy_client(...) once (e.g., in the worker
constructor or on first process_batch run) and reuse that instance in subsequent
process_batch invocations, and handle reconnects by detecting connection errors
from the existing copy_client and re-calling Indexer::connect_copy_client to
replace it; reference the existing connect_copy_client function and the local
variable copy_client in the diff to locate where to change.
- Around line 177-180: The current code calls get_missing_block_count() after
successful gap-fill which likely runs a COUNT(*) on failed_blocks (used by
set_indexer_missing_blocks); replace or guard that expensive COUNT query with a
cheaper estimate using pg_class.reltuples like elsewhere in the codebase when
table size is large: update the get_missing_block_count() implementation (or add
a new helper) to prefer querying pg_class.reltuples for relname =
'failed_blocks' and fall back to SELECT COUNT(*) only for small tables or when a
precise value is required, and ensure the call sites (including where succeeded
> 0 and set_indexer_missing_blocks is invoked) use the updated helper.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e20ebd0e-64ce-40b7-a000-9cd68f12f743
📒 Files selected for processing (8)
backend/Cargo.tomlbackend/crates/atlas-server/Cargo.tomlbackend/crates/atlas-server/src/indexer/gap_fill_worker.rsbackend/crates/atlas-server/src/indexer/indexer.rsbackend/crates/atlas-server/src/indexer/mod.rsbackend/crates/atlas-server/src/main.rsbackend/crates/atlas-server/tests/integration/gap_fill.rsbackend/crates/atlas-server/tests/integration/main.rs
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/crates/atlas-server/src/main.rs (1)
388-406:⚠️ Potential issue | 🟠 MajorAvoid starting gap-fill while reindex truncation can still be running.
Indexer::run()performs thereindextruncate after the indexer task starts, but gap-fill is spawned immediately afterward. With--reindex, stalefailed_blocksrows can be selected and written concurrently with the truncate, reintroducing old data or failing on table locks. Gate gap-fill during reindex startup, or add a startup barrier so it begins only after the reset completes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/crates/atlas-server/src/main.rs` around lines 388 - 406, The gap-fill worker is being spawned immediately while Indexer::run() may still be performing the reindex truncate, causing races with stale failed_blocks; fix by introducing a startup barrier (e.g., a tokio::sync::oneshot or watch channel) so gap-fill only starts after the indexer signals the reindex/truncate is complete: create a oneshot sender/receiver before constructing GapFillWorker (referencing GapFillWorker::new and gap_fill_worker.run()), pass the receiver into the gap-fill worker (or hold off spawning it), and modify Indexer::run() to send/notify on the sender right after the truncate completes; then spawn tokio::spawn(async move { run_with_retry(|| gap_fill_worker.run()).await }) only after receiving the readiness signal (or after Indexer::run() has signaled).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/crates/atlas-server/src/indexer/gap_fill_worker.rs`:
- Around line 79-87: The run() loop treats a zero recovered count as idle
because process_batch() currently returns only successful recoveries; change
process_batch() to return both attempted/selected count and recovered count
(e.g., -> Result<(usize /*attempted*/, usize /*recovered*/)> or a small struct)
and update run() to use the attempted value to decide sleeping (sleep only when
attempted == 0). Update calls and uses of process_batch() (including the other
loop that also calls process_batch()) to unpack the two values and log both
attempted and recovered counts, keeping the IDLE_SLEEP behavior tied to no
attempts rather than no recoveries.
- Around line 40-44: The SELECT_READY_SQL currently reads rows without claiming
them and orders ASC; change the selection to atomically claim rows and pick
newest gaps first by: open a transaction, run a SELECT block_number FROM
failed_blocks WHERE last_failed_at < ... ORDER BY last_failed_at DESC LIMIT $1
FOR UPDATE SKIP LOCKED (use the SELECT_READY_SQL symbol), then immediately
UPDATE those block_number(s) to mark them claimed (e.g., set last_failed_at =
NOW() or a dedicated claimed/claimed_at column and increment retry_count) within
the same transaction so other workers skip them, and use the UPDATE ...
RETURNING block_number for processing; ensure the logic that uses
SELECT_READY_SQL and any subsequent update/processing code handles the
transaction and returned rows atomically.
- Around line 191-203: The increment_retry function currently swallows DB update
errors; change increment_retry to return Result<(), sqlx::Error> (or a crate
error type) and return Err(e) when the sqlx::query().execute(...) fails (still
logging with tracing::warn! before returning), so callers can react to DB
failures; update all call sites (e.g., places that currently do
self.increment_retry(block_number).await;) to use
self.increment_retry(block_number).await?; and adjust the signatures of
run_with_retry (and any upstream async functions) to propagate the error (return
Result) so a failed UPDATE to the failed_blocks retry_count is not silently
ignored.
In `@backend/crates/atlas-server/src/indexer/indexer.rs`:
- Around line 1053-1069: The SELECT MAX(...) aggregate can return a NULL row, so
change the query result type from Option<(i64,)> to Option<(Option<i64>,)>
(i.e., make the column nullable) and update the match to handle Some((None,)) as
"no partitions" and Some((Some(max),)) to set current_max.store(max as u64,
Ordering::Relaxed) and perform the existing partition_num comparison; keep the
None => 0 fallback logic by treating NULL the same as None (or use .and_then()
to flatten Option<(Option<i64>,)> into Option<i64> before matching). Target the
code around the sqlx::query_as(...) call and the match that uses current_max,
partition_num and pool in indexer.rs.
In `@backend/crates/atlas-server/tests/integration/gap_fill.rs`:
- Around line 11-13: The current module-local SERIALIZER mutex and the use of
.ok() hide failures and don't prevent other integration tests from losing rows
when reset_failed_blocks() truncates failed_blocks; change to a
cross-suite/global locking strategy (e.g., a process-wide static lock or a DB
advisory lock) so cleanup is serialized across all test modules, and remove
swallowing of errors by replacing .ok() with fail-fast handling (use expect with
descriptive message or propagate error with ? in reset_failed_blocks() and any
callers); update references to SERIALIZER and reset_failed_blocks() to use the
new global/advisory lock and to surface errors immediately.
---
Outside diff comments:
In `@backend/crates/atlas-server/src/main.rs`:
- Around line 388-406: The gap-fill worker is being spawned immediately while
Indexer::run() may still be performing the reindex truncate, causing races with
stale failed_blocks; fix by introducing a startup barrier (e.g., a
tokio::sync::oneshot or watch channel) so gap-fill only starts after the indexer
signals the reindex/truncate is complete: create a oneshot sender/receiver
before constructing GapFillWorker (referencing GapFillWorker::new and
gap_fill_worker.run()), pass the receiver into the gap-fill worker (or hold off
spawning it), and modify Indexer::run() to send/notify on the sender right after
the truncate completes; then spawn tokio::spawn(async move { run_with_retry(||
gap_fill_worker.run()).await }) only after receiving the readiness signal (or
after Indexer::run() has signaled).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 80e3d7c3-a5d0-4405-b093-272818379c04
📒 Files selected for processing (6)
backend/Cargo.tomlbackend/crates/atlas-server/Cargo.tomlbackend/crates/atlas-server/src/indexer/gap_fill_worker.rsbackend/crates/atlas-server/src/indexer/indexer.rsbackend/crates/atlas-server/src/main.rsbackend/crates/atlas-server/tests/integration/gap_fill.rs
✅ Files skipped from review due to trivial changes (2)
- backend/Cargo.toml
- backend/crates/atlas-server/Cargo.toml
| /// Serializes gap-fill tests: process_batch queries all eligible failed_blocks, | ||
| /// so tests must not run concurrently or they'll pick up each other's rows. | ||
| static SERIALIZER: LazyLock<Mutex<()>> = LazyLock::new(|| Mutex::new(())); |
There was a problem hiding this comment.
Make the destructive cleanup globally serialized and fail-fast.
SERIALIZER only protects this module, but reset_failed_blocks() deletes the entire failed_blocks table. Other integration tests can still run concurrently and lose rows; .ok() also hides cleanup failures. Prefer a shared cross-suite lock/advisory lock strategy, and use expect/? for these deletes.
Also applies to: 69-79
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/crates/atlas-server/tests/integration/gap_fill.rs` around lines 11 -
13, The current module-local SERIALIZER mutex and the use of .ok() hide failures
and don't prevent other integration tests from losing rows when
reset_failed_blocks() truncates failed_blocks; change to a cross-suite/global
locking strategy (e.g., a process-wide static lock or a DB advisory lock) so
cleanup is serialized across all test modules, and remove swallowing of errors
by replacing .ok() with fail-fast handling (use expect with descriptive message
or propagate error with ? in reset_failed_blocks() and any callers); update
references to SERIALIZER and reset_failed_blocks() to use the new
global/advisory lock and to surface errors immediately.
- Use FOR UPDATE SKIP LOCKED CTE to atomically claim failed_blocks rows, preventing double-processing by concurrent workers; order DESC - process_batch returns (attempted, succeeded); run() sleeps only when attempted == 0 (no eligible work), not when all retries fail - increment_retry returns Result<()> and propagates errors with ?; anyhow::ensure! guards rows_affected == 1 - Fix NULL MAX aggregate bug in ensure_partitions_exist: change Option<(i64,)> to Option<(Option<i64>,)> so an empty pg_class result decodes correctly instead of panicking on NULL - Update integration test cleanup to use .expect() instead of .ok()
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/crates/atlas-server/src/indexer/indexer.rs (1)
282-336:⚠️ Potential issue | 🟠 MajorDon’t advance
last_indexed_blockafter writing a sparse batch.This batch can skip failed blocks into
failed_blocks, but Line 336 still updatesindexer_state.last_indexed_blockfrombatch.last_block. That makes the stored watermark non-contiguous and can report/restart past a hole; either keep a separate non-contiguous head or update the persisted contiguous watermark only through the last block before the first failure.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/crates/atlas-server/src/indexer/indexer.rs` around lines 282 - 336, The code currently advances the persisted watermark (indexer_state.last_indexed_block) to batch.last_block even when some blocks in the batch failed (failed_blocks), producing a non-contiguous persisted watermark; change the logic after write_batch to only advance the persisted contiguous watermark up to the last block BEFORE the first failure (i.e., if failed_blocks.is_empty() keep using batch.last_block, otherwise compute contiguous_last = first_failed_block - 1 and persist that), while still allowing head/tracking state (what you publish via publish_committed_batch / committed_blocks) to reflect the non-contiguous head; update the code paths that set indexer_state.last_indexed_block (and any persisted watermark writes) to use this contiguous_last value and leave failed_blocks and any in-memory head variables alone.
♻️ Duplicate comments (1)
backend/crates/atlas-server/tests/integration/gap_fill.rs (1)
11-13:⚠️ Potential issue | 🟡 MinorKeep the destructive cleanup isolated across integration suites.
SERIALIZERonly protects this module, butreset_failed_blocks()still deletes the entirefailed_blockstable. Any other integration test using that table can lose rows while these tests run; use a cross-suite DB advisory lock held for the full test or make the worker testable against scoped rows.Also applies to: 69-79
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/crates/atlas-server/tests/integration/gap_fill.rs` around lines 11 - 13, The SERIALIZER Mutex only serializes this module but reset_failed_blocks() still truncates the entire failed_blocks table and can interfere with other suites; change the test synchronization to use a cross-suite database advisory lock (e.g., acquire a Postgres advisory_lock before running gap-fill tests and hold it until they finish) or modify reset_failed_blocks() to operate on scoped rows (delete only rows belonging to the test's tenant/job id) instead of deleting the whole table; update references to SERIALIZER and calls to reset_failed_blocks() so the tests acquire/release the DB advisory lock (or pass a scoped identifier) to ensure cleanup is isolated across integration suites.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/crates/atlas-server/src/indexer/gap_fill_worker.rs`:
- Around line 45-56: The current SELECT_READY_SQL only retries rows from
failed_blocks and doesn't detect missing block ranges; implement a separate fast
gap-scanner that performs a descending windowed/fast-count scan over the blocks
table to detect contiguous holes and emit gap_start/gap_end ranges
(newest-first) to the same queue that SELECT_READY_SQL feeds; add a new SQL
constant (e.g. SELECT_GAPS_SQL) and integrate it into gap_fill_worker.rs's
scheduling loop so the worker alternates or merges results from SELECT_GAPS_SQL
(gap detection) and SELECT_READY_SQL (failed_rows), and ensure the emitted items
include block range bounds and priority/newest ordering so the queue receives
both failed_rows and newly-discovered ranges for processing.
In `@backend/crates/atlas-server/src/indexer/indexer.rs`:
- Around line 1035-1071: The code treats 0 as both “unknown” and a valid
partition index causing fresh DBs with no blocks_p0 to short-circuit; change the
logic to use an explicit sentinel (e.g. UNKNOWN_MAX_PARTITION) instead of 0 for
uninitialized current_max, update the current_max initializers in Indexer and
GapFillWorker to AtomicU64::new(UNKNOWN_MAX_PARTITION), and adjust
ensure_partitions_exist to treat UNKNOWN_MAX_PARTITION as “need discovery”
(perform the pg_class MAX lookup and initialize current_max accordingly) so
partition 0 can be created; also add a unit test in a #[cfg(test)] mod tests
block in the same file that simulates a fresh DB and verifies partition-0
creation and correct cache initialization.
---
Outside diff comments:
In `@backend/crates/atlas-server/src/indexer/indexer.rs`:
- Around line 282-336: The code currently advances the persisted watermark
(indexer_state.last_indexed_block) to batch.last_block even when some blocks in
the batch failed (failed_blocks), producing a non-contiguous persisted
watermark; change the logic after write_batch to only advance the persisted
contiguous watermark up to the last block BEFORE the first failure (i.e., if
failed_blocks.is_empty() keep using batch.last_block, otherwise compute
contiguous_last = first_failed_block - 1 and persist that), while still allowing
head/tracking state (what you publish via publish_committed_batch /
committed_blocks) to reflect the non-contiguous head; update the code paths that
set indexer_state.last_indexed_block (and any persisted watermark writes) to use
this contiguous_last value and leave failed_blocks and any in-memory head
variables alone.
---
Duplicate comments:
In `@backend/crates/atlas-server/tests/integration/gap_fill.rs`:
- Around line 11-13: The SERIALIZER Mutex only serializes this module but
reset_failed_blocks() still truncates the entire failed_blocks table and can
interfere with other suites; change the test synchronization to use a
cross-suite database advisory lock (e.g., acquire a Postgres advisory_lock
before running gap-fill tests and hold it until they finish) or modify
reset_failed_blocks() to operate on scoped rows (delete only rows belonging to
the test's tenant/job id) instead of deleting the whole table; update references
to SERIALIZER and calls to reset_failed_blocks() so the tests acquire/release
the DB advisory lock (or pass a scoped identifier) to ensure cleanup is isolated
across integration suites.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9a8af734-9b28-48dc-9e00-903d53d352da
📒 Files selected for processing (3)
backend/crates/atlas-server/src/indexer/gap_fill_worker.rsbackend/crates/atlas-server/src/indexer/indexer.rsbackend/crates/atlas-server/tests/integration/gap_fill.rs
…cache from partition 0 Using 0 as the initial value for current_max_partition caused a silent bug: when block 0 was processed on a fresh DB (no blocks_p0 yet), partition_num == 0 <= current_max_val == 0 triggered the fast path and returned Ok(()) without ever creating the partition. Replace the initializer with u64::MAX (UNKNOWN_MAX_PARTITION), update the fast-path guard to skip when the sentinel is present, and use the same sentinel in GapFillWorker. Add unit tests covering the sentinel value and the no-DB-needed fast path.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
♻️ Duplicate comments (1)
backend/crates/atlas-server/src/indexer/gap_fill_worker.rs (1)
45-56:⚠️ Potential issue | 🟠 MajorAdd the actual gap scanner and preserve newest-first order after claiming.
SELECT_READY_SQLstill only sourcesfailed_blocks, so holes without a failed-row entry remain undiscovered. Also, the candidateORDER BY block_number DESCis not guaranteed to be preserved by the finalUPDATE ... RETURNING; wrap the update in aclaimedCTE and select from it with an explicit final order.Suggested SQL shape for the ordering fix
const SELECT_READY_SQL: &str = " WITH candidates AS ( SELECT block_number FROM failed_blocks WHERE last_failed_at < NOW() - make_interval(mins => (2 * power(2, LEAST(retry_count, 10)))::int) ORDER BY block_number DESC LIMIT $1 FOR UPDATE SKIP LOCKED - ) - UPDATE failed_blocks SET last_failed_at = NOW() - FROM candidates - WHERE failed_blocks.block_number = candidates.block_number - RETURNING failed_blocks.block_number"; + ), + claimed AS ( + UPDATE failed_blocks SET last_failed_at = NOW() + FROM candidates + WHERE failed_blocks.block_number = candidates.block_number + RETURNING failed_blocks.block_number + ) + SELECT block_number FROM claimed + ORDER BY block_number DESC";Does PostgreSQL guarantee row order for UPDATE ... RETURNING when the source CTE has ORDER BY but the final RETURNING query has no ORDER BY?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/crates/atlas-server/src/indexer/gap_fill_worker.rs` around lines 45 - 56, SELECT_READY_SQL currently only reads from failed_blocks and its ORDER BY can be lost by UPDATE ... RETURNING; change it to first produce a candidates set that includes missing blocks (e.g., scanning the block range/gaps rather than only failed_rows) and then wrap the UPDATE in a claimed CTE: perform UPDATE ... FROM candidates ... RETURNING into claimed, then SELECT block_number FROM claimed ORDER BY block_number DESC to guarantee newest-first. Update the constant SELECT_READY_SQL to (1) source gaps as well as failed_blocks (the gap scanner logic) into candidates and (2) use a claimed CTE and final SELECT ... ORDER BY block_number DESC to preserve ordering when returning claimed blocks.
🧹 Nitpick comments (1)
backend/crates/atlas-server/src/indexer/gap_fill_worker.rs (1)
168-168: Coalesce recovered-block notifications per cycle.Line 168 broadcasts once per recovered historical block; with
BATCH_SIZE = 50, this can wake every SSE subscriber dozens of times in one cycle. Consider sending once after the loop whensucceeded > 0, alongside the missing-block metric refresh.Proposed refactor
- let _ = self.block_events_tx.send(()); tracing::info!(block = block_num, "gap-fill: block recovered"); succeeded += 1; @@ if succeeded > 0 { + let _ = self.block_events_tx.send(()); self.metrics .set_indexer_missing_blocks(self.get_missing_block_count().await?); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/crates/atlas-server/src/indexer/gap_fill_worker.rs` at line 168, The code currently calls self.block_events_tx.send(()) inside the per-block recovery loop, causing many SSE wakeups; change gap_fill_worker.rs so block_events_tx.send(()) is moved out of the inner loop and invoked once per cycle when at least one block was recovered (e.g., after the loop when succeeded > 0 and alongside the missing-block metric refresh), ensuring you still handle the send result (ignore or log the error) and keep BATCH_SIZE behavior unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@backend/crates/atlas-server/src/indexer/gap_fill_worker.rs`:
- Around line 45-56: SELECT_READY_SQL currently only reads from failed_blocks
and its ORDER BY can be lost by UPDATE ... RETURNING; change it to first produce
a candidates set that includes missing blocks (e.g., scanning the block
range/gaps rather than only failed_rows) and then wrap the UPDATE in a claimed
CTE: perform UPDATE ... FROM candidates ... RETURNING into claimed, then SELECT
block_number FROM claimed ORDER BY block_number DESC to guarantee newest-first.
Update the constant SELECT_READY_SQL to (1) source gaps as well as failed_blocks
(the gap scanner logic) into candidates and (2) use a claimed CTE and final
SELECT ... ORDER BY block_number DESC to preserve ordering when returning
claimed blocks.
---
Nitpick comments:
In `@backend/crates/atlas-server/src/indexer/gap_fill_worker.rs`:
- Line 168: The code currently calls self.block_events_tx.send(()) inside the
per-block recovery loop, causing many SSE wakeups; change gap_fill_worker.rs so
block_events_tx.send(()) is moved out of the inner loop and invoked once per
cycle when at least one block was recovered (e.g., after the loop when succeeded
> 0 and alongside the missing-block metric refresh), ensuring you still handle
the send result (ignore or log the error) and keep BATCH_SIZE behavior
unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9631c8c7-e5fe-4b4c-b7ac-5c5ff0618d76
📒 Files selected for processing (2)
backend/crates/atlas-server/src/indexer/gap_fill_worker.rsbackend/crates/atlas-server/src/indexer/indexer.rs
Summary
failed_blockswith backofffailed_blocksrows atomically and refresh the missing-blocks metricSummary by CodeRabbit
New Features
Tests
Bug Fixes
Closes #57