Describe the bug
The indexer service gets stuck in an infinite loop silently failing to poll new events when deployed to Google Cloud Run. The only visible log is a generic polling failed: connection closed repeating every 5 seconds. This occurs because the persistent tokio-postgres connection is dropped by the database (likely due to idle timeouts during container suspension/scale-to-zero) and the current indexer logic lacks an auto-reconnect mechanism, leaving it permanently stuck.
To Reproduce
Steps to reproduce the behavior:
- Deploy the
indexer service to Cloud Run.
- Allow the container to run and eventually suspend/throttle due to lack of requests.
- Observe the Cloud Run logs: the polling loop will begin emitting
polling failed: connection closed every 5 seconds without recovering.
- New events emitted on the Sui network during this time are not indexed into the database.
Expected behavior
The indexer should gracefully handle database connection drops, automatically reconnect, and resume polling from the last recorded cursor. The anyhow errors should also bubble up with sufficient context to identify the exact point of failure (RPC vs Database).
Screenshots
N/A - backend service log issue.
Environment:
- Environment: Google Cloud Run (Serverless)
- Service:
apps/indexer
Additional context
As a preliminary step to confirm the issue and aid debugging, we've merged a "logging assist" PR that adds .context(...) to all anyhow boundaries across the database and RPC operations. This modifies the error from a generic connection closed to explicitly identify the failing operation (e.g., polling failed: process_page failed: database insert failed: connection closed).
Recommended Fix:
Replace the single, raw tokio-postgres connection with a connection pooler like deadpool-postgres inside the indexer to automatically handle connection checkouts and transparent reconnects when connections are severed by the infrastructure.
Describe the bug
The
indexerservice gets stuck in an infinite loop silently failing to poll new events when deployed to Google Cloud Run. The only visible log is a genericpolling failed: connection closedrepeating every 5 seconds. This occurs because the persistenttokio-postgresconnection is dropped by the database (likely due to idle timeouts during container suspension/scale-to-zero) and the current indexer logic lacks an auto-reconnect mechanism, leaving it permanently stuck.To Reproduce
Steps to reproduce the behavior:
indexerservice to Cloud Run.polling failed: connection closedevery 5 seconds without recovering.Expected behavior
The indexer should gracefully handle database connection drops, automatically reconnect, and resume polling from the last recorded cursor. The
anyhowerrors should also bubble up with sufficient context to identify the exact point of failure (RPC vs Database).Screenshots
N/A - backend service log issue.
Environment:
apps/indexerAdditional context
As a preliminary step to confirm the issue and aid debugging, we've merged a "logging assist" PR that adds
.context(...)to allanyhowboundaries across the database and RPC operations. This modifies the error from a genericconnection closedto explicitly identify the failing operation (e.g.,polling failed: process_page failed: database insert failed: connection closed).Recommended Fix:
Replace the single, raw
tokio-postgresconnection with a connection pooler likedeadpool-postgresinside the indexer to automatically handle connection checkouts and transparent reconnects when connections are severed by the infrastructure.