Skip to content

indexer: silent failures and infinite crash loop due to dropped Postgres connection #91

@rustydb

Description

@rustydb

Describe the bug
The indexer service gets stuck in an infinite loop silently failing to poll new events when deployed to Google Cloud Run. The only visible log is a generic polling failed: connection closed repeating every 5 seconds. This occurs because the persistent tokio-postgres connection is dropped by the database (likely due to idle timeouts during container suspension/scale-to-zero) and the current indexer logic lacks an auto-reconnect mechanism, leaving it permanently stuck.

To Reproduce
Steps to reproduce the behavior:

  1. Deploy the indexer service to Cloud Run.
  2. Allow the container to run and eventually suspend/throttle due to lack of requests.
  3. Observe the Cloud Run logs: the polling loop will begin emitting polling failed: connection closed every 5 seconds without recovering.
  4. New events emitted on the Sui network during this time are not indexed into the database.

Expected behavior
The indexer should gracefully handle database connection drops, automatically reconnect, and resume polling from the last recorded cursor. The anyhow errors should also bubble up with sufficient context to identify the exact point of failure (RPC vs Database).

Screenshots
N/A - backend service log issue.

Environment:

  • Environment: Google Cloud Run (Serverless)
  • Service: apps/indexer

Additional context
As a preliminary step to confirm the issue and aid debugging, we've merged a "logging assist" PR that adds .context(...) to all anyhow boundaries across the database and RPC operations. This modifies the error from a generic connection closed to explicitly identify the failing operation (e.g., polling failed: process_page failed: database insert failed: connection closed).

Recommended Fix:
Replace the single, raw tokio-postgres connection with a connection pooler like deadpool-postgres inside the indexer to automatically handle connection checkouts and transparent reconnects when connections are severed by the infrastructure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions