Skip to content

Add /status endpoint with per-network health and metrics #85

@anhthii

Description

@anhthii

Summary

Add a GET /status HTTP endpoint that exposes per-network indexing health and metrics, identified by network_id. Designed for Grafana dashboards and operational monitoring.

Motivation

The current /health endpoint returns a simple {"status":"ok"} with no per-network visibility. Operators need to monitor individual chain health, detect lagging networks, and alert on degraded indexing performance.

Health Model

Health is derived per-network from total pending blocks (head gap + catchup backlog):

Status Condition
healthy pending_blocks < 50
slow pending_blocks < 250
degraded pending_blocks >= 250

Pending blocks calculation

head_gap = latest_block - indexed_block
catchup_pending_blocks = sum of remaining blocks across all active catchup ranges
pending_blocks = head_gap + catchup_pending_blocks

Data Source

Worker-cached — no additional RPC calls. The RegularWorker already fetches chain head and tracks currentBlock every tick. Workers push state into a shared in-memory Registry. The /status handler reads from that cache (at most one poll interval stale).

Response Shape

GET /status
{
  "timestamp": "2026-03-25T12:00:00Z",
  "version": "1.0.0",
  "networks": [
    {
      "network_id": "abc-123",
      "chain_name": "ethereum_sepolia",
      "internal_code": "ETHER_SEPOLIA_TESTNET",
      "network_type": "evm",
      "health": "healthy",
      "latest_block": 1000000,
      "indexed_block": 999998,
      "pending_blocks": 2,
      "head_gap": 2,
      "catchup_pending_blocks": 0,
      "catchup_ranges": 0,
      "failed_blocks": 0,
      "last_indexed_at": "2026-03-25T11:59:55Z"
    },
    {
      "network_id": "def-456",
      "chain_name": "tron_mainnet",
      "internal_code": "TRON_MAINNET",
      "network_type": "tron",
      "health": "degraded",
      "latest_block": 5000000,
      "indexed_block": 4999800,
      "pending_blocks": 10200,
      "head_gap": 200,
      "catchup_pending_blocks": 10000,
      "catchup_ranges": 1,
      "failed_blocks": 3,
      "last_indexed_at": "2026-03-25T11:59:30Z"
    }
  ]
}

Implementation Plan

File Change
internal/status/status.go NEW — types (HealthStatus, NetworkStatus, StatusResponse), Registry (thread-safe in-memory store), health derivation
internal/worker/manager.go Add registry field + Registry() getter
internal/worker/factory.go Create registry, register chain configs, pass to Manager
internal/worker/base.go Add registry field to BaseWorker
internal/worker/regular.go Call registry.Update() after processing blocks each tick
cmd/indexer/main.go Register /status route, pass manager to health server

Grafana-Relevant Fields

All fields are scoped by network_id for dashboard filtering:

  • pending_blocks — total unprocessed blocks (health signal)
  • head_gap — real-time lag at chain head
  • catchup_pending_blocks — historical backfill remaining
  • catchup_ranges — number of active catchup ranges
  • failed_blocks — blocks that failed processing
  • indexed_block / latest_block — absolute block positions
  • last_indexed_at — freshness of last successful index

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions