agents-oss · skokaina · Apr 1, 2026 · Apr 1, 2026
diff --git a/README.md b/README.md
@@ -28,6 +28,7 @@ agentspec generate agent.yaml --framework langgraph
 - [x] **Scan** an existing codebase and auto-generate the manifest
 - [x] **Evaluate** agent quality against JSONL datasets with CI pass/fail gates
 - [x] **Deploy** to Kubernetes — operator injects sidecar, exposes `/health/ready` and `/gap`
+- [x] **Track** token usage per model — in-process metering, no external infrastructure
 - [x] **Export** to A2A / AgentCard format
 - [ ] Visual dashboard for fleet-wide agent observability (coming soon)
 - [ ] Native OpenTelemetry trace export (coming soon)
@@ -39,7 +40,7 @@ agentspec generate agent.yaml --framework langgraph
 <img src="docs/graphics/agentspec-architecture.png" alt="AgentSpec Architecture" width="800" />
 
 - **`agent.yaml`** is the single source of truth — the SDK reads it at runtime, the CLI validates and audits it, the operator deploys it
-- **Sidecar** is injected automatically by the operator and exposes live `/health/ready`, `/gap`, and `/explore` endpoints without touching agent code
+- **Sidecar** is injected automatically by the operator and exposes live `/health/ready`, `/gap`, `/explore`, and `/usage` endpoints without touching agent code
 - **CLI** wraps the SDK for local development — validate, audit, generate, scan, evaluate
 - **MCP Server** bridges the sidecar to Claude Code and VS Code for in-editor introspection
 
@@ -126,6 +127,13 @@ npm install @agentspec/sdk
     [medium] MEM-04 — Vector store namespace isolated
 ```
 
+**Token usage** (`kubectl get agentobservations`):
+```
+NAME               PHASE     GRADE  SCORE  TOKENS   CHECKED
+budget-assistant   Healthy   B      82     12,450   30s ago
+gymcoach           Healthy   A      95     3,200    15s ago
+```
+
 ---
 
 ## Manifest

diff --git a/docs/concepts/operating-modes.md b/docs/concepts/operating-modes.md
@@ -15,7 +15,7 @@ source of confusion when working with VS Code, MCP, and the CLI.
 | **When to use** | Local dev, or cluster agent via per-agent port-forward | K8s cluster with AgentSpec Operator deployed |
 | **URL target** | `http://localhost:4001` (direct or port-forwarded per agent) | Operator service URL (one URL for all agents) |
 | **Data freshness** | **Live** — computed fresh on each request | **Stored** — last heartbeat (up to `RATE_LIMIT_SECONDS` stale) |
-| **Endpoints** | `GET /gap`, `GET /proof`, `GET /health/ready`, `GET /explore` | `GET /api/v1/agents/{name}/gap`, `/proof`, `/health` |
+| **Endpoints** | `GET /gap`, `GET /proof`, `GET /health/ready`, `GET /explore`, `GET /usage` | `GET /api/v1/agents/{name}/gap`, `/proof`, `/health`, `/usage` |
 | **Auth** | None (port-forward is already a trust boundary) | `X-Admin-Key` header |
 | **VS Code config** | `agentspec.sidecarUrl` | `agentspec.cluster.controlPlaneUrl` + `agentspec.cluster.adminKey` |
 
@@ -53,6 +53,7 @@ All endpoints return **live** data computed at request time:
 - `GET /proof` — compliance proof records
 - `GET /health/ready` — live health checks
 - `GET /explore` — runtime capabilities
+- `GET /usage` — aggregated token usage from the audit ring
 
 ### VS Code configuration
 
@@ -116,6 +117,7 @@ All endpoints return **stored** data from the last heartbeat push:
 - `GET /api/v1/agents/{name}/gap` — last known gap report
 - `GET /api/v1/agents/{name}/proof` — proof records
 - `GET /api/v1/agents/{name}/health` — last health check result
+- `GET /api/v1/agents/{name}/usage` — last token usage snapshot
 
 ### VS Code configuration
 

diff --git a/docs/concepts/runtime-introspection.md b/docs/concepts/runtime-introspection.md
@@ -191,6 +191,79 @@ These categories only appear in runtime `HealthReport`s (not in CLI pre-flight o
 | `service` | `AgentSpecReporter` | TCP connectivity for `spec.requires.services` entries |
 | `model` | `AgentSpecReporter` | Provider API endpoint reachable (resolves `$env:` at runtime) |
 
+## Token Usage Tracking
+
+`AgentSpecReporter` includes a built-in `UsageLedger` that aggregates LLM token counts in-process. No external infrastructure required — token counts flow through the existing heartbeat push.
+
+### Recording usage
+
+After each LLM call, record the token counts:
+
+```typescript
+reporter.usage.record('openai/gpt-4o', promptTokens, completionTokens)
+```
+
+For LangGraph agents, `instrument_call_model` records automatically when a `ledger` is provided:
+
+```python
+from agentspec_langgraph import instrument_call_model, UsageLedger
+
+ledger = UsageLedger()
+call_model = instrument_call_model(
+    original_call_model,
+    reporter=reporter,
+    model_id="groq/llama-3.3-70b-versatile",
+    ledger=ledger,
+)
+```
+
+### How it flows
+
+```
+LLM response → UsageLedger.record() → heartbeat push → CRD status → VS Code
+                                       sidecar GET /usage ─────────→ VS Code
+```
+
+Each heartbeat ships a **window snapshot** (e.g., last 30s of usage), then resets the counters. The control plane stores each window with the heartbeat row.
+
+### Querying usage
+
+**Sidecar mode** (live, from audit ring):
+```
+GET /usage
+```
+
+**Operator mode** (stored, from last heartbeat):
+```
+GET /api/v1/agents/{name}/usage
+```
+
+Response:
+```json
+{
+  "windowStartedAt": "2026-03-31T12:00:00.000Z",
+  "models": [
+    { "modelId": "openai/gpt-4o", "totalTokens": 1250, "callCount": 8 }
+  ],
+  "totalTokens": 1250,
+  "totalCalls": 8
+}
+```
+
+### CRD visibility
+
+Token usage appears in the `AgentObservation` CRD status and in `kubectl` output:
+
+```bash
+kubectl get agentobservations
+# NAME          PHASE     GRADE  SCORE  TOKENS  CHECKED
+# gymcoach      Healthy   A      92     1250    2m ago
+```
+
+### VS Code
+
+The agent detail panel shows a **Token Usage** section with total tokens, call count, and a per-model breakdown table.
+
 ## Caching and Refresh
 
 `AgentSpecReporter` caches the last `HealthReport` to avoid hammering external APIs on every request to `/agentspec/health`.

diff --git a/packages/control-plane/api/agents.py b/packages/control-plane/api/agents.py
@@ -3,22 +3,25 @@
 GET /api/v1/agents/{name}/health — last known HealthReport for an agent
 GET /api/v1/agents/{name}/gap    — last known GapReport for an agent
 GET /api/v1/agents/{name}/proof  — last known proof records for an agent
+GET /api/v1/agents/{name}/usage  — last known token usage for an agent
 
 All endpoints require the X-Admin-Key header (verify_admin_key dependency).
 The {name} path parameter is validated against the k8s resource name pattern.
 """
 from __future__ import annotations
 
 import logging
+from typing import Any, Optional, Type
 
 from fastapi import APIRouter, Depends, HTTPException, Path
+from pydantic import BaseModel
 from sqlalchemy import select
 from sqlalchemy.ext.asyncio import AsyncSession
 
 from auth.keys import verify_admin_key
 from db.base import get_session
 from db.models import Agent, Heartbeat
-from schemas import AgentSummary, StoredHealthReport, StoredGapReport, StoredProofRecords
+from schemas import AgentSummary, StoredHealthReport, StoredGapReport, StoredProofRecords, StoredUsageReport
 
 logger = logging.getLogger(__name__)
 router = APIRouter()
@@ -56,6 +59,32 @@ async def _get_latest_heartbeat(
     return agent, latest
 
 
+async def _get_validated_field(
+    name: str,
+    session: AsyncSession,
+    field: str,
+    schema: Type[BaseModel],
+    raw_data: Optional[dict[str, Any]] = None,
+) -> dict:
+    """Fetch a field from the latest heartbeat, validate through a Pydantic model, return dict.
+
+    When raw_data is provided it is used instead of getattr(latest, field).
+    This handles the common pattern of merging extra fields (e.g. receivedAt) before validation.
+    """
+    _, latest = await _get_latest_heartbeat(name, session)
+    data = raw_data if raw_data is not None else getattr(latest, field, None)
+    if data is None:
+        data = {"receivedAt": latest.received_at.isoformat()}
+    else:
+        data = {**data, "receivedAt": latest.received_at.isoformat()}
+    try:
+        report = schema.model_validate(data)
+    except Exception:
+        logger.error("Stored %s data for '%s' failed schema validation", field, name)
+        raise HTTPException(status_code=500, detail=f"Stored {field} data is corrupt")
+    return report.model_dump()
+
+
 @router.get(
     "/agents",
     response_model=list[AgentSummary],
@@ -107,15 +136,7 @@ async def get_agent_gap(
     session: AsyncSession = Depends(get_session),
 ) -> dict:
     """Return the last known GapReport for a remote agent (from its latest heartbeat)."""
-    _, latest = await _get_latest_heartbeat(name, session)
-    try:
-        report = StoredGapReport.model_validate(
-            {**(latest.gap or {}), "receivedAt": latest.received_at.isoformat()}
-        )
-    except Exception:
-        logger.error("Stored gap data for '%s' failed schema validation", name)
-        raise HTTPException(status_code=500, detail="Stored gap data is corrupt")
-    return report.model_dump()
+    return await _get_validated_field(name, session, "gap", StoredGapReport)
 
 
 @router.get(
@@ -128,11 +149,19 @@ async def get_agent_proof(
 ) -> dict:
     """Return the last known proof records for a remote agent (from its latest heartbeat)."""
     _, latest = await _get_latest_heartbeat(name, session)
-    try:
-        stored = StoredProofRecords.model_validate(
-            {"records": latest.proof or [], "receivedAt": latest.received_at.isoformat()}
-        )
-    except Exception:
-        logger.error("Stored proof data for '%s' failed schema validation", name)
-        raise HTTPException(status_code=500, detail="Stored proof data is corrupt")
-    return stored.model_dump()
+    return await _get_validated_field(
+        name, session, "proof", StoredProofRecords,
+        raw_data={"records": latest.proof or []},
+    )
+
+
+@router.get(
+    "/agents/{name}/usage",
+    dependencies=[Depends(verify_admin_key)],
+)
+async def get_agent_usage(
+    name: str = _K8S_NAME_PATH,
+    session: AsyncSession = Depends(get_session),
+) -> dict:
+    """Return the last known token usage for a remote agent (from its latest heartbeat)."""
+    return await _get_validated_field(name, session, "usage", StoredUsageReport)
diff --git a/packages/control-plane/api/heartbeat.py b/packages/control-plane/api/heartbeat.py
@@ -117,7 +117,7 @@ async def heartbeat(
     _check_rate_limit(agent_id)
 
     # 6. Derive phase / grade / score
-    status_patch = build_status_patch(data.health, data.gap)
+    status_patch = build_status_patch(data.health, data.gap, data.usage)
     now = datetime.now(timezone.utc)
 
     # 7. Persist heartbeat
@@ -127,6 +127,7 @@ async def heartbeat(
         health=data.health,
         gap=data.gap,
         proof=data.proof,
+        usage=data.usage,
     )
     session.add(hb)
 

diff --git a/packages/control-plane/db/models.py b/packages/control-plane/db/models.py
@@ -43,5 +43,6 @@ class Heartbeat(Base):
     health: Mapped[dict] = mapped_column(JSON, nullable=False)
     gap: Mapped[dict] = mapped_column(JSON, nullable=False)
     proof: Mapped[list] = mapped_column(JSON, nullable=False, default=list)
+    usage: Mapped[dict | None] = mapped_column(JSON, nullable=True)
 
     agent: Mapped[Agent] = relationship(back_populates="heartbeats")
diff --git a/packages/control-plane/k8s/upsert.py b/packages/control-plane/k8s/upsert.py
@@ -19,7 +19,7 @@
 NAMESPACE = "agentspec-remote"
 
 
-def build_status_patch(health: dict[str, Any], gap: dict[str, Any]) -> dict[str, Any]:
+def build_status_patch(health: dict[str, Any], gap: dict[str, Any], usage: dict[str, Any] | None = None) -> dict[str, Any]:
     """
     Derive AgentObservation .status from heartbeat data.
 
@@ -49,13 +49,16 @@ def build_status_patch(health: dict[str, Any], gap: dict[str, Any]) -> dict[str,
     else:
         grade = "F"
 
-    return {
+    patch: dict[str, Any] = {
         "phase": phase,
         "grade": grade,
         "score": score,
         "health": health,
         "gap": gap,
     }
+    if usage is not None:
+        patch["usage"] = usage
+    return patch
 
 
 async def upsert_agent_observation(

diff --git a/packages/control-plane/schemas.py b/packages/control-plane/schemas.py
@@ -39,6 +39,7 @@ class HeartbeatRequest(BaseModel):
     health: dict[str, Any]
     gap: dict[str, Any]
     proof: list[dict[str, Any]] = Field(default_factory=list)
+    usage: Optional[dict[str, Any]] = None
 
 
 # ── Stored health report (GET /agents/{name}/health response) ─────────────────
@@ -95,6 +96,20 @@ class StoredProofRecords(BaseModel):
     receivedAt: Optional[str] = None
 
 
+# ── Stored usage report (GET /agents/{name}/usage response) ─────────────────
+
+class StoredUsageReport(BaseModel):
+    """Schema for the usage snapshot stored in heartbeat rows.
+
+    Strips unknown fields (extra='ignore') — same pattern as StoredHealthReport.
+    """
+    windowStartedAt: Optional[str] = None
+    models: list[dict[str, Any]] = Field(default_factory=list)
+    totalTokens: int = 0
+    totalCalls: int = 0
+    receivedAt: Optional[str] = None
+
+
 # ── Agent summary (list endpoint) ─────────────────────────────────────────────
 
 class AgentSummary(BaseModel):