Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ agentspec generate agent.yaml --framework langgraph
- [x] **Scan** an existing codebase and auto-generate the manifest
- [x] **Evaluate** agent quality against JSONL datasets with CI pass/fail gates
- [x] **Deploy** to Kubernetes — operator injects sidecar, exposes `/health/ready` and `/gap`
- [x] **Track** token usage per model — in-process metering, no external infrastructure
- [x] **Export** to A2A / AgentCard format
- [ ] Visual dashboard for fleet-wide agent observability (coming soon)
- [ ] Native OpenTelemetry trace export (coming soon)
Expand All @@ -39,7 +40,7 @@ agentspec generate agent.yaml --framework langgraph
<img src="docs/graphics/agentspec-architecture.png" alt="AgentSpec Architecture" width="800" />

- **`agent.yaml`** is the single source of truth — the SDK reads it at runtime, the CLI validates and audits it, the operator deploys it
- **Sidecar** is injected automatically by the operator and exposes live `/health/ready`, `/gap`, and `/explore` endpoints without touching agent code
- **Sidecar** is injected automatically by the operator and exposes live `/health/ready`, `/gap`, `/explore`, and `/usage` endpoints without touching agent code
- **CLI** wraps the SDK for local development — validate, audit, generate, scan, evaluate
- **MCP Server** bridges the sidecar to Claude Code and VS Code for in-editor introspection

Expand Down Expand Up @@ -126,6 +127,13 @@ npm install @agentspec/sdk
[medium] MEM-04 — Vector store namespace isolated
```

**Token usage** (`kubectl get agentobservations`):
```
NAME PHASE GRADE SCORE TOKENS CHECKED
budget-assistant Healthy B 82 12,450 30s ago
gymcoach Healthy A 95 3,200 15s ago
```

---

## Manifest
Expand Down
4 changes: 3 additions & 1 deletion docs/concepts/operating-modes.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ source of confusion when working with VS Code, MCP, and the CLI.
| **When to use** | Local dev, or cluster agent via per-agent port-forward | K8s cluster with AgentSpec Operator deployed |
| **URL target** | `http://localhost:4001` (direct or port-forwarded per agent) | Operator service URL (one URL for all agents) |
| **Data freshness** | **Live** — computed fresh on each request | **Stored** — last heartbeat (up to `RATE_LIMIT_SECONDS` stale) |
| **Endpoints** | `GET /gap`, `GET /proof`, `GET /health/ready`, `GET /explore` | `GET /api/v1/agents/{name}/gap`, `/proof`, `/health` |
| **Endpoints** | `GET /gap`, `GET /proof`, `GET /health/ready`, `GET /explore`, `GET /usage` | `GET /api/v1/agents/{name}/gap`, `/proof`, `/health`, `/usage` |
| **Auth** | None (port-forward is already a trust boundary) | `X-Admin-Key` header |
| **VS Code config** | `agentspec.sidecarUrl` | `agentspec.cluster.controlPlaneUrl` + `agentspec.cluster.adminKey` |

Expand Down Expand Up @@ -53,6 +53,7 @@ All endpoints return **live** data computed at request time:
- `GET /proof` — compliance proof records
- `GET /health/ready` — live health checks
- `GET /explore` — runtime capabilities
- `GET /usage` — aggregated token usage from the audit ring

### VS Code configuration

Expand Down Expand Up @@ -116,6 +117,7 @@ All endpoints return **stored** data from the last heartbeat push:
- `GET /api/v1/agents/{name}/gap` — last known gap report
- `GET /api/v1/agents/{name}/proof` — proof records
- `GET /api/v1/agents/{name}/health` — last health check result
- `GET /api/v1/agents/{name}/usage` — last token usage snapshot

### VS Code configuration

Expand Down
73 changes: 73 additions & 0 deletions docs/concepts/runtime-introspection.md
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,79 @@ These categories only appear in runtime `HealthReport`s (not in CLI pre-flight o
| `service` | `AgentSpecReporter` | TCP connectivity for `spec.requires.services` entries |
| `model` | `AgentSpecReporter` | Provider API endpoint reachable (resolves `$env:` at runtime) |

## Token Usage Tracking

`AgentSpecReporter` includes a built-in `UsageLedger` that aggregates LLM token counts in-process. No external infrastructure required — token counts flow through the existing heartbeat push.

### Recording usage

After each LLM call, record the token counts:

```typescript
reporter.usage.record('openai/gpt-4o', promptTokens, completionTokens)
```

For LangGraph agents, `instrument_call_model` records automatically when a `ledger` is provided:

```python
from agentspec_langgraph import instrument_call_model, UsageLedger

ledger = UsageLedger()
call_model = instrument_call_model(
original_call_model,
reporter=reporter,
model_id="groq/llama-3.3-70b-versatile",
ledger=ledger,
)
```

### How it flows

```
LLM response → UsageLedger.record() → heartbeat push → CRD status → VS Code
sidecar GET /usage ─────────→ VS Code
```

Each heartbeat ships a **window snapshot** (e.g., last 30s of usage), then resets the counters. The control plane stores each window with the heartbeat row.

### Querying usage

**Sidecar mode** (live, from audit ring):
```
GET /usage
```

**Operator mode** (stored, from last heartbeat):
```
GET /api/v1/agents/{name}/usage
```

Response:
```json
{
"windowStartedAt": "2026-03-31T12:00:00.000Z",
"models": [
{ "modelId": "openai/gpt-4o", "totalTokens": 1250, "callCount": 8 }
],
"totalTokens": 1250,
"totalCalls": 8
}
```

### CRD visibility

Token usage appears in the `AgentObservation` CRD status and in `kubectl` output:

```bash
kubectl get agentobservations
# NAME PHASE GRADE SCORE TOKENS CHECKED
# gymcoach Healthy A 92 1250 2m ago
```

### VS Code

The agent detail panel shows a **Token Usage** section with total tokens, call count, and a per-model breakdown table.

## Caching and Refresh

`AgentSpecReporter` caches the last `HealthReport` to avoid hammering external APIs on every request to `/agentspec/health`.
Expand Down
65 changes: 47 additions & 18 deletions packages/control-plane/api/agents.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,25 @@
GET /api/v1/agents/{name}/health — last known HealthReport for an agent
GET /api/v1/agents/{name}/gap — last known GapReport for an agent
GET /api/v1/agents/{name}/proof — last known proof records for an agent
GET /api/v1/agents/{name}/usage — last known token usage for an agent

All endpoints require the X-Admin-Key header (verify_admin_key dependency).
The {name} path parameter is validated against the k8s resource name pattern.
"""
from __future__ import annotations

import logging
from typing import Any, Optional, Type

from fastapi import APIRouter, Depends, HTTPException, Path
from pydantic import BaseModel
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession

from auth.keys import verify_admin_key
from db.base import get_session
from db.models import Agent, Heartbeat
from schemas import AgentSummary, StoredHealthReport, StoredGapReport, StoredProofRecords
from schemas import AgentSummary, StoredHealthReport, StoredGapReport, StoredProofRecords, StoredUsageReport

logger = logging.getLogger(__name__)
router = APIRouter()
Expand Down Expand Up @@ -56,6 +59,32 @@ async def _get_latest_heartbeat(
return agent, latest


async def _get_validated_field(
name: str,
session: AsyncSession,
field: str,
schema: Type[BaseModel],
raw_data: Optional[dict[str, Any]] = None,
) -> dict:
"""Fetch a field from the latest heartbeat, validate through a Pydantic model, return dict.

When raw_data is provided it is used instead of getattr(latest, field).
This handles the common pattern of merging extra fields (e.g. receivedAt) before validation.
"""
_, latest = await _get_latest_heartbeat(name, session)
data = raw_data if raw_data is not None else getattr(latest, field, None)
if data is None:
data = {"receivedAt": latest.received_at.isoformat()}
else:
data = {**data, "receivedAt": latest.received_at.isoformat()}
try:
report = schema.model_validate(data)
except Exception:
logger.error("Stored %s data for '%s' failed schema validation", field, name)
raise HTTPException(status_code=500, detail=f"Stored {field} data is corrupt")
return report.model_dump()


@router.get(
"/agents",
response_model=list[AgentSummary],
Expand Down Expand Up @@ -107,15 +136,7 @@ async def get_agent_gap(
session: AsyncSession = Depends(get_session),
) -> dict:
"""Return the last known GapReport for a remote agent (from its latest heartbeat)."""
_, latest = await _get_latest_heartbeat(name, session)
try:
report = StoredGapReport.model_validate(
{**(latest.gap or {}), "receivedAt": latest.received_at.isoformat()}
)
except Exception:
logger.error("Stored gap data for '%s' failed schema validation", name)
raise HTTPException(status_code=500, detail="Stored gap data is corrupt")
return report.model_dump()
return await _get_validated_field(name, session, "gap", StoredGapReport)


@router.get(
Expand All @@ -128,11 +149,19 @@ async def get_agent_proof(
) -> dict:
"""Return the last known proof records for a remote agent (from its latest heartbeat)."""
_, latest = await _get_latest_heartbeat(name, session)
try:
stored = StoredProofRecords.model_validate(
{"records": latest.proof or [], "receivedAt": latest.received_at.isoformat()}
)
except Exception:
logger.error("Stored proof data for '%s' failed schema validation", name)
raise HTTPException(status_code=500, detail="Stored proof data is corrupt")
return stored.model_dump()
return await _get_validated_field(
name, session, "proof", StoredProofRecords,
raw_data={"records": latest.proof or []},
)


@router.get(
"/agents/{name}/usage",
dependencies=[Depends(verify_admin_key)],
)
async def get_agent_usage(
name: str = _K8S_NAME_PATH,
session: AsyncSession = Depends(get_session),
) -> dict:
"""Return the last known token usage for a remote agent (from its latest heartbeat)."""
return await _get_validated_field(name, session, "usage", StoredUsageReport)
3 changes: 2 additions & 1 deletion packages/control-plane/api/heartbeat.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ async def heartbeat(
_check_rate_limit(agent_id)

# 6. Derive phase / grade / score
status_patch = build_status_patch(data.health, data.gap)
status_patch = build_status_patch(data.health, data.gap, data.usage)
now = datetime.now(timezone.utc)

# 7. Persist heartbeat
Expand All @@ -127,6 +127,7 @@ async def heartbeat(
health=data.health,
gap=data.gap,
proof=data.proof,
usage=data.usage,
)
session.add(hb)

Expand Down
1 change: 1 addition & 0 deletions packages/control-plane/db/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,5 +43,6 @@ class Heartbeat(Base):
health: Mapped[dict] = mapped_column(JSON, nullable=False)
gap: Mapped[dict] = mapped_column(JSON, nullable=False)
proof: Mapped[list] = mapped_column(JSON, nullable=False, default=list)
usage: Mapped[dict | None] = mapped_column(JSON, nullable=True)

agent: Mapped[Agent] = relationship(back_populates="heartbeats")
7 changes: 5 additions & 2 deletions packages/control-plane/k8s/upsert.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
NAMESPACE = "agentspec-remote"


def build_status_patch(health: dict[str, Any], gap: dict[str, Any]) -> dict[str, Any]:
def build_status_patch(health: dict[str, Any], gap: dict[str, Any], usage: dict[str, Any] | None = None) -> dict[str, Any]:
"""
Derive AgentObservation .status from heartbeat data.

Expand Down Expand Up @@ -49,13 +49,16 @@ def build_status_patch(health: dict[str, Any], gap: dict[str, Any]) -> dict[str,
else:
grade = "F"

return {
patch: dict[str, Any] = {
"phase": phase,
"grade": grade,
"score": score,
"health": health,
"gap": gap,
}
if usage is not None:
patch["usage"] = usage
return patch


async def upsert_agent_observation(
Expand Down
15 changes: 15 additions & 0 deletions packages/control-plane/schemas.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ class HeartbeatRequest(BaseModel):
health: dict[str, Any]
gap: dict[str, Any]
proof: list[dict[str, Any]] = Field(default_factory=list)
usage: Optional[dict[str, Any]] = None


# ── Stored health report (GET /agents/{name}/health response) ─────────────────
Expand Down Expand Up @@ -95,6 +96,20 @@ class StoredProofRecords(BaseModel):
receivedAt: Optional[str] = None


# ── Stored usage report (GET /agents/{name}/usage response) ─────────────────

class StoredUsageReport(BaseModel):
"""Schema for the usage snapshot stored in heartbeat rows.

Strips unknown fields (extra='ignore') — same pattern as StoredHealthReport.
"""
windowStartedAt: Optional[str] = None
models: list[dict[str, Any]] = Field(default_factory=list)
totalTokens: int = 0
totalCalls: int = 0
receivedAt: Optional[str] = None


# ── Agent summary (list endpoint) ─────────────────────────────────────────────

class AgentSummary(BaseModel):
Expand Down
Loading
Loading