shepard-obs-stack

The Eye — self-hosted observability for AI coding assistants.

You use Claude Code, Codex, or Gemini CLI every day. You have no idea how much they cost, which tools they call, or whether they're actually helping. This fixes that.

More screenshots

Tools — 5K calls across all three CLIs, top tools ranked, failing tools by error count:

Operations — live event rate, breakdown by source and event type:

Claude Code Deep Dive — per-model cost, token breakdown, cache efficiency, productivity ratio:

Claude Code Deep Dive (Tools) — tool decisions, active time breakdown:

Quality — cache hit rates, error rates, session trends:

Highlights

One command to start: ./scripts/init.sh — 6 services, 8 dashboards, under a minute
Three CLIs supported: Claude Code, Codex, Gemini CLI — hooks + native OpenTelemetry
Eight Grafana dashboards auto-provisioned: cost, tools, operations, quality, per-provider deep dives, and session timeline
Minimal dependencies — Docker, plus bash, curl, and jq on the host for hooks and tests. No Python, no Node, no cloud accounts
Optional Rust accelerator — drop-in shepard-hook binary replaces bash+jq+curl. Hooks auto-detect it; falls back to bash if absent
Six Claude Code skills — /obs-status, /obs-cost, /obs-sessions, /obs-tools, /obs-alerts, /obs-query — query the stack without leaving your terminal
Works offline — everything runs on localhost, your data stays on your machine

Quick Start

Prerequisites: Docker (with Compose v2), curl, jq, and at least one AI CLI installed.

git clone https://github.com/shepard-system/shepard-obs-stack.git
cd shepard-obs-stack
./scripts/init.sh          # starts stack + health check
./hooks/install.sh         # injects hooks into your CLI configs

Open localhost:3000 (admin / shepherd). Use your CLI as usual — data appears in dashboards within seconds.

./scripts/test-signal.sh   # verify the full pipeline (11 checks)

Dashboards

Unified (cross-provider)

Dashboard	Question it answers
Cost	How much is this costing me?
Tools	Who is performing and who is wandering?
Operations	What is happening right now?
Quality	How well is the system working?

Deep Dive (per-provider)

Dashboard	What you see
Claude Code	Token usage, cost by model, tool decisions, active time
Codex	Sessions, API latency percentiles, reasoning tokens
Gemini CLI	Token breakdown, latency heatmap, tool call routing

Session Timeline

Dashboard	What you see
Session Timeline	Synthetic traces from all 3 CLI session logs — tool call waterfall, MCP timing, sub-agents

Click any Trace ID to open the full waterfall in Grafana Explore → Tempo.

Dashboard template variables: Tools and Operations support $source and $git_repo filtering. Deep Dive dashboards use $model. Session Timeline uses $provider. Cost and Quality show aggregated data without filters.

How It Works

AI CLIs emit telemetry through two channels:

AI CLI (Claude Code / Codex / Gemini)
    │
    ├── bash hooks → OTLP metrics (tool calls, events, git context)
    │                 └─→ OTel Collector :4318
    │
    └── native OTel → gRPC (tokens, cost, logs, traces)
                       └─→ OTel Collector :4317
                             │
                             ├── metrics → Prometheus :9090
                             ├── logs → Loki :3100
                             └── traces → Tempo :3200
                                           │
Loki recording rules ──── remote_write ───→ Prometheus
                                           │
Grafana :3000 ←── PromQL + LogQL ──────────┘

Hooks provide what native OTel cannot: git repo context and labeled tool/event counters. Everything else (tokens, cost, sessions) comes from native OTel export.

Hook Setup

./hooks/install.sh              # all detected CLIs
./hooks/install.sh claude       # specific CLI
./hooks/install.sh codex gemini # selective
./hooks/uninstall.sh            # clean removal

The installer auto-detects installed CLIs and merges hook configuration into their config files (creating backups first).

CLI	Hooks	Native OTel signals
Claude Code	`PreToolUse`, `PostToolUse`, `SessionStart`, `Stop`	metrics + logs
Codex CLI	`agent-turn-complete`	logs
Gemini CLI	`AfterTool`, `AfterAgent`, `AfterModel`, `SessionEnd`	metrics + logs + traces

Claude Code Skills

Six slash-command skills for querying the obs stack directly from Claude Code — no browser needed.

Skill	What it does
`/obs-status`	Stack health: service status, scrape targets, last telemetry, active alerts
`/obs-cost`	Cost report by provider and model (supports `today`, `yesterday`, `week`, `24h`)
`/obs-sessions`	Recent sessions with model, duration, tool count, cost
`/obs-tools`	Top tools, error rates, usage by provider and repo
`/obs-alerts`	Active alerts with severity and resolution hints
`/obs-query`	Free-form PromQL or LogQL — run any query inline

Skills are installed automatically when you clone the repo (they live in .claude/skills/). All API calls go through scripts/obs-api.sh — a centralized helper that's ready for auth and TLS when you need it:

# Default: plain HTTP to localhost (single-machine, no auth)
./scripts/obs-api.sh prometheus /api/v1/query --data-urlencode 'query=up'

# With auth (set env vars when hardening or going multi-machine)
SHEPARD_API_TOKEN=secret ./scripts/obs-api.sh prometheus /api/v1/query ...
SHEPARD_CA_CERT=/path/to/ca.pem ./scripts/obs-api.sh loki /ready

Rust Accelerator (optional)

All hooks work out of the box with bash + jq + curl. For faster execution, you can optionally install the Rust accelerator — a single static binary that replaces the entire bash pipeline:

./scripts/install-accelerator.sh           # latest release → hooks/bin/ (no sudo)
./scripts/install-accelerator.sh v0.1.0    # specific version

The installer downloads a pre-built binary from GitHub Releases (linux/macOS, x64/arm64) and verifies it against the SHA256SUMS file published with each release. The binary is placed in hooks/bin/ (gitignored, project-local).

Hooks auto-detect it via hooks/lib/accelerator.sh (project-local → PATH → bash fallback). No configuration needed — if the binary is present, hooks use it; if not, they fall back to bash.

Remove with ./hooks/uninstall.sh or simply delete hooks/bin/.

Alerting

Alertmanager runs on :9093 with 16 alert rules in three tiers:

Tier	Alerts	Examples
Infrastructure	6	`OTelCollectorDown`, `CollectorHighMemory`, `PrometheusHighMemory`, export failures
Pipeline	5	`LokiDown`, `ShepherdServicesDown`, `TempoDown`, `PrometheusTargetDown`, `LokiRecordingRulesFailing`
Business logic	5	`HighSessionCost` (>$10/hr), `HighTokenBurn` (>50k tok/min), `HighToolErrorRate` (>10%), `SensitiveFileAccess`, `NoTelemetryReceived`

Inhibit rules suppress business-logic alerts when infrastructure is down.

Native Telegram, Slack, and Discord receivers are included — uncomment and configure in configs/alertmanager/alertmanager.yaml:

# telegram_configs:
#   - bot_token: 'YOUR_BOT_TOKEN'
#     chat_id: YOUR_CHAT_ID
#     send_resolved: true

Services

Service	Port	Purpose
Grafana	3000	Dashboards & explore
Prometheus	9090	Metrics & alerts
Loki	3100	Log aggregation
Tempo	3200	Distributed tracing
Alertmanager	9093	Alert routing
OTel Collector	4317/4318	OTLP gRPC + HTTP

Architecture

C4 diagrams (click to expand)

System Context

Containers

Hook Components

Hook Event Flow

Event Schema Normalization

Project Structure

shepard-obs-stack/
├── docker-compose.yaml
├── .env.example
├── .claude/skills/            # Claude Code slash-command skills
│   ├── obs-status/            # /obs-status — stack health
│   ├── obs-cost/              # /obs-cost — cost report
│   ├── obs-sessions/          # /obs-sessions — session summary
│   ├── obs-tools/             # /obs-tools — tool usage
│   ├── obs-alerts/            # /obs-alerts — active alerts
│   └── obs-query/             # /obs-query — free-form PromQL/LogQL
├── hooks/
│   ├── bin/                   # Rust accelerator binary (gitignored, downloaded)
│   ├── lib/                   # shared: accelerator, git context, OTLP metrics + traces, sensitive file detection, session parser
│   ├── claude/                # PreToolUse + PostToolUse + SessionStart + Stop
│   ├── codex/                 # notify.sh (agent-turn-complete)
│   ├── gemini/                # AfterTool + AfterAgent + AfterModel + SessionEnd
│   ├── install.sh             # auto-detect + inject
│   └── uninstall.sh           # clean removal
├── scripts/
│   ├── init.sh                # bootstrap
│   ├── install-accelerator.sh # download Rust accelerator to hooks/bin/
│   ├── obs-api.sh             # centralized API client (auth-ready)
│   ├── test-signal.sh         # pipeline verification (11 checks)
│   └── render-c4.sh           # render PlantUML → SVG
├── tests/
│   ├── run-all.sh             # test orchestrator (--e2e for Docker smoke)
│   ├── test-shell-syntax.sh   # bash -n + shellcheck
│   ├── test-config-validate.sh # JSON + YAML validation
│   ├── test-hooks.sh          # behavioral tests (41 tests)
│   ├── test-parsers.sh        # session parser tests (24 tests)
│   └── fixtures/              # minimal session logs (Claude, Codex, Gemini)
├── configs/
│   ├── otel-collector/        # receivers → processors → exporters
│   ├── prometheus/            # scrape targets + alert rules
│   ├── alertmanager/          # routing, Telegram/Slack/Discord receivers
│   ├── loki/                  # storage + 15 recording rules
│   ├── tempo/                 # trace storage, 7d retention
│   └── grafana/               # provisioning + 8 dashboard JSONs
└── docs/c4/                   # architecture diagrams

Testing

113 automated tests across 4 suites, plus a Docker-based E2E smoke test:

bash tests/run-all.sh         # unit tests: syntax, configs, hooks, parsers
bash tests/run-all.sh --e2e   # + Docker E2E (starts stack, runs test-signal.sh)

Suite	Tests	What it checks
Shell Syntax	23	`bash -n` on all scripts, shellcheck (if installed)
Config Validation	25	JSON dashboards (jq) + YAML configs (PyYAML) + promtool rules + alert regression
Hook Behavior	41	PreToolUse guard, PostToolUse metrics, Stop compaction, all Gemini hooks, Codex, install/uninstall
Session Parsers	24	Span count, required fields, attributes, error status, trace_id consistency

CI runs automatically on push/PR via GitHub Actions.

Contributing

Issues and pull requests are welcome. Before submitting changes, run the tests:

bash tests/run-all.sh

License

Elastic License 2.0 — free to use, modify, and distribute. Cannot be offered as a hosted or managed service.

Part of the Shepard System.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.claude/skills		.claude/skills
.github/workflows		.github/workflows
configs		configs
docs		docs
hooks		hooks
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

shepard-obs-stack

Table of Contents

Highlights

Quick Start

Dashboards

Unified (cross-provider)

Deep Dive (per-provider)

Session Timeline

How It Works

Hook Setup

Claude Code Skills

Rust Accelerator (optional)

Alerting

Services

Architecture

System Context

Containers

Hook Components

Hook Event Flow

Event Schema Normalization

Project Structure

Testing

Contributing

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors 2

Languages

Folders and files

Latest commit

History

Repository files navigation

shepard-obs-stack

Table of Contents

Highlights

Quick Start

Dashboards

Unified (cross-provider)

Deep Dive (per-provider)

Session Timeline

How It Works

Hook Setup

Claude Code Skills

Rust Accelerator (optional)

Alerting

Services

Architecture

System Context

Containers

Hook Components

Hook Event Flow

Event Schema Normalization

Project Structure

Testing

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors 2

Languages

Packages