Observability: Metrics, Traces, Logging, SLO Dashboards, Alerting

Production readiness requires Prometheus metrics, OTel traces, log correlation, sample control, and default Grafana dashboards with SLO-driven alerts (latency, error rates, agent health, AI cost).

Paths to start:
- Metrics: expose Prometheus endpoints in API/agent; see any `metrics.go` or injection in main.go
- Traces: distributed and sample logic in API, agent, key flows (`internal/diagnostics/`, circuit breaker layer ref: `CLAUDE.md`)
- Logging: add and correlate context IDs; see logrus config
- Dashboards/alerts: publish default Grafana JSON; SLO recipes for latency, errors, agent heartbeat, budget

Integrate with key flows (diagnostics, remediation, agent lifecycle).

References: `internal/diagnostics/`, `CLAUDE.md`, grafana dashboards, main.go.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Observability: Metrics, Traces, Logging, SLO Dashboards, Alerting #103

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Observability: Metrics, Traces, Logging, SLO Dashboards, Alerting #103

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions