Skip to content

Latest commit

 

History

History
68 lines (43 loc) · 5.75 KB

File metadata and controls

68 lines (43 loc) · 5.75 KB

AgentOps with Kubernetes

What This Is

A hands-on course on AgentOps — running production-grade AI agents on Kubernetes. Builds on the LLMOps foundation taught in 302-llmops. Students take the LLM serving stack (vLLM, RAG, fine-tuned model) and add agent capabilities: Hermes Agent (NousResearch), MCP tool servers, Kubernetes Agent Sandbox for isolated agent execution, OTEL/Tempo distributed tracing, cost middleware, two-layer guardrails (MCP middleware + Hermes prompt prefix), DeepEval gate in the training pipeline, and a capstone where students ship an insurance_check MCP tool end-to-end through TDD → GitOps → eval gate → ArgoCD → Grafana.

Prerequisite course: 302-llmops v1.0.0+. AgentOps assumes you have a running LLM serving stack on KIND. Take 302 first.

Core Value

Teach practitioners how to operate AI agents on Kubernetes — agent architecture, tool calling via MCP, sandbox isolation, distributed tracing of agent reasoning, cost attribution, eval gates in CI, and guardrails. The bridge between "I can deploy a model" (LLMOps) and "I can deploy an agent that uses that model safely in production" (AgentOps).

Inheritance

This repo is the AgentOps split-out from schoolofdevops/302-llmops v0.19.0. The combined v0.19.0 release shipped LLMOps (Labs 0-6) and AgentOps (Labs 7-13); this repo holds the AgentOps half going forward.

Full history of the original combined course: https://github.com/schoolofdevops/302-llmops/tree/v0.19.0 (tag SHA 3c4e0b120efd93a147d61f916a943e6a775ec717)

See MIGRATION-FROM-302-LLMOPS.md for the full migration dossier.

What Was Validated at v0.19.0 (Phase 3: AgentOps Labs Day 2)

(Copied verbatim from schoolofdevops/302-llmops .planning/PROJECT.md §"Validated in Phase 3: AgentOps Labs Day 2" at SHA 3c4e0b120efd93a147d61f916a943e6a775ec717. Every item below was live-tested on a KIND cluster.)

  • Hermes Agent (NousResearch v0.12.0) configured for Smile Dental — 3 MCP tool servers (triage, treatment_lookup, book_appointment) + multi-step workflow validated live (Lab 07)
  • Two-phase LLM strategy — Day 2 labs switch to free-tier API; both Groq (llama-3.3-70b-versatile) and Gemini (gemini-2.5-flash) live-tested
  • Kubernetes Agent Sandbox v0.4.3 — CRDs installed, agent deployed as Sandbox + SandboxWarmPool (replicas=2) + NetworkPolicy + Sandbox Router gateway (Lab 08)
  • Cold-vs-warm timing demo — observed warm 7.95s / cold refill 25.03s / cold 2.54s
  • Agent observability — Grafana Tempo + OTEL Collector deployed; 3 MCP tools auto-instrumented; cost middleware emits agent_llm_tokens_total + agent_llm_cost_usd_total; Grafana dashboard auto-discovered (Lab 09)
  • D-18 partial compliance documented honestly: tool/retriever spans hierarchical; Hermes-internal agent.request/llm.completion not visible (closed binary)

Inherited Key Decisions

(Verbatim from schoolofdevops/302-llmops .planning/PROJECT.md Key Decisions table — AgentOps-relevant rows only. LOCKED unless explicitly revisited.)

Decision Rationale Status
Kubernetes Agent Sandbox for agentic module First-class K8s primitive for agent workloads — new, differentiated, production-relevant Inherited
Agent framework: Hermes Agent NousResearch/hermes-agent — model-agnostic, lightweight ($5 VPS), 40+ tools, MCP support, Docker sandbox built-in, MIT licensed, 47k stars. Configure and deploy, don't build from scratch. Inherited
No LangGraph/CrewAI Over-abstracted Pythonic frameworks are dated. Hermes is the modern approach — self-improving, persistent memory, multi-platform. Inherited
Two-phase LLM strategy Labs 00-05 use local SmolLM2-135M (LLMOps focus). Labs 06+ switch to free-tier API (Gemini/Groq) for agentic capabilities — local 135M model can't do tool-calling reliably. Inherited
Support both Gemini and Groq Abstract behind OpenAI-compatible API so students can use either free-tier provider Inherited
Move AgentOps to schoolofdevops/303-agentops Companion course builds on LLMOps foundation; separate repo isolates dependencies and sequencing. This repo
Drop eval gate from Argo Workflows pipeline (in 302) Eval gating is contextually agentic. LLMOps pipeline teaches orchestration; eval lives here. This repo carries the eval gate

Known Issues Inherited

  • D-18 partial compliance: Hermes is a closed binary (NousResearch/hermes-agent v0.12.0). Tool spans and retriever spans are visible in Grafana Tempo (auto-instrumented via OTEL). Hermes-internal agent.request and llm.completion spans are NOT visible because instrumentation hooks are not exposed. Documented in Lab 09. Workaround paths (custom Hermes build, alternative agent runtime) deferred.

Constraints

  • Hardware: Same 16GB-RAM CPU-only KIND budget as 302-llmops.
  • Platform: Same macOS + Windows + Linux requirement.
  • Free-tier LLM API: Either Groq or Gemini free tier. Students must not be required to pay.
  • Naming: Smile Dental (inherited from 302-llmops, globally accessible branding).

Active

(v0.1.0 — to be defined via REQUIREMENTS.md when this repo's first milestone is planned. The v0.19.0 baseline above is the inherited foundation; new milestones build on top.)

Out of Scope (inherited from 302-llmops split decision)

  • LLMOps content (data pipelines, RAG, LoRA fine-tuning, OCI packaging, plain vLLM serving, Prometheus/Grafana for vLLM, autoscaling for vLLM, GitOps for vLLM, Argo Workflows training pipeline without eval gate). Lives in 302-llmops.

Phase 3 validation date: pre-2026-05-07 (combined v0.19.0 release) This repo bootstrapped from 302-llmops v0.19.0 on 2026-05-07 — see MIGRATION-FROM-302-LLMOPS.md