Echo-Ops ⚡

Autonomous Infrastructure Observability Agent
Detects architectural entropy in system log streams using semantic vector drift analysis — before crashes happen.

The Problem

Traditional monitoring tools alert on thresholds — CPU > 90%, error rate > 5%. But many real failures are "silent killers":

A database index change slows queries gradually.
A retry storm fills up a connection pool over minutes.
A zombie service is "up" but not processing.

These don't throw ERROR logs until it's too late.

The Solution: Semantic Drift Detection

Echo-Ops treats logs as vectors in high-dimensional space. A healthy system has a characteristic "shape" — its logs cluster near a known-good baseline. When behaviour changes, the shape drifts — even if no explicit errors are logged.

Healthy Logs → Embed → Endee Baseline Index
Live Logs    → Embed → Compare vs. Baseline
                       Cosine Distance > 0.30 → ANOMALY
                                                   ↓
                                             Agentic LLM Loop
                                             calls diagnostic tools
                                                   ↓
                                             Root Cause Report

Architecture

┌─────────────────────────────────────────────┐
│              Echo-Ops Agent                 │
│                                             │
│  ingestion/          agent/        api/     │
│  ┌───────────┐  ┌──────────────┐  ┌───────┐ │
│  │ log_gen   │  │  detector.py │  │server │ │
│  │ embedder  │→ │  (Endee ANN) │→ │  SSE  │ │
│  │  + cache  │  │  agent.py    │  │  /    │ │
│  │ endee_    │  │    (Groq)    │  │static │ │
│  │  client   │  │  tools.py    │  │       │ │
│  └───────────┘  └──────────────┘  └───────┘ │
└─────────────────────────────────────────────┘
         │ HTTP                   │ SSE
         ▼                        ▼
  ┌─────────────┐          ┌──────────────┐
  │  Endee DB   │          │   Browser    │
  │  :8080      │          │  Dashboard   │
  └─────────────┘          └──────────────┘

How Endee Is Used

Operation	Endee Endpoint	Purpose
`create_index`	`POST /api/v1/index/create`	Create `echo_ops_baseline` index (384-dim, cosine)
`upsert`	`POST /api/v1/index/{name}/upsert`	Batch-write healthy log embeddings (baseline)
`query`	`POST /api/v1/index/{name}/query`	ANN search: "how far is current log pattern from healthy?"

Why Endee specifically? The drift check runs every 5 seconds against a 300+ vector index. Standard cloud vector DBs add 50–200ms of network latency per query. Endee's C++ HNSW core gives sub-millisecond local lookups, making real-time streaming analysis viable.

Embedding Cache

The embedder caches on the log template (e.g. "User {id} checkout failed") not the full message. Because logs repeat the same ~20 templates, we get ~80-90% cache hit rate — drastically reducing embedding model calls and keeping Endee write volume low.

@functools.lru_cache(maxsize=256)
def _embed_template(template: str) -> tuple:
    # only called on cache miss (~10-20% of logs)
    return tuple(model.encode(template).tolist())

Agentic ReAct Loop

The agent is not a prompt wrapper. It's a genuine ReAct (Reason + Act) loop:

Observe: Receives (service, drift_score, sample_logs) from detector
Reason: LLM decides which tools to call
Act: Calls get_recent_commits, get_top_db_queries, get_resource_snapshot
Observe: Tool results feed back into the conversation
Synthesize: Produces Root Cause Analysis Report as structured JSON

The LLM drives the investigation. We don't hardcode "if checkout → check DB". The agent figures that out.

Project Structure

echo-ops/
├── config.py              # All config in one place
├── main.py                # Entry point & orchestrator
├── docker-compose.yml     # Endee vector DB
├── requirements.txt
├── ingestion/
│   ├── log_generator.py   # Synthetic log stream (healthy + anomaly)
│   ├── embedder.py        # MiniLM + LRU cache
│   └── endee_client.py    # Endee HTTP API wrapper
├── agent/
│   ├── detector.py        # Drift detection engine
│   ├── tools.py           # Diagnostic tools + OpenAI function schemas
│   └── agent.py           # Agentic LLM ReAct loop (Groq)
├── api/
│   └── server.py          # FastAPI + SSE stream
└── static/
    └── index.html         # Real-time dashboard

Setup & Run

Prerequisites

Python 3.11+
Docker (for Endee Vector DB)
Groq API key (free at console.groq.com/keys)

1. Clone & Install

git clone https://github.com/codeRisshi25/echo-ops.git
cd echo-ops

python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

2. Configure

cp .env.example .env
# Edit .env and set GROQ_API_KEY

3. Start Endee

docker compose up -d
# Verify it's running:
curl http://localhost:8080/api/v1/index/list

4. Run Echo-Ops

# Demo mode: automatically injects an anomaly at t=25s
python main.py --demo

# Open dashboard
open http://localhost:8000

What you'll see

t=0s — "Building healthy baseline (300 vectors)" — Endee index populated
t=10s — Dashboard goes live, healthy log stream visible, drift score near 0
t=25s — Anomaly injected (checkout retry storm)
t=30s — Drift score spikes above 0.30, agent wakes up
t=35s — LLM calls tools (commits, DB queries, resources)
t=40s — Root Cause Report appears in dashboard: "DB index change in checkout service"

Tech Stack

Component	Technology	Cost
Vector DB	Endee (C++ HNSW)	Free, self-hosted
Embedding	fastembed + BAAI/bge-small-en-v1.5 (ONNX)	Free, local
LLM / Agent	Any Groq-compatible model	Free tier available
API Server	FastAPI + uvicorn	Open source
Dashboard	Vanilla HTML/CSS/JS	—

Author

Risshi Raj Sen — github.com/codeRisshi25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Echo-Ops ⚡

The Problem

The Solution: Semantic Drift Detection

Architecture

How Endee Is Used

Embedding Cache

Agentic ReAct Loop

Project Structure

Setup & Run

Prerequisites

1. Clone & Install

2. Configure

3. Start Endee

4. Run Echo-Ops

What you'll see

Tech Stack

Author

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
agent		agent
api		api
ingestion		ingestion
learn		learn
static		static
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
config.py		config.py
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Echo-Ops ⚡

The Problem

The Solution: Semantic Drift Detection

Architecture

How Endee Is Used

Embedding Cache

Agentic ReAct Loop

Project Structure

Setup & Run

Prerequisites

1. Clone & Install

2. Configure

3. Start Endee

4. Run Echo-Ops

What you'll see

Tech Stack

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages