The RAG server provides Signal with long-term memory and context. It indexes career history, project data, and other documents so that every response from the system can be grounded in personalized, persistent information.
The RAG server powers Signal’s agentic retrieval pipeline. It:
- Processes Markdown and JSON-based career history, FAQs, and project files
- Splits documents into semantically meaningful chunks
- Stores embeddings in ChromaDB for vector search
- Uses prototype classification to adjust retrieval parameters (broad vs narrow intent)
- Reranks results with Cohere’s reranker API for true semantic relevance
- Applies a relevance cutoff to filter noise while preserving high-quality context
- Serves a simple REST API (
/query) for retrieval and context injection
This ensures responses aren’t generic AI text, but tied directly to my work, projects, and history.
The RAG server acts as Signal’s memory layer, feeding relevant context into the system so responses reflect my background.
- Runtime: Node.js
- Framework: Express
- Vector store: ChromaDB
- Embeddings: OpenAI text-embedding-3-large
- Reranker: Cohere Rerank v3.5
- Dev tooling: ESLint, Prettier, Husky, and shared configs via dev-config
Signal’s services can be run locally, but setup involves multiple moving parts.
For now, the best way to explore Signal is the live demo.
Future work may include a simplified docker-compose flow for local development.
The RAG server shows how I think about system decomposition, clarity of responsibilities, and production grade AI pipelines. By separating retrieval into its own service and making it agentic with classification, reranking, and cutoffs, the architecture stays modular and maintainable. This mirrors how I approach building scalable, team friendly systems that balance speed, clarity, and correctness.