Welcome to Augmented Generation with LLMs, a curated collection of interactive Colab notebooks exploring different approaches to enhance Large Language Model (LLM) outputs through context, cache, and retrieval-based techniques. Built using LangChain, Ollama, Vector Databases, and more, this repo demonstrates powerful patterns to improve LLM performance and memory capabilities.
- Concept: Enhances LLM inference by reusing previously computed results with a smart cache layer.
- Tech Stack:
- Pickling for storing & retrieving cached outputs
- In-memory caching logic
- Minimal recomputation, blazing speed ⚡
- ✅ Ideal for repetitive or FAQ-style inputs.
- Concept: Augments responses using relevant context from previous interactions or documents.
- Tech Stack:
- Custom context memory
- LangChain’s prompt management
- Context-based generation pipeline
- 📚 Boosts response richness and continuity.
- Concept: Integrates Vector DBs and Embeddings to retrieve relevant chunks from external data for precise answering.
- Tech Stack:
- LangChain + FAISS/Chroma
- Embedding models via Ollama
- Retrieval-Augmented Generation (RAG) flow
- 🔎 Perfect for knowledge-based systems and document Q&A.
| Tech / Tool | Purpose |
|---|---|
| LLMs | Generative responses |
| LangChain | Chaining prompts, memory, and tools |
| Ollama | Lightweight local LLMs |
| Vector DB | Fast document retrieval |
| Embeddings | Semantic search capability |
| Pickling | Output caching |
| Cache Memory | Efficient reuse of responses |
| Jupyter Notebook | Interactive development |
💡 Tip: You can replace the above links with your actual GitHub asset paths or embed Colab previews using badges.
- Open any notebook in Jupyter Notebook or Google Colab
.
- Follow the instructions in each cell.
- Make sure you have the required models via Ollama and libraries installed (
langchain,faiss-cpu, etc.)
Questions or contributions? Open an issue or PR anytime.




