RAG vs CAG: A Cost-Optimization Case Study

This repository presents a comparative analysis between Retrieval-Augmented Generation (RAG) and Cache-Augmented Generation (CAG) architectures, focusing on API cost optimization and local deployment capabilities. The paper inspires this implementation Don't Do RAG by Chan et al.

Project Overview

This case study explores:

Cost Reduction: Comparing API costs between RAG and CAG approaches
Local Deployment: CPU-optimized implementation using Ollama
Multi-Model Analysis: Implementation across different models and platforms

Implemented Models

Google Gemini: For cloud-based API implementation
Llama 3.2 3B (via Ollama): For local CPU-optimized deployment

Key Features

CAG Implementation:
- Efficient caching mechanism
- Reduced API calls
- Cost-effective knowledge retrieval
Local Optimization:
- CPU-optimized Ollama integration
- Efficient batch processing
- Thread optimization
- Memory-efficient operations

Installation

Install Dependencies:

pip install -r requirements.txt

Install Ollama (for localally running Model):

# For Windows, download from https://ollama.ai/download
# For Linux:
curl https://ollama.ai/install.sh | sh

Setup Environment:

cp .env.template .env
# Add your API keys

Pull Llama Model:

ollama pull llama3.2

Usage

CAG with Ollama

python kvcache_ollama.py --dataset hotpotqa-train \
                        --similarity bertscore \
                        --maxKnowledge 27 \
                        --maxQuestion 27 \
                        --model "llama3.2" \
                        --output "./results/cag_ollama_results.txt"

RAG with Gemini

python rag_gemini.py --dataset hotpotqa-train \
                     --similarity bertscore \
                     --maxKnowledge 120 \
                     --maxQuestion 120 \
                     --output "./results/rag_gemini_results.txt"

Results

Results from different implementations can be found in:

CAG Results: CAG/New_Results/CAG/
RAG Results: CAG/New_Results/RAG/

Project Structure

CAG/
├── kvcache_gemini.py    # CAG implementation with Gemini
├── kvcache_ollama.py    # CAG implementation with Ollama
├── rag_gemini.py        # RAG implementation with Gemini
├── rag_groq.py         # RAG implementation with Groq
├── rag_ollama.py       # RAG implementation with Ollama
└── New_Results/        # Performance results and comparisons

Acknowledgments

This implementation builds upon the work presented in:

@misc{chan2024dontragcacheaugmentedgeneration,
      title={Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks}, 
      author={Brian J Chan and Chao-Ting Chen and Jui-Hung Cheng and Hen-Hsen Huang},
      year={2024},
      eprint={2412.15605},
      archivePrefix={arXiv}
}

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
New_Results		New_Results
bm25_retriever		bm25_retriever
data_cache		data_cache
datasets		datasets
results		results
scripts		scripts
.env.template		.env.template
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
downloads.sh		downloads.sh
kvcache_gemini.py		kvcache_gemini.py
kvcache_ollama.py		kvcache_ollama.py
pdm.lock		pdm.lock
pyproject.toml		pyproject.toml
rag_gemini.py		rag_gemini.py
rag_groq.py		rag_groq.py
rag_ollama.py		rag_ollama.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG vs CAG: A Cost-Optimization Case Study

Project Overview

Implemented Models

Key Features

Installation

Usage

CAG with Ollama

RAG with Gemini

Results

Project Structure

Acknowledgments

License

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG vs CAG: A Cost-Optimization Case Study

Project Overview

Implemented Models

Key Features

Installation

Usage

CAG with Ollama

RAG with Gemini

Results

Project Structure

Acknowledgments

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages