Multimodal Enterprise RAG

A hybrid Retrieval-Augmented Generation (RAG) system designed to support enterprise-level question answering across multiple modalities — including text, images, and audio.

Setup

1. Environment Setup

# Install dependencies
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2. Required System Tools

brew install tesseract  # for OCR
brew install ffmpeg     # for audio/video processing

3. Neo4j Setup

Download and start a Neo4j instance locally

Set these environment variables

NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_password

4. Qdrant Setup (Vector Database)

Start Qdrant using Docker:

# Start Qdrant container
docker run -d -p 6333:6333 -p 6334:6334 -v $(pwd)/qdrant_storage:/qdrant/storage qdrant/qdrant

The container will be accessible at:
- HTTP API: http://localhost:6333
- gRPC: localhost:6334
To stop the container:
```
docker stop <container_name>
```
To start it again:
```
docker start <container_name>
```

5. Together.ai LLM Setup

Add your API key in .env:
```
TOGETHER_API_KEY=your_key_here
```

Pushing Data

To ingest data into the system, place your files into the data/ directory and run the hybrid storage script:

# Push data to all storage systems (Qdrant, Neo4j, and Whoosh)
PYTHONPATH=src python src/scripts/push_to_hybrid_storage.py <file_path>

The script supports multiple file types:

Text files (.txt, .md, .pdf, .html)
Image files (.jpg, .jpeg, .png, .gif, .bmp)
Audio files (.mp3, .wav, .m4a, .ogg)

Running Tests

Run all tests

PYTHONPATH=src python -m pytest tests/

Run specific test ex:

PYTHONPATH=src python -m pytest tests/test_text_ingestor.py

Running the Pipeline

PYTHONPATH=src python src/crew_pipeline/main_pipeline.py

What works

multimodal ingestion
Hybrid Storage and retrieval
Entity and Relationship extraction
CrewAI agent pipeline

To Improve

Answer hallucinations in non lookup queries
eval log output for each query
Stronger entity relationships
Better Documentation
Query Type handling with specific retrieval techniques

Notes

Tested on Python 3.11.7
Used:
- CrewAI
- Qdrant
- Neo4j
- Together.ai

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.deepeval		.deepeval
data		data
demo		demo
src		src
tests		tests
.env		.env
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Enterprise RAG

Setup

1. Environment Setup

2. Required System Tools

3. Neo4j Setup

4. Qdrant Setup (Vector Database)

5. Together.ai LLM Setup

Pushing Data

Running Tests

Run all tests

Run specific test ex:

Running the Pipeline

What works

To Improve

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multimodal Enterprise RAG

Setup

1. Environment Setup

2. Required System Tools

3. Neo4j Setup

4. Qdrant Setup (Vector Database)

5. Together.ai LLM Setup

Pushing Data

Running Tests

Run all tests

Run specific test ex:

Running the Pipeline

What works

To Improve

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages