Document Chatbot

A Python-based document chatbot that uses vector embeddings and retrieval-augmented generation (RAG) to answer questions about your documents. Built with LangChain, HuggingFace embeddings, and Chroma vector database.

Features

Document Processing: Load and process Markdown and PDF documents
Vector Embeddings: Uses HuggingFace sentence-transformers for free, high-quality embeddings
Semantic Search: Find relevant document chunks using similarity search
Question Answering: Generate contextual answers based on retrieved document content
Local Processing: No API costs for embeddings (HuggingFace models run locally)
Flexible LLM Support: Compatible with Groq API for free and fast generation

Project Structure

Document Chatbot/ ├── createDatabase.py # Document processing and vector database creation ├── query.py # Query interface for asking questions ├── requirements.txt # Python dependencies ├── .env # Environment variables (API keys) ├── .gitignore # Git ignore rules ├── data/ # Document storage directory │ └── *.md # Markdown documents └── chroma/ # Vector database (auto-generated)

Installation

1. Clone and Setup

git clone cd "Document Chatbot" python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate

2. Install Dependencies

pip install -r requirements.txt pip install "unstructured[md]"

3. Environment Setup

Create a .env file in the project root:

GROQ_API_KEY=your_groq_api_key_here

Usage

Step 1: Prepare Your Documents

Place your documents in the data/ directory:

Supported formats: Markdown (.md), PDF (.pdf)
The system will automatically create the directory if it doesn't exist

Step 2: Create Vector Database

Process your documents and create the vector database:

python createDatabase.py

Step 3: Query Your Documents

Ask questions about your documents:

python query.py "What is the main character's name?" python query.py "How does Alice meet the Mad Hatter?" --threshold 0.4

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.gitignore		.gitignore
README.md		README.md
createDatabase.py		createDatabase.py
query.py		query.py
test_createDatabase.py		test_createDatabase.py
test_query.py		test_query.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document Chatbot

Features

Project Structure

Installation

1. Clone and Setup

2. Install Dependencies

3. Environment Setup

Usage

Step 1: Prepare Your Documents

Step 2: Create Vector Database

Step 3: Query Your Documents

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Document Chatbot

Features

Project Structure

Installation

1. Clone and Setup

2. Install Dependencies

3. Environment Setup

Usage

Step 1: Prepare Your Documents

Step 2: Create Vector Database

Step 3: Query Your Documents

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages