📄 Multi-Format Q&A with Sarvam AI

An interactive Streamlit application that enables intelligent question-answering over multiple document formats using Sarvam AI's powerful language models. Upload your documents (PDF, Word, Text, CSV, Excel, PPT, etc.), ask questions, and get accurate answers based on the document content.

💡 Note

This application requires a Sarvam AI API key. You can get 1000 free credits by signing up at the Sarvam AI Dashboard.

✨ Features

Multi-Format Document Processing: Upload and process multiple document types (PDF, DOCX, TXT, MD, CSV, XLSX, PPTX)
Intelligent Q&A: Ask questions about your documents and get contextual answers
Sarvam AI Integration: Leverages Sarvam AI's language models for high-quality responses
Vector Search: Uses embeddings for semantic search and relevant content retrieval
Customizable Settings:
- Adjust context window size
- Control response token length
- Modify chunk size for document processing
- Configure retrieval parameters (Top K)
- Set temperature for response creativity
Source Tracking: View source documents and relevant passages
System Prompt Customization: Define assistant behavior and response guidelines

🚀 Live

Use here

📋 Prerequisites

Python 3.8 or higher
Sarvam AI API key (Get your free credits here)
Internet connection for API access

🛠️ Installation

Clone the repository

git clone https://github.com/yourusername/pdf-qa-sarvam.git
cd pdf-qa-sarvam

Create and activate a virtual environment (optional but recommended)

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Dependencies The application requires the following packages:

streamlit
llama-index
llama-index-embeddings-fastembed
requests
PyPDF2
python-docx
pandas
openpyxl
python-pptx

You can install them directly:

pip install streamlit llama-index llama-index-embeddings-fastembed requests PyPDF2 python-docx pandas openpyxl python-pptx

Usage Start the Streamlit application

streamlit run app.py

Configure the application (sidebar)

Enter your Sarvam AI API key
Optionally modify the base URL (default: https://api.sarvam.ai)
Adjust model settings as needed
Customize the system prompt for assistant behavior
Upload Documents
Click "Browse files" to select document documents
Multiple files can be uploaded simultaneously
Process Documents
Click "Process Documents" to index your documents
Wait for the processing to complete
Ask Questions
Enter your question in the text input field
Click "Get Answer" to receive a response
View source documents in the expandable section

⚙️ Configuration Options

Model Settings

Context Window Size: 1024-8192 tokens (default: 4500)
Max Response Tokens: 64-2048 tokens (default: 512)
Chunk Size: 256-4096 tokens (default: 1024)

Advanced Options

Temperature: 0.0-1.0 (default: 0.1) - Controls response randomness
Top K Retrieval: 1-10 (default: 3) - Number of document chunks to retrieve

System Prompt Customize the assistant's behavior and response guidelines. Default prompt:

You are a helpful Q&A assistant. Answer questions based only on the provided documents. 
If the answer is not in the documents, say "I cannot find this information in the provided documents."
Provide clear, concise answers with relevant details from the documents.

🏗️ Architecture

The application is built using:

Frontend: Streamlit for interactive UI
Document Processing: LlamaIndex for document indexing and retrieval
Embeddings: FastEmbed with BAAI/bge-small-en-v1.5
Language Model: Custom SarvamAI LLM wrapper for Sarvam AI API
Vector Storage: In-memory vector store

Workflow

Step	Description
Upload	Document files are uploaded and temporarily stored
Processing	Documents are chunked and embedded using FastEmbed (format-specific readers used when available)
Indexing	VectorStoreIndex creates searchable embeddings
Querying	User questions trigger semantic search and LLM response generation
Response	Answers with source references are displayed

🔧 Troubleshooting

Common Issues

API Connection Error

Verify your API key is correct
Check internet connection
Ensure the base URL is correct (default: https://api.sarvam.ai)

Document Processing Failed

Ensure document files are not corrupted
Check that the file size is reasonable (< 10MB recommended)
Verify chunk size isn't too large for your system memory
Ensure required format-specific libraries are installed (python-docx, pandas, openpyxl, python-pptx)

Slow Responses

Reduce chunk size for faster processing
Lower the Top K retrieval value
Decrease max response tokens

Memory Issues

Process documents in smaller batches
Reduce chunk size
Clear session and restart the app

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

# Fork the repository
# Create your feature branch
git checkout -b feature/AmazingFeature

# Commit your changes
git commit -m 'Add some AmazingFeature'

# Push to the branch
git push origin feature/AmazingFeature

# Open a Pull Request

🙏 Acknowledgments

Sarvam AI — for providing the language model API
LlamaIndex — for document indexing and retrieval framework
Streamlit — for the interactive web interface
FastEmbed — for efficient embeddings

📧 Contact

Manish Tiwari

🐦 Twitter: @compmanish
📧 Email: Mail
🔗 Project Link: https://github.com/predictivemanish/pdf-qa-sarvam

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.devcontainer		.devcontainer
data		data
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 Multi-Format Q&A with Sarvam AI

💡 Note

✨ Features

🚀 Live

📋 Prerequisites

🛠️ Installation

Configure the application (sidebar)

⚙️ Configuration Options

🏗️ Architecture

Workflow

🔧 Troubleshooting

Common Issues

🤝 Contributing

🙏 Acknowledgments

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📄 Multi-Format Q&A with Sarvam AI

💡 Note

✨ Features

🚀 Live

📋 Prerequisites

🛠️ Installation

Configure the application (sidebar)

⚙️ Configuration Options

🏗️ Architecture

Workflow

🔧 Troubleshooting

Common Issues

🤝 Contributing

🙏 Acknowledgments

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages