-
Create and activate virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install system dependencies (macOS):
brew install libmagic
-
Install Python dependencies:
pip install -r requirements.txt
-
Set up environment variables:
cp env_example.txt .env # Edit .env with your API keys -
Test the setup:
python test_setup.py
-
Run the tool:
python main.py --info python main.py sample_file.pdf
- Go to OpenAI API
- Create an account and generate an API key
- Add to
.envfile:OPENAI_API_KEY=your_key_here
- Go to Google AI Studio
- Create a project and generate an API key
- Add to
.envfile:GOOGLE_API_KEY=your_key_here
# Required for file type detection
brew install libmagic
# Optional: For video/audio processing (if you encounter issues)
brew install ffmpeg# Required for file type detection
sudo apt-get install libmagic1
# Optional: For video/audio processing
sudo apt-get install ffmpeg# Install libmagic through conda or pip
pip install python-magic-bin
# FFmpeg can be installed from https://ffmpeg.org/download.html-
"No module named 'pyaudioop'" warning
- This is a Python 3.13+ compatibility issue with some audio libraries
- The tool will still work for document processing
- Audio/video features may have limited functionality
- Consider using Python 3.11 or 3.12 if you need full multimedia support
-
"MoviePy not available" warning
- This affects video processing capabilities
- Document processing (PDF, Word, Excel) still works perfectly
- For video support, try:
pip install --force-reinstall moviepy
-
"failed to find libmagic" error
- Install libmagic system library:
brew install libmagic(macOS) - On Linux:
sudo apt-get install libmagic1 - On Windows:
pip install python-magic-bin
- Install libmagic system library:
-
API errors
- Verify your API keys are correct in the
.envfile - Check that your OpenAI/Google accounts have sufficient credits
- Ensure the
.envfile is in the project root directory
- Verify your API keys are correct in the
Test only OpenAI processor (documents):
python -c "from src.file_processors.openai_processor import OpenAIProcessor; print('OpenAI OK')"Test only Gemini processor (multimedia):
python -c "from src.file_processors.gemini_processor import GeminiProcessor; print('Gemini OK')"If you only need document processing (PDF, Word, Excel), you can use this minimal requirements.txt:
openai>=1.3.0
PyPDF2>=3.0.1
python-docx>=0.8.11
openpyxl>=3.1.2
xlrd>=2.0.1
python-magic>=0.4.27
python-dotenv>=1.0.0
For multimedia support, add:
google-generativeai>=0.3.0
moviepy>=1.0.3
pydub>=0.25.1
✅ Document Processing (OpenAI)
- PDF files (.pdf)
- Text files (.txt)
- Word documents (.doc, .docx)
- Excel spreadsheets (.xls, .xlsx)
✅ Multimedia Processing (Gemini)
- Video files (.mp4, .avi, .mov, .mkv) - with graceful degradation
- Audio files (.mp3, .wav, .m4a, .webm, .ogg) - with graceful degradation
✅ Core Features
- Parallel processing
- Detailed error reporting
- CLI interface
- Programmatic API
- File validation
- Progress tracking
The tool is fully functional for document processing and will gracefully handle multimedia files even if some dependencies have issues.