A Windows-optimized version of JoyCaption that eliminates Linux-specific dependencies and provides a streamlined setup experience for Windows users.
JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models. The model combines Meta-Llama-3.1-8B with Google's SigLIP vision encoder to provide high-quality image descriptions perfect for AI art generation and dataset preparation.
- Windows Native: Removes
liger_kerneldependency that requirestriton(Linux-only) - One-Click Setup: Automated installation with batch files
- Portable Installation: Self-contained Conda environment in
installedfiles/miniconda - Memory Optimization: Multiple VRAM options (NF4, 8-bit, BF16)
- Easy Relocation: Simple environment reconfiguration when moved
- High-Quality Captions: Leverages Meta-Llama-3.1-8B for natural, detailed descriptions
- Gradio Interface: User-friendly web interface for image captioning
- Batch Processing Ready: Can be extended for batch image captioning workflows
The original JoyCaption gradio app (app.py) requires liger_kernel, which depends on triton - a library that only works on Linux systems. This creates a significant barrier for Windows users who want to run JoyCaption locally.
Original Implementation Issues:
- Requires
liger_kernelfor memory optimization liger_kerneldepends ontriton(Linux/CUDA specific)- Complex setup process for Windows users
- Potential compatibility issues with Windows CUDA installations
Our Windows-Optimized Solution (ImageCaption.py):
- Eliminates Linux dependencies: No more
liger_kernelortritonrequirements - Maintains full functionality: All JoyCaption model capabilities preserved
- Improves Windows compatibility: Native Windows operation without WSL or Linux subsystems
- Streamlined installation: Automated setup with batch files
- Memory efficiency: Alternative quantization methods for different VRAM levels
- Operating System: Windows 10/11 (64-bit)
- GPU: NVIDIA GPU with CUDA support (recommended for optimal performance)
- VRAM: Minimum 8GB (NF4 quantization), 12GB+ (8-bit), 16GB+ (BF16)
- Storage: 20GB+ free disk space for model files and dependencies
- Internet: Required for initial model download and setup
GetConda.batThis downloads and installs Miniconda to installer_files\Miniconda.
SetEnv.batConfigures the Conda environment paths. Run this again if you move the folder to a new location.
InstallRequirements.batCreates the Python environment and installs all required packages.
StartTextCaptioner.batLaunches the JoyCaption interface.
For advanced users who need to perform manual operations:
Cmd.batOpens a preconfigured command prompt with all necessary Conda paths set.
The application automatically selects the best quantization based on your available VRAM:
| VRAM | Quantization | Description |
|---|---|---|
| 8GB+ | NF4 | 4-bit quantization for low VRAM |
| 12GB+ | 8-bit | 8-bit quantization for medium VRAM |
| 16GB+ | BF16 | Brain Float 16 for high VRAM |
Joycaption/
├── ImageCaption.py # Main application (Windows optimized)
├── GetConda.bat # Download Conda installer
├── SetEnv.bat # Set environment variables
├── InstallRequirements.bat # Install requirements
├── StartTextCaptioner.bat # Start the application
├── Cmd.bat # Manual command prompt
├── requirements.txt # Python dependencies
└── installer_files/ # Conda installation directory
└── Miniconda/ # Miniconda installation
└── pkgs # All Miniconda Base Packages
└── Environments # All Miniconda Envs
If you need to move the entire folder to a different location:
- Move the complete folder to the new location
- Run
SetEnv.batto reconfigure the environment paths - Continue using the application normally
Environment not found after moving folder:
- Solution: Run
SetEnv.batto reconfigure paths
CUDA out of memory:
- Lower the quantization (use NF4 for lower VRAM)
- Close other GPU-intensive applications
Import errors:
- Ensure all requirements are installed: run
InstallRequirements.bat - Check that CUDA is properly installed
If you encounter issues, you can manually reset the environment:
- Delete the
installer_filesfolder - Run
GetConda.bat - Run
SetEnv.bat - Run
InstallRequirements.bat
This project is licensed under the Apache License 2.0 - the same license as the original JoyCaption project. See the LICENSE file for details.
- Original JoyCaption: Created by fpgaminer - A groundbreaking open-source VLM for image captioning
- Built upon: The gradio-app implementation from the original repository
- Model Components:
- Meta-Llama-3.1-8B language model
- Google SigLIP vision encoder (siglip-so400m-patch14-384)
- Community: Thanks to the AI/ML community for supporting open-source VLM development
- Windows Optimization: Independently developed for improved Windows compatibility
Contributions are welcome! Please feel free to submit a Pull Request. Areas where contributions would be particularly helpful:
- Performance optimizations
- Additional quantization options
- UI improvements
- Documentation enhancements
Note: This is an independent optimization of the original JoyCaption project, focused specifically on improving Windows compatibility and ease of installation.