A research project for evaluating Large Language Models (LLMs) on medical image classification tasks across three datasets: AFB (Acid-Fast Bacilli), BreakHis (Breast Cancer Histopathological), and CRIC (Cervical Cancer Screening).
This project provides a unified framework for:
- Running LLM-based classification experiments on medical imaging datasets
- Fine-tuning models for improved performance
- Comprehensive evaluation with multiple metrics
- Support for both binary and multiclass classification tasks
- AFB (Acid-Fast Bacilli): Binary classification for tuberculosis detection
- BreakHis: Breast cancer histopathological image classification (binary and 8-class)
- CRIC: Cervical cancer screening (binary and 6-class classification)
- Clone the repository:
git clone <repository-url>
cd LLM-TB-Clean- Install dependencies:
pip install -r requirements.txt- Set up environment variables by creating a
.envfile:
# Data paths
BASE_DATA_DIR=path/to/your/data
# API Keys
OPENAI_API_KEY=your_openai_api_key
GEMINI_API_KEY=your_gemini_api_key
WANDB_API_KEY=your_wandb_api_keyRun a single experiment:
python run_dataset.py <dataset> <classification_type> [prompt_size]Examples:
python run_dataset.py afb binary medium
python run_dataset.py breakhis multi full
python run_dataset.py cric binary shortRun experiments across all sizes and variants:
python run_dataset.py afb binary --run_all_sizesFine-tune a model for better performance:
python finetuning.py <dataset> <classification_type> --prompt_size <size>Example:
python finetuning.py afb binary --prompt_size full --num_positive 25 --num_negative 25Generate sample visualizations:
python visualize_samples.pyEdit config.py to modify:
- Dataset paths and file locations
- Sample sizes per class
- Image file extensions
- Prompt file mappings
Edit run_dataset.py to modify:
- Default model list
- Model provider configurations
- Add your fine-tuned model IDs
├── config.py # Central configuration
├── run_dataset.py # Main experiment runner
├── finetuning.py # Model fine-tuning
├── visualize_samples.py # Sample visualization
├── test_finetuning.py # Testing utilities
├── utils/
│ ├── data_utils.py # Data processing utilities
│ ├── dataset_utils.py # Dataset-specific functions
│ ├── experiment_utils.py # Experiment execution
│ ├── llm_utils.py # LLM interaction utilities
│ ├── retry_wrapper.py # Retry logic for API calls
│ └── wandb_utils.py # Weights & Biases integration
├── prompts/
│ ├── afb/ # AFB prompts and variants
│ ├── breakhis/ # BreakHis prompts and variants
│ └── cric/ # CRIC prompts and variants
├── requirements.txt # Python dependencies
└── README.md # This file