Skip to content

calebheinzman/LMMs-for-Histopathology

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM-Based Medical Image Classification

A research project for evaluating Large Language Models (LLMs) on medical image classification tasks across three datasets: AFB (Acid-Fast Bacilli), BreakHis (Breast Cancer Histopathological), and CRIC (Cervical Cancer Screening).

Overview

This project provides a unified framework for:

  • Running LLM-based classification experiments on medical imaging datasets
  • Fine-tuning models for improved performance
  • Comprehensive evaluation with multiple metrics
  • Support for both binary and multiclass classification tasks

Datasets Supported

  1. AFB (Acid-Fast Bacilli): Binary classification for tuberculosis detection
  2. BreakHis: Breast cancer histopathological image classification (binary and 8-class)
  3. CRIC: Cervical cancer screening (binary and 6-class classification)

Installation

  1. Clone the repository:
git clone <repository-url>
cd LLM-TB-Clean
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up environment variables by creating a .env file:
# Data paths
BASE_DATA_DIR=path/to/your/data

# API Keys
OPENAI_API_KEY=your_openai_api_key
GEMINI_API_KEY=your_gemini_api_key
WANDB_API_KEY=your_wandb_api_key

Usage

Basic Experiment

Run a single experiment:

python run_dataset.py <dataset> <classification_type> [prompt_size]

Examples:

python run_dataset.py afb binary medium
python run_dataset.py breakhis multi full
python run_dataset.py cric binary short

Comprehensive Evaluation

Run experiments across all sizes and variants:

python run_dataset.py afb binary --run_all_sizes

Fine-tuning

Fine-tune a model for better performance:

python finetuning.py <dataset> <classification_type> --prompt_size <size>

Example:

python finetuning.py afb binary --prompt_size full --num_positive 25 --num_negative 25

Visualization

Generate sample visualizations:

python visualize_samples.py

Configuration

Dataset Configuration

Edit config.py to modify:

  • Dataset paths and file locations
  • Sample sizes per class
  • Image file extensions
  • Prompt file mappings

Model Configuration

Edit run_dataset.py to modify:

  • Default model list
  • Model provider configurations
  • Add your fine-tuned model IDs

Project Structure

├── config.py                 # Central configuration
├── run_dataset.py           # Main experiment runner
├── finetuning.py            # Model fine-tuning
├── visualize_samples.py     # Sample visualization
├── test_finetuning.py       # Testing utilities
├── utils/
│   ├── data_utils.py        # Data processing utilities
│   ├── dataset_utils.py     # Dataset-specific functions
│   ├── experiment_utils.py  # Experiment execution
│   ├── llm_utils.py         # LLM interaction utilities
│   ├── retry_wrapper.py     # Retry logic for API calls
│   └── wandb_utils.py       # Weights & Biases integration
├── prompts/
│   ├── afb/                 # AFB prompts and variants
│   ├── breakhis/            # BreakHis prompts and variants
│   └── cric/                # CRIC prompts and variants
├── requirements.txt         # Python dependencies
└── README.md               # This file

About

Large Multimodal Models for Histopathology: Research project for pathology classification using AFB, BreakHis, and CRIC datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages