Skip to content

devnovikov/pneumonia-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pneumonia Detection from Chest X-Ray Images

main.png A machine learning system for binary classification of chest X-ray images to detect pneumonia. This project includes comprehensive EDA, model training with hyperparameter tuning, FastAPI deployment, and Docker containerization.

DISCLAIMER: This model is for educational purposes only and is NOT suitable for clinical diagnosis. Medical imaging interpretation requires trained professionals. This model should not be used as a substitute for professional medical advice, diagnosis, or treatment.

Problem Description

Pneumonia is an infection that inflames the air sacs in one or both lungs. Early and accurate detection is crucial for effective treatment. This project aims to assist in the classification of chest X-ray images into two categories:

  • NORMAL: Healthy lung X-ray with no signs of pneumonia
  • PNEUMONIA: X-ray showing signs of pneumonia infection

The system achieves this through deep learning models trained on the Kaggle Chest X-Ray Pneumonia dataset.

Dataset

Source: Kaggle Chest X-Ray Images (Pneumonia)

Structure:

dataset/
├── train/
│   ├── NORMAL/     (~1,341 images)
│   └── PNEUMONIA/  (~3,875 images)
├── val/
│   ├── NORMAL/     (~8 images)
│   └── PNEUMONIA/  (~8 images)
└── test/
    ├── NORMAL/     (~234 images)
    └── PNEUMONIA/  (~390 images)

Note: The dataset exhibits class imbalance (~1:3 NORMAL:PNEUMONIA ratio), which is addressed using weighted BCE loss during training.

Exploratory Data Analysis

This section presents a comprehensive analysis of the chest X-ray dataset, including distribution statistics, image properties, and quality assessment. All visualizations were generated in the Jupyter notebook (notebooks/notebook.ipynb).

Key Findings Summary

Metric Value
Total Images 5,856
Training Set 5,216 images
Validation Set 16 images
Test Set 624 images
Class Ratio (NORMAL:PNEUMONIA) 1:2.89
Corrupted Files 0
Image Format JPEG (stored as RGB, grayscale content)

Dataset Structure

The dataset is divided into three splits: training, validation, and test. The bar chart below shows the distribution of images across splits and classes.

Dataset Structure

Observations:

  • The training set contains the majority of images (5,216 total)
  • Training split: 1,341 NORMAL + 3,875 PNEUMONIA images
  • Validation set is notably small (only 16 images total)
  • Test set provides 624 images for final evaluation (234 NORMAL + 390 PNEUMONIA)

Class Distribution

Understanding class imbalance is critical for training robust models. The visualizations below show the significant imbalance between NORMAL and PNEUMONIA classes.

Class Distribution

Key Statistics:

  • Class Imbalance Ratio: 1:2.89 (NORMAL:PNEUMONIA)
  • NORMAL class: ~26% of dataset
  • PNEUMONIA class: ~74% of dataset

Mitigation Strategy: Weighted Binary Cross-Entropy loss is used during training to compensate for class imbalance. The weight is calculated as: weight = num_normal / num_pneumonia


Image Dimensions

Chest X-ray images in the dataset have variable dimensions. Understanding the distribution helps inform preprocessing decisions.

Image Dimensions

Analysis:

  • Width Range: Varies significantly across images
  • Height Range: Shows similar variability
  • Aspect Ratios: Most images are roughly square, but variations exist
  • Scatter Plot: Shows the relationship between width and height, revealing common dimension clusters

Preprocessing Decision: All images are resized to 224x224 pixels for model input, maintaining consistency with ImageNet pretrained models (MobileNetV2, ResNet18).


Color Mode Analysis

Medical X-ray images are inherently grayscale, but storage format may vary.

Color Mode

Findings:

  • Images are stored as RGB format (3 channels)
  • Actual content is grayscale (all three channels contain identical values)
  • No conversion needed during preprocessing; PyTorch models expect 3-channel input

Sample X-Ray Images

Visual inspection of sample images from both classes helps understand the classification task.

Sample X-Rays

Visual Observations:

  • Top Row (NORMAL): Clear lung fields with visible rib structures, no opacity
  • Bottom Row (PNEUMONIA): Visible infiltrates, consolidation, or opacity in lung regions
  • Image quality and positioning vary across samples
  • Some pneumonia cases show subtle signs, demonstrating task difficulty

Outlier Detection

Box plots reveal the distribution of image dimensions and identify potential outliers.

Outlier Box Plots

Analysis:

  • Width and height distributions show the presence of outliers (images significantly larger or smaller than typical)
  • Most images fall within a reasonable range for medical imaging
  • Outliers are handled gracefully by the resizing transform during preprocessing

Data Quality Assessment

Check Status Notes
Corrupted Files None Found All images load successfully
Missing Labels None Directory structure provides labels
Duplicate Images Not Detected Based on file names
Format Consistency Consistent All JPEG format

Preprocessing Pipeline

Based on the EDA findings, the following preprocessing steps are applied:

  1. Resize: All images to 224x224 pixels
  2. Normalization: ImageNet statistics (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
  3. Data Augmentation (training only):
    • Random horizontal flip
    • Random rotation (up to 10 degrees)
    • Random affine transforms
    • Color jitter for brightness/contrast variation

Training Results

This section shows the model training process and final performance metrics. Based on the EDA findings (class imbalance, image properties), we applied weighted BCE loss and transfer learning.

Training Curves

The training curves below show the model's learning progress over epochs, including loss convergence and metric improvements.

Training Curves

Observations:

  • Training and validation loss decrease steadily, indicating good convergence
  • F1 score improves consistently across epochs
  • AUC-ROC shows strong discriminative ability from early epochs
  • No significant overfitting observed (validation metrics track training closely)

Final Test Metrics

After training, the best model (MobileNetV2) was evaluated on the held-out test set of 624 images.

Results Metrics

Metric Value
Test Accuracy 90.38%
Test Precision 89.66%
Test Recall 95.64%
Test F1 Score 92.56%
Test AUC-ROC 96.19%

Key Insights:

  • High Recall (95.64%): The model correctly identifies 95.64% of pneumonia cases, which is critical for medical screening where missing a positive case is costly
  • Balanced Precision (89.66%): While prioritizing recall, the model maintains good precision to minimize false alarms
  • Strong AUC (96.19%): Excellent discriminative ability across all classification thresholds

Model Architecture Comparison

We trained and compared three model architectures to understand the trade-offs between model complexity, training time, and performance.

Model Comparison

Model Parameters Size Accuracy Precision Recall F1 AUC
SimpleCNN 390K 1.49 MB 84.29% 88.42% 86.15% 87.27% 91.28%
MobileNetV2 2.2M 8.6 MB 90.38% 89.66% 95.64% 92.56% 96.19%
ResNet18 11.2M 42.67 MB 80.77% 76.68% 99.49% 86.61% 95.69%

Key Findings:

  • MobileNetV2 provides the best balance of performance, model size, and generalization
  • SimpleCNN achieves competitive results with only 390K parameters (smallest model)
  • ResNet18 has the highest recall (99.49%) but lower precision, indicating some overfitting

Model Explainability (Grad-CAM)

Understanding where the model "looks" when making predictions is crucial for medical AI applications. We use Gradient-weighted Class Activation Mapping (Grad-CAM) to visualize which regions of the X-ray images most influence the model's decisions.

Why Explainability Matters

  1. Trust: Clinicians need to verify the model focuses on medically relevant areas
  2. Debugging: Identify if the model learns spurious correlations (e.g., image artifacts)
  3. Education: Help understand what distinguishes normal from pneumonia X-rays

NORMAL Class Attention

Grad-CAM visualizations for correctly classified NORMAL (healthy) X-rays:

Grad-CAM Normal

Observations:

  • Model attention is distributed across both lung fields
  • Focus areas include the clear lung parenchyma regions
  • Absence of concentrated hot spots in any particular region

PNEUMONIA Class Attention

Grad-CAM visualizations for correctly classified PNEUMONIA X-rays:

Grad-CAM Pneumonia

Observations:

  • Model attention concentrates on areas with infiltrates or consolidation
  • Hot spots align with visible opacity in the lung fields
  • Attention patterns differ from NORMAL cases, focusing on abnormal regions

Comparison Grid

Side-by-side comparison of NORMAL vs PNEUMONIA attention patterns:

Grad-CAM Comparison

Key Findings:

  • The model learns to focus on medically relevant regions (lung fields, not image edges or artifacts)
  • PNEUMONIA cases show concentrated attention on opacity/consolidation areas
  • NORMAL cases show more diffuse attention across clear lung tissue
  • Attention patterns are consistent with clinical interpretation of chest X-rays
  • No evidence of the model relying on spurious features (e.g., text labels, imaging equipment artifacts)

Installation

Prerequisites

  • Python 3.13+
  • uv package manager

Setup

  1. Clone the repository:
git clone https://github.com/yourusername/pneumonia-detection.git
cd pneumonia-detection
  1. Install dependencies:
uv sync
  1. Download the dataset from Kaggle and extract to dataset/ directory.

Usage

Exploratory Data Analysis

Open and run the Jupyter notebook for comprehensive EDA:

uv run jupyter notebook notebooks/notebook.ipynb

The notebook includes:

  • Dataset structure visualization
  • Class distribution analysis
  • Image dimension statistics
  • Color mode verification
  • Sample image grid
  • Outlier detection
  • Grad-CAM explainability (after training)

Model Training

Train a single model:

# SimpleCNN baseline
uv run python -m src.train.train --model SimpleCNN --epochs 20 --lr 0.001

# MobileNetV2 transfer learning
uv run python -m src.train.train --model MobileNetV2 --epochs 20 --lr 0.001

# ResNet18 transfer learning
uv run python -m src.train.train --model ResNet18 --epochs 20 --lr 0.0005

View all training options:

uv run python -m src.train.train --help

Hyperparameter Sweep

Run automated hyperparameter tuning:

# Run all 10 default configurations
uv run python -m src.train.sweep --epochs 10

# Run subset of configurations
uv run python -m src.train.sweep --runs 3 --epochs 5

Results are saved to models/experiments.csv and the best model is copied to models/best_model.pth.

Training CLI Options

Argument Default Description
--model SimpleCNN Model architecture (SimpleCNN, MobileNetV2, ResNet18)
--lr 0.001 Learning rate
--weight-decay 0.0001 L2 regularization
--dropout 0.3 Dropout probability
--epochs 20 Training epochs
--batch-size 32 Batch size
--image-size 224 Input image size
--augmentation light Augmentation tier (none, light, heavy)
--data-dir dataset Dataset directory
--output-dir models Output directory

API Service

Running Locally

Start the FastAPI server:

uv run uvicorn src.predict.predict:app --host 0.0.0.0 --port 8000

API Endpoints

Endpoint Method Description
/ GET API information
/healthz GET Health check
/predict POST Classify chest X-ray image
/docs GET OpenAPI documentation

Example Requests

Health check:

curl http://localhost:8000/healthz

Expected response:

{
  "status": "healthy",
  "model_loaded": true,
  "device": "mps"
}

Image prediction (NORMAL):

curl -X POST -F "file=@dataset/test/NORMAL/IM-0001-0001.jpeg" \
  http://localhost:8000/predict

Expected response:

{
  "prediction": "NORMAL",
  "confidence": 0.996,
  "inference_time_ms": 303.64,
  "model_path": "models/best_model.pth"
}

Image prediction (PNEUMONIA):

curl -X POST -F "file=@dataset/test/PNEUMONIA/person100_bacteria_475.jpeg" \
  http://localhost:8000/predict

Expected response:

{
  "prediction": "PNEUMONIA",
  "confidence": 0.9972,
  "inference_time_ms": 12.59,
  "model_path": "models/best_model.pth"
}

Docker Deployment

Building the Image

docker build -t pneumonia-api .

Running the Container

docker run -p 8000:8000 pneumonia-api

With custom model path:

docker run -p 8000:8000 \
  -v /path/to/models:/app/models \
  -e MODEL_PATH=models/custom_model.pth \
  pneumonia-api

Environment Variables

Variable Default Description
MODEL_PATH models/best_model.pth Path to model checkpoint
IMAGE_SIZE 224 Input image size
THRESHOLD 0.5 Classification threshold

Testing the Containerized API

# Health check
curl http://localhost:8000/healthz

Expected response:

{
  "status": "healthy",
  "model_loaded": true,
  "device": "cpu"
}
# Prediction (NORMAL image)
curl -X POST -F "file=@dataset/test/NORMAL/IM-0001-0001.jpeg" http://localhost:8000/predict

Expected response:

{
  "prediction": "NORMAL",
  "confidence": 0.996,
  "inference_time_ms": 61.1,
  "model_path": "models/best_model.pth"
}
# Prediction (PNEUMONIA image)
curl -X POST -F "file=@dataset/test/PNEUMONIA/person100_bacteria_475.jpeg" http://localhost:8000/predict

Expected response:

{
  "prediction": "PNEUMONIA",
  "confidence": 0.9854,
  "inference_time_ms": 10.31,
  "model_path": "models/best_model.pth"
}

Project Structure

pneumonia-detection/
├── dataset/                 # Dataset directory (not in repo)
│   ├── train/
│   ├── val/
│   └── test/
├── models/                  # Saved models and checkpoints
├── notebooks/
│   └── notebook.ipynb       # EDA and training notebook
├── screenshots/             # Visualizations from notebook
│   ├── 01_dataset_structure.png
│   ├── 02_class_distribution.png
│   ├── 03_image_dimensions.png
│   ├── 04_color_mode.png
│   ├── 05_sample_xrays.png
│   ├── 06_outlier_boxplots.png
│   ├── 07_training_curves.png
│   ├── 08_results_metrics.png
│   ├── 09_model_comparison.png
│   ├── 10_gradcam_normal.png
│   ├── 11_gradcam_pneumonia.png
│   └── 12_gradcam_comparison.png
├── scripts/
│   └── extract_notebook_images.py  # Extract images from notebook
├── src/
│   ├── train/               # Training modules
│   │   ├── config.py        # Configuration and reproducibility
│   │   ├── dataset.py       # Dataset and dataloaders
│   │   ├── models.py        # Model definitions
│   │   ├── transforms.py    # Image transforms
│   │   ├── metrics.py       # Evaluation metrics
│   │   ├── trainer.py       # Training loop
│   │   ├── train.py         # Training CLI
│   │   └── sweep.py         # Hyperparameter sweep
│   └── predict/             # Inference modules
│       ├── inference.py     # Inference logic
│       └── predict.py       # FastAPI application
├── Dockerfile               # Container definition
├── pyproject.toml           # Project dependencies
├── uv.lock                  # Locked dependencies
└── README.md

Model Architectures

SimpleCNN (Baseline)

Custom 4-layer CNN:

  • Conv2d(3→32) → BatchNorm → ReLU → MaxPool
  • Conv2d(32→64) → BatchNorm → ReLU → MaxPool
  • Conv2d(64→128) → BatchNorm → ReLU → MaxPool
  • Conv2d(128→256) → BatchNorm → ReLU → AdaptiveAvgPool
  • Flatten → Dropout → Linear(256→1)

Parameters: ~390K | Size: ~1.5 MB

MobileNetV2 (Transfer Learning)

  • Pretrained on ImageNet
  • Modified classifier: Dropout → Linear(1280→1)

Parameters: ~2.2M | Size: ~8.6 MB

ResNet18 (Transfer Learning)

  • Pretrained on ImageNet
  • Modified fc: Dropout → Linear(512→1)

Parameters: ~11.2M | Size: ~42.7 MB

Reproducibility

All experiments use:

  • SEED = 42 for NumPy, PyTorch, CUDA, and MPS
  • Deterministic algorithms enabled
  • All code, comments, and documentation in English

To reproduce results:

# Set seed explicitly
uv run python -m src.train.train --seed 42 --model MobileNetV2

Technical Stack

  • Python: 3.13
  • Deep Learning: PyTorch
  • Hardware: MPS (Apple Silicon) / CUDA / CPU auto-detection
  • Package Manager: uv
  • API Framework: FastAPI
  • Containerization: Docker

License

This project is for educational purposes. The dataset is from Kaggle and subject to its license terms.

Acknowledgments


Remember: This tool is for educational and research purposes only. Always consult qualified medical professionals for health-related decisions.

About

Pneumonia detection using deep learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages