Pneumonia Detection from Chest X-Ray Images

A machine learning system for binary classification of chest X-ray images to detect pneumonia. This project includes comprehensive EDA, model training with hyperparameter tuning, FastAPI deployment, and Docker containerization.

DISCLAIMER: This model is for educational purposes only and is NOT suitable for clinical diagnosis. Medical imaging interpretation requires trained professionals. This model should not be used as a substitute for professional medical advice, diagnosis, or treatment.

Problem Description

Pneumonia is an infection that inflames the air sacs in one or both lungs. Early and accurate detection is crucial for effective treatment. This project aims to assist in the classification of chest X-ray images into two categories:

NORMAL: Healthy lung X-ray with no signs of pneumonia
PNEUMONIA: X-ray showing signs of pneumonia infection

The system achieves this through deep learning models trained on the Kaggle Chest X-Ray Pneumonia dataset.

Dataset

Source: Kaggle Chest X-Ray Images (Pneumonia)

Structure:

dataset/
├── train/
│   ├── NORMAL/     (~1,341 images)
│   └── PNEUMONIA/  (~3,875 images)
├── val/
│   ├── NORMAL/     (~8 images)
│   └── PNEUMONIA/  (~8 images)
└── test/
    ├── NORMAL/     (~234 images)
    └── PNEUMONIA/  (~390 images)

Note: The dataset exhibits class imbalance (~1:3 NORMAL:PNEUMONIA ratio), which is addressed using weighted BCE loss during training.

Exploratory Data Analysis

This section presents a comprehensive analysis of the chest X-ray dataset, including distribution statistics, image properties, and quality assessment. All visualizations were generated in the Jupyter notebook (notebooks/notebook.ipynb).

Key Findings Summary

Metric	Value
Total Images	5,856
Training Set	5,216 images
Validation Set	16 images
Test Set	624 images
Class Ratio (NORMAL:PNEUMONIA)	1:2.89
Corrupted Files	0
Image Format	JPEG (stored as RGB, grayscale content)

Dataset Structure

The dataset is divided into three splits: training, validation, and test. The bar chart below shows the distribution of images across splits and classes.

Observations:

The training set contains the majority of images (5,216 total)
Training split: 1,341 NORMAL + 3,875 PNEUMONIA images
Validation set is notably small (only 16 images total)
Test set provides 624 images for final evaluation (234 NORMAL + 390 PNEUMONIA)

Class Distribution

Understanding class imbalance is critical for training robust models. The visualizations below show the significant imbalance between NORMAL and PNEUMONIA classes.

Key Statistics:

Class Imbalance Ratio: 1:2.89 (NORMAL:PNEUMONIA)
NORMAL class: ~26% of dataset
PNEUMONIA class: ~74% of dataset

Mitigation Strategy: Weighted Binary Cross-Entropy loss is used during training to compensate for class imbalance. The weight is calculated as: weight = num_normal / num_pneumonia

Image Dimensions

Chest X-ray images in the dataset have variable dimensions. Understanding the distribution helps inform preprocessing decisions.

Analysis:

Width Range: Varies significantly across images
Height Range: Shows similar variability
Aspect Ratios: Most images are roughly square, but variations exist
Scatter Plot: Shows the relationship between width and height, revealing common dimension clusters

Preprocessing Decision: All images are resized to 224x224 pixels for model input, maintaining consistency with ImageNet pretrained models (MobileNetV2, ResNet18).

Color Mode Analysis

Medical X-ray images are inherently grayscale, but storage format may vary.

Findings:

Images are stored as RGB format (3 channels)
Actual content is grayscale (all three channels contain identical values)
No conversion needed during preprocessing; PyTorch models expect 3-channel input

Sample X-Ray Images

Visual inspection of sample images from both classes helps understand the classification task.

Visual Observations:

Top Row (NORMAL): Clear lung fields with visible rib structures, no opacity
Bottom Row (PNEUMONIA): Visible infiltrates, consolidation, or opacity in lung regions
Image quality and positioning vary across samples
Some pneumonia cases show subtle signs, demonstrating task difficulty

Outlier Detection

Box plots reveal the distribution of image dimensions and identify potential outliers.

Analysis:

Width and height distributions show the presence of outliers (images significantly larger or smaller than typical)
Most images fall within a reasonable range for medical imaging
Outliers are handled gracefully by the resizing transform during preprocessing

Data Quality Assessment

Check	Status	Notes
Corrupted Files	None Found	All images load successfully
Missing Labels	None	Directory structure provides labels
Duplicate Images	Not Detected	Based on file names
Format Consistency	Consistent	All JPEG format

Preprocessing Pipeline

Based on the EDA findings, the following preprocessing steps are applied:

Resize: All images to 224x224 pixels
Normalization: ImageNet statistics (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
Data Augmentation (training only):
- Random horizontal flip
- Random rotation (up to 10 degrees)
- Random affine transforms
- Color jitter for brightness/contrast variation

Training Results

This section shows the model training process and final performance metrics. Based on the EDA findings (class imbalance, image properties), we applied weighted BCE loss and transfer learning.

Training Curves

The training curves below show the model's learning progress over epochs, including loss convergence and metric improvements.

Observations:

Training and validation loss decrease steadily, indicating good convergence
F1 score improves consistently across epochs
AUC-ROC shows strong discriminative ability from early epochs
No significant overfitting observed (validation metrics track training closely)

Final Test Metrics

After training, the best model (MobileNetV2) was evaluated on the held-out test set of 624 images.

Metric	Value
Test Accuracy	90.38%
Test Precision	89.66%
Test Recall	95.64%
Test F1 Score	92.56%
Test AUC-ROC	96.19%

Key Insights:

High Recall (95.64%): The model correctly identifies 95.64% of pneumonia cases, which is critical for medical screening where missing a positive case is costly
Balanced Precision (89.66%): While prioritizing recall, the model maintains good precision to minimize false alarms
Strong AUC (96.19%): Excellent discriminative ability across all classification thresholds

Model Architecture Comparison

We trained and compared three model architectures to understand the trade-offs between model complexity, training time, and performance.

Model	Parameters	Size	Accuracy	Precision	Recall	F1	AUC
SimpleCNN	390K	1.49 MB	84.29%	88.42%	86.15%	87.27%	91.28%
MobileNetV2	2.2M	8.6 MB	90.38%	89.66%	95.64%	92.56%	96.19%
ResNet18	11.2M	42.67 MB	80.77%	76.68%	99.49%	86.61%	95.69%

Key Findings:

MobileNetV2 provides the best balance of performance, model size, and generalization
SimpleCNN achieves competitive results with only 390K parameters (smallest model)
ResNet18 has the highest recall (99.49%) but lower precision, indicating some overfitting

Model Explainability (Grad-CAM)

Understanding where the model "looks" when making predictions is crucial for medical AI applications. We use Gradient-weighted Class Activation Mapping (Grad-CAM) to visualize which regions of the X-ray images most influence the model's decisions.

Why Explainability Matters

Trust: Clinicians need to verify the model focuses on medically relevant areas
Debugging: Identify if the model learns spurious correlations (e.g., image artifacts)
Education: Help understand what distinguishes normal from pneumonia X-rays

NORMAL Class Attention

Grad-CAM visualizations for correctly classified NORMAL (healthy) X-rays:

Observations:

Model attention is distributed across both lung fields
Focus areas include the clear lung parenchyma regions
Absence of concentrated hot spots in any particular region

PNEUMONIA Class Attention

Grad-CAM visualizations for correctly classified PNEUMONIA X-rays:

Observations:

Model attention concentrates on areas with infiltrates or consolidation
Hot spots align with visible opacity in the lung fields
Attention patterns differ from NORMAL cases, focusing on abnormal regions

Comparison Grid

Side-by-side comparison of NORMAL vs PNEUMONIA attention patterns:

Key Findings:

The model learns to focus on medically relevant regions (lung fields, not image edges or artifacts)
PNEUMONIA cases show concentrated attention on opacity/consolidation areas
NORMAL cases show more diffuse attention across clear lung tissue
Attention patterns are consistent with clinical interpretation of chest X-rays
No evidence of the model relying on spurious features (e.g., text labels, imaging equipment artifacts)

Installation

Prerequisites

Python 3.13+
uv package manager

Setup

Clone the repository:

git clone https://github.com/yourusername/pneumonia-detection.git
cd pneumonia-detection

Install dependencies:

uv sync

Download the dataset from Kaggle and extract to dataset/ directory.

Usage

Exploratory Data Analysis

Open and run the Jupyter notebook for comprehensive EDA:

uv run jupyter notebook notebooks/notebook.ipynb

The notebook includes:

Dataset structure visualization
Class distribution analysis
Image dimension statistics
Color mode verification
Sample image grid
Outlier detection
Grad-CAM explainability (after training)

Model Training

Train a single model:

# SimpleCNN baseline
uv run python -m src.train.train --model SimpleCNN --epochs 20 --lr 0.001

# MobileNetV2 transfer learning
uv run python -m src.train.train --model MobileNetV2 --epochs 20 --lr 0.001

# ResNet18 transfer learning
uv run python -m src.train.train --model ResNet18 --epochs 20 --lr 0.0005

View all training options:

uv run python -m src.train.train --help

Hyperparameter Sweep

Run automated hyperparameter tuning:

# Run all 10 default configurations
uv run python -m src.train.sweep --epochs 10

# Run subset of configurations
uv run python -m src.train.sweep --runs 3 --epochs 5

Results are saved to models/experiments.csv and the best model is copied to models/best_model.pth.

Training CLI Options

Argument	Default	Description
`--model`	SimpleCNN	Model architecture (SimpleCNN, MobileNetV2, ResNet18)
`--lr`	0.001	Learning rate
`--weight-decay`	0.0001	L2 regularization
`--dropout`	0.3	Dropout probability
`--epochs`	20	Training epochs
`--batch-size`	32	Batch size
`--image-size`	224	Input image size
`--augmentation`	light	Augmentation tier (none, light, heavy)
`--data-dir`	dataset	Dataset directory
`--output-dir`	models	Output directory

API Service

Running Locally

Start the FastAPI server:

uv run uvicorn src.predict.predict:app --host 0.0.0.0 --port 8000

API Endpoints

Endpoint	Method	Description
`/`	GET	API information
`/healthz`	GET	Health check
`/predict`	POST	Classify chest X-ray image
`/docs`	GET	OpenAPI documentation

Example Requests

Health check:

curl http://localhost:8000/healthz

Expected response:

{
  "status": "healthy",
  "model_loaded": true,
  "device": "mps"
}

Image prediction (NORMAL):

curl -X POST -F "file=@dataset/test/NORMAL/IM-0001-0001.jpeg" \
  http://localhost:8000/predict

Expected response:

{
  "prediction": "NORMAL",
  "confidence": 0.996,
  "inference_time_ms": 303.64,
  "model_path": "models/best_model.pth"
}

Image prediction (PNEUMONIA):

curl -X POST -F "file=@dataset/test/PNEUMONIA/person100_bacteria_475.jpeg" \
  http://localhost:8000/predict

Expected response:

{
  "prediction": "PNEUMONIA",
  "confidence": 0.9972,
  "inference_time_ms": 12.59,
  "model_path": "models/best_model.pth"
}

Docker Deployment

Building the Image

docker build -t pneumonia-api .

Running the Container

docker run -p 8000:8000 pneumonia-api

With custom model path:

docker run -p 8000:8000 \
  -v /path/to/models:/app/models \
  -e MODEL_PATH=models/custom_model.pth \
  pneumonia-api

Environment Variables

Variable	Default	Description
`MODEL_PATH`	models/best_model.pth	Path to model checkpoint
`IMAGE_SIZE`	224	Input image size
`THRESHOLD`	0.5	Classification threshold

Testing the Containerized API

# Health check
curl http://localhost:8000/healthz

Expected response:

{
  "status": "healthy",
  "model_loaded": true,
  "device": "cpu"
}

# Prediction (NORMAL image)
curl -X POST -F "file=@dataset/test/NORMAL/IM-0001-0001.jpeg" http://localhost:8000/predict

Expected response:

{
  "prediction": "NORMAL",
  "confidence": 0.996,
  "inference_time_ms": 61.1,
  "model_path": "models/best_model.pth"
}

# Prediction (PNEUMONIA image)
curl -X POST -F "file=@dataset/test/PNEUMONIA/person100_bacteria_475.jpeg" http://localhost:8000/predict

Expected response:

{
  "prediction": "PNEUMONIA",
  "confidence": 0.9854,
  "inference_time_ms": 10.31,
  "model_path": "models/best_model.pth"
}

Project Structure

pneumonia-detection/
├── dataset/                 # Dataset directory (not in repo)
│   ├── train/
│   ├── val/
│   └── test/
├── models/                  # Saved models and checkpoints
├── notebooks/
│   └── notebook.ipynb       # EDA and training notebook
├── screenshots/             # Visualizations from notebook
│   ├── 01_dataset_structure.png
│   ├── 02_class_distribution.png
│   ├── 03_image_dimensions.png
│   ├── 04_color_mode.png
│   ├── 05_sample_xrays.png
│   ├── 06_outlier_boxplots.png
│   ├── 07_training_curves.png
│   ├── 08_results_metrics.png
│   ├── 09_model_comparison.png
│   ├── 10_gradcam_normal.png
│   ├── 11_gradcam_pneumonia.png
│   └── 12_gradcam_comparison.png
├── scripts/
│   └── extract_notebook_images.py  # Extract images from notebook
├── src/
│   ├── train/               # Training modules
│   │   ├── config.py        # Configuration and reproducibility
│   │   ├── dataset.py       # Dataset and dataloaders
│   │   ├── models.py        # Model definitions
│   │   ├── transforms.py    # Image transforms
│   │   ├── metrics.py       # Evaluation metrics
│   │   ├── trainer.py       # Training loop
│   │   ├── train.py         # Training CLI
│   │   └── sweep.py         # Hyperparameter sweep
│   └── predict/             # Inference modules
│       ├── inference.py     # Inference logic
│       └── predict.py       # FastAPI application
├── Dockerfile               # Container definition
├── pyproject.toml           # Project dependencies
├── uv.lock                  # Locked dependencies
└── README.md

Model Architectures

SimpleCNN (Baseline)

Custom 4-layer CNN:

Conv2d(3→32) → BatchNorm → ReLU → MaxPool
Conv2d(32→64) → BatchNorm → ReLU → MaxPool
Conv2d(64→128) → BatchNorm → ReLU → MaxPool
Conv2d(128→256) → BatchNorm → ReLU → AdaptiveAvgPool
Flatten → Dropout → Linear(256→1)

Parameters: ~390K | Size: ~1.5 MB

MobileNetV2 (Transfer Learning)

Pretrained on ImageNet
Modified classifier: Dropout → Linear(1280→1)

Parameters: ~2.2M | Size: ~8.6 MB

ResNet18 (Transfer Learning)

Pretrained on ImageNet
Modified fc: Dropout → Linear(512→1)

Parameters: ~11.2M | Size: ~42.7 MB

Reproducibility

All experiments use:

SEED = 42 for NumPy, PyTorch, CUDA, and MPS
Deterministic algorithms enabled
All code, comments, and documentation in English

To reproduce results:

# Set seed explicitly
uv run python -m src.train.train --seed 42 --model MobileNetV2

Technical Stack

Python: 3.13
Deep Learning: PyTorch
Hardware: MPS (Apple Silicon) / CUDA / CPU auto-detection
Package Manager: uv
API Framework: FastAPI
Containerization: Docker

License

This project is for educational purposes. The dataset is from Kaggle and subject to its license terms.

Acknowledgments

Dataset: Chest X-Ray Images (Pneumonia) by Paul Mooney
Pretrained models from torchvision

Remember: This tool is for educational and research purposes only. Always consult qualified medical professionals for health-related decisions.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
models		models
notebooks		notebooks
screenshots		screenshots
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Pneumonia Detection from Chest X-Ray Images

Problem Description

Dataset

Exploratory Data Analysis

Key Findings Summary

Dataset Structure

Class Distribution

Image Dimensions

Color Mode Analysis

Sample X-Ray Images

Outlier Detection

Data Quality Assessment

Preprocessing Pipeline

Training Results

Training Curves

Final Test Metrics

Model Architecture Comparison

Model Explainability (Grad-CAM)

Why Explainability Matters

NORMAL Class Attention

PNEUMONIA Class Attention

Comparison Grid

Installation

Prerequisites

Setup

Usage

Exploratory Data Analysis

Model Training

Hyperparameter Sweep

Training CLI Options

API Service

Running Locally

API Endpoints

Example Requests

Docker Deployment

Building the Image

Running the Container

Environment Variables

Testing the Containerized API

Project Structure

Model Architectures

SimpleCNN (Baseline)

MobileNetV2 (Transfer Learning)

ResNet18 (Transfer Learning)

Reproducibility

Technical Stack

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages