A machine learning system for binary classification of chest X-ray images to detect pneumonia. This project includes comprehensive EDA, model training with hyperparameter tuning, FastAPI deployment, and Docker containerization.
DISCLAIMER: This model is for educational purposes only and is NOT suitable for clinical diagnosis. Medical imaging interpretation requires trained professionals. This model should not be used as a substitute for professional medical advice, diagnosis, or treatment.
Pneumonia is an infection that inflames the air sacs in one or both lungs. Early and accurate detection is crucial for effective treatment. This project aims to assist in the classification of chest X-ray images into two categories:
- NORMAL: Healthy lung X-ray with no signs of pneumonia
- PNEUMONIA: X-ray showing signs of pneumonia infection
The system achieves this through deep learning models trained on the Kaggle Chest X-Ray Pneumonia dataset.
Source: Kaggle Chest X-Ray Images (Pneumonia)
Structure:
dataset/
├── train/
│ ├── NORMAL/ (~1,341 images)
│ └── PNEUMONIA/ (~3,875 images)
├── val/
│ ├── NORMAL/ (~8 images)
│ └── PNEUMONIA/ (~8 images)
└── test/
├── NORMAL/ (~234 images)
└── PNEUMONIA/ (~390 images)
Note: The dataset exhibits class imbalance (~1:3 NORMAL:PNEUMONIA ratio), which is addressed using weighted BCE loss during training.
This section presents a comprehensive analysis of the chest X-ray dataset, including distribution statistics, image properties, and quality assessment. All visualizations were generated in the Jupyter notebook (notebooks/notebook.ipynb).
| Metric | Value |
|---|---|
| Total Images | 5,856 |
| Training Set | 5,216 images |
| Validation Set | 16 images |
| Test Set | 624 images |
| Class Ratio (NORMAL:PNEUMONIA) | 1:2.89 |
| Corrupted Files | 0 |
| Image Format | JPEG (stored as RGB, grayscale content) |
The dataset is divided into three splits: training, validation, and test. The bar chart below shows the distribution of images across splits and classes.
Observations:
- The training set contains the majority of images (5,216 total)
- Training split: 1,341 NORMAL + 3,875 PNEUMONIA images
- Validation set is notably small (only 16 images total)
- Test set provides 624 images for final evaluation (234 NORMAL + 390 PNEUMONIA)
Understanding class imbalance is critical for training robust models. The visualizations below show the significant imbalance between NORMAL and PNEUMONIA classes.
Key Statistics:
- Class Imbalance Ratio: 1:2.89 (NORMAL:PNEUMONIA)
- NORMAL class: ~26% of dataset
- PNEUMONIA class: ~74% of dataset
Mitigation Strategy: Weighted Binary Cross-Entropy loss is used during training to compensate for class imbalance. The weight is calculated as: weight = num_normal / num_pneumonia
Chest X-ray images in the dataset have variable dimensions. Understanding the distribution helps inform preprocessing decisions.
Analysis:
- Width Range: Varies significantly across images
- Height Range: Shows similar variability
- Aspect Ratios: Most images are roughly square, but variations exist
- Scatter Plot: Shows the relationship between width and height, revealing common dimension clusters
Preprocessing Decision: All images are resized to 224x224 pixels for model input, maintaining consistency with ImageNet pretrained models (MobileNetV2, ResNet18).
Medical X-ray images are inherently grayscale, but storage format may vary.
Findings:
- Images are stored as RGB format (3 channels)
- Actual content is grayscale (all three channels contain identical values)
- No conversion needed during preprocessing; PyTorch models expect 3-channel input
Visual inspection of sample images from both classes helps understand the classification task.
Visual Observations:
- Top Row (NORMAL): Clear lung fields with visible rib structures, no opacity
- Bottom Row (PNEUMONIA): Visible infiltrates, consolidation, or opacity in lung regions
- Image quality and positioning vary across samples
- Some pneumonia cases show subtle signs, demonstrating task difficulty
Box plots reveal the distribution of image dimensions and identify potential outliers.
Analysis:
- Width and height distributions show the presence of outliers (images significantly larger or smaller than typical)
- Most images fall within a reasonable range for medical imaging
- Outliers are handled gracefully by the resizing transform during preprocessing
| Check | Status | Notes |
|---|---|---|
| Corrupted Files | None Found | All images load successfully |
| Missing Labels | None | Directory structure provides labels |
| Duplicate Images | Not Detected | Based on file names |
| Format Consistency | Consistent | All JPEG format |
Based on the EDA findings, the following preprocessing steps are applied:
- Resize: All images to 224x224 pixels
- Normalization: ImageNet statistics (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- Data Augmentation (training only):
- Random horizontal flip
- Random rotation (up to 10 degrees)
- Random affine transforms
- Color jitter for brightness/contrast variation
This section shows the model training process and final performance metrics. Based on the EDA findings (class imbalance, image properties), we applied weighted BCE loss and transfer learning.
The training curves below show the model's learning progress over epochs, including loss convergence and metric improvements.
Observations:
- Training and validation loss decrease steadily, indicating good convergence
- F1 score improves consistently across epochs
- AUC-ROC shows strong discriminative ability from early epochs
- No significant overfitting observed (validation metrics track training closely)
After training, the best model (MobileNetV2) was evaluated on the held-out test set of 624 images.
| Metric | Value |
|---|---|
| Test Accuracy | 90.38% |
| Test Precision | 89.66% |
| Test Recall | 95.64% |
| Test F1 Score | 92.56% |
| Test AUC-ROC | 96.19% |
Key Insights:
- High Recall (95.64%): The model correctly identifies 95.64% of pneumonia cases, which is critical for medical screening where missing a positive case is costly
- Balanced Precision (89.66%): While prioritizing recall, the model maintains good precision to minimize false alarms
- Strong AUC (96.19%): Excellent discriminative ability across all classification thresholds
We trained and compared three model architectures to understand the trade-offs between model complexity, training time, and performance.
| Model | Parameters | Size | Accuracy | Precision | Recall | F1 | AUC |
|---|---|---|---|---|---|---|---|
| SimpleCNN | 390K | 1.49 MB | 84.29% | 88.42% | 86.15% | 87.27% | 91.28% |
| MobileNetV2 | 2.2M | 8.6 MB | 90.38% | 89.66% | 95.64% | 92.56% | 96.19% |
| ResNet18 | 11.2M | 42.67 MB | 80.77% | 76.68% | 99.49% | 86.61% | 95.69% |
Key Findings:
- MobileNetV2 provides the best balance of performance, model size, and generalization
- SimpleCNN achieves competitive results with only 390K parameters (smallest model)
- ResNet18 has the highest recall (99.49%) but lower precision, indicating some overfitting
Understanding where the model "looks" when making predictions is crucial for medical AI applications. We use Gradient-weighted Class Activation Mapping (Grad-CAM) to visualize which regions of the X-ray images most influence the model's decisions.
- Trust: Clinicians need to verify the model focuses on medically relevant areas
- Debugging: Identify if the model learns spurious correlations (e.g., image artifacts)
- Education: Help understand what distinguishes normal from pneumonia X-rays
Grad-CAM visualizations for correctly classified NORMAL (healthy) X-rays:
Observations:
- Model attention is distributed across both lung fields
- Focus areas include the clear lung parenchyma regions
- Absence of concentrated hot spots in any particular region
Grad-CAM visualizations for correctly classified PNEUMONIA X-rays:
Observations:
- Model attention concentrates on areas with infiltrates or consolidation
- Hot spots align with visible opacity in the lung fields
- Attention patterns differ from NORMAL cases, focusing on abnormal regions
Side-by-side comparison of NORMAL vs PNEUMONIA attention patterns:
Key Findings:
- The model learns to focus on medically relevant regions (lung fields, not image edges or artifacts)
- PNEUMONIA cases show concentrated attention on opacity/consolidation areas
- NORMAL cases show more diffuse attention across clear lung tissue
- Attention patterns are consistent with clinical interpretation of chest X-rays
- No evidence of the model relying on spurious features (e.g., text labels, imaging equipment artifacts)
- Python 3.13+
- uv package manager
- Clone the repository:
git clone https://github.com/yourusername/pneumonia-detection.git
cd pneumonia-detection- Install dependencies:
uv sync- Download the dataset from Kaggle and extract to
dataset/directory.
Open and run the Jupyter notebook for comprehensive EDA:
uv run jupyter notebook notebooks/notebook.ipynbThe notebook includes:
- Dataset structure visualization
- Class distribution analysis
- Image dimension statistics
- Color mode verification
- Sample image grid
- Outlier detection
- Grad-CAM explainability (after training)
Train a single model:
# SimpleCNN baseline
uv run python -m src.train.train --model SimpleCNN --epochs 20 --lr 0.001
# MobileNetV2 transfer learning
uv run python -m src.train.train --model MobileNetV2 --epochs 20 --lr 0.001
# ResNet18 transfer learning
uv run python -m src.train.train --model ResNet18 --epochs 20 --lr 0.0005View all training options:
uv run python -m src.train.train --helpRun automated hyperparameter tuning:
# Run all 10 default configurations
uv run python -m src.train.sweep --epochs 10
# Run subset of configurations
uv run python -m src.train.sweep --runs 3 --epochs 5Results are saved to models/experiments.csv and the best model is copied to models/best_model.pth.
| Argument | Default | Description |
|---|---|---|
--model |
SimpleCNN | Model architecture (SimpleCNN, MobileNetV2, ResNet18) |
--lr |
0.001 | Learning rate |
--weight-decay |
0.0001 | L2 regularization |
--dropout |
0.3 | Dropout probability |
--epochs |
20 | Training epochs |
--batch-size |
32 | Batch size |
--image-size |
224 | Input image size |
--augmentation |
light | Augmentation tier (none, light, heavy) |
--data-dir |
dataset | Dataset directory |
--output-dir |
models | Output directory |
Start the FastAPI server:
uv run uvicorn src.predict.predict:app --host 0.0.0.0 --port 8000| Endpoint | Method | Description |
|---|---|---|
/ |
GET | API information |
/healthz |
GET | Health check |
/predict |
POST | Classify chest X-ray image |
/docs |
GET | OpenAPI documentation |
Health check:
curl http://localhost:8000/healthzExpected response:
{
"status": "healthy",
"model_loaded": true,
"device": "mps"
}Image prediction (NORMAL):
curl -X POST -F "file=@dataset/test/NORMAL/IM-0001-0001.jpeg" \
http://localhost:8000/predictExpected response:
{
"prediction": "NORMAL",
"confidence": 0.996,
"inference_time_ms": 303.64,
"model_path": "models/best_model.pth"
}Image prediction (PNEUMONIA):
curl -X POST -F "file=@dataset/test/PNEUMONIA/person100_bacteria_475.jpeg" \
http://localhost:8000/predictExpected response:
{
"prediction": "PNEUMONIA",
"confidence": 0.9972,
"inference_time_ms": 12.59,
"model_path": "models/best_model.pth"
}docker build -t pneumonia-api .docker run -p 8000:8000 pneumonia-apiWith custom model path:
docker run -p 8000:8000 \
-v /path/to/models:/app/models \
-e MODEL_PATH=models/custom_model.pth \
pneumonia-api| Variable | Default | Description |
|---|---|---|
MODEL_PATH |
models/best_model.pth | Path to model checkpoint |
IMAGE_SIZE |
224 | Input image size |
THRESHOLD |
0.5 | Classification threshold |
# Health check
curl http://localhost:8000/healthzExpected response:
{
"status": "healthy",
"model_loaded": true,
"device": "cpu"
}# Prediction (NORMAL image)
curl -X POST -F "file=@dataset/test/NORMAL/IM-0001-0001.jpeg" http://localhost:8000/predictExpected response:
{
"prediction": "NORMAL",
"confidence": 0.996,
"inference_time_ms": 61.1,
"model_path": "models/best_model.pth"
}# Prediction (PNEUMONIA image)
curl -X POST -F "file=@dataset/test/PNEUMONIA/person100_bacteria_475.jpeg" http://localhost:8000/predictExpected response:
{
"prediction": "PNEUMONIA",
"confidence": 0.9854,
"inference_time_ms": 10.31,
"model_path": "models/best_model.pth"
}pneumonia-detection/
├── dataset/ # Dataset directory (not in repo)
│ ├── train/
│ ├── val/
│ └── test/
├── models/ # Saved models and checkpoints
├── notebooks/
│ └── notebook.ipynb # EDA and training notebook
├── screenshots/ # Visualizations from notebook
│ ├── 01_dataset_structure.png
│ ├── 02_class_distribution.png
│ ├── 03_image_dimensions.png
│ ├── 04_color_mode.png
│ ├── 05_sample_xrays.png
│ ├── 06_outlier_boxplots.png
│ ├── 07_training_curves.png
│ ├── 08_results_metrics.png
│ ├── 09_model_comparison.png
│ ├── 10_gradcam_normal.png
│ ├── 11_gradcam_pneumonia.png
│ └── 12_gradcam_comparison.png
├── scripts/
│ └── extract_notebook_images.py # Extract images from notebook
├── src/
│ ├── train/ # Training modules
│ │ ├── config.py # Configuration and reproducibility
│ │ ├── dataset.py # Dataset and dataloaders
│ │ ├── models.py # Model definitions
│ │ ├── transforms.py # Image transforms
│ │ ├── metrics.py # Evaluation metrics
│ │ ├── trainer.py # Training loop
│ │ ├── train.py # Training CLI
│ │ └── sweep.py # Hyperparameter sweep
│ └── predict/ # Inference modules
│ ├── inference.py # Inference logic
│ └── predict.py # FastAPI application
├── Dockerfile # Container definition
├── pyproject.toml # Project dependencies
├── uv.lock # Locked dependencies
└── README.md
Custom 4-layer CNN:
- Conv2d(3→32) → BatchNorm → ReLU → MaxPool
- Conv2d(32→64) → BatchNorm → ReLU → MaxPool
- Conv2d(64→128) → BatchNorm → ReLU → MaxPool
- Conv2d(128→256) → BatchNorm → ReLU → AdaptiveAvgPool
- Flatten → Dropout → Linear(256→1)
Parameters: ~390K | Size: ~1.5 MB
- Pretrained on ImageNet
- Modified classifier: Dropout → Linear(1280→1)
Parameters: ~2.2M | Size: ~8.6 MB
- Pretrained on ImageNet
- Modified fc: Dropout → Linear(512→1)
Parameters: ~11.2M | Size: ~42.7 MB
All experiments use:
- SEED = 42 for NumPy, PyTorch, CUDA, and MPS
- Deterministic algorithms enabled
- All code, comments, and documentation in English
To reproduce results:
# Set seed explicitly
uv run python -m src.train.train --seed 42 --model MobileNetV2- Python: 3.13
- Deep Learning: PyTorch
- Hardware: MPS (Apple Silicon) / CUDA / CPU auto-detection
- Package Manager: uv
- API Framework: FastAPI
- Containerization: Docker
This project is for educational purposes. The dataset is from Kaggle and subject to its license terms.
- Dataset: Chest X-Ray Images (Pneumonia) by Paul Mooney
- Pretrained models from torchvision
Remember: This tool is for educational and research purposes only. Always consult qualified medical professionals for health-related decisions.











