This project utilizes Deep Learning and Transfer Learning techniques to classify white blood cells from microscopic images. The goal is to accurately distinguish between normal cells and those affected by Acute Lymphoblastic Leukemia (ALL) using the C-NMC dataset.
Acute lymphoblastic leukemia (ALL) is the most common type of childhood cancer and accounts for approximately 25% of pediatric cancers. These cells have been segmented from microscopic images and are representative of images in the real world. The task of identifying immature leukemic blasts from normal cells under the microscope is challenging due to morphological similarity; therefore, the ground truth labels were annotated by an expert oncologist.
We employ DenseNet121, a densely connected convolutional network architecture where each layer connects to every other layer in a feed-forward fashion. This mitigates the vanishing-gradient problem, strengthens feature propagation, and substantially reduces the number of parameters.
| Stage | Component | Description |
|---|---|---|
| 1. Base | DenseNet121 | Pre-trained on ImageNet with Global Average Pooling. Used for feature extraction (Top layers removed, weights frozen). |
| 2. Norm | Batch Normalization | Stabilizes learning by normalizing inputs (axis=-1, momentum=0.99, epsilon=0.001). |
| 3. Dense | Fully Connected (256) | Custom classification head with ReLU activation and L2 Regularization. |
| 4. Dropout | Dropout (0.40) | Randomly sets 40% of neurons to 0 to prevent overfitting. |
| 5. Output | Dense (2) | Softmax layer for binary classification probabilities (ALL vs. HEM). |
Unlike traditional sequential networks, DenseNet connects every layer to every subsequent layer. This improves information flow and gradients throughout the network.
To combat overfitting in medical imaging, we apply L2 regularization to the dense layer weights.
-
L2 (Ridge): Prevents large weights by penalizing the squared magnitude of coefficients.
$$Loss_{L2} = \lambda \sum w_i^2$$ (where Ξ» = 1e-4)
We utilize the Adam optimizer (Adaptive Moment Estimation) starting with a learning rate of 0.0001. It computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients.
A custom callback loop monitors the validation metrics to implement:
- Learning Rate Scheduling: Decays the learning rate by a factor of 0.5 if accuracy plateaus (with a patience of 1).
- Early Stopping: Halts training after 3 learning rate adjustments without improvement to save computational resources.
- Model Checkpointing: Restores the weights from the epoch with the highest validation performance.
To assess the performance and generalization capability of the DenseNet121 model, we analyzed both the training history and the final predictions on the unseen Test Set.
The confusion matrix provides a detailed breakdown of the model's true vs. predicted classifications on the testing data (1600 images).
Based on the evaluation of the test set, the model demonstrates strong discriminatory power between ALL (Leukemia) and HEM (Normal) cells.
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| ALL (Malignant) | 0.96 | 0.98 | 0.97 | 1091 |
| HEM (Normal) | 0.95 | 0.90 | 0.93 | 509 |
| Overall Accuracy | 95.5% | 1600 |
The project uses the C-NMC (Classification of Normal vs Malignant Cells) Leukemia dataset.
- Source: Kaggle - Leukemia Classification
- Classes: 1.
Hem(Normal) 2.All(Malignant / Leukemia) - Split Configuration:
- Training: 70%
- Validation: 15%
- Test: 15% (1600 images evaluated in the dashboard: 1091 ALL, 509 HEM)
Follow these steps to get the project running on your local machine.
Open your terminal or command prompt and run:
# Clone the project
git clone https://github.com/aryannverse/Leukemia-Detection-Using-DenseNet121-CNN.gitInstall the required libraries listed in requirements.txt.
pip install -r requirements.txt- Download the dataset from the Kaggle Link.
- Extract the downloaded folder.
- Important: Ensure the path in the notebook matches your local data location:
data_dir = 'C-NMC_Leukemia/training_data'
Launch Jupyter to view and run the training process.
jupyter notebook Leukemia_Classification.ipynbEither train your own model or use the pretrained model in the repo and run the streamlit app using:
streamlit run leukemia_app.py