Training Pipeline Enhancements by sridhs21 · Pull Request #16 · SCOREC/reconClassifier

sridhs21 · 2025-08-29T02:27:51Z

This PR enhances the training pipeline to address vanishing gradient problems that were preventing successful mixed precision training implementation. The primary focus is on implementing automatic mixed precision (AMP) support while making necessary architectural and pipeline improvements to ensure gradient stability. The changes include:

Automatic Mixed Precision (AMP) training support:
- Adds --use-amp and --amp-dtype flags to enable mixed precision training with both float16 and bfloat16 support. This can significantly reduce memory usage and training time on compatible GPUs while maintaining numerical stability through proper gradient scaling.
Enhanced U-Net architecture with residual connections:
- Replaces the basic U-Net with a residual block-based design that includes batch normalization and skip connections. This improves gradient flow and training stability while adding dropout layers to prevent overfitting, which was a common problem when implementing AMP.
Improved training pipeline:
- Implements patch-based training through XPointPatchDataset for better data augmentation, adds feature normalization for training stability, includes gradient clipping to prevent exploding gradients, and adds early stopping with patience to prevent overfitting. The data is also resampled when training, in that it is undersampled. Patches with no X-points are removed to match the number of patches with X-points present to create a balanced dataset.
Enhanced optimization:
- Switches from Adam to AdamW optimizer with weight decay for better generalization, adds cosine annealing learning rate scheduling, and improves checkpoint functionality to save/load all training state, including AMP scalers.

I have also updated the README file to include a description of what flags can be used.

…ecision

cwsmith

Looks good. Thank you. I appreciate the update to the README and the verbose PR description.

Would you please remove the whitespace only changes?

Update: as discussed in the meeting, if changing the whitespace breaks python then please ignore the whitespace request

XPointMLTest.py

sridhs21 added 7 commits July 30, 2025 17:06

Implemented Automatic Mixed Precision (AMP) for training

6147606

Major architecture upgrade: ResNet-style U-Net + modern training

527a874

Merge branch 'main' of github.com:scorec/reconClassifier into MixedPr…

024d9ef

…ecision

Made changes to make implementations merged from main work

633e111

made changes to make implementations from main work

0054ff2

Fixed bug and updated README for added flags.

e8767ee

fixed bug - psi return value

265780b

sridhs21 closed this Aug 29, 2025

revert to default params

f99434f

sridhs21 reopened this Aug 29, 2025

cwsmith requested changes Aug 29, 2025

View reviewed changes

cwsmith reviewed Aug 29, 2025

View reviewed changes

XPointMLTest.py Show resolved Hide resolved

XPointMLTest.py Show resolved Hide resolved

cwsmith merged commit 68bd475 into main Oct 20, 2025
2 checks passed

cwsmith deleted the MixedPrecision branch October 20, 2025 19:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Pipeline Enhancements#16

Training Pipeline Enhancements#16
cwsmith merged 8 commits intomainfrom
MixedPrecision

sridhs21 commented Aug 29, 2025 •

edited

Loading

Uh oh!

cwsmith left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sridhs21 commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cwsmith left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sridhs21 commented Aug 29, 2025 •

edited

Loading

cwsmith left a comment •

edited

Loading