Skip to content

Add multigpu pt#288

Draft
gkumbhat wants to merge 10 commits intocaikit:mainfrom
gkumbhat:add_multigpu_pt
Draft

Add multigpu pt#288
gkumbhat wants to merge 10 commits intocaikit:mainfrom
gkumbhat:add_multigpu_pt

Conversation

@gkumbhat
Copy link
Copy Markdown
Collaborator

@gkumbhat gkumbhat commented Dec 1, 2023

Changes

  • Add FSDP configuration for PT trainer
  • Add torch elastic launch

NOTE: This PR to be merged only after merge and rebase with #287

This currently gives following error:

ValueError: expected to be in states [<TrainingState.IDLE: 1>] but current state is TrainingState.FORWARD_BACKWARD

gkumbhat and others added 10 commits November 26, 2023 14:49
Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant