This repository was archived by the owner on Nov 3, 2023. It is now read-only.
Support mixed precision with CPU driver#118
Open
amogkam wants to merge 45 commits intoray-project:mainfrom
Open
Support mixed precision with CPU driver#118amogkam wants to merge 45 commits intoray-project:mainfrom
amogkam wants to merge 45 commits intoray-project:mainfrom
Conversation
Bumps [pytorch-lightning](https://github.com/PyTorchLightning/pytorch-lightning) from 1.4.7 to 1.5.2. - [Release notes](https://github.com/PyTorchLightning/pytorch-lightning/releases) - [Changelog](https://github.com/PyTorchLightning/pytorch-lightning/blob/master/CHANGELOG.md) - [Commits](Lightning-AI/pytorch-lightning@1.4.7...1.5.2) --- updated-dependencies: - dependency-name: pytorch-lightning dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
…ors into ptl-1.5-support
…ors into cpu-head-mixed-precision
…ors into cpu-head-mixed-precision
…rs into cpu-head-mixed-precision
…ors into cpu-head-mixed-precision
matthewdeng
reviewed
Jan 27, 2022
Contributor
matthewdeng
left a comment
There was a problem hiding this comment.
Thanks for the thorough description! Unfortunately my understanding of PTL is quite limited so I still have to ask a clarifying question below 😅
| # Swap out the accelerator if necessary. | ||
| # This is needed to support CPU head with GPU workers or Ray Client. | ||
| current_accelerator = self.lightning_module.trainer.accelerator | ||
| if self.use_gpu and isinstance(current_accelerator, CPUAccelerator): |
Contributor
There was a problem hiding this comment.
I'm not quite following this logic - by removing isinstance(current_accelerator, CPUAccelerator), what scenario does this solve? Wouldn't the problem case (CPUAccelerator) be changed to DelayedGPUAccelerator both before and after this PR?
|
|
||
|
|
||
| def train_func(dir, plugin, callbacks=None): | ||
| def train_func(dir, plugin, callbacks=None, amp=False): |
Contributor
There was a problem hiding this comment.
Should there be a test with amp=True?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #99. GPU tests were run manually and all are passing.
In PTL 16 bit precision is only works on GPU. You specify that you want GPUs to your Trainer by setting
gpus=1. However, if you do this with Ray Lightning, the driver process requires GPUs. This prevents you from using Ray Lightning with Ray Client, or if you are using Tune, this requires an extra GPU to be reserved but not actually be used. To fix this, we previously implemented a "hack" to swap out the accelerator with a custom accelerator so the driver doesn't require GPUs: #67.However, this swap only takes place if the initial accelerator is a
CPUAccelerator. This prevents mixed precision from being used since PTL complains about 16 bit precision with aCPUAccelerator. Instead this PR does the swap out regardless of the initial accelerator that is set.