Add GPU & Distributed Training Support by JoeHolt · Pull Request #22 · stsievert/adadamp

JoeHolt · 2021-04-05T19:14:50Z

No description provided.

JoeHolt · 2021-04-05T19:16:17Z

I believe the only relevant file from my other PR was the _dist.py, so I moved it over here into this cleaner PR

JoeHolt · 2021-04-05T19:24:02Z

When we last met, you asked that I update the API of DaskClassifier to look like this:

DaskClassifier(..., batch_size="geodamp", batch_size__latency=60, batch_size__factor=5)

I am have a few questions:

How should I implement the latency? Should I start tracking the current epoch internally within the class? Ie every time run_single_epoch is called, increment internal epoch count by 1.
How can I implement the specific dampening algorithm? Ie what the easiest way to make _dist.py use GeoDamp vs PadaDamp?

stsievert · 2021-04-05T22:23:54Z

relevant file from my other PR

Does tests/test_dist_damping.py need to be moved over from #15 too?

How should I implement the ~~latency~~ delay?

Something like this:

def initialize(self, X_train):
    self._meta = {..., "num_examples": 0}
def _partial_fit(self, X, y):
    self._meta["num_examples"] += len(X)
    _epochs = self._meta["num_examples"] / len(X_train)
    ...

Ie what the easiest way to make _dist.py use GeoDamp vs PadaDamp?

I think the easiest way is:

class GeoDamp:
    def __init__(self, delay=60, factor=5, initial=128):
        self.delay = delay
        self.factor = factor
        self.initial = initial

    def damp(self, meta: Dict[str, Any]) -> int:
        # meta is DaskBaseDamper._meta
        epochs = meta["epochs"]
        return self.initial * (self.factor ** (epochs // self.delay))

This will require some parsing of the arguments. DaskClassifier(batch_size="geodamp", batch_size__delay=70) should turn into DaskClassifier(batch_size=GeoDamp(delay=70)).

JoeHolt · 2021-04-14T18:13:27Z

I implemented the new API based on your description but had some questions:

It seems like it makes more sense to set the batch size arguments during a call to partial_fit vs fit. Does this make sense?
How can I adapt the classes in damping.py to work with dist.py? those classes seem like they need a lot more init arguments than the simple example we used.
Do I need to then update the test files?

stsievert

I'd put the dampers (BaseDamper, Damper, ...) in a new file, _dampers.py.

The current tests should pass. I'd make a new test to ensure that GeoDamp works as expected. I think test_damping.py has some code that will be useful.

stsievert · 2021-04-14T19:42:15Z

adadamp/_dist.py

        device: str = "cpu",
-        grads_per_worker: int=128,
-        max_epochs: int=20,
+        grads_per_worker=32,


Suggested change

grads_per_worker=32,

grads_per_worker: int = 128,

stsievert · 2021-04-14T19:46:16Z

adadamp/_dist.py

+        self.worker_max_batch_size = worker_max_batch_size
        self.min_workers = min_workers
        self.max_workers = max_workers
+        self.n_workers_ = min_workers


This should be set in initialize.

The Scikit-learn API says that parameters ending in underscores should only be set when fit is called (https://scikit-learn.org/stable/glossary.html#term-attributes)

stsievert · 2021-04-14T19:49:32Z

adadamp/_dist.py

+            args = (X, y) if y is not None else (X,)
+            return TensorDataset(*args)
+
+    def get_batch_size(self, batch_size, kwargs: Dict[str, Any]) -> BaseDamper:


Suggested change

def get_batch_size(self, batch_size, kwargs: Dict[str, Any]) -> BaseDamper:

def _get_damper(self, batch_size, kwargs: Dict[str, Any]) -> BaseDamper:

stsievert · 2021-04-14T19:49:58Z

adadamp/_dist.py

+            self.batch_size_ = self.batch_size
+
+        if not isinstance(self.batch_size_, BaseDamper):
+             raise ValueError("BatchSize not subclass of BaseDamper")


This chunk of code should go in initialize.

stsievert · 2021-04-14T19:50:11Z

adadamp/_dist.py

        return self

-    def _run_single_epoch(self, X, y=None, **fit_params):
+    def run_single_epoch(self, X, y=None, **fit_params):


Suggested change

def run_single_epoch(self, X, y=None, **fit_params):

def _run_single_epoch(self, X, y=None, **fit_params):

stsievert · 2021-04-16T03:36:47Z

adadamp/__init__.py

 )
+from .dampers import (
+    SimpleBaseDamper,
+    SimpleGeoDamp


I see – there's already BaseDamper and GeoDamp. It might be okay to remove this from __init__.py. Users can always do from adadamp.dampers import GeoDamp.

stsievert · 2021-04-23T17:49:31Z

adadamp/_dist.py

-from torch.utils.data import Dataset, IterableDataset, TensorDataset
-from torch.nn.modules.loss import _Loss as Loss
+from torch.utils.data import Dataset, IterableDataset, TensorDataset, DataLoader
+from adadamp.adadamp.dampers import SimpleBaseDamper, SimpleGeoDamp


Suggested change

from adadamp.adadamp.dampers import SimpleBaseDamper, SimpleGeoDamp

from .dampers import SimpleBaseDamper, SimpleGeoDamp

It helps to run pip install -e . in the root of this repo.

What does this do?

pip install -e /path/to/adadamp installs the AdaDamp package. It looks like from adadamp.adadamp.dampers import ... is not using the adadamp package; instead, it looks like it's using a relative path.

It might cleaner to do from adadamp.dampers import SimpleBaseDamper, SimpleGeoDamp. That's what from .dampers import is doing (but is relative, not absolute). Here's more detail: https://realpython.com/absolute-vs-relative-python-imports/

stsievert · 2021-04-23T17:53:47Z

adadamp/_dist.py

        }
-        self._initialized = True
+
+        if isinstance(self.batch_size, str):


Suggested change:

+ if not isinstance(self.batch_size, (str, int, np.integer, SimpleBaseDamper)): + raise ValueError("self.batch_size needs to be ...") ... + elif isinstance(self.batch_size, (int, np.integer)): ... - if not isinstance(self.batch_size_, SimpleBaseDamper): - raise ValueError("self.batch_size is not ...")

Copied over to new PR

28a3d33

JoeHolt changed the title ~~Copied over to new PR~~ Add GPU & Distributed Training Support Apr 5, 2021

JoeHolt added 2 commits April 9, 2021 17:09

Moved over test file

b0c9673

Updated dist to have new batch size API

f359d31

stsievert reviewed Apr 14, 2021

View reviewed changes

Updated batch size setup

ade67c9

stsievert reviewed Apr 21, 2021

View reviewed changes

stsievert reviewed Apr 23, 2021

View reviewed changes

JoeHolt and others added 5 commits April 25, 2021 17:54

Updated API/internals to match SciKitLern spec

2fc5489

Removed print

830b1ef

add example usage

1ede49f

tests pass

789465f

uncomment

58c2ff3

	def get_batch_size(self, batch_size, kwargs: Dict[str, Any]) -> BaseDamper:
	def _get_damper(self, batch_size, kwargs: Dict[str, Any]) -> BaseDamper:

	def run_single_epoch(self, X, y=None, **fit_params):
	def _run_single_epoch(self, X, y=None, **fit_params):

	from adadamp.adadamp.dampers import SimpleBaseDamper, SimpleGeoDamp
	from .dampers import SimpleBaseDamper, SimpleGeoDamp

Conversation

JoeHolt commented Apr 5, 2021

Uh oh!

JoeHolt commented Apr 5, 2021

Uh oh!

JoeHolt commented Apr 5, 2021

Uh oh!

stsievert commented Apr 5, 2021

Uh oh!

JoeHolt commented Apr 14, 2021

Uh oh!

stsievert left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants