Open
Conversation
Owner
There was a problem hiding this comment.
I still have to read this through, but can't this "sharding" be reduced to SequenceKFold somehow? It looks awfully similar.
Collaborator
Author
There was a problem hiding this comment.
I think "SequenceKFold" can (and should) be implemented using "sharding", but not the other way around. It is a copy-pasted version of SequenceKFold indeed, with some unnecessary stuff removed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I tried to implement "iterative parameter mixing" strategy for distributed training of structured perceptron:
The idea is the following:
So communication should involve only transferring learned weights, and each shard could have its own training data.
ParallelStructuredPerceptron is an attempt to reimplement StructuredPerceptron in terms of OneEpochPerceptrons. It has n_jobs parameter, and ideally it should use multiprocessing or multithreading (numpy/scipy releases GIL and the bottleneck is in dot product isn't it?) for faster training. But I didn't manage to make multiprocessing work without copying shard's X/y/lengths each iteration, so n_jobs = N just creates N OneEpochPerceptrons and trains them sequentially.
Ideally, I want OneEpochPerceptron to be easy to use with IPython.parallel in distributed environment, and ParallelStructuredPerceptron to be easy to use on single machine.
Issues with current implementation:
sequence_ids shuffling method is changed to make ParallelStructuredPerceptron and StructuredPerceptron learn exactly the same weights given the same random_state.
With n_jobs=1 ParallelStructuredPerceptron is about 10% slower than StructuredPerceptron on my data; I think we could join these classes when (and if) ParallelStructuredPerceptron will be ready.