More efficient multiprocessing for computing MI#2
More efficient multiprocessing for computing MI#2CorentinJ wants to merge 1 commit intocvjena:masterfrom CorentinJ:master
Conversation
Callidior
left a comment
There was a problem hiding this comment.
Thanks for pointing this out! The speed could definitely be improved here by setting up the worker processes only once.
However, I am concerned that your suggested fix breaks the correctness of the method due to the mutual_information object that is passed in the initialization of the pool to each worker. This object is not constant and changes it's state after each step. With your implementation, the workers receive the mutual_information object only once in the beginning and won't update it's state to be in sync with the object from the main process.
Thus, we should update this object in the worker's memory as well, in a fashion like this:
if self.parallelized and (it > 0):
mi = pool.map(_parallel_mi, candidates)
else
mi = [mutual_information(i) for i in candidates]
max_ind = np.argmax(mi)
mutual_information.append(candidates[max_ind])
del candidates[max_ind]
if self.parallelized:
for worker_ind in range(p._processes):
p.apply_async(_update_pool, mutual_information)To make sure that the new object is passed to all worker processes and avoid race conditions, we should also introduce a multiprocessing.Barrier for the update:
def _init_pool(mi, barrier):
global _mi, _barrier
_mi = mi
_barrier = barrier
def _update_pool(mi):
global _mi, _barrier
_mi = mi
_barrier.wait()Since the barrier must be passed to the workers upon initialization, we need to specify the number of worker threads explicitly when initializing the pool:
barrier = Barrier(os.cpu_count())
pool = Pool(barrier.parties, initializer = _init_pool, initargs = (mutual_information, barrier))It would be nice if you could implement these changes and also verify that the results are still the same as before, in addition to comparing just the run-time. I would then be happy to merge it.
|
Oh so that's why the gain of speed is so large, not surprising then. I completely missed that argument. It would probably be better to refactor that global variable away too, wouldn't it? |
|
That would be preferable, but I currently don't see an elegant way of doing this. Nevertheless, I would still expect a speed up with the synchronization of that variable between individual steps among the processes, though not as big as you initially observed. |
The repeated creation of the pool in the inner loop leads to serious performance hit:
USPS:
Time elapsed: 61.996sMIRFLICKR:
Time elapsed: 359.179sMoving the pool outside the loop:
USPS:
Time elapsed: 17.587sMIRFLICKR:
Time elapsed: 140.803s