@@ -7,8 +7,9 @@ usage of workers across the Dask cluster. It is disabled by default.
77Memory imbalance and duplication
88--------------------------------
99Whenever a Dask task returns data, it is stored on the worker that executed the task for
10- as long as it's a dependency of other tasks, is referenced by a ``Client `` through a
11- ``Future ``, or is part of a :doc: `published dataset <publish >`.
10+ as long as it's a dependency of other tasks, is referenced by a
11+ :class: `~distributed.Client ` through a :class: `~distributed.Future `, or is part of a
12+ :doc: `published dataset <publish >`.
1213
1314Dask assigns tasks to workers following criteria of CPU occupancy, :doc: `resources `, and
1415locality. In the trivial use case of tasks that are not connected to each other, take
@@ -38,7 +39,7 @@ The AMM can be enabled through the :doc:`Dask configuration file <configuration>
3839
3940 The above is the recommended setup and will run all enabled *AMM policies * (see below)
4041every two seconds. Alternatively, you can manually start/stop the AMM from the
41- ` `Client ` ` or trigger a one-off iteration:
42+ :class: ` ~distributed. `Client ` or trigger a one-off iteration:
4243
4344.. code-block :: python
4445
@@ -58,6 +59,10 @@ long as they don't harm data integrity. These suggestions can be of two types:
5859- Delete one or more replicas of an in-memory task. The AMM will never delete the last
5960 replica of a task, even if a policy asks to.
6061
62+ There are no "move" operations. A move is performed in two passes: first a policy
63+ creates a copy; in the next AMM iteration, the same or another policy deletes the
64+ original (if the copy succeeded).
65+
6166Unless a policy puts constraints on which workers should be impacted, the AMM will
6267automatically create replicas on workers with the lowest memory usage first and delete
6368them from workers with the highest memory usage first.
@@ -90,7 +95,7 @@ Built-in policies
9095ReduceReplicas
9196++++++++++++++
9297class
93- `` distributed.active_memory_manager.ReduceReplicas ` `
98+ :class: ` distributed.active_memory_manager.ReduceReplicas `
9499parameters
95100 None
96101
@@ -128,7 +133,7 @@ define two methods:
128133 Create one replica of the target task on the worker with the lowest memory among
129134 the listed candidates.
130135 ``yield "drop", <TaskState>, None ``
131- Delete one replica of the target task one the worker with the highest memory
136+ Delete one replica of the target task on the worker with the highest memory
132137 usage across the whole cluster.
133138 ``yield "drop", <TaskState>, {<WorkerState>, <WorkerState>, ...} ``
134139 Delete one replica of the target task on the worker with the highest memory
@@ -186,8 +191,8 @@ Example
186191The following custom policy ensures that keys "foo" and "bar" are replicated on all
187192workers at all times. New workers will receive a replica soon after connecting to the
188193scheduler. The policy will do nothing if the target keys are not in memory somewhere or
189- if all workers already hold a replica.
190- Note that this example is incompatible with the `` ReduceReplicas `` built-in policy.
194+ if all workers already hold a replica. Note that this example is incompatible with the
195+ :class: ` ~distributed.active_memory_manager. ReduceReplicas built-in policy.
191196
192197In mymodule.py (it must be accessible by the scheduler):
193198
0 commit comments