Skip to content

Support some level of priority scheduling for DataFlowTasks #75

@maltezfaria

Description

@maltezfaria

At some point, we may want to (add back?) support for a PriorityScheduler along the lines of what can be found in

The idea was that when a DataFlowTask is spawned, it is added to the DAG but the actual scheduling of the underlying task is delegated to a worker task which takes nodes from a runnable priority queue and executes them. After a task is finished, it inserts itself in a finished queue for removal (by another worker).

Pushing tasks into the runnable queue happens in one of two ways:

  1. A task inserts itself into runnable if it is dependency-free when it is created.
  2. If a task has a dependency when created, it is the responsibility of its last unfinished dependency to put it in the runnable queue. This means that the cleanup phase of finishing a task involves
    • Removing itself as a dependency on all of its outgoing edges
    • If it was the last dependency of one of those outgoing nodes, add that node to runnable
    • Remove itself from the DAG

This was dropped for simplicity until the package matures a bit more, and because I was not very happy with the way tasks were executed under this scheduler, but perhaps it is worth bringing it back to live at some point since priority scheduling can help guide the scheduler through the critical path more easily.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions