- n records
- start with an equidistant graph
- know the type of machines, small algorithm to determine how many records are given to each machine
- everything computed is written to a disk
- decide how many reducers we want
- write final output to a machine
- has an associated time
- has an associated state
- completed execution
- failed execution
- straggling
- event queue specifies the tasks for a specific stage of computation, can use this as a global synchronizer
- task queue is directly correlated to the computation execution graph
- scheduler touches both between task queue and computation graph, as well as coordinating between event and task queue
- will jump and skip over steps where no computations/events are occuring