ODESSA is a framework that combines a middleware with the DSL to deploy and execute scientific workflows with scalability. SWIG, a simple DSL, allows the user to transform the workflow to a DAG. The middleware can automatically (1) exploit data parallelism opportunities in the workflow, and (2) scale the workflow according to the availability of cloud resources.
This repository contains our implementation of ODESSA. In particular, the implementation of the master in our middleware architecture is for evaluating the performance of ODESSA. It runs each test 26 times (13 times of Random Walk and 13 times of Round Robin).
workflow_seq.txtcontains the SWIG script of the sequential Word Count workflow.workflow.txtcontains the SWIG script of the scalable Word Count workflow.
input_nMB.txtis the input file of the Word Count Problem. Each of them include a string that of the size ofnMB.
The user shall submit cloud resources (e.g., Aliyun ECS VMs) by providing the private IP addresses of the master and worker VMs.
The user should put the IP addresses of all workers in a .txt file (e.g., worker_ip.txt) with each address as a string on a line.
The user should also provide the master's IP, master_ip, when executing the workers (see the section below).
- Upload all the files within /master to the master VM. And upload all the files within /worker to each worker VM.
- Upload one SWIG script to the master.
- Upload one input string file to one worker VM.
- Upload the workers' IPs file to the master and each worker.
- Run each worker VM with the command:
python3 worker.py [master_ip] [worker_ip.txt] - Run the master with the command:
python3 master.py [root] [dag.txt] [worker_ip.txt] [input_filename] [input_location]whererootis the root of the workflow DAG (namelyAin this case),dag.txtis the SWIG script (workflow.txtorworkflow_seq.txt),input_filenameis the name of the input file (input_nMB.txt), andinput_locationis the IP of the VM which contains the input file.
The output of each 26 tests is generated in the result.txt file when the master program indicates ALL FINISHED. The times taken by the Random Walk scheduling algorithm are on the odd number of lines, whereas the times taken by Round Robin are on the even number of lines.