Skip to content

jmiano/MTLSys

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

157 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MTLSys: Latency-aware Pruning for Multi-task Learning

Authors: Abhigyan Khaund, Joseph Miano

Run examples/demos

Set up conda environment

  • Run the following command to set up the conda environment:
    • Windows: conda env create --name smr_env --file smr_env_windows.yml
    • Linux: conda env create --name smr_env --file smr_env_linux.yml
    • Mac: conda env create --name smr_env --file smr_env_mac.yml
  • Activate the conda environment by running: conda activate smr_env
  • Keep this environment active throughout the next steps
  • Note: you must have a GPU with NVIDIA Drivers configured on your system for the Model Variant Generation part of the examples/demos, since it runs on GPU.

Download trained model-variants

  • From repo_team14, change directory to the examples/models directory: cd examples/models
  • Run the command: gdown "https://drive.google.com/uc?id=1pG_6ncWFn8Gy4pIaz4Q4EE6JbGmD6RkM"
  • Unzip the model-variants.zip file. The model files should populate the examples/models/model_variants directory.
  • If prompted to replace existing files, yes allow the files to be replaced

Inference

Note: this example is run first, before the model variant generation example, so the inference demo can leverage the variants we trained on the larger dataset, which were downloaded and extracted in the "Download trained model-variants step." To run the example of the inference system on a small sample dataset -

  • Navigate to the repo_team14 directory and open Jupyter notebooks by running: jupyter notebook
  • Open the notebook - examples/Inference_System_Demo.ipynb
  • Once you open the ipynb file of interest, open the "Kernel" menu, then "Change Kernel" and select the smr_env kernel
  • Run all cells
  • Wait for a few minutes as profiling takes time.

Model Variant Generation

Note: the variants generated here will not perform well due to being trained on a small sample dataset, but this example is intended to show the variant generation process. To run the example of the MTL model variant generation on a small sample dataset -

  • Open the notebook - examples/Generate_Model_Variants_MTL_Example.ipynb
  • Run all cells
  • The model variant files will be created and visible in the examples/models/model_variants directory.

Run full system

To Download dataset -

  • Run the command: gdown "https://drive.google.com/uc?id=18YAbwahQT808HjJ0ZthqX6oKNkYZd-Yf"
  • unzip all files.

Build and Run Code

Set up conda environment (if not already set up)

  • Run the following command to set up the conda environment:
    • Windows: conda env create --name smr_env --file smr_env_windows.yml
    • Linux: conda env create --name smr_env --file smr_env_linux.yml
    • Mac: conda env create --name smr_env --file smr_env_mac.yml
  • Activate the conda environment by running: conda activate smr_env
  • Keep this environment active throughout the next steps
  • Note: you must have a GPU with NVIDIA CUDA and Drivers configured on your system for our code to run, since part of the examples/demos run on GPU.
  • Note: whenever opening an ipynb file of interest, open the "Kernel" menu, then "Change Kernel" and select the smr_env kernel

Variant Generator

Run Generate_Model_Variants_MTL.ipynb to generate all the models-variants for the MTL version of the system. Run Generate_Model_Variants_SingleTask.ipynb to generate all the model-variants for the single-task version (with no MTL models) of the system.

Inference System

  • Run inference/Inference.ipynb to simulate the different systems and run the queries on the system and generate the results in the report.
  • RequestDetails can be used to specify requests to the system
  • By default, the system runs on CPU hardware. To run it on GPU, replace the followin in inference/Inference.ipynb and inference/utils.py. -
    • Replace instances of model = torch.load(model_path, map_location=torch.device('cpu')) with model = torch.load(model_to_use.file_path)
    • Replace model.cpu() with model.cuda()
    • Replace request_details.input_image with request_details.input_image.cuda()
    • Replace output = model(image) with output = model(image.cuda())

View Experimental Results

  • TaskHead_Length_Experiments.ipynb shows our code to test the effect of task head length on latency
  • Pruning_Robustness_Experiments.ipynb shows code testing relationship between pruning amount and accuracy
  • inference/Inference.ipynb shows code testing the system inference performance, including Pareto curves and mishit plots

About

Latency-aware Pruning for Multi-task Learning. Final Project for CS8803: Systems for Machine Learning at Georgia Tech

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors