Skip to content

jiawei96-liu/distributed_PS_ML

 
 

Repository files navigation

Training ML tasks with in-network aggregation

A distributed PS training architecture with P4 programmable switches accelerating.

Dependency

pytorch needed

sudo apt install libjpeg-dev zlib1g-dev libssl-dev libffi-dev python-dev build-essential libxml2-dev libxslt1-dev

python dependency

  pip3 install pulp numpy tensorboard

cpu only pytorch

pip3 install torch==1.10.0+cpu torchvision==0.11.1+cpu torchaudio==0.10.0+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html

Usage

run python3 server.py

About

A PS ML training architecture with p4 programmable switches.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 73.9%
  • C++ 17.0%
  • Cuda 3.9%
  • Shell 1.5%
  • Batchfile 1.0%
  • Java 0.9%
  • Other 1.8%