Offline Federated Reinforcement Learning Simulator in 2D-Navigation

This project was supported in part by the Institute for Information and Communications Technology Planning and Evaluation (IITP) under Grant 2021-0-00900

Introduction

(i) Executes offline federated reinforcement learning in a 2D-Navigation environment with two heterogeneous tasks (task classification according to destination location), and (ii) measures the success rate of destination arrival of object navigation according to the global policy created by federated learning.

Environments: Navigation-2D (Components: a start-point, a end-point, a navigator, moving obstacles)
Tasks: 0, 3 (0~9)
Learning Methods: Federated Learning (Server) + Offline Reinforcement Learning (Client)
Evaluation Metric: Arrival Success Rate - Goal Score: 95% ↑
Settings: 1 Server, 4 Clients (based on d3rl and flower frameworks)

System Architecture

Environments

(i) Red: Navigator, (ii) Blue: Obstacles, (iii) Green: Destination

Install

1. Docker

docker pull mkris0714/flrl:base
docker run --gpus all -e LC_ALL=C.UTF-8 -p 8080:8080 -it mkris0714/flrl:base /bin/bash

2. Git

pip Requirements (pip dependency resolving)

pip install -U pip==20.3
pip install -r requirements.txt --use-deprecated=legacy-resolver

apt Requirements

apt install freeglut3-dev

Environments Install (based on python3.8)

cp -r navigation_2d/ /opt/conda/lib/python3.8/site-packages/

[Optional] GPU Error Resolving (based on NVIDIA GTX3090, CUDNN11)

pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html

Excute (ex. two tasks: task0, task3)

cd offline_federated_rl/

1. Expert Model Generation

python train_expert.py --env_id 0
python train_expert.py --env_id 3

outputs: models_env_id_0, models_env_id_3, tensorboard, buffers

2. Expert Offline Dataset Generation

python fl_gather_buffer.py --env_id 0 --expert_steps 100000 --num_trajectories 300 --num_clients 2 --dataset_name 100
python fl_buffer_to_mdp_dataset.py --env_id 0 --num_trajectories 300 --num_clients 2 --dataset_name 100

python fl_gather_buffer.py --env_id 3 --expert_steps 100000 --num_trajectories 300 --num_clients 2 --dataset_name 300
python fl_buffer_to_mdp_dataset.py --env_id 3 --num_trajectories 300 --num_clients 2 --dataset_name 300

outputs: buffers_fl

3. Federated Global Model Generation

python fl_server.py

python fl_client.py --env_id 0 --num_trajectories 300 --dataset_name 100

python fl_client.py --env_id 0 --num_trajectories 300 --dataset_name 101

python fl_client.py --env_id 3 --num_trajectories 300 --dataset_name 300

python fl_client.py --env_id 3 --num_trajectories 300 --dataset_name 301

outputs: forl_logs

4. Performance Evaluation (for 160~170 round global models)

xvfb-run -a python eval.py --env_id 0 --model_name example_models --start=160 --end=170

xvfb-run -a python eval.py --env_id 3 --model_name example_models --start=160 --end=170

outputs: task_0.csv, task_3.csv

5. Performance

Task0			Task3
Client Algo.	Round	Test Score (%)	Client Algo.	Round	Test Score (%)
CQL-FL	160	94	CQL-FL	160	95
CQL-FL	161	95	CQL-FL	161	99
CQL-FL	162	98	CQL-FL	162	97
CQL-FL	163	95	CQL-FL	163	100
CQL-FL	164	98	CQL-FL	164	99
CQL-FL	165	90	CQL-FL	165	98
CQL-FL	166	97	CQL-FL	166	97
CQL-FL	167	96	CQL-FL	167	96
CQL-FL	168	95	CQL-FL	168	98
CQL-FL	169	94	CQL-FL	169	98
CQL-FL	170	95	CQL-FL	170	97

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
asset		asset
in_progress		in_progress
navigation_2d		navigation_2d
offline_federated_rl		offline_federated_rl
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Offline Federated Reinforcement Learning Simulator in 2D-Navigation

This project was supported in part by the Institute for Information and Communications Technology Planning and Evaluation (IITP) under Grant 2021-0-00900

Introduction

Environments: Navigation-2D (Components: a start-point, a end-point, a navigator, moving obstacles)

Tasks: 0, 3 (0~9)

Learning Methods: Federated Learning (Server) + Offline Reinforcement Learning (Client)

Evaluation Metric: Arrival Success Rate - Goal Score: 95% ↑

Settings: 1 Server, 4 Clients (based on d3rl and flower frameworks)

System Architecture

Environments

(i) Red: Navigator, (ii) Blue: Obstacles, (iii) Green: Destination

Install

1. Docker

2. Git

pip Requirements (pip dependency resolving)

apt Requirements

Environments Install (based on python3.8)

[Optional] GPU Error Resolving (based on NVIDIA GTX3090, CUDNN11)

Excute (ex. two tasks: task0, task3)

1. Expert Model Generation

outputs: models_env_id_0, models_env_id_3, tensorboard, buffers

2. Expert Offline Dataset Generation

outputs: buffers_fl

3. Federated Global Model Generation

outputs: forl_logs

4. Performance Evaluation (for 160~170 round global models)

outputs: task_0.csv, task_3.csv

5. Performance

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages