Institution: Ben-Gurion University of the Negev, Faculty of Engineering Sciences, Department of Software and Information Systems
Course: Deep Reinforcement Learning
Date Published: 24/12/2025
Due Date: 16/01/2024
-
Submission Format: The assignment should be submitted in pairs via the Moodle course site.
-
Zip File Contents: The submission must include a zip file containing:
- A report in PDF format containing answers to questions and requested code outputs.
- Detailed explanations and analysis.
- Short instructions for running the scripts.
- The scripts of your solutions.
-
Technical Requirements:
- Scripts must be written in Python using the PyTorch library for Neural Networks.
- Use TensorBoard for visualization and graphs.
-
Report Guidelines:
- The report can be written in English or Hebrew.
- The length must not exceed six pages.
- Include your names and IDs in the report.
While humans and animals can learn new tasks in just a few trials, deep reinforcement learning algorithms usually require a large number of trials. Standard tools require re-collecting large datasets and training from scratch for new tasks. Intuitively, knowledge from one task should facilitate learning related tasks more quickly.
In this assignment, you will design a reinforcement learning algorithm that leverages prior experience to solve new tasks quickly, a method referred to in literature as meta-reinforcement learning.
In this section, you will implement the actor-critic architecture (from HW2) for two additional small control problems: Acrobot-v1 and MountainCarContinuous-v0.
Goals:
-
Achieve the respective goals: reaching the mountain top and bringing the acrobot to a pre-specified height.
-
Standardization for Transfer Learning: The size of the input and output for all tasks must be identical.
- For problems with smaller inputs, pad with 0.
- For problems with smaller outputs, create "empty" actions that are never used.
Requirements:
- You must retrain the architecture for the cartpole problem.
- You may use different architectures for each problem, but each must include at least one hidden layer (Input -> Hidden -> Output).
- Provide statistics for running time and the number of training iterations required for convergence.
You will fine-tune a model trained on a source problem and apply it to a target problem.
Tasks: Apply the following to two pairs (Source -> Target):
- Acrobot -> Cartpole
- Cartpole -> MountainCar
Procedure:
- Take the model fully trained on the source.
- Re-initialize the weights of the output layer.
- Train the new network on the target.
Analysis:
- Provide statistics on running time and training iterations.
- Compare results to Section 1. Did fine-tuning lead to faster convergence?
In this section, you will implement a simplified version of Progressive Networks.
Tasks: Apply the following settings (Sources -> Target):
- {Acrobot, MountainCar} -> Cartpole
- {Cartpole, Acrobot} -> MountainCar
Procedure:
-
Use the fully-trained source networks created in Section 1 and connect them to the un-trained target network.
-
Frozen Sources: The source networks remain frozen throughout the process.
-
Adapters: Implementing adapters (marked with 'a' in diagrams) is optional.
Connections:
-
Single Hidden Layer: Connect the hidden layers of the sources to the output of the target network.
-
Multiple Hidden Layers: Connect the top hidden layer in the source to the target output, then connect layer to until you run out of hidden layers in one of the architectures.
Analysis:
- Train until convergence. Did transfer learning improve training?
- Provide statistics on running time and training iterations.
Important Note: Transfer learning is tricky. If you do not succeed in showing improvement, you must document your efforts and explain how you attempted to get the architectures to work.
DRL-ass3/
├── train.py # Main CLI entry point
├── README.md
├── requirements.txt # Python dependencies
├── src/
│ ├── actor_critic.py # Section 1: Actor-Critic implementation
│ ├── environments.py # Standardized environment wrappers
│ ├── fine_tuning.py # Section 2: Fine-tuning trainer
│ ├── progressive_networks.py # Section 3: Progressive Networks
│ └── utils.py # Configuration and utilities
├── models/ # Saved model checkpoints
├── logs/ # TensorBoard logs
└── report/
├── report.pdf # Final report (4 pages)
├── report.tex # LaTeX source
└── generate_pdf.py # PDF generation script
pip install -r requirements.txt# Run all sections sequentially
python train.py --section all
# Run individual sections
python train.py --section 1 # Train individual networks (Section 1)
python train.py --section 2 # Fine-tuning experiments (Section 2)
python train.py --section 3 # Progressive networks (Section 3)
# With custom parameters
python train.py --section all --episodes 2000 --lr 0.001 --hidden 128tensorboard --logdir logs/Then open http://localhost:6006 in your browser.
- Observation dimension: 6 (padded with zeros for smaller environments)
- Action dimension: 3 (with action masking for environments with fewer actions)
- Actor: 6 (input) → 128 (hidden) → 128 (hidden) → 3 (output)
- Critic: 6 (input) → 128 (hidden) → 128 (hidden) → 1 (output)
- Xavier/Glorot initialization for all weights