This repository contains the source code for the paper "Learning the Value Systems of Societies with Multi-objective Preference-based Inverse Reinforcement Learning", accepted at AAMAS 2026 (OpenReview). Our algorithm, SVSL-P, observes a certain MOMDP envirnment, a given set of value labels and demonstrations of preferences of a (here, simulated) society of diverse agents with different value systems (multiobjective preference weights). Then, it simultaneously learns a reward vector for the MOMDP that implements a value alignment specification for the given set of values, a set of upto L preference weights that describe the different preferences observed, a clustering of agents into these value systems, and a weight-dependent policy (\Pi(s,a|W)) that approximates the optimal policy for any given set of weights (in particular those selected as the clusters of the society).
The repository includes the other algorithms used in the paper in the evaluation, namely Envelope Q-learning from MORL-baselines MORL baselines, a modification of our previous algorithm Value Learning From Preferences, and a custom implementation of Preference-based Multi-Objective Reinforcement Learning.
This is the development branch. The version of the code accepted at AAMAS 2026 is available in the branch AAMAS2026 and in the supplementary material of the OpenReview version.
- Select an empty main folder, go inside.
- Create a virtual environment with Python 3.13+. We used 3.13.5 in the paper.
python3.13 -m venv .venvsource .venv/bin/activate
-
Clone the following repositories inside the main folder.
- MORL baselines fork.
- Mushroom RL fork. Then make sure to select the branch "andres-dev".
cd mushroom-rl-kzgit checkout andres-devcd ..
-
Clone this repository in the main folder.
git clone https://github.com/andresh26-uam/ValueLearningInMOMDP.gitcd ValueLearningInMOMDP
-
Install packages
pip install ../mushroom-rl-kzpip install ../morl-baselines-reward
-
Requirements.
pip install -r full_requirements.txt
- Perform the steps 1-3 from "If not modifying the full code".
- Clone the following repositories in folder F.
cd ..(if needed to get to F)- Clone: Baraacuda. Then, make sure to select the branch "andres-dev":
cd baraacudagit checkout andres-devcd ..
- Clone: Imitation fork
- Requirements.
- Remove or comment lines 89 and 95 in
full_requirements.txt. pip install -r full_requirements.txt
- Remove or comment lines 89 and 95 in
- Go into ValueLearningInMOMDP.
- Create a virtual environment with Python 3.13+. We used 3.13.5 in the paper.
python3.13 -m venv .venvsource .venv/bin/activatepip install -r full_requirements.txt
### Generate preference datasets (and execute the Envelope Q learning baseline)
- FF environment
sh script.sh -ffmo -genrt -algo pc -L 10 -expol envelope -pol envelope - MVC environment
sh script.sh -mvc -genrt -algo pc -L 10 -expol envelope -pol envelope### Training The code is not memory efficient, you need at least 16GB RAM (preferably 32GB) and run only one of these simultaneously. - SVSL-P, FF:
sh script.sh -ffmo -trval -algo cpbmorl -L 10 -prefix "repr" -pdata "" -seeds 25,26,27,28,29,30,31,32,33,34 - PbMORL, FF:
sh script.sh -ffmo -trval -algo pbmorl -L 10 -prefix "repr" -pdata "" -seeds 25,26,27,28,29,30,31,32,33,34 - SVSL, FF:
sh script.sh -ffmo -trval -algo pc -L 10 -prefix "repr" -pdata "" -seeds 25,26,27,28,29,30,31,32,33,34 - SVSL-P, MVC:
sh script.sh -mvc -trval -algo cpbmorl -L 15 -prefix "repr" -pdata "" -seeds 26,27,28,29,30,31,32,33,35 - PbMORL, MVC:
sh script.sh -mvc -trval -algo pbmorl -L 15 -prefix "repr" -pdata "" -seeds 26,27,28,29,30,31,32,33,35 - SVSL, MVC:
sh script.sh -mvc -trval -algo pc -L 15 -prefix "repr" -pdata "" -seeds 26,27,28,29,30,31,32,33,35
The results can be accessed, e.g., for the first case, under folder results/ffmo/experiments/repr_cpbmorl_ffmo_EnvelopeClusteredPBMORL_from_envelope.