Value Learning in Markov Decision Processes

This repository contains the source code for the paper "Learning the Value Systems of Societies with Multi-objective Preference-based Inverse Reinforcement Learning", accepted at AAMAS 2026 (OpenReview). Our algorithm, SVSL-P, observes a certain MOMDP envirnment, a given set of value labels and demonstrations of preferences of a (here, simulated) society of diverse agents with different value systems (multiobjective preference weights). Then, it simultaneously learns a reward vector for the MOMDP that implements a value alignment specification for the given set of values, a set of upto L preference weights that describe the different preferences observed, a clustering of agents into these value systems, and a weight-dependent policy (\Pi(s,a|W)) that approximates the optimal policy for any given set of weights (in particular those selected as the clusters of the society).

The repository includes the other algorithms used in the paper in the evaluation, namely Envelope Q-learning from MORL-baselines MORL baselines, a modification of our previous algorithm Value Learning From Preferences, and a custom implementation of Preference-based Multi-Objective Reinforcement Learning.

This is the development branch. The version of the code accepted at AAMAS 2026 is available in the branch AAMAS2026 and in the supplementary material of the OpenReview version.

Installation from GitHub

Python Environment

Select an empty main folder, go inside.
Create a virtual environment with Python 3.13+. We used 3.13.5 in the paper.
- python3.13 -m venv .venv
- source .venv/bin/activate

If not changing the full code:

Clone the following repositories inside the main folder.
- MORL baselines fork.
- Mushroom RL fork. Then make sure to select the branch "andres-dev".
  - cd mushroom-rl-kz
  - git checkout andres-dev
  - cd ..
Clone this repository in the main folder.
- git clone https://github.com/andresh26-uam/ValueLearningInMOMDP.git
- cd ValueLearningInMOMDP
Install packages
- pip install ../mushroom-rl-kz
- pip install ../morl-baselines-reward
Requirements.
- pip install -r full_requirements.txt

If planning on modifying the full code (or found issues):

Perform the steps 1-3 from "If not modifying the full code".
Clone the following repositories in folder F.
- cd .. (if needed to get to F)
- Clone: Baraacuda. Then, make sure to select the branch "andres-dev":
  - cd baraacuda
  - git checkout andres-dev
  - cd ..
- Clone: Imitation fork
Requirements.
- Remove or comment lines 89 and 95 in full_requirements.txt.
- pip install -r full_requirements.txt

Installation of the static version in OpenReview

Go into ValueLearningInMOMDP.
Create a virtual environment with Python 3.13+. We used 3.13.5 in the paper.
- python3.13 -m venv .venv
- source .venv/bin/activate
- pip install -r full_requirements.txt

Reproduce experiments.

### Generate preference datasets (and execute the Envelope Q learning baseline)

FF environment sh script.sh -ffmo -genrt -algo pc -L 10 -expol envelope -pol envelope
MVC environment sh script.sh -mvc -genrt -algo pc -L 10 -expol envelope -pol envelope ### Training The code is not memory efficient, you need at least 16GB RAM (preferably 32GB) and run only one of these simultaneously.
SVSL-P, FF: sh script.sh -ffmo -trval -algo cpbmorl -L 10 -prefix "repr" -pdata "" -seeds 25,26,27,28,29,30,31,32,33,34
PbMORL, FF: sh script.sh -ffmo -trval -algo pbmorl -L 10 -prefix "repr" -pdata "" -seeds 25,26,27,28,29,30,31,32,33,34
SVSL, FF: sh script.sh -ffmo -trval -algo pc -L 10 -prefix "repr" -pdata "" -seeds 25,26,27,28,29,30,31,32,33,34
SVSL-P, MVC: sh script.sh -mvc -trval -algo cpbmorl -L 15 -prefix "repr" -pdata "" -seeds 26,27,28,29,30,31,32,33,35
PbMORL, MVC: sh script.sh -mvc -trval -algo pbmorl -L 15 -prefix "repr" -pdata "" -seeds 26,27,28,29,30,31,32,33,35
SVSL, MVC: sh script.sh -mvc -trval -algo pc -L 15 -prefix "repr" -pdata "" -seeds 26,27,28,29,30,31,32,33,35

The results can be accessed, e.g., for the first case, under folder results/ffmo/experiments/repr_cpbmorl_ffmo_EnvelopeClusteredPBMORL_from_envelope.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
envs		envs
src		src
use_cases		use_cases
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
algorithm_config_apbmorl.json		algorithm_config_apbmorl.json
algorithm_config_cpbmorl.json		algorithm_config_cpbmorl.json
algorithm_config_cpbmorl_rich.json		algorithm_config_cpbmorl_rich.json
algorithm_config_pbmorl.json		algorithm_config_pbmorl.json
algorithm_config_pbmorl_stable.json		algorithm_config_pbmorl_stable.json
algorithm_config_pc.json		algorithm_config_pc.json
defines.py		defines.py
evaluate.py		evaluate.py
full_requirements.txt		full_requirements.txt
generate_dataset_mo.py		generate_dataset_mo.py
parsing.py		parsing.py
requirements.txt		requirements.txt
script.sh		script.sh
script_simple.sh		script_simple.sh
societies.json		societies.json
train_movsl.py		train_movsl.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Value Learning in Markov Decision Processes

Installation from GitHub

Python Environment

If not changing the full code:

If planning on modifying the full code (or found issues):

Installation of the static version in OpenReview

Reproduce experiments.

About

Uh oh!

Releases

Packages

Languages

License

andresh26-uam/ValueLearningInMOMDP

Folders and files

Latest commit

History

Repository files navigation

Value Learning in Markov Decision Processes

Installation from GitHub

Python Environment

If not changing the full code:

If planning on modifying the full code (or found issues):

Installation of the static version in OpenReview

Reproduce experiments.

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages