Multiturn-RL

Multiturn RL experiments

Setup

$ uv sync

Run

$ cd experiments
$ ./train.sh

TODO

Make custom chat scheduler for terminal environment
Create custom reward functions for various tasks (SWE-Bench, TACO, math, etc.)
Collect data
- ToRL dataset?
Run experiments

Data

*1. Math

NuminaMath
AIME
MATH
etc.

Coding
- TACO *3. SWE
- SWE-Bench

Notes:

should we do regular RL training before integrating with terminal? seems like it will take a while before the model starts using the terminal

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.vscode		.vscode
SWE-bench		SWE-bench
SWE-smith		SWE-smith
experiments		experiments
verl		verl
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multiturn-RL

Setup

Run

TODO

Data

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multiturn-RL

Setup

Run

TODO

Data

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages