Skip to content

MultiturnRL/Multiturn-RL-SWE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multiturn-RL

Multiturn RL experiments

Setup

$ uv sync

Run

$ cd experiments
$ ./train.sh

TODO

  • Make custom chat scheduler for terminal environment
  • Create custom reward functions for various tasks (SWE-Bench, TACO, math, etc.)
  • Collect data
    • ToRL dataset?
  • Run experiments

Data

*1. Math

  • NuminaMath
  • AIME
  • MATH
  • etc.
  1. Coding
    • TACO *3. SWE
    • SWE-Bench

Notes:

  • should we do regular RL training before integrating with terminal? seems like it will take a while before the model starts using the terminal

About

Multiturn RL experiments

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors