o1 Replication

an effort to replicate o1

TODO:

clean up code and separate data from code
make all the scripts use CLI args and have a main scripts/ folder. separate the project into:

and have proper python modules and everything. also, use classes to make things more modular and safe. should make the research more replicable and reliable overall. probably make a general pipeline for generating SFT data, training on it, etc. also make pipeline for the RL being done, with many reward functions for different domains and such

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
rl		rl
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

o1 Replication

TODO:

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

o1 Replication

TODO:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages