Releases: databricks/compose-rl
Releases · databricks/compose-rl
v0.8.0
What's Changed
- Removed Compatibility directories by @jdchang1 in #98
- Prompts per iteration by @bcui-db in #94
- Allow reference checkpoint to be both callback and load_path by @dakinggg in #99
- Added MessagesDataloader so we can just use
messagesin our datasets rather than tokenized inputs by @SeanKski in #92 - Add logs around reward process pool recreation by @dakinggg in #101
- Revert timeout by @dakinggg in #104
- Added proper temperature scaling of logits by @jdchang1 in #105
- Wensun/apo by @wensun in #96
- vLLM Chat Conversion by @jdchang1 in #102
- hotfix by @jdchang1 in #108
- STEM Benchmarks and verifiers by @gupta-abhay in #95
- Add prefix caching by @gupta-abhay in #107
- Update codeowners by @dakinggg in #113
- Changes for accumulate flag by @gupta-abhay in #111
- Adding Token Counter for Online RL by @rithwik-db in #110
- Use Single Controller design with unit test by [Experimental] @bowenyang008 in #114
- Update Code Owners by @gupta-abhay in #116
- refactor ppo callback to move its logic to single controller [Experimental] by @bowenyang008 in #115
New Contributors
- @SeanKski made their first contribution in #92
- @wensun made their first contribution in #96
- @rithwik-db made their first contribution in #110
Full Changelog: v0.7.0...v0.8.0
v0.7.0
What's Changed
- Added verified answers to the logging by @abaheti95 in #63
- Adding GPU CI back by @dakinggg in #64
- Fix args propagation by @dakinggg in #65
- Fix weight propagation by @bcui-db in #66
- Microbatching fixes by @dakinggg in #71
- Make myself admin by @gupta-abhay in #72
- Update ci-testing to latest version by @dakinggg in #70
- Move generate to be done via
prompt_token_idsby @bcui-db in #73 - Add GRPO assert that we need more than one generation by @bcui-db in #74
- Adding a Math format verifier by @gupta-abhay in #75
- Ping foundry version and hash to prepare foundry upgrade by @bowenyang008 in #76
- Bump to torch 2.7 by @bowenyang008 in #77
- Allow DPO reference model to be loaded from LoadCheckpoint callback by @dakinggg in #80
- Set default value as this is only used for local debugging by @gupta-abhay in #84
- Add More Codeowners by @bcui-db in #86
- Fix reward timeouts by @dakinggg in #87
- Remove llama models as defaults by @gupta-abhay in #88
- Skip initial vLLM weight load. by @dakinggg in #89
- Fix memory leak by @dakinggg in #90
- Renaming and Organization of RL algorithms in preparation for Development by @jdchang1 in #83
- Causal classifier by @alextrott16 in #8
- Vllm import Hotfix by @jdchang1 in #91
- Fixing entropy calculation by @abaheti95 in #85
Full Changelog: v0.5.0...v0.7.0
v0.6.0
What's Changed
- Added verified answers to the logging by @abaheti95 in #63
- Adding GPU CI back by @dakinggg in #64
- Fix args propagation by @dakinggg in #65
- Fix weight propagation by @bcui-db in #66
- Microbatching fixes by @dakinggg in #71
- Make myself admin by @gupta-abhay in #72
- Update ci-testing to latest version by @dakinggg in #70
- Move generate to be done via
prompt_token_idsby @bcui-db in #73 - Add GRPO assert that we need more than one generation by @bcui-db in #74
- Adding a Math format verifier by @gupta-abhay in #75
- Ping foundry version and hash to prepare foundry upgrade by @bowenyang008 in #76
- Bump to torch 2.7 by @bowenyang008 in #77
New Contributors
- @bowenyang008 made their first contribution in #76
Full Changelog: v0.5.0...v0.6.0
v0.5.0
What's new
- Online RL Algorithms: We now support PPO and GRPO for online RL training
- RL with Verifiable Rewards: We've added support for verifiable rewards with online RL algorithms, along with evaluations during training.
- Registries for extensible and composable design
- Robust vLLM support for efficient inference during online RL training
What's Changed
- Update version to match latest release by @dakinggg in #25
- attach vllm engines to state by @vchiley in #20
- Adding warning for truncating preferences by @bcui-db in #27
- Add load planner for PPO by @bcui-db in #18
- Auto set TP size by @vchiley in #29
- Enable Masking of EOS tokens list by @bcui-db in #31
- Accomodate typing changes for transformers 4.51 by @dakinggg in #33
- Dataloader changes for RLVR by @gupta-abhay in #21
- Moved the long seq fix on top of main by @abaheti95 in #34
- Changes for better reward validation by @gupta-abhay in #35
- Inheritance fix by @gupta-abhay in #37
- Simple change by @gupta-abhay in #40
- K generation per prompt by @abaheti95 in #36
- Merge ReadMEs for easier parsing by @gupta-abhay in #41
- Enable hf token for restricted data access by @gupta-abhay in #42
- Enable different KL estimators for training by @gupta-abhay in #44
- update readme by @bcui-db in #45
- Upgrade yapf version by @gupta-abhay in #46
- Fast inference w/ single vllm generate call per PPO iter by @abaheti95 in #43
- Addressing cleanup comments on fast vLLM PR by @abaheti95 in #49
- Improving online RL logging by @abaheti95 in #50
- Update vLLM, enables single node Tensor parallel sizes (1, 2, 4, 8) by @bcui-db in #48
- Unified kl estimators by @gupta-abhay in #53
- Add codeowners by @gupta-abhay in #54
- Add
chatfunctionality to vLLM actor by @bcui-db in #55 - Exposing average log prob flag by @abaheti95 in #56
- Modifying codeowners by @gupta-abhay in #57
- GRPO implementation by @abaheti95 in #51
- Registries for extending compose-rl by @gupta-abhay in #47
- Simple tests for new registries by @gupta-abhay in #58
- Timeout change by @gupta-abhay in #59
- Fix label generation for MATH to match verification by @gupta-abhay in #60
- Changes for optional tokens list by @gupta-abhay in #61
- Minor changes for dtype and docstrings by @gupta-abhay in #62
New Contributors
- @vchiley made their first contribution in #20
- @gupta-abhay made their first contribution in #21
Full Changelog: v0.4.0...v0.5.0
v0.4.0
v0.3.0
What's Changed
- Force float32 when loading transformers configs by @dakinggg in #11
- Torch 2.6 Version Bump by @abaheti95 in #13
- Preference RL refactor by @abaheti95 in #12
- Standardized the
sequence_idbatch variable to match llm-foundry by @abaheti95 in #14 - Standardized attention mask field in DPO, RM and finegrained preferences by @abaheti95 in #15
- Updating sequence length usage by @bcui-db in #17
- Separate inference engine by @bcui-db in #16
- Upper bound vllm by @dakinggg in #19
- Update setuptools version by @irenedea in #22
New Contributors
- @dakinggg made their first contribution in #11
- @abaheti95 made their first contribution in #13
- @irenedea made their first contribution in #22
Full Changelog: v0.2.1...v0.3.0