Releases · databricks/compose-rl

29 Jul 21:43

ethantang-db

v0.8.0

cba77b8

v0.8.0 Latest

Latest

What's Changed

Removed Compatibility directories by @jdchang1 in #98
Prompts per iteration by @bcui-db in #94
Allow reference checkpoint to be both callback and load_path by @dakinggg in #99
Added MessagesDataloader so we can just use messages in our datasets rather than tokenized inputs by @SeanKski in #92
Add logs around reward process pool recreation by @dakinggg in #101
Revert timeout by @dakinggg in #104
Added proper temperature scaling of logits by @jdchang1 in #105
Wensun/apo by @wensun in #96
vLLM Chat Conversion by @jdchang1 in #102
hotfix by @jdchang1 in #108
STEM Benchmarks and verifiers by @gupta-abhay in #95
Add prefix caching by @gupta-abhay in #107
Update codeowners by @dakinggg in #113
Changes for accumulate flag by @gupta-abhay in #111
Adding Token Counter for Online RL by @rithwik-db in #110
Use Single Controller design with unit test by [Experimental] @bowenyang008 in #114
Update Code Owners by @gupta-abhay in #116
refactor ppo callback to move its logic to single controller [Experimental] by @bowenyang008 in #115

New Contributors

@SeanKski made their first contribution in #92
@wensun made their first contribution in #96
@rithwik-db made their first contribution in #110

Full Changelog: v0.7.0...v0.8.0

Contributors

wensun, gupta-abhay, and 6 other contributors

Assets 2

23 Jun 16:58

jdchang1

v0.7.0

22dd44e

v0.7.0

What's Changed

Added verified answers to the logging by @abaheti95 in #63
Adding GPU CI back by @dakinggg in #64
Fix args propagation by @dakinggg in #65
Fix weight propagation by @bcui-db in #66
Microbatching fixes by @dakinggg in #71
Make myself admin by @gupta-abhay in #72
Update ci-testing to latest version by @dakinggg in #70
Move generate to be done via prompt_token_ids by @bcui-db in #73
Add GRPO assert that we need more than one generation by @bcui-db in #74
Adding a Math format verifier by @gupta-abhay in #75
Ping foundry version and hash to prepare foundry upgrade by @bowenyang008 in #76
Bump to torch 2.7 by @bowenyang008 in #77
Allow DPO reference model to be loaded from LoadCheckpoint callback by @dakinggg in #80
Set default value as this is only used for local debugging by @gupta-abhay in #84
Add More Codeowners by @bcui-db in #86
Fix reward timeouts by @dakinggg in #87
Remove llama models as defaults by @gupta-abhay in #88
Skip initial vLLM weight load. by @dakinggg in #89
Fix memory leak by @dakinggg in #90
Renaming and Organization of RL algorithms in preparation for Development by @jdchang1 in #83
Causal classifier by @alextrott16 in #8
Vllm import Hotfix by @jdchang1 in #91
Fixing entropy calculation by @abaheti95 in #85

Full Changelog: v0.5.0...v0.7.0

Contributors

alextrott16, gupta-abhay, and 5 other contributors

Assets 2

02 Jun 21:45

bowenyang008

v0.6.0

7416c48

v0.6.0

What's Changed

Added verified answers to the logging by @abaheti95 in #63
Adding GPU CI back by @dakinggg in #64
Fix args propagation by @dakinggg in #65
Fix weight propagation by @bcui-db in #66
Microbatching fixes by @dakinggg in #71
Make myself admin by @gupta-abhay in #72
Update ci-testing to latest version by @dakinggg in #70
Move generate to be done via prompt_token_ids by @bcui-db in #73
Add GRPO assert that we need more than one generation by @bcui-db in #74
Adding a Math format verifier by @gupta-abhay in #75
Ping foundry version and hash to prepare foundry upgrade by @bowenyang008 in #76
Bump to torch 2.7 by @bowenyang008 in #77

New Contributors

@bowenyang008 made their first contribution in #76

Full Changelog: v0.5.0...v0.6.0

Contributors

gupta-abhay, abaheti95, and 3 other contributors

Assets 2

15 May 04:43

gupta-abhay

v0.5.0

d65335f

v0.5.0

What's new

Online RL Algorithms: We now support PPO and GRPO for online RL training
RL with Verifiable Rewards: We've added support for verifiable rewards with online RL algorithms, along with evaluations during training.
Registries for extensible and composable design
Robust vLLM support for efficient inference during online RL training

What's Changed

Update version to match latest release by @dakinggg in #25
attach vllm engines to state by @vchiley in #20
Adding warning for truncating preferences by @bcui-db in #27
Add load planner for PPO by @bcui-db in #18
Auto set TP size by @vchiley in #29
Enable Masking of EOS tokens list by @bcui-db in #31
Accomodate typing changes for transformers 4.51 by @dakinggg in #33
Dataloader changes for RLVR by @gupta-abhay in #21
Moved the long seq fix on top of main by @abaheti95 in #34
Changes for better reward validation by @gupta-abhay in #35
Inheritance fix by @gupta-abhay in #37
Simple change by @gupta-abhay in #40
K generation per prompt by @abaheti95 in #36
Merge ReadMEs for easier parsing by @gupta-abhay in #41
Enable hf token for restricted data access by @gupta-abhay in #42
Enable different KL estimators for training by @gupta-abhay in #44
update readme by @bcui-db in #45
Upgrade yapf version by @gupta-abhay in #46
Fast inference w/ single vllm generate call per PPO iter by @abaheti95 in #43
Addressing cleanup comments on fast vLLM PR by @abaheti95 in #49
Improving online RL logging by @abaheti95 in #50
Update vLLM, enables single node Tensor parallel sizes (1, 2, 4, 8) by @bcui-db in #48
Unified kl estimators by @gupta-abhay in #53
Add codeowners by @gupta-abhay in #54
Add chat functionality to vLLM actor by @bcui-db in #55
Exposing average log prob flag by @abaheti95 in #56
Modifying codeowners by @gupta-abhay in #57
GRPO implementation by @abaheti95 in #51
Registries for extending compose-rl by @gupta-abhay in #47
Simple tests for new registries by @gupta-abhay in #58
Timeout change by @gupta-abhay in #59
Fix label generation for MATH to match verification by @gupta-abhay in #60
Changes for optional tokens list by @gupta-abhay in #61
Minor changes for dtype and docstrings by @gupta-abhay in #62

New Contributors

@vchiley made their first contribution in #20
@gupta-abhay made their first contribution in #21

Full Changelog: v0.4.0...v0.5.0

Contributors

vchiley, gupta-abhay, and 3 other contributors

Assets 2

09 Apr 06:39

dakinggg

v0.4.0

c733544

v0.4.0

What's Changed

Move non optional deps to non optional by @dakinggg in #23

Full Changelog: v0.3.0...v0.4.0

Contributors

dakinggg

Assets 2

08 Apr 23:16

irenedea

v0.3.0

9d3d63f

v0.3.0

What's Changed

Force float32 when loading transformers configs by @dakinggg in #11
Torch 2.6 Version Bump by @abaheti95 in #13
Preference RL refactor by @abaheti95 in #12
Standardized the sequence_id batch variable to match llm-foundry by @abaheti95 in #14
Standardized attention mask field in DPO, RM and finegrained preferences by @abaheti95 in #15
Updating sequence length usage by @bcui-db in #17
Separate inference engine by @bcui-db in #16
Upper bound vllm by @dakinggg in #19
Update setuptools version by @irenedea in #22

New Contributors

@dakinggg made their first contribution in #11
@abaheti95 made their first contribution in #13
@irenedea made their first contribution in #22

Full Changelog: v0.2.1...v0.3.0

Contributors

abaheti95, irenedea, and 2 other contributors

Assets 2

07 Mar 19:27

bcui-db

v0.2.1

9f8237f

v0.2.1

Cutting a new release to include cpu-release

Assets 2

06 Mar 22:02

bcui-db

v0.2.0

db44681

v0.2.0

Added support for classifiers in LLMs, fixed some tests, and updated release dependencies.

Assets 2

05 Feb 00:14

bcui-db

v0.1.0

36c7a85

v0.1.0

This is the first release of Databricks' Compose-RL, which is a library designed to streamline the integration of various reinforcement learning from human feedback (RLHF) techniques.

Assets 2

Releases: databricks/compose-rl

v0.8.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.7.0

What's Changed

Contributors

Uh oh!

v0.6.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.0

What's new

What's Changed

New Contributors

Contributors

Uh oh!

v0.4.0

What's Changed

Contributors

Uh oh!

v0.3.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.1

Uh oh!

v0.2.0

Uh oh!

v0.1.0

Uh oh!