avengers/Roadmap.md at main · foundation-model-stack/avengers · GitHub

9 lines (6 loc) · 630 Bytes

The current plan is that, for each model variant, the followings will be run in order:

3b run with 300b tokens (cc only, using FineWeb-Edu)
3b run with 1T tokens (mixed data, using FineWeb-Edu + Dolma v1.7)
7b run with 2T-4T tokens (TBD)

models will be evaluated and reported back in this repo.

We will keep a running log of the experiments and rough schedules (depends on GPU availability).