Skip to content

Latest commit

 

History

History
9 lines (6 loc) · 630 Bytes

File metadata and controls

9 lines (6 loc) · 630 Bytes

The current plan is that, for each model variant, the followings will be run in order:

  1. 3b run with 300b tokens (cc only, using FineWeb-Edu)
  2. 3b run with 1T tokens (mixed data, using FineWeb-Edu + Dolma v1.7)
  3. 7b run with 2T-4T tokens (TBD)

models will be evaluated and reported back in this repo.

We will keep a running log of the experiments and rough schedules (depends on GPU availability).