diff --git a/projects/thinking_midtraining/README.md b/projects/thinking_midtraining/README.md index 82a41f3..6f62edf 100644 --- a/projects/thinking_midtraining/README.md +++ b/projects/thinking_midtraining/README.md @@ -58,7 +58,7 @@ $\tilde{\mathcal{D}} = \{\tilde{c}^1, \tilde{c}^2, \ldots, \tilde{c}^N\}$. ### 2) Thinking SFT Mid-training -We perform supervised fine-tuning (SFT) mid-training on half of the augmented corpus, which we call $$\tilde{\mathcal{D}}\_{\text{SFT}}$$, using standard next-token prediction. Given a base model $$\mathcal{M}\_{\text{base}}$$ parameterized by $\theta$, we optimize the following objective: $\mathcal{L}\_{\text{SFT}}(\theta) = -\mathbb{E}\_{\tilde{c}^i \sim \tilde{\mathcal{D}}} \left[ \sum_{j=1}^{|\tilde{c}^i|} \log P_\theta(\tilde{c}^i_j \mid \tilde{c}^i_{