diff --git a/projects/thinking_midtraining/README.md b/projects/thinking_midtraining/README.md index 6f62edf..d6ac919 100644 --- a/projects/thinking_midtraining/README.md +++ b/projects/thinking_midtraining/README.md @@ -58,7 +58,11 @@ $\tilde{\mathcal{D}} = \{\tilde{c}^1, \tilde{c}^2, \ldots, \tilde{c}^N\}$. ### 2) Thinking SFT Mid-training -We perform supervised fine-tuning (SFT) mid-training on half of the augmented corpus, which we call $\tilde{\mathcal{D}}_{\text{SFT}}$, using standard next-token prediction. Given a base model $\mathcal{M}_{\text{base}}$ parameterized by $\theta$, we optimize the following objective: $\mathcal{L}_{\text{SFT}}(\theta) = -\mathbb{E}_{\tilde{c}^i \sim \tilde{\mathcal{D}}} \left[ \sum_{j=1}^{|\tilde{c}^i|} \log P_\theta(\tilde{c}^i_j \mid \tilde{c}^i_{