Replicating 'Reinforcement Learning from Language Feedback' (Klissarov et al., 2026) — Gemma 3 12B, GRPO, multi-turn teacher-student training on Omni-MATH
-
Updated
Apr 5, 2026 - Python
Replicating 'Reinforcement Learning from Language Feedback' (Klissarov et al., 2026) — Gemma 3 12B, GRPO, multi-turn teacher-student training on Omni-MATH
Add a description, image, and links to the natural-language-feedback topic page so that developers can more easily learn about it.
To associate your repository with the natural-language-feedback topic, visit your repo's landing page and select "manage topics."