[FEAT][TRAINING] Add fine-tuning support for EAGLE3 from HF Hub#268
[FEAT][TRAINING] Add fine-tuning support for EAGLE3 from HF Hub#268VincentG1234 wants to merge 12 commits intovllm-project:mainfrom
Conversation
ae09b3b to
a36d8e9
Compare
|
Hi @VincentG1234, this is great, thank you for working on this (and opening the related issue)! My main feedback at this stage is that it would be good to move the loading logic + t2d/d2t extraction logic into utility functions inside the speculators package. We actually have some similar code for loading from hf/local safetensor files already in src/speculators/utils/loading.py, so maybe this could be consolidated with what you've added (although our existing loading utils are mostly targeted at more focused on extracting a single tensor, like the lm head or token embedding). Additionally, we will want at least one test that loads an existing checkpoint from HF before merging. Let me know if I can help! |
dsikka
left a comment
There was a problem hiding this comment.
Thank you for your contribution! Just out of curiosity, how much data was needed to fine tune to the llama3 draft model?
I haven't conducted a full fine-tuning yet. My aim is to enhance the model in French while maintaining performance in English. I will share my results!
Thank you for the feedback! I'll fix this very soon. |
|
This pull request has merge conflicts that must be resolved before it can be |
4c5de19 to
f09a235
Compare
|
Hello guys @dsikka @fynnsu, just to keep you in touch. I come with good news:
have a nice week end |
There was a problem hiding this comment.
Pull request overview
Adds fine-tuning (“warm start”) support for EAGLE3 training by allowing initialization from a pretrained EAGLE3 checkpoint (local dir or HuggingFace Hub), including automatic loading of safetensors weights and extraction of d2t/t2d vocab mappings.
Changes:
- Add
--pretrained-model-pathtoscripts/train.pyand load pretrained config/weights when provided. - Implement
load_full_state_dict()andextract_vocab_mappings()utilities for single/sharded safetensors and Hub download. - Add unit + integration tests covering pretrained loading, mapping extraction, and weight loading behavior.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
scripts/train.py |
Adds pretrained initialization flow, mapping/config initialization helpers, and weight-loading logging/validation. |
src/speculators/utils/loading.py |
Adds full safetensors state_dict loading (single/sharded) + mapping extraction helpers. |
tests/unit/utils/test_loading.py |
Expands loading tests to cover new helpers and safetensors loading. |
tests/unit/train/test_pretrained_loading.py |
Adds unit tests for the new training-time pretrained loading helpers. |
tests/integration/test_pretrained_loading.py |
Adds integration tests that load a real EAGLE3 model and validate end-to-end loading. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
This pull request has merge conflicts that must be resolved before it can be |
78339ca to
856483c
Compare
856483c to
8d4adcc
Compare
6b0df63 to
9ce938c
Compare
acf83bb to
9272916
Compare
|
I failed the CI quality checks failed, despite receiving 0 errors when running 'make quality' locally. The problem appears to be related to the 'ruff format' not being included in the local quality checks. It seems there might be a misalignment between the local and CI environments. Or maybe, there might be a misuse on my part 👀 |
Hi @VincentG1234, as you said, there could be an environment mismatch between your local environment and our CI one. Could you try resetting your virtual environment and then running If you're using rm -r .venv/
uv venv -p 3.10
source .venv/bin/activate
uv pip install -e .[dev]
make style # apply auto fixes if possible
make quality # confirm passes |
|
|
||
| assert config.draft_vocab_size == d2t.shape[0] | ||
| model_class = SpeculatorModel.registered_model_class_from_config(config) | ||
| model = model_class( # type: ignore[call-arg] |
There was a problem hiding this comment.
That line caused the fail of the last CI.
Context:
SpeculatorModel.registered_model_class_from_config(config) is typed as returning type[SpeculatorModel]. The base class SpeculatorModel.init (in src/speculators/model.py) requires verifier and verifier_attachment_mode. For EAGLE3 configs, the actual class returned is Eagle3DraftModel (in src/speculators/models/eagle3/core.py), whose init only takes config, t2d, and d2t and passes fixed values to super().init(config=..., verifier=None, verifier_attachment_mode="train_only").
So: calling without verifier/verifier_attachment_mode triggers mypy [call-arg]; calling with them triggers a runtime TypeError from Eagle3DraftModel.init.
fix:
To keep it simple, and do not modify another file, I fixed it by adding # type: ignore[call-arg] for this call so mypy skips the check.
Follow-up:
It would be good to keep this mismatch in mind and consider aligning constructors in a later change, so that code using registered_model_class_from_config() can call the constructor in a type-safe way without ignores.
Hoping we are finally good 😄
|
This pull request has merge conflicts that must be resolved before it can be |
4d5abff to
0368384
Compare
0368384 to
a0848d6
Compare
|
I ran a small experiment to better understand the French performance gap and wanted to share some preliminary results. I used MT-Bench French as a benchmark: To isolate the language effect, I ran the exact same prompts in English and in French. BaselineEN weighted acceptance FR weighted acceptance This reveals a large gap between English and French speculative acceptance. Small targeted fine-tuningI fine-tuned Eagle3 on ~5k French samples (angeluriot/french_instruct), using verifier-generated assistant responses.
Training took ~40 minutes on an A100 40GB. After fine-tuningEN (no English samples seen) FR French acceptance improves across all positions, especially deeper ones, while English performance remains strong despite seeing no English data. This is still a small and somewhat superficial experiment, but the results are encouraging and suggest that light language-specific fine-tuning can noticeably improve speculative decoding efficiency and reduce the language gap. I can share a Colab notebook if that would be useful. EDIT: here is the Colab: Link NB Colab If helpful, I can also iterate on the PR, for example by reducing diffs, simplifying parts of the logic, or improving tests, whatever makes review easier and the code more robust. |
This commit introduces the ability to fine-tune pretrained EAGLE3 models from HuggingFace Hub or local checkpoints using the --pretrained-model-path argument in train.py. Key features: - Automatic vocabulary mapping (d2t/t2d) extraction from pretrained models - Automatic draft vocabulary size detection - Support for both HuggingFace Hub and local model paths - Improved error handling and validation for vocabulary mappings - Network-independent unit tests Testing: - Add comprehensive unit tests for pretrained loading utilities - Add integration tests for end-to-end pretrained model loading - Add tests for vocabulary mapping validation Documentation: - Update scripts/README.md with fine-tuning examples and usage - Update train.py docstrings and argument descriptions Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
- Move load_model_weights from train.py to loading.py as load_pretrained_weights - Make integration tests algorithm-agnostic (use registry instead of Eagle3DraftModel) - Add TYPE_CHECKING imports to avoid circular dependencies - Update unit tests to reflect the new function location - Add type hints and assertions for better type safety Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
- Fix missing closing parenthesis in train.py parser - Remove trailing whitespace in loading.py - Remove unused imports in tests - Reorganize import statements Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
…E2E weight sanity Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
…ained loading Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
a0848d6 to
cefe679
Compare
|
Hi @VincentG1234, sorry it's been a hectic couple weeks working on today's release and I haven't had a good opportunity to review this. I appreciate you working on this but looking into it further, I realized there is some additional complexity revolving around weight initialization, ensuring that weights are distributed correctly when loading from checkpoint vs loading from pretrained vs setting up from scratch. In addition, we previously supported transformers I think that will largely cover the scope of this pr, and also allows us to handle the loading models from HF and disk, without needing to directly manage the model repo downloads. If you have a moment, please take a look at #333 and let me know if it's missing anything. |
|
Feature implemented in the PR #333 |
Summary
Adds support for fine-tuning EAGLE3 models from pretrained checkpoints, enabling users to initialize training from existing models stored locally or on HuggingFace Hub.
Changes
--pretrained-model-pathCLI argument toscripts/train.pyload_safetensors_state_dict()function supporting:d2t/t2dvocab mappings from pretrained modelsdraft_vocab_sizefrom loaded mappingsUsage
Fine-tune from HuggingFace Hub:
python scripts/train.py
--verifier-name-or-path meta-llama/Llama-3.1-8B-Instruct
--pretrained-model-path RedHatAI/Llama-3.1-8B-Instruct-speculator.eagle3
--data-path ./new_data
--epochs 3
--lr 5e-5
Related issue
issue #269