Skip to content

[FEAT][TRAINING] Add fine-tuning support for EAGLE3 from HF Hub#268

Closed
VincentG1234 wants to merge 12 commits intovllm-project:mainfrom
VincentG1234:eagle-finetuning
Closed

[FEAT][TRAINING] Add fine-tuning support for EAGLE3 from HF Hub#268
VincentG1234 wants to merge 12 commits intovllm-project:mainfrom
VincentG1234:eagle-finetuning

Conversation

@VincentG1234
Copy link
Copy Markdown
Contributor

@VincentG1234 VincentG1234 commented Jan 31, 2026

Summary

Adds support for fine-tuning EAGLE3 models from pretrained checkpoints, enabling users to initialize training from existing models stored locally or on HuggingFace Hub.

Changes

  • Added --pretrained-model-path CLI argument to scripts/train.py
  • Implemented load_safetensors_state_dict() function supporting:
    • Local single/sharded safetensors files
    • Automatic download from HuggingFace Hub
  • Automatic extraction of d2t/t2d vocab mappings from pretrained models
  • Automatic derivation of draft_vocab_size from loaded mappings

Usage

Fine-tune from HuggingFace Hub:

python scripts/train.py
--verifier-name-or-path meta-llama/Llama-3.1-8B-Instruct
--pretrained-model-path RedHatAI/Llama-3.1-8B-Instruct-speculator.eagle3
--data-path ./new_data
--epochs 3
--lr 5e-5

Related issue

issue #269

@fynnsu
Copy link
Copy Markdown
Collaborator

fynnsu commented Feb 2, 2026

Hi @VincentG1234, this is great, thank you for working on this (and opening the related issue)!

My main feedback at this stage is that it would be good to move the loading logic + t2d/d2t extraction logic into utility functions inside the speculators package. We actually have some similar code for loading from hf/local safetensor files already in src/speculators/utils/loading.py, so maybe this could be consolidated with what you've added (although our existing loading utils are mostly targeted at more focused on extracting a single tensor, like the lm head or token embedding).

Additionally, we will want at least one test that loads an existing checkpoint from HF before merging.

Let me know if I can help!

Copy link
Copy Markdown
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! Just out of curiosity, how much data was needed to fine tune to the llama3 draft model?

@dsikka dsikka added enhancement New feature or request eagle3 labels Feb 2, 2026
@VincentG1234
Copy link
Copy Markdown
Contributor Author

Thank you for your contribution! Just out of curiosity, how much data was needed to fine tune to the llama3 draft model?

I haven't conducted a full fine-tuning yet. My aim is to enhance the model in French while maintaining performance in English. I will share my results!

Hi @VincentG1234, this is great, thank you for working on this (and opening the related issue)!

My main feedback at this stage is that it would be good to move the loading logic + t2d/d2t extraction logic into utility functions inside the speculators package. We actually have some similar code for loading from hf/local safetensor files already in src/speculators/utils/loading.py, so maybe this could be consolidated with what you've added (although our existing loading utils are mostly targeted at more focused on extracting a single tensor, like the lm head or token embedding).

Additionally, we will want at least one test that loads an existing checkpoint from HF before merging.

Let me know if I can help!

Thank you for the feedback! I'll fix this very soon.

@mergify
Copy link
Copy Markdown

mergify Bot commented Feb 5, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @VincentG1234.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Feb 5, 2026
@dsikka dsikka added the training label Feb 6, 2026
@VincentG1234 VincentG1234 force-pushed the eagle-finetuning branch 2 times, most recently from 4c5de19 to f09a235 Compare February 6, 2026 17:17
@mergify mergify Bot removed the needs-rebase label Feb 6, 2026
@VincentG1234
Copy link
Copy Markdown
Contributor Author

VincentG1234 commented Feb 6, 2026

Hello guys @dsikka @fynnsu, just to keep you in touch. I come with good news:

  1. I investigated the acceptance drop after a light fine-tuning of an EAGLE-3 1B draft model (I test with this one: RedHatAI/Llama-3.1-8B-Instruct-speculator.eagle3). Although the weights were nearly identical to the HF checkpoint (only fp-level differences), the issue was caused by a config.json mismatch between my local setup and the HF model. In particular, rope_theta (and related RoPE scaling parameters) differed significantly, which changed the verifier’s hidden states and broke draft–verifier alignment, leading to a large drop in mean acceptance length (2.6 to 1.9).
    After aligning the RoPE configuration with the HF config, acceptance metrics returned to expected values. I will fix that in the code and I think we will be good !

  2. I exported the loading_weights funtions in the right script as suggested

  3. I’m now validating the full set of changes in my PR that enable fine-tuning EAGLE(-3) models end-to-end (the earlier config.json/RoPE mismatch should be the last issue). I could use guidance on which tests you’d consider sufficient/most relevant for this PR. btw, I don't know why, but running tox locally is currently difficult because downloading model weights takes hours on my setup...

have a nice week end

@VincentG1234 VincentG1234 marked this pull request as ready for review February 8, 2026 22:48
Copilot AI review requested due to automatic review settings February 8, 2026 22:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds fine-tuning (“warm start”) support for EAGLE3 training by allowing initialization from a pretrained EAGLE3 checkpoint (local dir or HuggingFace Hub), including automatic loading of safetensors weights and extraction of d2t/t2d vocab mappings.

Changes:

  • Add --pretrained-model-path to scripts/train.py and load pretrained config/weights when provided.
  • Implement load_full_state_dict() and extract_vocab_mappings() utilities for single/sharded safetensors and Hub download.
  • Add unit + integration tests covering pretrained loading, mapping extraction, and weight loading behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
scripts/train.py Adds pretrained initialization flow, mapping/config initialization helpers, and weight-loading logging/validation.
src/speculators/utils/loading.py Adds full safetensors state_dict loading (single/sharded) + mapping extraction helpers.
tests/unit/utils/test_loading.py Expands loading tests to cover new helpers and safetensors loading.
tests/unit/train/test_pretrained_loading.py Adds unit tests for the new training-time pretrained loading helpers.
tests/integration/test_pretrained_loading.py Adds integration tests that load a real EAGLE3 model and validate end-to-end loading.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/integration/test_pretrained_loading.py Outdated
Comment thread scripts/train.py Outdated
Comment thread src/speculators/utils/loading.py Outdated
Comment thread tests/unit/utils/test_loading.py Outdated
Comment thread tests/unit/utils/test_loading.py Outdated
Comment thread tests/unit/utils/test_loading.py Outdated
Comment thread tests/integration/test_pretrained_loading.py Outdated
Comment thread tests/integration/test_pretrained_loading.py Outdated
@mergify
Copy link
Copy Markdown

mergify Bot commented Feb 9, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @VincentG1234.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@VincentG1234 VincentG1234 marked this pull request as draft February 14, 2026 21:58
@VincentG1234 VincentG1234 marked this pull request as ready for review February 18, 2026 09:30
@VincentG1234
Copy link
Copy Markdown
Contributor Author

I failed the CI quality checks failed, despite receiving 0 errors when running 'make quality' locally. The problem appears to be related to the 'ruff format' not being included in the local quality checks. It seems there might be a misalignment between the local and CI environments. Or maybe, there might be a misuse on my part 👀

@fynnsu
Copy link
Copy Markdown
Collaborator

fynnsu commented Feb 18, 2026

I failed the CI quality checks failed, despite receiving 0 errors when running 'make quality' locally. The problem appears to be related to the 'ruff format' not being included in the local quality checks. It seems there might be a misalignment between the local and CI environments. Or maybe, there might be a misuse on my part 👀

Hi @VincentG1234, as you said, there could be an environment mismatch between your local environment and our CI one. Could you try resetting your virtual environment and then running pip install -e .[dev]. That should install the package versions we use during testing. Then run the make quality command again.

If you're using uv this would look like:

rm -r .venv/
uv venv -p 3.10
source .venv/bin/activate
uv pip install -e .[dev]
make style           # apply auto fixes if possible
make quality         # confirm passes


assert config.draft_vocab_size == d2t.shape[0]
model_class = SpeculatorModel.registered_model_class_from_config(config)
model = model_class( # type: ignore[call-arg]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That line caused the fail of the last CI.

Context:
SpeculatorModel.registered_model_class_from_config(config) is typed as returning type[SpeculatorModel]. The base class SpeculatorModel.init (in src/speculators/model.py) requires verifier and verifier_attachment_mode. For EAGLE3 configs, the actual class returned is Eagle3DraftModel (in src/speculators/models/eagle3/core.py), whose init only takes config, t2d, and d2t and passes fixed values to super().init(config=..., verifier=None, verifier_attachment_mode="train_only").
So: calling without verifier/verifier_attachment_mode triggers mypy [call-arg]; calling with them triggers a runtime TypeError from Eagle3DraftModel.init.

fix:
To keep it simple, and do not modify another file, I fixed it by adding # type: ignore[call-arg] for this call so mypy skips the check.

Follow-up:
It would be good to keep this mismatch in mind and consider aligning constructors in a later change, so that code using registered_model_class_from_config() can call the constructor in a type-safe way without ignores.

Hoping we are finally good 😄

@mergify
Copy link
Copy Markdown

mergify Bot commented Feb 19, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @VincentG1234.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@VincentG1234
Copy link
Copy Markdown
Contributor Author

VincentG1234 commented Mar 2, 2026

I ran a small experiment to better understand the French performance gap and wanted to share some preliminary results.

I used MT-Bench French as a benchmark:
https://huggingface.co/datasets/bofenghuang/mt-bench-french

To isolate the language effect, I ran the exact same prompts in English and in French.

Baseline

EN weighted acceptance

[0.768 0.557 0.376]

FR weighted acceptance

[0.586 0.349 0.188]

This reveals a large gap between English and French speculative acceptance.

Small targeted fine-tuning

I fine-tuned Eagle3 on ~5k French samples (angeluriot/french_instruct), using verifier-generated assistant responses.

  • verifier: meta-llama/Llama-3.1-8B-Instruct
  • base speculator: RedHatAI/Llama-3.1-8B-Instruct-speculator.eagle3
  • 2 epochs, lr 6e-5, seq len 8192
  • 1 layer, TTT=3, cosine scheduler

Training took ~40 minutes on an A100 40GB.

After fine-tuning

EN (no English samples seen)

[0.73  0.524 0.354]

FR

[0.659 0.432 0.268]   (# around + 0.08 at each position, so a speed up of 20-30% !!!)

French acceptance improves across all positions, especially deeper ones, while English performance remains strong despite seeing no English data.

This is still a small and somewhat superficial experiment, but the results are encouraging and suggest that light language-specific fine-tuning can noticeably improve speculative decoding efficiency and reduce the language gap.

I can share a Colab notebook if that would be useful.

EDIT: here is the Colab: Link NB Colab
It should be fully reproducible with an A100 and at least you have the log.


If helpful, I can also iterate on the PR, for example by reducing diffs, simplifying parts of the logic, or improving tests, whatever makes review easier and the code more robust.

VincentG1234 and others added 12 commits March 3, 2026 13:12
This commit introduces the ability to fine-tune pretrained EAGLE3 models
from HuggingFace Hub or local checkpoints using the --pretrained-model-path
argument in train.py.

Key features:
- Automatic vocabulary mapping (d2t/t2d) extraction from pretrained models
- Automatic draft vocabulary size detection
- Support for both HuggingFace Hub and local model paths
- Improved error handling and validation for vocabulary mappings
- Network-independent unit tests

Testing:
- Add comprehensive unit tests for pretrained loading utilities
- Add integration tests for end-to-end pretrained model loading
- Add tests for vocabulary mapping validation

Documentation:
- Update scripts/README.md with fine-tuning examples and usage
- Update train.py docstrings and argument descriptions

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
- Move load_model_weights from train.py to loading.py as load_pretrained_weights
- Make integration tests algorithm-agnostic (use registry instead of Eagle3DraftModel)
- Add TYPE_CHECKING imports to avoid circular dependencies
- Update unit tests to reflect the new function location
- Add type hints and assertions for better type safety

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
- Fix missing closing parenthesis in train.py parser
- Remove trailing whitespace in loading.py
- Remove unused imports in tests
- Reorganize import statements

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
…E2E weight sanity

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
…ained loading

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
@fynnsu
Copy link
Copy Markdown
Collaborator

fynnsu commented Mar 5, 2026

Hi @VincentG1234, sorry it's been a hectic couple weeks working on today's release and I haven't had a good opportunity to review this.

I appreciate you working on this but looking into it further, I realized there is some additional complexity revolving around weight initialization, ensuring that weights are distributed correctly when loading from checkpoint vs loading from pretrained vs setting up from scratch. In addition, we previously supported transformers .from_pretrained method via the PreTrainedModel superclass that Eagle3DraftModel subclasses (indirectly). I decided to refactor a lot of the model setup/distributed code and in the process also fixed from_pretrained for Eagle3DraftModel.

I think that will largely cover the scope of this pr, and also allows us to handle the loading models from HF and disk, without needing to directly manage the model repo downloads. If you have a moment, please take a look at #333 and let me know if it's missing anything.

@VincentG1234
Copy link
Copy Markdown
Contributor Author

VincentG1234 commented Mar 6, 2026

@fynnsu

Thanks for the explanation and for taking the time to refactor this.

It makes sense to rely on the transformers from_pretrained() pathway rather than re-implementing model loading logic from scratch
I'll take a look at #333 and test the workflow on my side.

I can close this PR once #333 is merged.

@VincentG1234
Copy link
Copy Markdown
Contributor Author

Feature implemented in the PR #333

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

eagle3 enhancement New feature or request training

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants