[FEAT][TRAINING] Add fine-tuning support for EAGLE3 from HF Hub by VincentG1234 · Pull Request #268 · vllm-project/speculators

VincentG1234 · 2026-01-31T12:31:19Z

Summary

Adds support for fine-tuning EAGLE3 models from pretrained checkpoints, enabling users to initialize training from existing models stored locally or on HuggingFace Hub.

Changes

Added --pretrained-model-path CLI argument to scripts/train.py
Implemented load_safetensors_state_dict() function supporting:
- Local single/sharded safetensors files
- Automatic download from HuggingFace Hub
Automatic extraction of d2t/t2d vocab mappings from pretrained models
Automatic derivation of draft_vocab_size from loaded mappings

Usage

Fine-tune from HuggingFace Hub:

python scripts/train.py
--verifier-name-or-path meta-llama/Llama-3.1-8B-Instruct
--pretrained-model-path RedHatAI/Llama-3.1-8B-Instruct-speculator.eagle3
--data-path ./new_data
--epochs 3
--lr 5e-5

Related issue

issue #269

fynnsu · 2026-02-02T15:05:21Z

Hi @VincentG1234, this is great, thank you for working on this (and opening the related issue)!

My main feedback at this stage is that it would be good to move the loading logic + t2d/d2t extraction logic into utility functions inside the speculators package. We actually have some similar code for loading from hf/local safetensor files already in src/speculators/utils/loading.py, so maybe this could be consolidated with what you've added (although our existing loading utils are mostly targeted at more focused on extracting a single tensor, like the lm head or token embedding).

Additionally, we will want at least one test that loads an existing checkpoint from HF before merging.

Let me know if I can help!

dsikka

Thank you for your contribution! Just out of curiosity, how much data was needed to fine tune to the llama3 draft model?

VincentG1234 · 2026-02-03T21:58:28Z

Thank you for your contribution! Just out of curiosity, how much data was needed to fine tune to the llama3 draft model?

I haven't conducted a full fine-tuning yet. My aim is to enhance the model in French while maintaining performance in English. I will share my results!

Hi @VincentG1234, this is great, thank you for working on this (and opening the related issue)!

My main feedback at this stage is that it would be good to move the loading logic + t2d/d2t extraction logic into utility functions inside the speculators package. We actually have some similar code for loading from hf/local safetensor files already in src/speculators/utils/loading.py, so maybe this could be consolidated with what you've added (although our existing loading utils are mostly targeted at more focused on extracting a single tensor, like the lm head or token embedding).

Additionally, we will want at least one test that loads an existing checkpoint from HF before merging.

Let me know if I can help!

Thank you for the feedback! I'll fix this very soon.

mergify · 2026-02-05T20:37:34Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @VincentG1234.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

VincentG1234 · 2026-02-06T19:19:30Z

Hello guys @dsikka @fynnsu, just to keep you in touch. I come with good news:

I investigated the acceptance drop after a light fine-tuning of an EAGLE-3 1B draft model (I test with this one: RedHatAI/Llama-3.1-8B-Instruct-speculator.eagle3). Although the weights were nearly identical to the HF checkpoint (only fp-level differences), the issue was caused by a config.json mismatch between my local setup and the HF model. In particular, rope_theta (and related RoPE scaling parameters) differed significantly, which changed the verifier’s hidden states and broke draft–verifier alignment, leading to a large drop in mean acceptance length (2.6 to 1.9).
After aligning the RoPE configuration with the HF config, acceptance metrics returned to expected values. I will fix that in the code and I think we will be good !
I exported the loading_weights funtions in the right script as suggested
I’m now validating the full set of changes in my PR that enable fine-tuning EAGLE(-3) models end-to-end (the earlier config.json/RoPE mismatch should be the last issue). I could use guidance on which tests you’d consider sufficient/most relevant for this PR. btw, I don't know why, but running tox locally is currently difficult because downloading model weights takes hours on my setup...

have a nice week end

Copilot

Pull request overview

Adds fine-tuning (“warm start”) support for EAGLE3 training by allowing initialization from a pretrained EAGLE3 checkpoint (local dir or HuggingFace Hub), including automatic loading of safetensors weights and extraction of d2t/t2d vocab mappings.

Changes:

Add --pretrained-model-path to scripts/train.py and load pretrained config/weights when provided.
Implement load_full_state_dict() and extract_vocab_mappings() utilities for single/sharded safetensors and Hub download.
Add unit + integration tests covering pretrained loading, mapping extraction, and weight loading behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`scripts/train.py`	Adds pretrained initialization flow, mapping/config initialization helpers, and weight-loading logging/validation.
`src/speculators/utils/loading.py`	Adds full safetensors state_dict loading (single/sharded) + mapping extraction helpers.
`tests/unit/utils/test_loading.py`	Expands loading tests to cover new helpers and safetensors loading.
`tests/unit/train/test_pretrained_loading.py`	Adds unit tests for the new training-time pretrained loading helpers.
`tests/integration/test_pretrained_loading.py`	Adds integration tests that load a real EAGLE3 model and validate end-to-end loading.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mergify · 2026-02-09T18:20:36Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @VincentG1234.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

VincentG1234 · 2026-02-18T14:42:37Z

I failed the CI quality checks failed, despite receiving 0 errors when running 'make quality' locally. The problem appears to be related to the 'ruff format' not being included in the local quality checks. It seems there might be a misalignment between the local and CI environments. Or maybe, there might be a misuse on my part 👀

fynnsu · 2026-02-18T15:15:00Z

I failed the CI quality checks failed, despite receiving 0 errors when running 'make quality' locally. The problem appears to be related to the 'ruff format' not being included in the local quality checks. It seems there might be a misalignment between the local and CI environments. Or maybe, there might be a misuse on my part 👀

Hi @VincentG1234, as you said, there could be an environment mismatch between your local environment and our CI one. Could you try resetting your virtual environment and then running pip install -e .[dev]. That should install the package versions we use during testing. Then run the make quality command again.

If you're using uv this would look like:

rm -r .venv/
uv venv -p 3.10
source .venv/bin/activate
uv pip install -e .[dev]
make style           # apply auto fixes if possible
make quality         # confirm passes

VincentG1234 · 2026-02-18T22:12:41Z

+
+    assert config.draft_vocab_size == d2t.shape[0]
+    model_class = SpeculatorModel.registered_model_class_from_config(config)
+    model = model_class(  # type: ignore[call-arg]


That line caused the fail of the last CI.

Context:
SpeculatorModel.registered_model_class_from_config(config) is typed as returning type[SpeculatorModel]. The base class SpeculatorModel.init (in src/speculators/model.py) requires verifier and verifier_attachment_mode. For EAGLE3 configs, the actual class returned is Eagle3DraftModel (in src/speculators/models/eagle3/core.py), whose init only takes config, t2d, and d2t and passes fixed values to super().init(config=..., verifier=None, verifier_attachment_mode="train_only").
So: calling without verifier/verifier_attachment_mode triggers mypy [call-arg]; calling with them triggers a runtime TypeError from Eagle3DraftModel.init.

fix:
To keep it simple, and do not modify another file, I fixed it by adding # type: ignore[call-arg] for this call so mypy skips the check.

Follow-up:
It would be good to keep this mismatch in mind and consider aligning constructors in a later change, so that code using registered_model_class_from_config() can call the constructor in a type-safe way without ignores.

Hoping we are finally good 😄

mergify · 2026-02-19T20:18:36Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @VincentG1234.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

VincentG1234 · 2026-03-02T16:44:40Z

I ran a small experiment to better understand the French performance gap and wanted to share some preliminary results.

I used MT-Bench French as a benchmark:
https://huggingface.co/datasets/bofenghuang/mt-bench-french

To isolate the language effect, I ran the exact same prompts in English and in French.

Baseline

EN weighted acceptance

[0.768 0.557 0.376]

FR weighted acceptance

[0.586 0.349 0.188]

This reveals a large gap between English and French speculative acceptance.

Small targeted fine-tuning

I fine-tuned Eagle3 on ~5k French samples (angeluriot/french_instruct), using verifier-generated assistant responses.

verifier: meta-llama/Llama-3.1-8B-Instruct
base speculator: RedHatAI/Llama-3.1-8B-Instruct-speculator.eagle3
2 epochs, lr 6e-5, seq len 8192
1 layer, TTT=3, cosine scheduler

Training took ~40 minutes on an A100 40GB.

After fine-tuning

EN (no English samples seen)

[0.73  0.524 0.354]

FR

[0.659 0.432 0.268]   (# around + 0.08 at each position, so a speed up of 20-30% !!!)

French acceptance improves across all positions, especially deeper ones, while English performance remains strong despite seeing no English data.

This is still a small and somewhat superficial experiment, but the results are encouraging and suggest that light language-specific fine-tuning can noticeably improve speculative decoding efficiency and reduce the language gap.

I can share a Colab notebook if that would be useful.

EDIT: here is the Colab: Link NB Colab
It should be fully reproducible with an A100 and at least you have the log.

If helpful, I can also iterate on the PR, for example by reducing diffs, simplifying parts of the logic, or improving tests, whatever makes review easier and the code more robust.

This commit introduces the ability to fine-tune pretrained EAGLE3 models from HuggingFace Hub or local checkpoints using the --pretrained-model-path argument in train.py. Key features: - Automatic vocabulary mapping (d2t/t2d) extraction from pretrained models - Automatic draft vocabulary size detection - Support for both HuggingFace Hub and local model paths - Improved error handling and validation for vocabulary mappings - Network-independent unit tests Testing: - Add comprehensive unit tests for pretrained loading utilities - Add integration tests for end-to-end pretrained model loading - Add tests for vocabulary mapping validation Documentation: - Update scripts/README.md with fine-tuning examples and usage - Update train.py docstrings and argument descriptions Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

- Move load_model_weights from train.py to loading.py as load_pretrained_weights - Make integration tests algorithm-agnostic (use registry instead of Eagle3DraftModel) - Add TYPE_CHECKING imports to avoid circular dependencies - Update unit tests to reflect the new function location - Add type hints and assertions for better type safety Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

- Fix missing closing parenthesis in train.py parser - Remove trailing whitespace in loading.py - Remove unused imports in tests - Reorganize import statements Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

…E2E weight sanity Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

…ained loading Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

fynnsu · 2026-03-05T01:50:21Z

Hi @VincentG1234, sorry it's been a hectic couple weeks working on today's release and I haven't had a good opportunity to review this.

I appreciate you working on this but looking into it further, I realized there is some additional complexity revolving around weight initialization, ensuring that weights are distributed correctly when loading from checkpoint vs loading from pretrained vs setting up from scratch. In addition, we previously supported transformers .from_pretrained method via the PreTrainedModel superclass that Eagle3DraftModel subclasses (indirectly). I decided to refactor a lot of the model setup/distributed code and in the process also fixed from_pretrained for Eagle3DraftModel.

I think that will largely cover the scope of this pr, and also allows us to handle the loading models from HF and disk, without needing to directly manage the model repo downloads. If you have a moment, please take a look at #333 and let me know if it's missing anything.

VincentG1234 · 2026-03-06T08:34:39Z

@fynnsu

Thanks for the explanation and for taking the time to refactor this.

It makes sense to rely on the transformers from_pretrained() pathway rather than re-implementing model loading logic from scratch
I'll take a look at #333 and test the workflow on my side.

I can close this PR once #333 is merged.

VincentG1234 · 2026-03-11T16:51:55Z

Feature implemented in the PR #333

VincentG1234 force-pushed the eagle-finetuning branch from ae09b3b to a36d8e9 Compare January 31, 2026 12:32

VincentG1234 mentioned this pull request Jan 31, 2026

Feature Request: Support warm-start/fine-tuning from pretrained EAGLE3 models #269

Closed

dsikka reviewed Feb 2, 2026

View reviewed changes

dsikka added enhancement New feature or request eagle3 labels Feb 2, 2026

mergify Bot added the needs-rebase label Feb 5, 2026

dsikka added the training label Feb 6, 2026

VincentG1234 force-pushed the eagle-finetuning branch 2 times, most recently from 4c5de19 to f09a235 Compare February 6, 2026 17:17

mergify Bot removed the needs-rebase label Feb 6, 2026

VincentG1234 marked this pull request as ready for review February 8, 2026 22:48

Copilot AI review requested due to automatic review settings February 8, 2026 22:48

Copilot started reviewing on behalf of VincentG1234 February 8, 2026 22:49 View session

Copilot AI reviewed Feb 8, 2026

View reviewed changes

mergify Bot added the needs-rebase label Feb 9, 2026

VincentG1234 force-pushed the eagle-finetuning branch from 78339ca to 856483c Compare February 11, 2026 11:24

mergify Bot removed the needs-rebase label Feb 11, 2026

VincentG1234 force-pushed the eagle-finetuning branch from 856483c to 8d4adcc Compare February 11, 2026 11:26

VincentG1234 requested review from Copilot and dsikka February 11, 2026 11:30

Copilot AI reviewed Feb 11, 2026

View reviewed changes

Copilot started reviewing on behalf of VincentG1234 February 11, 2026 15:36 View session

VincentG1234 force-pushed the eagle-finetuning branch from 6b0df63 to 9ce938c Compare February 14, 2026 18:16

VincentG1234 marked this pull request as draft February 14, 2026 21:58

VincentG1234 marked this pull request as ready for review February 18, 2026 09:30

VincentG1234 force-pushed the eagle-finetuning branch from acf83bb to 9272916 Compare February 18, 2026 10:23

VincentG1234 commented Feb 18, 2026

View reviewed changes

mergify Bot added the needs-rebase label Feb 19, 2026

VincentG1234 force-pushed the eagle-finetuning branch from 4d5abff to 0368384 Compare February 19, 2026 20:45

mergify Bot removed the needs-rebase label Feb 19, 2026

VincentG1234 force-pushed the eagle-finetuning branch from 0368384 to a0848d6 Compare March 2, 2026 16:23

VincentG1234 and others added 12 commits March 3, 2026 13:12

docs: add single-GPU fine-tuning example and clarify CLI support

5604b17

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

style: fix linting issues and code formatting

e01cfc5

- Fix missing closing parenthesis in train.py parser - Remove trailing whitespace in loading.py - Remove unused imports in tests - Reorganize import statements Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

fix ruff format issue

3c1fce5

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

Switch to ungated model in integration test

ed04aef

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

tests: add e2e test for finetuning weight sanity (~95% similarity)

55bdb46

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

test: consolidate tests (parametrize loading/pretrained) and improve …

2cfc934

…E2E weight sanity Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

fix: resolve E501 line length in tests

855bd68

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

fix CI ruff format

8670640

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

fix CI: address mypy call-arg and arg-type in pretrained loading tests

acac806

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

fix CI: test: add type ignore for concrete model constructor in pretr…

cefe679

…ained loading Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

VincentG1234 force-pushed the eagle-finetuning branch from a0848d6 to cefe679 Compare March 3, 2026 12:12

VincentG1234 closed this Mar 11, 2026

Conversation

VincentG1234 commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Usage

Fine-tune from HuggingFace Hub:

Related issue

Uh oh!

fynnsu commented Feb 2, 2026

Uh oh!

dsikka left a comment

Choose a reason for hiding this comment

Uh oh!

VincentG1234 commented Feb 3, 2026

Uh oh!

mergify Bot commented Feb 5, 2026

Uh oh!

VincentG1234 commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented Feb 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

VincentG1234 commented Feb 18, 2026

Uh oh!

fynnsu commented Feb 18, 2026

Uh oh!

VincentG1234 Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Feb 19, 2026

Uh oh!

VincentG1234 commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Baseline

Small targeted fine-tuning

After fine-tuning

Uh oh!

fynnsu commented Mar 5, 2026

Uh oh!

VincentG1234 commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VincentG1234 commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

VincentG1234 commented Jan 31, 2026 •

edited

Loading

VincentG1234 commented Feb 6, 2026 •

edited

Loading

VincentG1234 commented Mar 2, 2026 •

edited

Loading

VincentG1234 commented Mar 6, 2026 •

edited

Loading