Cleanup old implementations/features by fynnsu · Pull Request #332 · vllm-project/speculators

fynnsu · 2026-03-05T00:43:38Z

This pr aims to clean up some tech debt in the repo.

In particular:

Independent and MLP speculators are not functional implementations and are unlikely to be implemented anytime soon due to low priority / lack of vllm support. They also don't conform with the structure of other implementations (Eagle3), which is confusing for new adopters.
Move the base TokenProposalConfig into the proposals folder
Remove auto importer. This logic was unnecessarily complicated, since the required modules can be imported trivially. Since this was a superclass of all models, that complexity remained in the concrete model classes.
Remove attach verifier related logic. Since speculators doesn't target/support generation, we remove any logic related to attaching the full verifier model. Supporting generation would require significantly better support for distributed inference to handle the full size of the verifier models and is out of scope for this library. vLLM should be used for model inference instead.
Move the legacy EagleSpeculator model (which only supports conversion, not training) to a folder under convert. Since this model doesn't support training and uses legacy structures (like using attach_verifier logic), its inclusion in the models/ folder is confusing.

These were never fully implemented and didn't support training or conversion. Removing for clarity Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

This speculator only support conversion. We move it here to clean up the repo, and better organize the classes. Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

github-actions · 2026-03-05T00:44:20Z

📦 Build Artifacts Available
The build artifacts (`.whl` and `.tar.gz`) have been successfully generated and are available for download: https://github.com/vllm-project/speculators/actions/runs/22762040296/artifacts/5796878279.
They will be retained for up to 30 days.
Commit: 3d46f46

rahul-tuli

LGTM pending nits! Great Job!

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

shanjiaz

Looks good! Out of scope for this PR but I think we should consider removing proposals

github-actions · 2026-03-06T11:45:26Z

Summary

Status	Count
🔍 Total	93
✅ Successful	90
⏳ Timeouts	0
🔀 Redirected	2
👻 Excluded	0
❓ Unknown	0
🚫 Errors	1
⛔ Unsupported	0

Errors per input

Errors in docs/developer/contributing.md

[503] https://github.com/vllm-project/speculators/blob/main/CODE_OF_CONDUCT.md | Rejected status code: 503 Service Unavailable (configurable with "accept" option)

Redirects per input

Redirects in docs/developer/contributing.md

[200] https://docs.pytest.org/ | Redirect: Followed 1 redirect resolving to the final status of: OK. Redirects: https://docs.pytest.org/ --[302]--> https://docs.pytest.org/en/stable/

Redirects in docs/train.md

[200] https://github.com/vllm-project/speculators/blob/main/scripts/response_regeneration | Redirect: Followed 1 redirect resolving to the final status of: OK. Redirects: https://github.com/vllm-project/speculators/blob/main/scripts/response_regeneration --[301]--> https://github.com/vllm-project/speculators/tree/main/scripts/response_regeneration

Full Github Actions output

## Prereq Depends on #332, which should be merged first. ## Purpose We'd like to support finetuning existing Eagle3 models. This pr adds the `--from-pretrained` option, which loads a pretrained model from HF / local path. ### Setup flow #### Fresh model setup, i.e. `from_training_args` pathway 1. model = Eagle3DraftModel.__init__ - Initialize all modules/parameters/buffers - Use `torch.zeros` (w/ correct shape + dtype) as placeholders for t2d/d2t - Use `torch.nan` (w/ correct shape + dtype) as placeholders for lm_head, embed_tokens, and verifier_lm_head 2. model.load_vocab_mappings(t2d, d2t) - t2d/d2t can be `None`, which will cause an early return - Verify shapes match and then load t2d/d2t 3. model.load_verifier_weights() - Loads embed_tokens, lm_head (into both lm_head and verifier_lm_head), verifier_norm from verifier model - Checks if embed_tokens and lm_head are still `NaN` (from init) before loading. Otherwise they are skipped - verifier_lm_head is always loaded - verifier_norm is skipped if the weight isn't found (with a warning). Note we don't use NaN init for this module Continues below #### Finetuning setup, i.e. `from_pretrained` pathway 1. model = Eagle3DraftModel.__init__ call internally by `PreTrainedModel.from_pretrained` under meta device context 2. `PreTrainedModel.from_pretrained` also loads model weights for us 3. model.load_vocab_mappings(t2d, d2t) called, same as above 4. model.load_verifier_weights() called - lm_head and embed_token loading should be skipped because the values have already been set by `PreTrainedModel.from_pretrained` - verifier_lm_head and verifier_norm loaded (overriding existing values but should be the same) Continues below #### Joint (Fresh + Finetuning) next steps (in `Trainer.setup_model`) ``` If distributed: cache state dict on rank0 Apply `fully_shard` to model layers and model (don't move to meta device, don't re-init weights) if checkpoint exists: load checkpoint (distributed) else: broadcast cached state dict from rank0 to all ranks else: move model to local device if checkpoint exists: load checkpoint (single device) ``` ## Implementation This uses the `.from_pretrained` method from the base `PreTrainedModel` class to resolve local vs remote model and handle downloading checkpoints. As part of this pr, I also clean up the model initialize process, which previously also including loading verifier weights and t2d/d2t tensors. These will now be loaded after the init function. This was required to support `.from_pretrained` as the model init is run under a meta device context. This means the init instead sets up placeholder buffers for verifier parameters/vocab mapping. These are being set intentionally in a way that makes it easy to confirm they are overwritten when training starts (e.g. by initializing some values to `NaN` so that failing to overwrite them will result in immediate NaN outputs from the model). I also updated the fully shard handling to ensure values are set on all ranks correctly when using distributed training. ## Testing Added comprehensive test coverage for loading pathway combinations (e.g. loading from checkpoint, pretrained, fresh init, w/ and w/o vocab mappings, single gpu and distributed, etc.). Note some tests require single or multi-gpu. These are skipped if requirements are not met. It would be good to ensure these are being run correctly at regular intervals. --------- Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

…t#333) ## Prereq Depends on vllm-project#332, which should be merged first. ## Purpose We'd like to support finetuning existing Eagle3 models. This pr adds the `--from-pretrained` option, which loads a pretrained model from HF / local path. ### Setup flow #### Fresh model setup, i.e. `from_training_args` pathway 1. model = Eagle3DraftModel.__init__ - Initialize all modules/parameters/buffers - Use `torch.zeros` (w/ correct shape + dtype) as placeholders for t2d/d2t - Use `torch.nan` (w/ correct shape + dtype) as placeholders for lm_head, embed_tokens, and verifier_lm_head 2. model.load_vocab_mappings(t2d, d2t) - t2d/d2t can be `None`, which will cause an early return - Verify shapes match and then load t2d/d2t 3. model.load_verifier_weights() - Loads embed_tokens, lm_head (into both lm_head and verifier_lm_head), verifier_norm from verifier model - Checks if embed_tokens and lm_head are still `NaN` (from init) before loading. Otherwise they are skipped - verifier_lm_head is always loaded - verifier_norm is skipped if the weight isn't found (with a warning). Note we don't use NaN init for this module Continues below #### Finetuning setup, i.e. `from_pretrained` pathway 1. model = Eagle3DraftModel.__init__ call internally by `PreTrainedModel.from_pretrained` under meta device context 2. `PreTrainedModel.from_pretrained` also loads model weights for us 3. model.load_vocab_mappings(t2d, d2t) called, same as above 4. model.load_verifier_weights() called - lm_head and embed_token loading should be skipped because the values have already been set by `PreTrainedModel.from_pretrained` - verifier_lm_head and verifier_norm loaded (overriding existing values but should be the same) Continues below #### Joint (Fresh + Finetuning) next steps (in `Trainer.setup_model`) ``` If distributed: cache state dict on rank0 Apply `fully_shard` to model layers and model (don't move to meta device, don't re-init weights) if checkpoint exists: load checkpoint (distributed) else: broadcast cached state dict from rank0 to all ranks else: move model to local device if checkpoint exists: load checkpoint (single device) ``` ## Implementation This uses the `.from_pretrained` method from the base `PreTrainedModel` class to resolve local vs remote model and handle downloading checkpoints. As part of this pr, I also clean up the model initialize process, which previously also including loading verifier weights and t2d/d2t tensors. These will now be loaded after the init function. This was required to support `.from_pretrained` as the model init is run under a meta device context. This means the init instead sets up placeholder buffers for verifier parameters/vocab mapping. These are being set intentionally in a way that makes it easy to confirm they are overwritten when training starts (e.g. by initializing some values to `NaN` so that failing to overwrite them will result in immediate NaN outputs from the model). I also updated the fully shard handling to ensure values are set on all ranks correctly when using distributed training. ## Testing Added comprehensive test coverage for loading pathway combinations (e.g. loading from checkpoint, pretrained, fresh init, w/ and w/o vocab mappings, single gpu and distributed, etc.). Note some tests require single or multi-gpu. These are skipped if requirements are not met. It would be good to ensure these are being run correctly at regular intervals. --------- Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

fynnsu added 6 commits March 3, 2026 19:28

Remove Independent and MLP speculators

12e126c

These were never fully implemented and didn't support training or conversion. Removing for clarity Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

Move TokenProposalConfig into a new base.py file

1e8e1fd

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

Remove auto_importer

a39ec64

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

Remove attach_verifier from base model

758bc1b

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

Move EagleSpeculator to convert/eagle/eagle_legacy_model.py

393bbec

This speculator only support conversion. We move it here to clean up the repo, and better organize the classes. Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

Import models to populate registry

c0fd8f4

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

This comment was marked as resolved.

Sign in to view

fynnsu mentioned this pull request Mar 5, 2026

Support EagleDraftModel.from_pretrained for finetuning #333

Merged

rahul-tuli reviewed Mar 5, 2026

View reviewed changes

suggested changes

1e18d92

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

shanjiaz approved these changes Mar 5, 2026

View reviewed changes

rahul-tuli approved these changes Mar 6, 2026

View reviewed changes

rahul-tuli enabled auto-merge (squash) March 6, 2026 11:44

Merge branch 'main' into cleanup_repo

3d46f46

rahul-tuli merged commit 792b9e6 into main Mar 6, 2026
11 of 12 checks passed

rahul-tuli deleted the cleanup_repo branch March 6, 2026 11:48

YzTongNiar pushed a commit to YzTongNiar/speculators that referenced this pull request Apr 10, 2026

Cleanup old implementations/features (vllm-project#332)

9099e03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup old implementations/features#332

Cleanup old implementations/features#332
rahul-tuli merged 8 commits intomainfrom
cleanup_repo

fynnsu commented Mar 5, 2026

Uh oh!

github-actions Bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

rahul-tuli left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shanjiaz left a comment

Uh oh!

github-actions Bot commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fynnsu commented Mar 5, 2026

Uh oh!

github-actions Bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

rahul-tuli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shanjiaz left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Mar 6, 2026

Summary

Errors per input

Errors in docs/developer/contributing.md

Redirects per input

Redirects in docs/developer/contributing.md

Redirects in docs/train.md

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Mar 5, 2026 •

edited

Loading