Skip to content

Cleanup old implementations/features#332

Merged
rahul-tuli merged 8 commits intomainfrom
cleanup_repo
Mar 6, 2026
Merged

Cleanup old implementations/features#332
rahul-tuli merged 8 commits intomainfrom
cleanup_repo

Conversation

@fynnsu
Copy link
Copy Markdown
Collaborator

@fynnsu fynnsu commented Mar 5, 2026

This pr aims to clean up some tech debt in the repo.

In particular:

  • Independent and MLP speculators are not functional implementations and are unlikely to be implemented anytime soon due to low priority / lack of vllm support. They also don't conform with the structure of other implementations (Eagle3), which is confusing for new adopters.
  • Move the base TokenProposalConfig into the proposals folder
  • Remove auto importer. This logic was unnecessarily complicated, since the required modules can be imported trivially. Since this was a superclass of all models, that complexity remained in the concrete model classes.
  • Remove attach verifier related logic. Since speculators doesn't target/support generation, we remove any logic related to attaching the full verifier model. Supporting generation would require significantly better support for distributed inference to handle the full size of the verifier models and is out of scope for this library. vLLM should be used for model inference instead.
  • Move the legacy EagleSpeculator model (which only supports conversion, not training) to a folder under convert. Since this model doesn't support training and uses legacy structures (like using attach_verifier logic), its inclusion in the models/ folder is confusing.

fynnsu added 6 commits March 3, 2026 19:28
These were never fully implemented and didn't support training or conversion.
Removing for clarity

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
This speculator only support conversion. We move it here to clean up
the repo, and better organize the classes.

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 5, 2026

📦 Build Artifacts Available
The build artifacts (`.whl` and `.tar.gz`) have been successfully generated and are available for download: https://github.com/vllm-project/speculators/actions/runs/22762040296/artifacts/5796878279.
They will be retained for up to 30 days.
Commit: 3d46f46

@github-actions

This comment was marked as resolved.

Copy link
Copy Markdown
Collaborator

@rahul-tuli rahul-tuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending nits! Great Job!

Comment thread src/speculators/utils/auto_importer.py
Comment thread src/speculators/convert/eagle/eagle_legacy_model.py Outdated
Comment thread src/speculators/convert/eagle/eagle_legacy_model.py Outdated
Comment thread tests/unit/test_config.py Outdated
Comment thread src/speculators/convert/eagle/eagle_converter.py
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
Copy link
Copy Markdown
Collaborator

@shanjiaz shanjiaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Out of scope for this PR but I think we should consider removing proposals

@rahul-tuli rahul-tuli enabled auto-merge (squash) March 6, 2026 11:44
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 6, 2026

Summary

Status Count
🔍 Total 93
✅ Successful 90
⏳ Timeouts 0
🔀 Redirected 2
👻 Excluded 0
❓ Unknown 0
🚫 Errors 1
⛔ Unsupported 0

Errors per input

Errors in docs/developer/contributing.md

Redirects per input

Redirects in docs/developer/contributing.md

Redirects in docs/train.md

Full Github Actions output

@rahul-tuli rahul-tuli merged commit 792b9e6 into main Mar 6, 2026
11 of 12 checks passed
@rahul-tuli rahul-tuli deleted the cleanup_repo branch March 6, 2026 11:48
rahul-tuli pushed a commit that referenced this pull request Mar 11, 2026
## Prereq

Depends on #332, which should be merged first. 

## Purpose
We'd like to support finetuning existing Eagle3 models. This pr adds the
`--from-pretrained` option, which loads a pretrained model from HF /
local path.

### Setup flow
#### Fresh model setup, i.e. `from_training_args` pathway
1. model = Eagle3DraftModel.__init__
    - Initialize all modules/parameters/buffers
- Use `torch.zeros` (w/ correct shape + dtype) as placeholders for
t2d/d2t
- Use `torch.nan` (w/ correct shape + dtype) as placeholders for
lm_head, embed_tokens, and verifier_lm_head
2. model.load_vocab_mappings(t2d, d2t)
    - t2d/d2t can be `None`, which will cause an early return
    - Verify shapes match and then load t2d/d2t
3. model.load_verifier_weights()
- Loads embed_tokens, lm_head (into both lm_head and verifier_lm_head),
verifier_norm from verifier model
- Checks if embed_tokens and lm_head are still `NaN` (from init) before
loading. Otherwise they are skipped
    - verifier_lm_head is always loaded
- verifier_norm is skipped if the weight isn't found (with a warning).
Note we don't use NaN init for this module
 
Continues below

#### Finetuning setup, i.e. `from_pretrained` pathway
1. model = Eagle3DraftModel.__init__ call internally by
`PreTrainedModel.from_pretrained` under meta device context
2. `PreTrainedModel.from_pretrained` also loads model weights for us
3. model.load_vocab_mappings(t2d, d2t) called, same as above
4. model.load_verifier_weights() called
- lm_head and embed_token loading should be skipped because the values
have already been set by `PreTrainedModel.from_pretrained`
- verifier_lm_head and verifier_norm loaded (overriding existing values
but should be the same)
 
Continues below

#### Joint (Fresh + Finetuning) next steps (in `Trainer.setup_model`)
```
If distributed:
    cache state dict on rank0
    Apply `fully_shard` to model layers and model (don't move to meta device, don't re-init weights)
    
    if checkpoint exists:
        load checkpoint (distributed)
    else:
        broadcast cached state dict from rank0 to all ranks
else:
    move model to local device
    if checkpoint exists:
        load checkpoint (single device)
```

## Implementation
This uses the `.from_pretrained` method from the base `PreTrainedModel`
class to resolve local vs remote model and handle downloading
checkpoints.

As part of this pr, I also clean up the model initialize process, which
previously also including loading verifier weights and t2d/d2t tensors.
These will now be loaded after the init function. This was required to
support `.from_pretrained` as the model init is run under a meta device
context.

This means the init instead sets up placeholder buffers for verifier
parameters/vocab mapping. These are being set intentionally in a way
that makes it easy to confirm they are overwritten when training starts
(e.g. by initializing some values to `NaN` so that failing to overwrite
them will result in immediate NaN outputs from the model).

I also updated the fully shard handling to ensure values are set on all
ranks correctly when using distributed training.

## Testing

Added comprehensive test coverage for loading pathway combinations (e.g.
loading from checkpoint, pretrained, fresh init, w/ and w/o vocab
mappings, single gpu and distributed, etc.).

Note some tests require single or multi-gpu. These are skipped if
requirements are not met. It would be good to ensure these are being run
correctly at regular intervals.

---------

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
YzTongNiar pushed a commit to YzTongNiar/speculators that referenced this pull request Apr 10, 2026
YzTongNiar pushed a commit to YzTongNiar/speculators that referenced this pull request Apr 10, 2026
…t#333)

## Prereq

Depends on vllm-project#332, which should be merged first. 

## Purpose
We'd like to support finetuning existing Eagle3 models. This pr adds the
`--from-pretrained` option, which loads a pretrained model from HF /
local path.

### Setup flow
#### Fresh model setup, i.e. `from_training_args` pathway
1. model = Eagle3DraftModel.__init__
    - Initialize all modules/parameters/buffers
- Use `torch.zeros` (w/ correct shape + dtype) as placeholders for
t2d/d2t
- Use `torch.nan` (w/ correct shape + dtype) as placeholders for
lm_head, embed_tokens, and verifier_lm_head
2. model.load_vocab_mappings(t2d, d2t)
    - t2d/d2t can be `None`, which will cause an early return
    - Verify shapes match and then load t2d/d2t
3. model.load_verifier_weights()
- Loads embed_tokens, lm_head (into both lm_head and verifier_lm_head),
verifier_norm from verifier model
- Checks if embed_tokens and lm_head are still `NaN` (from init) before
loading. Otherwise they are skipped
    - verifier_lm_head is always loaded
- verifier_norm is skipped if the weight isn't found (with a warning).
Note we don't use NaN init for this module
 
Continues below

#### Finetuning setup, i.e. `from_pretrained` pathway
1. model = Eagle3DraftModel.__init__ call internally by
`PreTrainedModel.from_pretrained` under meta device context
2. `PreTrainedModel.from_pretrained` also loads model weights for us
3. model.load_vocab_mappings(t2d, d2t) called, same as above
4. model.load_verifier_weights() called
- lm_head and embed_token loading should be skipped because the values
have already been set by `PreTrainedModel.from_pretrained`
- verifier_lm_head and verifier_norm loaded (overriding existing values
but should be the same)
 
Continues below

#### Joint (Fresh + Finetuning) next steps (in `Trainer.setup_model`)
```
If distributed:
    cache state dict on rank0
    Apply `fully_shard` to model layers and model (don't move to meta device, don't re-init weights)
    
    if checkpoint exists:
        load checkpoint (distributed)
    else:
        broadcast cached state dict from rank0 to all ranks
else:
    move model to local device
    if checkpoint exists:
        load checkpoint (single device)
```

## Implementation
This uses the `.from_pretrained` method from the base `PreTrainedModel`
class to resolve local vs remote model and handle downloading
checkpoints.

As part of this pr, I also clean up the model initialize process, which
previously also including loading verifier weights and t2d/d2t tensors.
These will now be loaded after the init function. This was required to
support `.from_pretrained` as the model init is run under a meta device
context.

This means the init instead sets up placeholder buffers for verifier
parameters/vocab mapping. These are being set intentionally in a way
that makes it easy to confirm they are overwritten when training starts
(e.g. by initializing some values to `NaN` so that failing to overwrite
them will result in immediate NaN outputs from the model).

I also updated the fully shard handling to ensure values are set on all
ranks correctly when using distributed training.

## Testing

Added comprehensive test coverage for loading pathway combinations (e.g.
loading from checkpoint, pretrained, fresh init, w/ and w/o vocab
mappings, single gpu and distributed, etc.).

Note some tests require single or multi-gpu. These are skipped if
requirements are not met. It would be good to ensure these are being run
correctly at regular intervals.

---------

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants