Cleanup old implementations/features#332
Merged
rahul-tuli merged 8 commits intomainfrom Mar 6, 2026
Merged
Conversation
These were never fully implemented and didn't support training or conversion. Removing for clarity Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
This speculator only support conversion. We move it here to clean up the repo, and better organize the classes. Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
|
📦 Build Artifacts Available |
This comment was marked as resolved.
This comment was marked as resolved.
rahul-tuli
reviewed
Mar 5, 2026
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
shanjiaz
approved these changes
Mar 5, 2026
Collaborator
shanjiaz
left a comment
There was a problem hiding this comment.
Looks good! Out of scope for this PR but I think we should consider removing proposals
rahul-tuli
approved these changes
Mar 6, 2026
Summary
Errors per inputErrors in docs/developer/contributing.md
Redirects per inputRedirects in docs/developer/contributing.md
Redirects in docs/train.md
|
rahul-tuli
pushed a commit
that referenced
this pull request
Mar 11, 2026
## Prereq Depends on #332, which should be merged first. ## Purpose We'd like to support finetuning existing Eagle3 models. This pr adds the `--from-pretrained` option, which loads a pretrained model from HF / local path. ### Setup flow #### Fresh model setup, i.e. `from_training_args` pathway 1. model = Eagle3DraftModel.__init__ - Initialize all modules/parameters/buffers - Use `torch.zeros` (w/ correct shape + dtype) as placeholders for t2d/d2t - Use `torch.nan` (w/ correct shape + dtype) as placeholders for lm_head, embed_tokens, and verifier_lm_head 2. model.load_vocab_mappings(t2d, d2t) - t2d/d2t can be `None`, which will cause an early return - Verify shapes match and then load t2d/d2t 3. model.load_verifier_weights() - Loads embed_tokens, lm_head (into both lm_head and verifier_lm_head), verifier_norm from verifier model - Checks if embed_tokens and lm_head are still `NaN` (from init) before loading. Otherwise they are skipped - verifier_lm_head is always loaded - verifier_norm is skipped if the weight isn't found (with a warning). Note we don't use NaN init for this module Continues below #### Finetuning setup, i.e. `from_pretrained` pathway 1. model = Eagle3DraftModel.__init__ call internally by `PreTrainedModel.from_pretrained` under meta device context 2. `PreTrainedModel.from_pretrained` also loads model weights for us 3. model.load_vocab_mappings(t2d, d2t) called, same as above 4. model.load_verifier_weights() called - lm_head and embed_token loading should be skipped because the values have already been set by `PreTrainedModel.from_pretrained` - verifier_lm_head and verifier_norm loaded (overriding existing values but should be the same) Continues below #### Joint (Fresh + Finetuning) next steps (in `Trainer.setup_model`) ``` If distributed: cache state dict on rank0 Apply `fully_shard` to model layers and model (don't move to meta device, don't re-init weights) if checkpoint exists: load checkpoint (distributed) else: broadcast cached state dict from rank0 to all ranks else: move model to local device if checkpoint exists: load checkpoint (single device) ``` ## Implementation This uses the `.from_pretrained` method from the base `PreTrainedModel` class to resolve local vs remote model and handle downloading checkpoints. As part of this pr, I also clean up the model initialize process, which previously also including loading verifier weights and t2d/d2t tensors. These will now be loaded after the init function. This was required to support `.from_pretrained` as the model init is run under a meta device context. This means the init instead sets up placeholder buffers for verifier parameters/vocab mapping. These are being set intentionally in a way that makes it easy to confirm they are overwritten when training starts (e.g. by initializing some values to `NaN` so that failing to overwrite them will result in immediate NaN outputs from the model). I also updated the fully shard handling to ensure values are set on all ranks correctly when using distributed training. ## Testing Added comprehensive test coverage for loading pathway combinations (e.g. loading from checkpoint, pretrained, fresh init, w/ and w/o vocab mappings, single gpu and distributed, etc.). Note some tests require single or multi-gpu. These are skipped if requirements are not met. It would be good to ensure these are being run correctly at regular intervals. --------- Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
YzTongNiar
pushed a commit
to YzTongNiar/speculators
that referenced
this pull request
Apr 10, 2026
YzTongNiar
pushed a commit
to YzTongNiar/speculators
that referenced
this pull request
Apr 10, 2026
…t#333) ## Prereq Depends on vllm-project#332, which should be merged first. ## Purpose We'd like to support finetuning existing Eagle3 models. This pr adds the `--from-pretrained` option, which loads a pretrained model from HF / local path. ### Setup flow #### Fresh model setup, i.e. `from_training_args` pathway 1. model = Eagle3DraftModel.__init__ - Initialize all modules/parameters/buffers - Use `torch.zeros` (w/ correct shape + dtype) as placeholders for t2d/d2t - Use `torch.nan` (w/ correct shape + dtype) as placeholders for lm_head, embed_tokens, and verifier_lm_head 2. model.load_vocab_mappings(t2d, d2t) - t2d/d2t can be `None`, which will cause an early return - Verify shapes match and then load t2d/d2t 3. model.load_verifier_weights() - Loads embed_tokens, lm_head (into both lm_head and verifier_lm_head), verifier_norm from verifier model - Checks if embed_tokens and lm_head are still `NaN` (from init) before loading. Otherwise they are skipped - verifier_lm_head is always loaded - verifier_norm is skipped if the weight isn't found (with a warning). Note we don't use NaN init for this module Continues below #### Finetuning setup, i.e. `from_pretrained` pathway 1. model = Eagle3DraftModel.__init__ call internally by `PreTrainedModel.from_pretrained` under meta device context 2. `PreTrainedModel.from_pretrained` also loads model weights for us 3. model.load_vocab_mappings(t2d, d2t) called, same as above 4. model.load_verifier_weights() called - lm_head and embed_token loading should be skipped because the values have already been set by `PreTrainedModel.from_pretrained` - verifier_lm_head and verifier_norm loaded (overriding existing values but should be the same) Continues below #### Joint (Fresh + Finetuning) next steps (in `Trainer.setup_model`) ``` If distributed: cache state dict on rank0 Apply `fully_shard` to model layers and model (don't move to meta device, don't re-init weights) if checkpoint exists: load checkpoint (distributed) else: broadcast cached state dict from rank0 to all ranks else: move model to local device if checkpoint exists: load checkpoint (single device) ``` ## Implementation This uses the `.from_pretrained` method from the base `PreTrainedModel` class to resolve local vs remote model and handle downloading checkpoints. As part of this pr, I also clean up the model initialize process, which previously also including loading verifier weights and t2d/d2t tensors. These will now be loaded after the init function. This was required to support `.from_pretrained` as the model init is run under a meta device context. This means the init instead sets up placeholder buffers for verifier parameters/vocab mapping. These are being set intentionally in a way that makes it easy to confirm they are overwritten when training starts (e.g. by initializing some values to `NaN` so that failing to overwrite them will result in immediate NaN outputs from the model). I also updated the fully shard handling to ensure values are set on all ranks correctly when using distributed training. ## Testing Added comprehensive test coverage for loading pathway combinations (e.g. loading from checkpoint, pretrained, fresh init, w/ and w/o vocab mappings, single gpu and distributed, etc.). Note some tests require single or multi-gpu. These are skipped if requirements are not met. It would be good to ensure these are being run correctly at regular intervals. --------- Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

This pr aims to clean up some tech debt in the repo.
In particular:
EagleSpeculatormodel (which only supports conversion, not training) to a folder underconvert. Since this model doesn't support training and uses legacy structures (like using attach_verifier logic), its inclusion in themodels/folder is confusing.