Skip to content

Commit fddee27

Browse files
authored
feat(example): align server MTP support with llama.cpp (abetlen#2283)
* feat(example): align server MTP support with llama.cpp * docs: update changelog for server MTP alignment * fix: remove stale pre-norm extension bindings * fix(example): disable sampled MTP for shared memory contexts
1 parent db66da3 commit fddee27

4 files changed

Lines changed: 238 additions & 241 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
- feat(example): align server MTP support with llama.cpp by @abetlen in #2283
1011
- feat: update llama.cpp to ggml-org/llama.cpp@9e3b928fd
1112
- feat(example): add OpenAI-compatible embeddings endpoint by @abetlen in #2281
1213

examples/server/README.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -434,6 +434,22 @@ Use MTP when the loaded model and llama.cpp build expose the required draft stat
434434
}
435435
```
436436

437+
By default `draft-mtp` creates the MTP context from the target model.
438+
Set `draft_model_path` or `draft_model_from_pretrained` when the model uses a separate assistant GGUF.
439+
440+
```json
441+
{
442+
"model": {
443+
"draft_model": "draft-mtp",
444+
"draft_model_num_pred_tokens": 2,
445+
"draft_model_from_pretrained": {
446+
"repo_id": "example/gemma-assistant-GGUF",
447+
"filename": "assistant.gguf"
448+
}
449+
}
450+
}
451+
```
452+
437453
MTP currently applies to text-only requests.
438454

439455
## Disk Sequence Cache

0 commit comments

Comments
 (0)