Commit 66635a0
authored
feat(example): Updated server example (batch processing,
* Add batch processing server
* Improve response parser streaming performance
* Add /v1/models endpoint
* Support custom tools in responses API
* Improve responses api compatibility for codex
* Improve batch server prompt and context config
* Improve batch server scheduling and prompt handling
* fix apply_patch tool
* Improve batch server schemas and metrics
* Refactor sequence cache helpers
* Fix server type diagnostics
* feat: add llama.cpp extension bindings
* feat: add MTP support to batch server
* feat: improve draft-mtp handling in batch server
* feat: cap MTP draft context outputs
* fix: preserve held streaming tokens
* feat: add load-time LoRA support to batch server
* feat: add multimodal support to batch server
* refactor: rename batch item kinds
* refactor: type sampled mtp updates
* refactor: structure sampled mtp batch processing
* refactor: clarify batch item construction
* refactor: type batch item kind
* refactor: clarify sampled pending index
* refactor: clarify output index naming
* refactor: rename logits index resolver
* refactor: colocate sampled mtp state
* refactor: inline sampled mtp helpers
* refactor: use row-expanded multimodal prompt identity
* test: remove multimodal prompt plan tests
* refactor: narrow mtmd processor dependencies
* refactor: group prompt segment media fields
* refactor: centralize sequence state copy
* refactor: keep disk cache storage only
* refactor: split batch item payloads
* refactor: centralize pending request failure cleanup
* refactor: centralize sequence claiming
* refactor: key sequence disk cache compatibility
* refactor: decouple completion request preparation
* refactor: name prepared completion parts
* refactor: return prepared completion parts
* refactor: localize media cache key building
* refactor: remove unused request id override
* refactor: simplify prompt segment row capacity
* refactor: inline prompt row clamp
* refactor: inline disconnect cancellation response
* refactor: simplify recurrent draft capacity
* refactor: define builtin grammar rule as dataclass
* refactor: type chat template conversions
* docs: mark llama_cpp_ext experimental
* feat: restrict multimodal media sources
* docs: add server example README and config
* docs: document server example configuration
* docs: update server README
* docs: document server wheel setup and clients
* docs: add server model configs
* docs: add server chat templates and response schemas
* docs: keep batch processing server example
* docs: add server example changelog entry
* docs: mention multi-token prediction in changelog/v1/responses api, response parsing) (abetlen#2174)1 parent ed83366 commit 66635a0
9 files changed
Lines changed: 17517 additions & 0 deletions
File tree
- examples/server
- configs
- llama_cpp
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| 11 | + | |
11 | 12 | | |
12 | 13 | | |
13 | 14 | | |
| |||
0 commit comments