Skip to content

[pull] main from abetlen:main#115

Merged
pull[bot] merged 3 commits into
MZWNET:mainfrom
abetlen:main
Jun 7, 2026
Merged

[pull] main from abetlen:main#115
pull[bot] merged 3 commits into
MZWNET:mainfrom
abetlen:main

Conversation

@pull

@pull pull Bot commented Jun 7, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

abetlen added 3 commits June 7, 2026 03:04
…es` api, response parsing) (#2174)

* Add batch processing server

* Improve response parser streaming performance

* Add /v1/models endpoint

* Support custom tools in responses API

* Improve responses api compatibility for codex

* Improve batch server prompt and context config

* Improve batch server scheduling and prompt handling

* fix apply_patch tool

* Improve batch server schemas and metrics

* Refactor sequence cache helpers

* Fix server type diagnostics

* feat: add llama.cpp extension bindings

* feat: add MTP support to batch server

* feat: improve draft-mtp handling in batch server

* feat: cap MTP draft context outputs

* fix: preserve held streaming tokens

* feat: add load-time LoRA support to batch server

* feat: add multimodal support to batch server

* refactor: rename batch item kinds

* refactor: type sampled mtp updates

* refactor: structure sampled mtp batch processing

* refactor: clarify batch item construction

* refactor: type batch item kind

* refactor: clarify sampled pending index

* refactor: clarify output index naming

* refactor: rename logits index resolver

* refactor: colocate sampled mtp state

* refactor: inline sampled mtp helpers

* refactor: use row-expanded multimodal prompt identity

* test: remove multimodal prompt plan tests

* refactor: narrow mtmd processor dependencies

* refactor: group prompt segment media fields

* refactor: centralize sequence state copy

* refactor: keep disk cache storage only

* refactor: split batch item payloads

* refactor: centralize pending request failure cleanup

* refactor: centralize sequence claiming

* refactor: key sequence disk cache compatibility

* refactor: decouple completion request preparation

* refactor: name prepared completion parts

* refactor: return prepared completion parts

* refactor: localize media cache key building

* refactor: remove unused request id override

* refactor: simplify prompt segment row capacity

* refactor: inline prompt row clamp

* refactor: inline disconnect cancellation response

* refactor: simplify recurrent draft capacity

* refactor: define builtin grammar rule as dataclass

* refactor: type chat template conversions

* docs: mark llama_cpp_ext experimental

* feat: restrict multimodal media sources

* docs: add server example README and config

* docs: document server example configuration

* docs: update server README

* docs: document server wheel setup and clients

* docs: add server model configs

* docs: add server chat templates and response schemas

* docs: keep batch processing server example

* docs: add server example changelog entry

* docs: mention multi-token prediction in changelog
@pull pull Bot locked and limited conversation to collaborators Jun 7, 2026
@pull pull Bot added the ⤵️ pull label Jun 7, 2026
@pull pull Bot merged commit 380177b into MZWNET:main Jun 7, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant