RelayServe

RelayServe is a minimal LLM inference server that adapts to heterogeneous devices.

Quick start

Install from PyPI:

pip install relayserve
relayserve

Or install for development with editable mode:

pip install -e .
relayserve

Multi-backend (llama.cpp)

To run llama.cpp as a backend, use the scripts in the class1_resources root (parent of this RelayServe repo). From the class1_resources directory:

export LLAMA_SERVER_PATH=/path/to/llama.cpp/build/bin/llama-server
export LLAMA_MODEL_PATH=/path/to/models/phi-3-mini.gguf
export LLAMA_PORTS=8081,8082
python scripts/spawn_backends.py

In a second terminal, from this RelayServe directory:

export RELAYSERVE_BACKENDS=http://localhost:8081,http://localhost:8082
relayserve

See Serve_local_model.md in the class1_resources root for the full step-by-step (model download, verify setup, spawn, test).

Streaming and request ID

RelayServe supports OpenAI-compatible streaming for POST /v1/chat/completions: send "stream": true in the request body to receive Server-Sent Events (SSE) until data: [DONE].

Example (curl, non-streaming):

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Request-ID: my-request-123" \
  -d '{"model":"relay-gguf","messages":[{"role":"user","content":"Hello"}]}'

Example (curl, streaming):

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Request-ID: my-request-456" \
  -d '{"model":"relay-gguf","messages":[{"role":"user","content":"Hello"}],"stream":true}'

Request-ID: Send X-Request-ID or Request-Id in the request; the same value is echoed in the response header and in the JSON id field (or in each streamed chunk id). If omitted, the server generates a UUID.

Python example: From the class1_resources root, run scripts/streaming_chat.py (RelayServe must be running on 8080):

cd /path/to/class1_resources
python scripts/streaming_chat.py "Your prompt" "optional-request-id"

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
backends		backends
docs		docs
relayserve		relayserve
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
pyproject.toml		pyproject.toml
router.py		router.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RelayServe

Quick start

Multi-backend (llama.cpp)

Streaming and request ID

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RelayServe

Quick start

Multi-backend (llama.cpp)

Streaming and request ID

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages