Streaming Inference Gateway

Small Rust service that simulates token streaming.

What it does

Accepts a prompt with POST /runs
Starts streaming tokens with GET /runs/:id/stream (SSE)
Sends tokens progressively instead of one big response
Stops safely on timeout, disconnect, or slow consumers

Run it

cargo run

Open http://localhost:8080 to use the simple test UI.

API

Create run

POST /runs
Content-Type: application/json

{ "prompt": "Explain async streaming" }

Response:

{ "id": "<uuid>" }

Stream run

GET /runs/<uuid>/stream
Accept: text/event-stream

SSE event types:

token
done
error

Notes

This is a demo generator, not a real LLM backend.
Runs are in-memory and removed when finished.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
static		static
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Streaming Inference Gateway

What it does

Run it

API

Create run

Stream run

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Streaming Inference Gateway

What it does

Run it

API

Create run

Stream run

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages