KVScope

KVScope is a lightweight LLM inference observability and diagnostics tool. It is vLLM-first and reconstructs operational behavior from live serving telemetry.

KVScope focuses on:

KV cache pressure
scheduler pressure and queueing
prefill pressure
TTFT spikes and latency instability
decode throughput collapse
preemption behavior
runtime phase transitions
sustained condition episodes

KVScope is not a benchmark harness, not a Grafana replacement, and not a generic metrics dashboard.

Core Invariant

KV cache usage is always stored internally as percent units from 0.0 to 100.0.

Raw vLLM Prometheus metrics may expose vllm:kv_cache_usage_perc as a fraction such as 0.734. KVScope normalizes that to 73.4 at the adapter boundary.

CLI Commands

kvscope record --server http://127.0.0.1:8080 --duration 90
kvscope analyze results/sessions/<session-id>
kvscope timeline results/sessions/<session-id> --verbose
kvscope doctor results/sessions/<session-id> --verbose
kvscope dashboard --server http://127.0.0.1:8080 --density compact

The same commands can be run from source:

python -m kvscope.cli record --server http://127.0.0.1:8080 --duration 90
python -m kvscope.cli analyze results/sessions/<session-id>
python -m kvscope.cli timeline results/sessions/<session-id> --verbose
python -m kvscope.cli doctor results/sessions/<session-id> --verbose
python -m kvscope.cli dashboard --server http://127.0.0.1:8080 --density full

Dashboard density modes:

compact: default mode for normal terminal heights
full: includes detailed latency and cumulative counter panels

Session Outputs

kvscope record writes a session directory under results/sessions/ by default.

Core files:

metadata.json: session metadata, schema fields, source URL, and recorder settings
metrics.csv: normalized samples as CSV
samples.jsonl: normalized KVScopeSample records
events.jsonl: phase transitions, condition changes, condition episodes, and transient operational events

Analysis files:

summary.json: aggregate session metrics from kvscope analyze
doctor.json: ranked diagnostic hypotheses from kvscope doctor

Runtime Phases

Runtime phases are emitted as phase_transition records in events.jsonl.

Current phases:

IDLE: no active workload
HEALTHY: active workload without visible pressure
QUEUE_PRESSURE: requests are waiting in the scheduler
KV_PRESSURE_RISING: KV usage is high but not saturated
SATURATED: KV usage is in the saturation range

Classification priority:

IDLE
SATURATED
QUEUE_PRESSURE
KV_PRESSURE_RISING
HEALTHY

Historical sessions may contain PREFILL_PRESSURE as a phase. KVScope readers continue to support those records for backward compatibility.

Runtime Conditions

Runtime conditions are concurrent observations that can overlap with the primary phase.

Current conditions:

PREFILL_PRESSURE: prefill-side pressure during active serving
HIGH_PROMPT_INGESTION: high running request count plus high prompt token ingestion
TTFT_ELEVATED: elevated time to first token
QUEUE_DELAY: waiting requests or queue-time latency

Example:

State: KV_PRESSURE_RISING
Conditions:
  PREFILL_PRESSURE
  HIGH_PROMPT_INGESTION

Conditions are grouped into episodes when they are sustained. Episode records include:

start_time
end_time
duration_sec
peak_evidence

Event Timeline

kvscope timeline reads events.jsonl and renders operational behavior over time. It separates phase transitions from transient events such as TTFT_SPIKE, E2E_LATENCY_SPIKE, KV_SATURATION, and DECODE_THROUGHPUT_DROP. It also summarizes sustained condition episodes with duration and peak evidence.

Example:

python -m kvscope.cli timeline results/sessions/<session-id> --verbose
python -m kvscope.cli timeline results/sessions/<session-id> --json

Example Workflow

Start a vLLM server:

vllm serve /path/to/model \
  --host 127.0.0.1 \
  --port 8080 \
  --dtype float16 \
  --max-model-len 2048 \
  --gpu-memory-utilization 0.90

Record telemetry:

python -m kvscope.cli record --server http://127.0.0.1:8080 --duration 90

Run traffic with vLLM bench serve or another OpenAI-compatible client.

Inspect the run:

python -m kvscope.cli timeline results/sessions/<session-id> --verbose
python -m kvscope.cli doctor results/sessions/<session-id> --verbose

Workflow summary:

vLLM server -> kvscope dashboard -> kvscope record -> vllm bench serve -> kvscope timeline -> kvscope doctor

Conference-readiness docs:

docs/architecture.md
docs/demo_walkthrough.md
docs/conference_story.md
docs/evidence_pack.md
docs/cfp_notes.md

Tests

KVScope currently uses stdlib unittest.

python -m unittest discover -v
python -m compileall -q kvscope collectors analyzers visualizers tests

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
analyzers		analyzers
collectors		collectors
docs		docs
kvscope		kvscope
loaders		loaders
scripts		scripts
tests		tests
visualizers		visualizers
.gitignore		.gitignore
AGENTS.md		AGENTS.md
PROJECT_STATE.md		PROJECT_STATE.md
README.md		README.md
feature_list.json		feature_list.json
harness_migration_report.md		harness_migration_report.md
progress.md		progress.md
pyproject.toml		pyproject.toml
session_handoff.md		session_handoff.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KVScope

Core Invariant

CLI Commands

Session Outputs

Runtime Phases

Runtime Conditions

Event Timeline

Example Workflow

Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

KVScope

Core Invariant

CLI Commands

Session Outputs

Runtime Phases

Runtime Conditions

Event Timeline

Example Workflow

Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages