|
| 1 | +# Worker-Reported Perf Metrics (v1) |
| 2 | + |
| 3 | +`gen-worker` can optionally report best-effort performance/debug metrics to `gen-orchestrator` via the existing gRPC stream `WorkerSchedulerMessage.worker_event`. |
| 4 | + |
| 5 | +These metrics are: |
| 6 | + |
| 7 | +- Best-effort: metrics emission must never fail a run. |
| 8 | +- Safe: numbers and small strings only. No URLs, secrets, or file paths. |
| 9 | +- Optional: only emit keys when known; omit unknown fields entirely. |
| 10 | + |
| 11 | +## Canonical Events |
| 12 | + |
| 13 | +These event types are designed to be stable and low-cardinality. `gen-orchestrator` can persist them into dedicated columns. |
| 14 | + |
| 15 | +- `metrics.compute.started` payload: `{ "at": "<rfc3339>" }` |
| 16 | +- `metrics.compute.completed` payload: `{ "at": "<rfc3339>" }` |
| 17 | +- `metrics.fetch` payload: `{ "ms": <int> }` (use `0` for warm disk hits) |
| 18 | +- `metrics.gpu_load` payload: `{ "ms": <int> }` |
| 19 | +- `metrics.inference` payload: `{ "ms": <int> }` |
| 20 | +- `metrics.tokens` payload: `{ "output_tokens": <int> }` (only when applicable) |
| 21 | + |
| 22 | +All times are milliseconds as integers. |
| 23 | + |
| 24 | +## Extended Debug Event |
| 25 | + |
| 26 | +Additionally, the worker emits one extended event at the end of each run: |
| 27 | + |
| 28 | +- `metrics.run` payload: JSON object (schema versioned) |
| 29 | + |
| 30 | +### `metrics.run` payload (schema_version=1) |
| 31 | + |
| 32 | +Top-level keys (all optional unless noted): |
| 33 | + |
| 34 | +- `schema_version` (required): `1` |
| 35 | +- `function_name`: string |
| 36 | +- `cache_state`: `hot_vram | warm_disk | cold_remote` |
| 37 | +- `models`: array of objects (best-effort per required model) |
| 38 | +- `pipeline_init_ms`: int |
| 39 | +- `gpu_load_ms`: int |
| 40 | +- `warmup_ms`: int (only for first warmup run; otherwise omit) |
| 41 | +- `inference_ms`: int |
| 42 | +- diffusion-ish extras (optional): `steps`, `iters_per_s`, `width`, `height`, `guidance` |
| 43 | +- post (optional): `png_encode_ms`, `upload_ms` |
| 44 | +- resources (optional): `peak_vram_bytes`, `peak_ram_bytes` |
| 45 | + |
| 46 | +Per-model object keys (all optional unless noted): |
| 47 | + |
| 48 | +- `model_id` (required): canonical model id used by worker/scheduler |
| 49 | +- `variant_label`: string |
| 50 | +- `snapshot_digest`: string |
| 51 | +- `cache_state`: `hot_vram | warm_disk | cold_remote` |
| 52 | +- `bytes_downloaded`: int (0 if none) |
| 53 | +- `download_ms`: int (0 if warm disk hit) |
| 54 | +- `bytes_read_disk`: int |
| 55 | + |
| 56 | +## Notes |
| 57 | + |
| 58 | +- `metrics.fetch` is primarily the time spent ensuring required model blobs are present on disk (remote download vs warm disk hit). |
| 59 | +- `metrics.gpu_load` is best-effort and currently reflects time spent moving injected model objects to the worker device when supported. |
| 60 | +- `metrics.inference` is best-effort and currently reflects time spent executing the user function body (not including scheduler queueing). |
| 61 | + |
0 commit comments