feat(metrics): add use_legacy_loadgen_qps_metrics for LoadGen-parity QPS/TPS#372
feat(metrics): add use_legacy_loadgen_qps_metrics for LoadGen-parity QPS/TPS#372viraatc wants to merge 1 commit into
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Code Review
This pull request introduces support for MLPerf LoadGen Server 'completed' window throughput metrics, including a configuration toggle (legacy_streaming_tps) to switch between legacy and LoadGen-aligned throughput calculations. However, several configuration templates were updated to use agentic_inference and agentic_inference_inline which are not yet supported by the schema in schema.py, leading to validation failures. Additionally, a potential state inconsistency was identified in metrics_table.py when handling out-of-bounds tracking block indices.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
0751e69 to
40dbf89
Compare
Review Council — Multi-AI Code ReviewReviewed by: Codex (gpt-5.5, xhigh) + Claude | Depth: standard | Commit: Found 4 findings (1 already fixed in this push). No critical/high blockers — the core numerator/denominator logic faithfully matches
Disposition: #1 fixed. #2 (Poisson start bias) and #3 (failed-last-issued philosophy + help wording) are author calls — see PR thread. #4 is a nice-to-have follow-up. |
40dbf89 to
d44bb21
Compare
Review Council — all findings addressed ✅Rebased onto current
Branch is rebased on top of (Findings were posted as a single summary comment, so there are no inline threads to resolve via the UI — this comment records the resolution.) |
d44bb21 to
fec276f
Compare
Update — finding #3 (error handling) revisedPer maintainer direction, LoadGen mode now handles errors exactly as endpoints already does — no special-casing:
This also matches LoadGen itself — verified against loadgen source: Branch rebased on |
8bacc8a to
3f5e8b5
Compare
|
Redesigned per review: the boolean |
3f5e8b5 to
39a99b2
Compare
|
Simplified to a single boolean |
75132b0 to
8c322de
Compare
…QPS/TPS Adds `use_legacy_loadgen_qps_metrics` on the poisson load pattern (default True). When on, QPS/TPS use the legacy MLPerf LoadGen Server "completed" definition: (completed-1)/T and tokens/T, where T spans from the first issued request to the completion of the last-issued request (see mlcommons/inference loadgen/results.cc). Disable via --no-use-legacy-loadgen-qps-metrics for endpoints-native completed/duration over the full run. Scoped to poisson (serialized out of the offline/concurrency templates) and gated to poisson in execute.py, so other patterns stay native. The metrics-aggregator subprocess emits one extra counter (legacy_loadgen_window_duration_ns); the timed load-generator path is unchanged. Falls back to the native window when the legacy window is unavailable (snapshots predating this change, or <2 completions). TODO(vir) to deprecate once a formal tail-cutting mechanism lands. Ref: mlcommons/inference loadgen/results.cc (Server scenario, completed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
8c322de to
53566b6
Compare
53566b6 to
4a6ceda
Compare
What
Adds
use_legacy_loadgen_qps_metrics(bool, defaultTrue). When on, QPS/TPS use the MLPerf LoadGen Server "completed" definition:QPS = (completed - 1) / T,TPS = tokens / TT= first issued request -> completion of the last-issued request (LoadGenfinal_query_all_samples_done_time; mlcommons/inferenceloadgen/results.cc).Disable with
--no-use-legacy-loadgen-qps-metricsfor endpoints-nativecompleted/durationandtokens/duration(window to the last request to finish) -- byte-identical to prior behavior.Notes
loadgen_window_duration_ns); the timed load-generator path is unchanged (no new events, no per-query work).~100 lines of source + tests; 261 unit tests pass, mypy/ruff/pre-commit clean.