Feat: Support creation of run_metadata.json at the end of benchmark run#374
Feat: Support creation of run_metadata.json at the end of benchmark run#374anandhu-eng wants to merge 6 commits into
Conversation
Captures qps, request/TTFT/TPOT latency percentiles, and scalar metrics into a run_metadata.json artifact in the report directory. Infrastructure fields (tensor_parallel, config_summary, etc.) are left as None for a separate sysinfo population step. Failure to write run_metadata.json is isolated so results.json is never suppressed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces datetime.now(UTC) (finalization time) with monotime_to_datetime(bench.session.start_time_ns).date().isoformat(), recording the date the benchmark actually started as a YYYY-MM-DD string. Also threads start_time_ns through _build_run_metadata as a keyword arg and updates all test call sites to pass _START_NS. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
datetime.fromtimestamp() in monotime_to_datetime() returns naive local time. .astimezone(UTC) before .date() ensures run_date is always the UTC calendar date, making the value consistent across machines in different timezones. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add concurrency field (from load_pattern.target_concurrency) - Rename measured_latency_*_avg → measured_latency_*_average - Reorder all fields to match the canonical submission field order Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Code Review
This pull request introduces the generation of a run_metadata.json file containing benchmark run metrics during finalization, along with comprehensive unit tests. The review feedback highlights a potential ZeroDivisionError if concurrency is zero, compatibility concerns with Python versions older than 3.11 due to the direct import of UTC from datetime, and a recommendation to explicitly specify UTF-8 encoding when writing the metadata file.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| from dataclasses import dataclass, field | ||
| from dataclasses import replace as dataclass_replace | ||
| from datetime import datetime | ||
| from datetime import UTC, datetime |
There was a problem hiding this comment.
Importing UTC directly from datetime was introduced in Python 3.11. If this project is run on Python 3.10 or older, this import will raise an ImportError. To ensure backward compatibility with older Python versions, it is safer to import timezone and use timezone.utc instead.
| from datetime import UTC, datetime | |
| from datetime import datetime, timezone |
There was a problem hiding this comment.
I think this is fine as the project uses python>3.12 as detailed here
| "run_date": monotime_to_datetime(start_time_ns) | ||
| .astimezone(UTC) | ||
| .date() | ||
| .isoformat(), |
There was a problem hiding this comment.
There was a problem hiding this comment.
I think this is fine as the project uses python>3.12 as detailed here
- Guard tps_per_user division against concurrency=0 (falsy check instead of `is not None`) to prevent ZeroDivisionError - Add encoding="utf-8" to run_metadata.json open() for platform safety Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| ), | ||
| "link_config": None, | ||
| "link_logs": None, | ||
| "measured_latency_ttft_min": _metric_stat(ttft, "min"), |
There was a problem hiding this comment.
Hi Anandu, I would suggest that we have submission checker or other means to auto-generate the run_metadata.json by reading the results_summary.json in post-processing, i.e. make results_summary.json self complete and don't create more output file that duplicate contents.
By creating more separate files with a similar set of metrics, we are increasing the entropy of the system and decreasing the maintainability of the module. Not all scenarios will have the TTFT/TPOT in endpoints (e.g. T2V, VLM).
There was a problem hiding this comment.
We can discuss in slack or iterate on the exact requirements. Right now we have points.yaml and results_summary.json which should have all the information
What does this PR do?
Type of change
Related issues
Testing
Checklist