Skip to content

Feat: Support creation of run_metadata.json at the end of benchmark run#374

Open
anandhu-eng wants to merge 6 commits into
mainfrom
feat/runmetadata
Open

Feat: Support creation of run_metadata.json at the end of benchmark run#374
anandhu-eng wants to merge 6 commits into
mainfrom
feat/runmetadata

Conversation

@anandhu-eng

@anandhu-eng anandhu-eng commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Related issues

Testing

  • Tests added/updated
  • All tests pass locally
  • Manual testing completed

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

anandhu-eng and others added 4 commits June 23, 2026 19:35
Captures qps, request/TTFT/TPOT latency percentiles, and scalar metrics
into a run_metadata.json artifact in the report directory. Infrastructure
fields (tensor_parallel, config_summary, etc.) are left as None for a
separate sysinfo population step. Failure to write run_metadata.json is
isolated so results.json is never suppressed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces datetime.now(UTC) (finalization time) with
monotime_to_datetime(bench.session.start_time_ns).date().isoformat(),
recording the date the benchmark actually started as a YYYY-MM-DD string.
Also threads start_time_ns through _build_run_metadata as a keyword arg
and updates all test call sites to pass _START_NS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
datetime.fromtimestamp() in monotime_to_datetime() returns naive local
time. .astimezone(UTC) before .date() ensures run_date is always the
UTC calendar date, making the value consistent across machines in
different timezones.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add concurrency field (from load_pattern.target_concurrency)
- Rename measured_latency_*_avg → measured_latency_*_average
- Reorder all fields to match the canonical submission field order

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@anandhu-eng anandhu-eng requested a review from a team June 23, 2026 20:29
@github-actions

Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@github-actions github-actions Bot requested review from arekay-nv and nvzhihanj June 23, 2026 20:29

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the generation of a run_metadata.json file containing benchmark run metrics during finalization, along with comprehensive unit tests. The review feedback highlights a potential ZeroDivisionError if concurrency is zero, compatibility concerns with Python versions older than 3.11 due to the direct import of UTC from datetime, and a recommendation to explicitly specify UTF-8 encoding when writing the metadata file.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/inference_endpoint/commands/benchmark/execute.py
from dataclasses import dataclass, field
from dataclasses import replace as dataclass_replace
from datetime import datetime
from datetime import UTC, datetime

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Importing UTC directly from datetime was introduced in Python 3.11. If this project is run on Python 3.10 or older, this import will raise an ImportError. To ensure backward compatibility with older Python versions, it is safer to import timezone and use timezone.utc instead.

Suggested change
from datetime import UTC, datetime
from datetime import datetime, timezone

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine as the project uses python>3.12 as detailed here

Comment thread src/inference_endpoint/commands/benchmark/execute.py Outdated
Comment on lines +1032 to +1035
"run_date": monotime_to_datetime(start_time_ns)
.astimezone(UTC)
.date()
.isoformat(),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use timezone.utc instead of UTC to maintain compatibility with Python versions older than 3.11.

Suggested change
"run_date": monotime_to_datetime(start_time_ns)
.astimezone(UTC)
.date()
.isoformat(),
"run_date": monotime_to_datetime(start_time_ns)
.astimezone(timezone.utc)
.date()
.isoformat(),

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine as the project uses python>3.12 as detailed here

- Guard tps_per_user division against concurrency=0 (falsy check instead
  of `is not None`) to prevent ZeroDivisionError
- Add encoding="utf-8" to run_metadata.json open() for platform safety

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
),
"link_config": None,
"link_logs": None,
"measured_latency_ttft_min": _metric_stat(ttft, "min"),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Anandu, I would suggest that we have submission checker or other means to auto-generate the run_metadata.json by reading the results_summary.json in post-processing, i.e. make results_summary.json self complete and don't create more output file that duplicate contents.

By creating more separate files with a similar set of metrics, we are increasing the entropy of the system and decreasing the maintainability of the module. Not all scenarios will have the TTFT/TPOT in endpoints (e.g. T2V, VLM).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can discuss in slack or iterate on the exact requirements. Right now we have points.yaml and results_summary.json which should have all the information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants