feat: add configurable live metrics publishing for CPU servers by attafosu · Pull Request #363 · mlcommons/endpoints

attafosu · 2026-06-22T01:56:30Z

What does this PR do?

Summary

This PR adds a runtime toggle to control live metrics publishing in the metrics aggregator:

New config field: settings.runtime.enable_live_metrics (default: true)
New CLI flag: --enable-live-metrics
When disabled, the aggregator uses publish_interval=0, which skips periodic live snapshot ticks while preserving final snapshot/report generation.

Why

On CPU-only server deployments, periodic live snapshot building can compete with inference workloads for CPU cycles.
This change allows disabling live ticks to reduce contention while keeping end-of-run reporting intact.

What Changed

Added enable_live_metrics to runtime schema and runtime settings (default true)
Updated benchmark execution path to pass publish interval based on enable_live_metrics
Updated metrics publisher to treat publish_interval <= 0 as live publishing disabled
Regenerated full config templates with the new field
Added unit coverage for disabled live publishing behavior

Validation

Pre-commit/type checks pass
Unit test added for disabled live publish path
Functional behavior verified in both modes:
enable_live_metrics=true (existing behavior)
enable_live_metrics=false (no periodic live ticks, final report still generated)

Snapshots

Performance snapshots:

Enabled (existing and behaviour)

2026-06-21 18:35:31,496 - inference_endpoint.commands.benchmark.execute - INFO - ----------------- Summary -----------------
2026-06-21 18:35:31,496 - inference_endpoint.commands.benchmark.execute - INFO - Version: 0.1.0
2026-06-21 18:35:31,496 - inference_endpoint.commands.benchmark.execute - INFO - Git SHA: 5825f83
2026-06-21 18:35:31,496 - inference_endpoint.commands.benchmark.execute - INFO - Total samples issued: 2048
2026-06-21 18:35:31,496 - inference_endpoint.commands.benchmark.execute - INFO - Total samples completed: 2048
2026-06-21 18:35:31,496 - inference_endpoint.commands.benchmark.execute - INFO - Total samples failed: 0
2026-06-21 18:35:31,496 - inference_endpoint.commands.benchmark.execute - INFO - Duration: 371.66 seconds
2026-06-21 18:35:31,496 - inference_endpoint.commands.benchmark.execute - INFO - QPS: 5.51
2026-06-21 18:35:31,496 - inference_endpoint.commands.benchmark.execute - INFO - TPS: 705.33
2026-06-21 18:35:31,496 - inference_endpoint.commands.benchmark.execute - INFO - ----------------- End of Summary -----------------

Disabled (enable_live_metrics: false)

2026-06-21 18:22:16,876 - inference_endpoint.commands.benchmark.execute - INFO - ----------------- Summary -----------------         
2026-06-21 18:22:16,876 - inference_endpoint.commands.benchmark.execute - INFO - Version: 0.1.0                                                                                                           
2026-06-21 18:22:16,876 - inference_endpoint.commands.benchmark.execute - INFO - Git SHA: 5825f83                                                                                                         
2026-06-21 18:22:16,876 - inference_endpoint.commands.benchmark.execute - INFO - Total samples issued: 2048                          
2026-06-21 18:22:16,876 - inference_endpoint.commands.benchmark.execute - INFO - Total samples completed: 2048                                              
2026-06-21 18:22:16,876 - inference_endpoint.commands.benchmark.execute - INFO - Total samples failed: 0                                                   
2026-06-21 18:22:16,876 - inference_endpoint.commands.benchmark.execute - INFO - Duration: 265.30 seconds                            
2026-06-21 18:22:16,876 - inference_endpoint.commands.benchmark.execute - INFO - QPS: 7.72                                                                                                                
2026-06-21 18:22:16,876 - inference_endpoint.commands.benchmark.execute - INFO - TPS: 988.12                                                                                                              
2026-06-21 18:22:16,876 - inference_endpoint.commands.benchmark.execute - INFO - ----------------- End of Summary -----------------

These artifacts illustrate reduced CPU contention and improved workload stability when live metrics are disabled on CPU servers.

Type of change

Bug fix
New feature
Documentation update
Refactor/cleanup

Related issues

Testing

Tests added/updated
All tests pass locally
Manual testing completed

Checklist

Code follows project style
Pre-commit hooks pass
Documentation updated (if needed)

Add a new configuration parameter 'enable_live_metrics' (default=True) that controls periodic live metrics publishing. This addresses resource contention issues where the metrics aggregator subprocess competing for CPU cycles causes CPU servers to starve for cycles. When disabled (--enable-live-metrics=false or runtime.enable_live_metrics=false in YAML), the aggregator skips the live tick task, eliminating the periodic registry.build_snapshot() calls that cause CPU contention. Final snapshots (used by Report) are unaffected and continue to provide exact metrics. Changes: - RuntimeConfig: Add 'enable_live_metrics: bool = True' parameter (CLI/YAML) - RuntimeSettings: Add field with default=True for backward compatibility - MetricsPublisher: Treat 'publish_interval_s <= 0' as disabled state (log and skip tick task) - execute.py: Conditionally create metrics subscriber only when enabled, pass publish-interval to aggregator (0.25 if enabled, 0 if disabled), add None checks for conditional subscriber usage - Config templates: Regenerated with enable_live_metrics field - test_publisher.py: Add unit test for disabled publish_interval_s path Backward compatible: Default=True maintains existing behavior. Fixes CPU contention on CPU-only server deployments where metrics aggregator competes with inference workload for shared L3/LLC resources. Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

github-actions · 2026-06-22T01:56:39Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

gemini-code-assist

Code Review

This pull request introduces the ability to disable live metrics publishing via a new configuration option enable_live_metrics. When disabled, the live metrics tick task is skipped, while final snapshots are still written. The review feedback suggests two improvements: first, reordering the checks in MetricsPublisher.start to ensure that duplicate calls still trigger a warning even when live publishing is disabled; second, adding a negative CLI alias (--no-live-metrics) to the configuration schema for a more intuitive user experience.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Copilot

Pull request overview

Adds a runtime-configurable switch to disable periodic live metrics snapshot publishing (while preserving final snapshot/report generation) to reduce CPU contention on CPU-only deployments.

Changes:

Introduces settings.runtime.enable_live_metrics (default true) and plumbs it through RuntimeSettings.
Passes --publish-interval 0 to the metrics aggregator when live publishing is disabled, and updates MetricsPublisher.start() to treat publish_interval_s <= 0 as “no live tick task”.
Updates full config templates and adds a unit test covering the disabled-live-publishing path.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/unit/async_utils/services/metrics_aggregator/test_publisher.py	Adds unit coverage ensuring `publish_interval_s <= 0` skips the tick task but still produces a final snapshot.
src/inference_endpoint/config/templates/online_template_full.yaml	Adds `settings.runtime.enable_live_metrics` to the full online template.
src/inference_endpoint/config/templates/offline_template_full.yaml	Adds `settings.runtime.enable_live_metrics` to the full offline template.
src/inference_endpoint/config/templates/concurrency_template_full.yaml	Adds `settings.runtime.enable_live_metrics` to the full concurrency template.
src/inference_endpoint/config/schema.py	Adds the new runtime config field and CLI alias `--enable-live-metrics`.
src/inference_endpoint/config/runtime_settings.py	Carries `enable_live_metrics` into immutable `RuntimeSettings`.
src/inference_endpoint/commands/benchmark/execute.py	Plumbs the toggle into metrics-aggregator launch args via `--publish-interval`.
src/inference_endpoint/async_utils/services/metrics_aggregator/publisher.py	Treats `publish_interval_s <= 0` as “live publishing disabled” (no tick task).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        if publish_interval_s <= 0:
+            logger.info(
+                "Live metrics publishing disabled "
+                "(publish_interval_s=%s, skipping tick task)",
+                publish_interval_s,
+            )
+            return
        if self._tick_task is not None:
            logger.warning(
                "MetricsPublisher.start called again while tick task is "
                "still running (id=%r); ignoring the second start.",
                id(self._tick_task),
            )
            return


…/publisher.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Add negative alias to `enable-live-metrics` Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

+        # Control live metrics publishing via config parameter
+        # publish_interval_s = 0 disables live tick task in publisher.start()
+        publish_interval_s = 0.25 if ctx.rt_settings.enable_live_metrics else 0.0
+        aggregator_args.extend(["--publish-interval", str(publish_interval_s)])


attafosu · 2026-06-22T06:00:34Z

@viraatc @arekay-nv @nvzhihanj Can you please take a look? It fixes a perf regression for CPU servers as a result of #306 (periodic and live metric snapshots)

attafosu added 2 commits June 21, 2026 17:39

Apply precommit recs

5825f83

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

attafosu requested review from a team and Copilot June 22, 2026 01:56

github-actions Bot requested review from arekay-nv and nvzhihanj June 22, 2026 01:56

Copilot started reviewing on behalf of attafosu June 22, 2026 01:56 View session

gemini-code-assist Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread src/inference_endpoint/async_utils/services/metrics_aggregator/publisher.py Outdated

Comment thread src/inference_endpoint/config/schema.py

Copilot AI reviewed Jun 22, 2026

View reviewed changes

attafosu and others added 2 commits June 21, 2026 19:00

Merge branch 'main' into feat/attafosu/optional-live-metrics

f6a95b7

Update src/inference_endpoint/async_utils/services/metrics_aggregator…

35d4ffe

…/publisher.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Copilot AI review requested due to automatic review settings June 22, 2026 02:15

Copilot started reviewing on behalf of attafosu June 22, 2026 02:15 View session

Copilot AI reviewed Jun 22, 2026

View reviewed changes

Comment thread src/inference_endpoint/config/schema.py

Update src/inference_endpoint/config/schema.py

6a48d4f

Add negative alias to `enable-live-metrics` Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Copilot AI review requested due to automatic review settings June 22, 2026 02:19

Copilot started reviewing on behalf of attafosu June 22, 2026 02:20 View session

Copilot AI reviewed Jun 22, 2026

View reviewed changes

Merge branch 'main' into feat/attafosu/optional-live-metrics

58753f5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add configurable live metrics publishing for CPU servers#363

feat: add configurable live metrics publishing for CPU servers#363
attafosu wants to merge 6 commits into
mainfrom
feat/attafosu/optional-live-metrics

attafosu commented Jun 22, 2026

Uh oh!

github-actions Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

attafosu commented Jun 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

attafosu commented Jun 22, 2026

What does this PR do?

Summary

Why

What Changed

Validation

Snapshots

Type of change

Related issues

Testing

Checklist

Uh oh!

github-actions Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

attafosu commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jun 22, 2026 •

edited

Loading

attafosu commented Jun 22, 2026 •

edited

Loading