Skip to content

feat: add configurable live metrics publishing for CPU servers#363

Open
attafosu wants to merge 6 commits into
mainfrom
feat/attafosu/optional-live-metrics
Open

feat: add configurable live metrics publishing for CPU servers#363
attafosu wants to merge 6 commits into
mainfrom
feat/attafosu/optional-live-metrics

Conversation

@attafosu

Copy link
Copy Markdown
Collaborator

What does this PR do?

Summary

This PR adds a runtime toggle to control live metrics publishing in the metrics aggregator:

  • New config field: settings.runtime.enable_live_metrics (default: true)
  • New CLI flag: --enable-live-metrics
  • When disabled, the aggregator uses publish_interval=0, which skips periodic live snapshot ticks while preserving final snapshot/report generation.

Why

On CPU-only server deployments, periodic live snapshot building can compete with inference workloads for CPU cycles.
This change allows disabling live ticks to reduce contention while keeping end-of-run reporting intact.

What Changed

  • Added enable_live_metrics to runtime schema and runtime settings (default true)
  • Updated benchmark execution path to pass publish interval based on enable_live_metrics
  • Updated metrics publisher to treat publish_interval <= 0 as live publishing disabled
  • Regenerated full config templates with the new field
  • Added unit coverage for disabled live publishing behavior

Validation

  • Pre-commit/type checks pass
  • Unit test added for disabled live publish path
  • Functional behavior verified in both modes:
    enable_live_metrics=true (existing behavior)
    enable_live_metrics=false (no periodic live ticks, final report still generated)

Snapshots

Performance snapshots:

  • Enabled (existing and behaviour)
2026-06-21 18:35:31,496 - inference_endpoint.commands.benchmark.execute - INFO - ----------------- Summary -----------------
2026-06-21 18:35:31,496 - inference_endpoint.commands.benchmark.execute - INFO - Version: 0.1.0
2026-06-21 18:35:31,496 - inference_endpoint.commands.benchmark.execute - INFO - Git SHA: 5825f83
2026-06-21 18:35:31,496 - inference_endpoint.commands.benchmark.execute - INFO - Total samples issued: 2048
2026-06-21 18:35:31,496 - inference_endpoint.commands.benchmark.execute - INFO - Total samples completed: 2048
2026-06-21 18:35:31,496 - inference_endpoint.commands.benchmark.execute - INFO - Total samples failed: 0
2026-06-21 18:35:31,496 - inference_endpoint.commands.benchmark.execute - INFO - Duration: 371.66 seconds
2026-06-21 18:35:31,496 - inference_endpoint.commands.benchmark.execute - INFO - QPS: 5.51
2026-06-21 18:35:31,496 - inference_endpoint.commands.benchmark.execute - INFO - TPS: 705.33
2026-06-21 18:35:31,496 - inference_endpoint.commands.benchmark.execute - INFO - ----------------- End of Summary -----------------
  • Disabled (enable_live_metrics: false)
2026-06-21 18:22:16,876 - inference_endpoint.commands.benchmark.execute - INFO - ----------------- Summary -----------------         
2026-06-21 18:22:16,876 - inference_endpoint.commands.benchmark.execute - INFO - Version: 0.1.0                                                                                                           
2026-06-21 18:22:16,876 - inference_endpoint.commands.benchmark.execute - INFO - Git SHA: 5825f83                                                                                                         
2026-06-21 18:22:16,876 - inference_endpoint.commands.benchmark.execute - INFO - Total samples issued: 2048                          
2026-06-21 18:22:16,876 - inference_endpoint.commands.benchmark.execute - INFO - Total samples completed: 2048                                              
2026-06-21 18:22:16,876 - inference_endpoint.commands.benchmark.execute - INFO - Total samples failed: 0                                                   
2026-06-21 18:22:16,876 - inference_endpoint.commands.benchmark.execute - INFO - Duration: 265.30 seconds                            
2026-06-21 18:22:16,876 - inference_endpoint.commands.benchmark.execute - INFO - QPS: 7.72                                                                                                                
2026-06-21 18:22:16,876 - inference_endpoint.commands.benchmark.execute - INFO - TPS: 988.12                                                                                                              
2026-06-21 18:22:16,876 - inference_endpoint.commands.benchmark.execute - INFO - ----------------- End of Summary -----------------

These artifacts illustrate reduced CPU contention and improved workload stability when live metrics are disabled on CPU servers.

Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Related issues

Testing

  • Tests added/updated
  • All tests pass locally
  • Manual testing completed

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

attafosu added 2 commits June 21, 2026 17:39
Add a new configuration parameter 'enable_live_metrics' (default=True) that
controls periodic live metrics publishing. This addresses resource contention issues
where the metrics aggregator subprocess competing for CPU cycles causes CPU servers to starve for cycles.

When disabled (--enable-live-metrics=false or runtime.enable_live_metrics=false
in YAML), the aggregator skips the live tick task, eliminating the periodic
registry.build_snapshot() calls that cause CPU contention. Final snapshots
(used by Report) are unaffected and continue to provide exact metrics.

Changes:
- RuntimeConfig: Add 'enable_live_metrics: bool = True' parameter (CLI/YAML)
- RuntimeSettings: Add field with default=True for backward compatibility
- MetricsPublisher: Treat 'publish_interval_s <= 0' as disabled state (log and skip tick task)
- execute.py: Conditionally create metrics subscriber only when enabled,
  pass publish-interval to aggregator (0.25 if enabled, 0 if disabled),
  add None checks for conditional subscriber usage
- Config templates: Regenerated with enable_live_metrics field
- test_publisher.py: Add unit test for disabled publish_interval_s path

Backward compatible: Default=True maintains existing behavior.
Fixes CPU contention on CPU-only server deployments where metrics aggregator
competes with inference workload for shared L3/LLC resources.

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
@attafosu attafosu requested review from a team and Copilot June 22, 2026 01:56
@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the ability to disable live metrics publishing via a new configuration option enable_live_metrics. When disabled, the live metrics tick task is skipped, while final snapshots are still written. The review feedback suggests two improvements: first, reordering the checks in MetricsPublisher.start to ensure that duplicate calls still trigger a warning even when live publishing is disabled; second, adding a negative CLI alias (--no-live-metrics) to the configuration schema for a more intuitive user experience.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/inference_endpoint/async_utils/services/metrics_aggregator/publisher.py Outdated
Comment thread src/inference_endpoint/config/schema.py

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a runtime-configurable switch to disable periodic live metrics snapshot publishing (while preserving final snapshot/report generation) to reduce CPU contention on CPU-only deployments.

Changes:

  • Introduces settings.runtime.enable_live_metrics (default true) and plumbs it through RuntimeSettings.
  • Passes --publish-interval 0 to the metrics aggregator when live publishing is disabled, and updates MetricsPublisher.start() to treat publish_interval_s <= 0 as “no live tick task”.
  • Updates full config templates and adds a unit test covering the disabled-live-publishing path.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/unit/async_utils/services/metrics_aggregator/test_publisher.py Adds unit coverage ensuring publish_interval_s <= 0 skips the tick task but still produces a final snapshot.
src/inference_endpoint/config/templates/online_template_full.yaml Adds settings.runtime.enable_live_metrics to the full online template.
src/inference_endpoint/config/templates/offline_template_full.yaml Adds settings.runtime.enable_live_metrics to the full offline template.
src/inference_endpoint/config/templates/concurrency_template_full.yaml Adds settings.runtime.enable_live_metrics to the full concurrency template.
src/inference_endpoint/config/schema.py Adds the new runtime config field and CLI alias --enable-live-metrics.
src/inference_endpoint/config/runtime_settings.py Carries enable_live_metrics into immutable RuntimeSettings.
src/inference_endpoint/commands/benchmark/execute.py Plumbs the toggle into metrics-aggregator launch args via --publish-interval.
src/inference_endpoint/async_utils/services/metrics_aggregator/publisher.py Treats publish_interval_s <= 0 as “live publishing disabled” (no tick task).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/inference_endpoint/config/schema.py
Comment on lines 124 to 137
if publish_interval_s <= 0:
logger.info(
"Live metrics publishing disabled "
"(publish_interval_s=%s, skipping tick task)",
publish_interval_s,
)
return
if self._tick_task is not None:
logger.warning(
"MetricsPublisher.start called again while tick task is "
"still running (id=%r); ignoring the second start.",
id(self._tick_task),
)
return
attafosu and others added 2 commits June 21, 2026 19:00
…/publisher.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 22, 2026 02:15

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Comment thread src/inference_endpoint/config/schema.py
Add negative alias to `enable-live-metrics`

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 22, 2026 02:19

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Comment on lines +621 to +624
# Control live metrics publishing via config parameter
# publish_interval_s = 0 disables live tick task in publisher.start()
publish_interval_s = 0.25 if ctx.rt_settings.enable_live_metrics else 0.0
aggregator_args.extend(["--publish-interval", str(publish_interval_s)])
@attafosu

attafosu commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

@viraatc @arekay-nv @nvzhihanj Can you please take a look? It fixes a perf regression for CPU servers as a result of #306 (periodic and live metric snapshots)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants