SAI API Performance Monitoring by JaiOCP · Pull Request #2279 · opencomputeproject/SAI

JaiOCP · 2026-04-21T20:20:13Z

This PR brings in support for measuring SAI API performance. This is based on presentation done in OCP 2023.

Signed-off-by: JaiOCP <jai.kumar@broadcom.com>

rck-innovium

While most of the measurements can be done at the application level, this proposal provides a way to measure the metrics per object operation inside bulk APIs which cannot be done by application level performance monitoring.

rck-innovium · 2026-04-23T15:57:36Z

As discussed, the community concluded that we should not preserve this perfmon data across warmboot (especially since we thought it does not make sense for warm upgrades/ downgrades)

deepak-singhal0408 · 2026-05-07T22:55:39Z

@JaiOCP, could you address the comments? thanks,

Signed-off-by: JaiOCP <jai.kumar@broadcom.com>

JaiOCP · 2026-05-08T00:33:32Z

Review comments address. Please take a look @j-bos @rck-innovium

JaiOCP · 2026-05-08T00:34:16Z

@JaiOCP, could you address the comments? thanks,

Commens addressed. Please review

deepak-singhal0408 · 2026-05-14T07:39:38Z

Question: Per-object latency with aggregated reads across variable batch sizes

The spec describes PERFDATA as clear-on-read, with AVG_LATENCY computed across multiple API invocations between reads. In a typical route convergence scenario, orchagent may issue several bulk_create calls with varying batch sizes (e.g., 50, 3000, 200) before reading PERFDATA.

The returned average latency is per-call — but each call processes a different number of objects. Without knowing the total object count across those calls, the consumer cannot derive per-object latency:

Call 1: bulk_create(50 routes)   → 50µs
Call 2: bulk_create(3000 routes) → 900µs  
Call 3: bulk_create(200 routes)  → 100µs
Read AVG_LATENCY → (50+900+100)/3 = 350µs per-call avg

But per-route avg = (50+900+100)/(50+3000+200) = 0.32µs
    ← cannot be derived without total object count

The previous revision (#2265) addressed this with num_objects in sai_perfdata_t. Would it make sense to add an object count attribute (e.g., SAI_PERFMON_ATTR_OBJECT_COUNT, also READ_ONLY + clear-on-read) so consumers can compute per-object metrics from aggregated reads?

kcudnik · 2026-05-14T08:57:07Z

+    /**
+     * @brief SAI Performance Monitoring API set
+     */


this commet is no needed

kcudnik · 2026-05-14T09:00:13Z

Question: Per-object latency with aggregated reads across variable batch sizes

The spec describes PERFDATA as clear-on-read, with AVG_LATENCY computed across multiple API invocations between reads. In a typical route convergence scenario, orchagent may issue several bulk_create calls with varying batch sizes (e.g., 50, 3000, 200) before reading PERFDATA.

The returned average latency is per-call — but each call processes a different number of objects. Without knowing the total object count across those calls, the consumer cannot derive per-object latency:
Call 1: bulk_create(50 routes)   → 50µs
Call 2: bulk_create(3000 routes) → 900µs  
Call 3: bulk_create(200 routes)  → 100µs
Read AVG_LATENCY → (50+900+100)/3 = 350µs per-call avg

But per-route avg = (50+900+100)/(50+3000+200) = 0.32µs
    ← cannot be derived without total object count
The previous revision (#2265) addressed this with num_objects in sai_perfdata_t. Would it make sense to add an object count attribute (e.g., SAI_PERFMON_ATTR_OBJECT_COUNT, also READ_ONLY + clear-on-read) so consumers can compute per-object metrics from aggregated reads?

i would assume that create_bulk route is one of the heaviest api to call, and i would guess that other bulk api maybe faster, and readig performance also should be fast, internally it should be just reading/copying a table

JaiOCP · 2026-05-14T19:08:55Z

Question: Per-object latency with aggregated reads across variable batch sizes

The spec describes PERFDATA as clear-on-read, with AVG_LATENCY computed across multiple API invocations between reads. In a typical route convergence scenario, orchagent may issue several bulk_create calls with varying batch sizes (e.g., 50, 3000, 200) before reading PERFDATA.

The returned average latency is per-call — but each call processes a different number of objects. Without knowing the total object count across those calls, the consumer cannot derive per-object latency:
Call 1: bulk_create(50 routes)   → 50µs
Call 2: bulk_create(3000 routes) → 900µs  
Call 3: bulk_create(200 routes)  → 100µs
Read AVG_LATENCY → (50+900+100)/3 = 350µs per-call avg

But per-route avg = (50+900+100)/(50+3000+200) = 0.32µs
    ← cannot be derived without total object count
The previous revision (#2265) addressed this with num_objects in sai_perfdata_t. Would it make sense to add an object count attribute (e.g., SAI_PERFMON_ATTR_OBJECT_COUNT, also READ_ONLY + clear-on-read) so consumers can compute per-object metrics from aggregated reads?

HI Deepak,

As we talked about this, computation done this way is a wrong implementation in SAI adapter.
SAI Adapter MUST do as follows:
Call 1: bulk_create(50 routes) → 50µs

Call 2: bulk_create(3000 routes) → 900µs
Call 3: bulk_create(200 routes) → 100µs
Read AVG_LATENCY → (50+900+100)/(50+3000+200) = 0.32µs
Essentially SAI Adapter need to maintain the object count for computing the average till next clear on read.
.

JaiOCP · 2026-05-15T17:31:43Z

@rck-innovium @j-bos Please approve the PR

SAI API Performance Monitoring

3af7fdd

Signed-off-by: JaiOCP <jai.kumar@broadcom.com>

rck-innovium suggested changes Apr 23, 2026

View reviewed changes

Comment thread doc/perfmon/SAI-perfmon-Spec.md

Comment thread doc/perfmon/SAI-perfmon-Spec.md Outdated

j-bos reviewed Apr 23, 2026

View reviewed changes

Comment thread inc/saiswitch.h

j-bos reviewed Apr 23, 2026

View reviewed changes

Comment thread doc/perfmon/SAI-perfmon-Spec.md Outdated

tjchadaga added the reviewed PR is discussed in SAI Meeting label Apr 28, 2026

JaiOCP added 2 commits May 7, 2026 16:27

Merge branch 'opencomputeproject:master' into perfmon

e6fde37

SAI API Performance Monitoring

dcf7229

Signed-off-by: JaiOCP <jai.kumar@broadcom.com>

kcudnik reviewed May 14, 2026

View reviewed changes

deepak-singhal0408 approved these changes May 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SAI API Performance Monitoring#2279

SAI API Performance Monitoring#2279
JaiOCP wants to merge 3 commits into
opencomputeproject:masterfrom
JaiOCP:perfmon

JaiOCP commented Apr 21, 2026

Uh oh!

rck-innovium left a comment

Uh oh!

Uh oh!

Uh oh!

rck-innovium commented Apr 23, 2026

Uh oh!

Uh oh!

Uh oh!

deepak-singhal0408 commented May 7, 2026

Uh oh!

JaiOCP commented May 8, 2026

Uh oh!

JaiOCP commented May 8, 2026

Uh oh!

deepak-singhal0408 commented May 14, 2026

Uh oh!

kcudnik May 14, 2026

Uh oh!

kcudnik commented May 14, 2026

Uh oh!

JaiOCP commented May 14, 2026

Uh oh!

JaiOCP commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

JaiOCP commented Apr 21, 2026

Uh oh!

rck-innovium left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rck-innovium commented Apr 23, 2026

Uh oh!

Uh oh!

Uh oh!

deepak-singhal0408 commented May 7, 2026

Uh oh!

JaiOCP commented May 8, 2026

Uh oh!

JaiOCP commented May 8, 2026

Uh oh!

deepak-singhal0408 commented May 14, 2026

Uh oh!

kcudnik May 14, 2026

Choose a reason for hiding this comment

Uh oh!

kcudnik commented May 14, 2026

Uh oh!

JaiOCP commented May 14, 2026

Uh oh!

JaiOCP commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants