-
Notifications
You must be signed in to change notification settings - Fork 585
SAI API Performance Monitoring #2279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
JaiOCP
wants to merge
3
commits into
opencomputeproject:master
Choose a base branch
from
JaiOCP:perfmon
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,238 @@ | ||
| # Performance Monitoring SAI Specification | ||
| ------------------------------------------------------------------------------- | ||
| Title | SAI support for Performance Monitoring | ||
| :-------------|:----------------------------------------------------------------- | ||
| Authors | Jai Kumar, Broadcom Inc | ||
| Status | In review | ||
| Type | Standards track | ||
| Created | 03/18/2026: Initial Draft | ||
| SAI-Version | 1.19 | ||
| ------------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| ## 1.0 Introduction | ||
| As network fabric scale increases and data centers require regional spine connectivity, the number of downlinks for cluster connectivity is growing. This leads to more LAGs, more prefixes, and larger ECMP. This is also true for large scale up and scale across fabrics for AI/ML. | ||
|
|
||
| This increasing scale mandates that SAI be scalable, reliable, and high-performance. This specification addresses the performance component of SAI by introducing a new set of metrics to accurately measure the performance of various components within the SAI layer and below, such as SDK and hardware updates. | ||
|
|
||
| Using these metrics, deployments can isolate components impacting performance and focus on their optimization. | ||
|
|
||
|
|
||
|
|
||
| ## 2.0 Terms and Acronyms | ||
|
|
||
| | Term| Description | | ||
| |:---|:---| | ||
| | perfmon | Performance Metrics | | ||
|
|
||
| ## 3.0 Overview | ||
| The SAI infrastructure exposes a set of APIs as a standard interface to the upper layer. | ||
|
|
||
| These APIs are synchronous and blocking, making the completion time of any given API a critical performance measure. Note that application-specific callbacks are not addressed by this specification. | ||
|
|
||
| ``` | ||
| /** | ||
| * @brief SAI common API type | ||
| */ | ||
| typedef enum _sai_common_api_t | ||
| { | ||
| SAI_COMMON_API_CREATE = 0, | ||
| SAI_COMMON_API_REMOVE = 1, | ||
| SAI_COMMON_API_SET = 2, | ||
| SAI_COMMON_API_GET = 3, | ||
| SAI_COMMON_API_BULK_CREATE = 4, | ||
| SAI_COMMON_API_BULK_REMOVE = 5, | ||
| SAI_COMMON_API_BULK_SET = 6, | ||
| SAI_COMMON_API_BULK_GET = 7, | ||
| SAI_COMMON_API_MAX = 8, | ||
| } sai_common_api_t; | ||
|
|
||
| ``` | ||
|
|
||
| This specification proposes API performance measures for the following metrics | ||
| 1. Average Latency | ||
| 2. Instantaneous Latency | ||
| 3. Maximum Latency | ||
|
|
||
| ### 3.1 Average, Instantaneous, and Maximum Latency | ||
| API completion time consists of the time spent in the SAI adapter and the SDK, including hardware update or query time. Time measured is irrespetcive of the status of the API call i.e. if the API call completes with error status, adapter will still account the measured latency during the time interval of the metrics computation. NOS tracks the return status of API calls and can account for errors as needed. Discounting latency for specific error statuses would result in inconsistent measurements, requiring metric subscribers to implement manual workarounds for those cases. | ||
|
|
||
| These metrics can be used to: | ||
| - Improve SAI adapter and SDK implementations | ||
| - Provide a baseline for comparing different hardware | ||
| - Instantaneous value: last observed latency for the API call | ||
| - Maximum: The highest value observed across the last n invocations | ||
| - Average: The average value over the last n invocations. | ||
|
|
||
|
|
||
| ## 4.0 SAI Specification | ||
| New perfmon object is introduced. Each perfmon object specifies the object of interest, set of APIs and metrics to be measured for each API. | ||
|
|
||
|
|
||
| Each perfmon object created has a binding to the switch object. | ||
|
|
||
| ### 4.2 Perfmon Object | ||
| New perfmon object is introduced specifying API and metrics of interest. | ||
|
|
||
| #### 4.3.1 Metrics | ||
| Each API can be measure for a specific performance metrics as specified in sai_perfmon_metrics_t | ||
|
|
||
| ``` | ||
| /** | ||
| * @brief Performance Monitoring Metrics | ||
| */ | ||
| typedef enum _sai_perfmon_metrics_t | ||
| { | ||
| /** | ||
| * @brief None | ||
| */ | ||
| SAI_PERFMON_METRICS_NONE, | ||
|
|
||
| /** | ||
| * @brief Maximum latency observed | ||
| */ | ||
| SAI_PERFMON_METRICS_MAX_LATENCY, | ||
|
|
||
| /** | ||
| * @brief Average latency observed | ||
| */ | ||
| SAI_PERFMON_METRICS_AVERAGE_LATENCY, | ||
|
|
||
| /** | ||
| * @brief Instantaneous latency observed | ||
| */ | ||
| SAI_PERFMON_METRICS_INST_LATENCY, | ||
|
|
||
| } sai_perfmon_metrics_t; | ||
|
|
||
| ``` | ||
|
|
||
| #### 4.3.2 Perfmon Object Attributes | ||
| Type of API to be monitored for performance and its associated attributes are specified in the perfmon object attributes | ||
|
|
||
| ``` | ||
| /** | ||
| * @brief Performance Monitoring Attributes | ||
| */ | ||
| typedef enum _sai_perfmon_attr_t | ||
| { | ||
| /** | ||
| * @brief Start of Attributes | ||
| */ | ||
| SAI_PERFMON_ATTR_START, | ||
|
|
||
| /** | ||
| * @brief Object to be monitored | ||
| * | ||
| * @type sai_object_type_t | ||
| * @flags MANDATORY_ON_CREATE | CREATE_ONLY | ||
| */ | ||
| SAI_PERFMON_ATTR_OBJECT_TYPE = SAI_PERFMON_ATTR_START, | ||
|
|
||
| /** | ||
| * @brief API to be monitored | ||
| * | ||
| * @type sai_common_api_t | ||
| * @flags CREATE_AND_SET | ||
| */ | ||
| SAI_PERFMON_ATTR_COMMON_API, | ||
|
|
||
| /** | ||
| * @brief Performance metrics to be collected | ||
| * | ||
| * @type sai_perfmon_metrics_t | ||
| * @flags CREATE_AND_SET | ||
| * @default SAI_PERFMON_METRICS_NONE | ||
| */ | ||
| SAI_PERFMON_ATTR_PERFMON_METRICS, | ||
|
|
||
| /** | ||
| * @brief Performance data as collected. This is clear on read. | ||
| * Performance data is computed once enabled and is cleared once read. | ||
| * | ||
| * @type sai_uint64_t | ||
| * @flags READ_ONLY | ||
| */ | ||
| SAI_PERFMON_ATTR_PERFDATA, | ||
|
|
||
| /** | ||
| * @brief End of Performance Monitoring attributes | ||
| */ | ||
| SAI_PERFMON_ATTR_END, | ||
|
|
||
| /** Custom range base value */ | ||
| SAI_PERFMON_ATTR_CUSTOM_RANGE_START = 0x10000000, | ||
|
|
||
| /** End of custom range base */ | ||
| SAI_PERFMON_ATTR_CUSTOM_RANGE_END | ||
|
|
||
| } sai_perfmon_attr_t; | ||
|
|
||
| ``` | ||
|
|
||
| #### 4.3.3 Perfmon Object Switch Binding | ||
| List of perfmon objects can be bound to the switch object. This binding can be done as a SET operation when perfmon object is created. | ||
|
|
||
| ``` | ||
| /** | ||
| * @brief Performance Monitoring enabled on the switch | ||
| * | ||
| * @type sai_object_list_t | ||
| * @flags CREATE_AND_SET | ||
| * @objects SAI_OBJECT_TYPE_PERFMO$ | ||
| * @default empty | ||
| */ | ||
| SAI_SWITCH_ATTR_PERFMON_LIST, | ||
| ``` | ||
|
|
||
|
|
||
| ## 5.0 Sample Workflow | ||
|
|
||
| This section talks about enabling performance monitoring for a given API and a metrics. | ||
|
|
||
| ### 5.1 Create perfmon object | ||
| - Each perfmon object supports a single API and a single set of metrics. To monitor additional metrics for the same API or to monitor a different API, a new perfmon object must be created. | ||
| - Monitoring in the SAI adapter will only begin once the perfmon object is bound to the switch object. | ||
|
|
||
| ``` | ||
| /* | ||
| * Configure CSIG Compact Tag for ABW signal processing and time interval of 256 micro seconds | ||
| */ | ||
|
|
||
| // Specify the Object of intererst | ||
| sai_attr_list[0].id = SAI_PERFMON_ATTR_OBJECT_TYPE; | ||
| sai_attr_list[0].value.s32 = SAI_OBJECT_TYPE_ROUTE_ENTRY; | ||
|
|
||
| // Specify the API of interest | ||
| sai_attr_list[1].id = SAI_PERFMON_ATTR_COMMON_API; | ||
| sai_attr_list[1].value.s32 = SAI_COMMON_API_BULK_SET; | ||
|
|
||
| // Configure metrics to be measured | ||
| sai_attr_list[2].id = SAI_PERFMON_ATTR_PERFMON_METRICS; | ||
| sai_attr_list[2].value.s32 = SAI_PERFMON_METRICS_AVERAGE_LATENCY; | ||
|
|
||
| // Create perfmon object | ||
| attr_count = 3; | ||
| create_perfmon( | ||
| &sai_perfmon_object, | ||
| switch_id, | ||
| attr_count, | ||
| sai_attr_list); | ||
| ``` | ||
|
|
||
| ### 5.2 Read perfmon Metrics | ||
|
|
||
| Read the perfmon attribute for getting the API related metrics. | ||
|
|
||
| ``` | ||
| // Specify the read attribute | ||
| sai_attr_list[1].id = SAI_PERFMON_ATTR_PERFDATA; | ||
|
|
||
| // Read perfmon metrics | ||
| attr_count = 1; | ||
| get_perfmon_attribute( | ||
| sai_perfmon_object, | ||
| attr_count, | ||
| sai_attr_list); | ||
| ... | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.