Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,81 @@ Each release lists pull requests grouped by category, with the most recent versi

---

## [v1.3.0](https://github.com/NVIDIA/ncx-infra-controller-rest/releases/tag/v1.3.0)

### Features

- **Add system job scheduler in RLA with trigger and overlap policies** ([#352](https://github.com/NVIDIA/ncx-infra-controller-rest/pull/352))
Replaces ad-hoc inventory sync and leak detection go-routines with a structured scheduling framework. Each job is defined with a configurable trigger (timer, cron, trigger-once, or event-driven), an overlap policy, and a worker, providing graceful and forceful shutdown support.

- **Add support for updating InfiniBand Partition data on Site** ([#334](https://github.com/NVIDIA/ncx-infra-controller-rest/pull/334))
Implements end-to-end InfiniBand Partition update propagation to the Site Controller. The API handler now starts a site workflow after a successful update to REST DB cache, wiring through proto definitions, Temporal workflows, and activities consistent with the existing create/delete patterns.

- **Add net.HardwareAddr wrapper for BMC MAC JSON marshaling** ([#369](https://github.com/NVIDIA/ncx-infra-controller-rest/pull/369))
Introduces a `net.HardwareAddr` wrapper type that provides proper JSON marshaling and unmarshaling for BMC MAC addresses, replacing raw byte-slice serialization with human-readable colon-separated format.

### Bug Fixes

- **Include name in update request of NVLink Partition Update** ([#373](https://github.com/NVIDIA/ncx-infra-controller-rest/pull/373))
Ensures new or existing partition name is included in NVLink Logical Partition update requests to Site, since Site expects the update request to reflect the full data.

Comment thread
coderabbitai[bot] marked this conversation as resolved.
- **Require TLS certs by default for RLA/PSM/NVSM and IPAM server** ([#333](https://github.com/NVIDIA/ncx-infra-controller-rest/pull/333))
RLA, PSM, and NSM now refuse to start without TLS certificates unless `ALLOW_INSECURE_GRPC=true` is explicitly set, hardening the default security posture. Also IPAM gRPC server now supports/requires TLS specification.
Comment on lines +26 to +27
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Resolve service name inconsistency: NVSM vs NSM.

Line 26 says NVSM while Line 27 says NSM. Please use one canonical service name in both title and description to avoid ambiguity in release communication.

As per coding guidelines, "**/*.md: Assess the documentation for misspellings, grammatical errors, missing documentation and correctness".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG.md` around lines 26 - 27, The changelog entry uses inconsistent
service names: "NVSM" in the title and "NSM" in the description; update the
title to use the canonical "NSM" so both the heading and body consistently refer
to NSM (leave RLA, PSM, and IPAM unchanged), ensuring the phrase becomes
"**Require TLS certs by default for RLA/PSM/NSM and IPAM server**" and the
description aligns with that name.


- **Update default firmware update sequence for NSM to only include BMC and BIOS updates** ([#376](https://github.com/NVIDIA/ncx-infra-controller-rest/pull/376))
Narrows the default NSM firmware update sequence to BMC and BIOS components only, excluding unnecessary sub-component updates that could cause longer maintenance windows.

- **Prepare for Machine/InstanceType Association ID deprecation** ([#367](https://github.com/NVIDIA/ncx-infra-controller-rest/pull/367))
Adds Machine ID as a replacement for Instance Type/Machine Association ID for removal of assignment, introduces a dated deprecation window for association IDs and enabling clients to migrate smoothly.

Comment on lines +32 to +34
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tighten grammar in the association ID deprecation description.

Line 33 reads awkwardly (for removal of assignment...). Reword to a direct form (e.g., “for assignment removal” / “to support assignment removal”) to improve clarity.

As per coding guidelines, "**/*.md: Assess the documentation for misspellings, grammatical errors, missing documentation and correctness".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG.md` around lines 32 - 34, The CHANGELOG entry for "**Prepare for
Machine/InstanceType Association ID deprecation**" contains an awkward phrase
"for removal of assignment"; update the sentence to use a direct phrasing such
as "to support assignment removal" or "for assignment removal" so it reads
clearly (e.g., replace "for removal of assignment" with "to support assignment
removal") in the paragraph that mentions adding Machine ID as a replacement for
Instance Type/Machine Association ID and introducing the deprecation window.

- **Include NVLink and InfiniBand Interfaces while cleaning up Instance resources** ([#366](https://github.com/NVIDIA/ncx-infra-controller-rest/pull/366))
Fixes instance termination cleanup to also delete associated NVLink and InfiniBand interfaces, preventing orphaned network interface records. Includes a DB migration to remove previously orphaned interfaces.

- **Harden scheduler dispatcher correctness and unit tests of RLA** ([#364](https://github.com/NVIDIA/ncx-infra-controller-rest/pull/364))
Eliminates shared-state race conditions in the scheduler dispatcher by using `forceCtx` as the parent for all job contexts, fixes event-draining on queue exhaustion, and replaces timing-sensitive tests with deterministic assertions.

- **Added status field in NVLink Interface summary API model** ([#363](https://github.com/NVIDIA/ncx-infra-controller-rest/pull/363))
Adds the missing `status` field to the NVLink Interface summary API model, allowing consumers to view interface status when listing NVLink Interfaces within an NVLink Partition.

- **Fix bringup sequence, NSM stale records, and unify tray/rack type enums** ([#377](https://github.com/NVIDIA/ncx-infra-controller-rest/pull/377))
Addresses several issues found during rack bring-up and firmware update testing: replaces the default BringUp rule's ingestion-based power-on with standard `PowerControl` to avoid BMC MAC lookup failures, restructures firmware upgrade sequencing from parallel to staged execution (compute then NVLSwitch then power recycle), fixes Temporal serialization loss of `FirmwareControlTaskInfo` across child workflow boundaries, filters stale firmware update records in NSM's `GetUpdatesForSwitch` to prevent old failures from masking current successes, and unifies component type enum naming across Tray and Rack API endpoints to PascalCase (`Compute`, `NVLSwitch`, `PowerShelf`, etc.).

- **Add dev mode for RLA service** ([#360](https://github.com/NVIDIA/ncx-infra-controller-rest/pull/360))
Introduces an `RLA_ENV` environment variable that gates development-only features: gRPC reflection is enabled only in dev mode, and the log level defaults to debug in dev mode versus info in production, preventing accidental exposure of diagnostic interfaces in deployed environments.

- **Skip config filter in DB if no config query params are set when retrieving all Sites** ([#379](https://github.com/NVIDIA/ncx-infra-controller-rest/pull/379))
Fixes a bug where the Site list handler unconditionally applied an empty JSONB config filter, causing sites with a NULL config column to be silently excluded from results. Site listing now only applies config filtering when at least one config query parameter is explicitly provided.

- **Maintain association record when Instance Type is updated in Machine inventory** ([#383](https://github.com/NVIDIA/ncx-infra-controller-rest/pull/383))
When a Machine's Instance Type changes during inventory sync, the Machine/InstanceType association record is now updated alongside the Machine attribute itself, keeping both representations consistent until the association ID is fully deprecated.

- **Have PSM read firmware files at startup time rather than using an embedded filesystem** ([#385](https://github.com/NVIDIA/ncx-infra-controller-rest/pull/385))
Switches PowerShelf Manager from compile-time embedded firmware binaries to runtime file loading at startup, allowing firmware images to be updated by replacing files on disk without recompiling the service.

### Refactoring

- **Require Ready status for targeted machine instance creation** ([#357](https://github.com/NVIDIA/ncx-infra-controller-rest/pull/357))
Targeted instance creation now enforces that the specified machine must be in `Ready` status or in `Error` (health alerts) or `Maintenance` status with the Core state being `Ready` (when `allowUnhealthyMachine` flag is set).

### Chores

- **Replace hardcoded API name in path in TUI using helper** ([#356](https://github.com/NVIDIA/ncx-infra-controller-rest/pull/356))
Replaces all 86 hardcoded `/v2/org/{org}/carbide/...` path strings in the TUI with calls to a new `apiPath` helper, making path construction consistent with the SDK's configurable API name support.

- **Rename Site Agent and mock Core/RLA server binary** ([#365](https://github.com/NVIDIA/ncx-infra-controller-rest/pull/365))
Renames Site Agent and mock server binaries as part of the Site Agent v2 preparation, and removes residual database references from the stateless agent.

- **Update Core proto and improve firmware update sequencing in RLA** ([#361](https://github.com/NVIDIA/ncx-infra-controller-rest/pull/361))
Aligns RLA snapshot of Core proto, improves firmware version matching between input requests and observed state, and enables RLA to update the `firmware_autoupdate` flag for machines.

- **Update Core proto snapshot for REST components** ([#251](https://github.com/NVIDIA/ncx-infra-controller-rest/pull/251))
Introduces an idempotent `make core-proto` script that automates Core proto file snapshotting with handling for backwards-incompatible changes and REST-specific additions. Also removes deprecated non-paginated object retrieval fallback methods from Site Agent.

- **Add changelog with detailed history of released tags up to v1.2.1** ([#359](https://github.com/NVIDIA/ncx-infra-controller-rest/pull/359))
Adds a comprehensive CHANGELOG.md with professional descriptions for every pull request across all 12 released versions.

---

## [v1.2.1](https://github.com/NVIDIA/ncx-infra-controller-rest/releases/tag/v1.2.1) — 2026-04-07

### Features
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,4 @@
# See the License for the specific language governing permissions and
# limitations under the License.

1.2.1
1.3.0
8 changes: 4 additions & 4 deletions docs/index.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion helm/charts/carbide-rest/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name: carbide-rest
description: Umbrella chart for the Carbide REST API platform
type: application
version: 0.1.5
appVersion: "1.2.1"
appVersion: "1.3.0"
Comment thread
coderabbitai[bot] marked this conversation as resolved.
keywords:
- carbide
- forge
Expand Down
2 changes: 1 addition & 1 deletion openapi/spec.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
openapi: 3.1.0
info:
title: NCX Infra Controller REST API
version: 1.2.1
version: 1.3.0
summary: NCX Infra Controller REST API is the centralized RESTful gateway to access NCX Infra Controller service
description: 'NCX Infra Controller REST API allows users to create and manage resources e.g. VPC, Subnets, Instances across all connected NCX Infra Controller datacenters, also referred to as Sites.'
license:
Expand Down Expand Up @@ -876,7 +876,7 @@
patch:
summary: Update Site
operationId: update-site
responses:

Check warning on line 879 in openapi/spec.yaml

View workflow job for this annotation

GitHub Actions / Lint and Test / Lint OpenAPI

operation-4xx-response

Operation must have at least one `4XX` response.
'200':
description: OK
content:
Expand Down Expand Up @@ -922,7 +922,7 @@
delete:
summary: Delete Site
operationId: delete-site
responses:

Check warning on line 925 in openapi/spec.yaml

View workflow job for this annotation

GitHub Actions / Lint and Test / Lint OpenAPI

operation-4xx-response

Operation must have at least one `4XX` response.
'202':
description: Deletion request was accepted
tags:
Expand Down Expand Up @@ -1345,7 +1345,7 @@
delete:
summary: Delete Allocation
operationId: delete-allocation
responses:

Check warning on line 1348 in openapi/spec.yaml

View workflow job for this annotation

GitHub Actions / Lint and Test / Lint OpenAPI

operation-4xx-response

Operation must have at least one `4XX` response.
'202':
description: Accepted
description: |-
Expand All @@ -1359,7 +1359,7 @@
patch:
summary: Update Allocation
operationId: update-allocation
responses:

Check warning on line 1362 in openapi/spec.yaml

View workflow job for this annotation

GitHub Actions / Lint and Test / Lint OpenAPI

operation-4xx-response

Operation must have at least one `4XX` response.
'200':
description: OK
content:
Expand Down Expand Up @@ -1661,7 +1661,7 @@
delete:
summary: Delete Allocation Constraint
operationId: delete-allocation-constraint
responses:

Check warning on line 1664 in openapi/spec.yaml

View workflow job for this annotation

GitHub Actions / Lint and Test / Lint OpenAPI

operation-4xx-response

Operation must have at least one `4XX` response.
'202':
description: Accepted
description: |
Expand Down Expand Up @@ -6848,7 +6848,7 @@
description: ''
tags:
- Instance
'/v2/org/{org}/carbide/instance/{instanceId}/status-history':

Check warning on line 6851 in openapi/spec.yaml

View workflow job for this annotation

GitHub Actions / Lint and Test / Lint OpenAPI

no-ambiguous-paths

Paths should resolve unambiguously. Found two ambiguous paths: `/v2/org/{org}/carbide/instance/type/{instanceTypeId}` and `/v2/org/{org}/carbide/instance/{instanceId}/status-history`.
parameters:
- schema:
type: string
Expand Down Expand Up @@ -6910,7 +6910,7 @@
in: query
name: orderBy
description: Ordering for pagination query
'/v2/org/{org}/carbide/instance/{instanceId}/interface':

Check warning on line 6913 in openapi/spec.yaml

View workflow job for this annotation

GitHub Actions / Lint and Test / Lint OpenAPI

no-ambiguous-paths

Paths should resolve unambiguously. Found two ambiguous paths: `/v2/org/{org}/carbide/instance/type/{instanceTypeId}` and `/v2/org/{org}/carbide/instance/{instanceId}/interface`.
parameters:
- schema:
type: string
Expand Down Expand Up @@ -7017,7 +7017,7 @@
in: query
name: orderBy
description: Ordering for pagination query
'/v2/org/{org}/carbide/instance/{instanceId}/infiniband-interface':

Check warning on line 7020 in openapi/spec.yaml

View workflow job for this annotation

GitHub Actions / Lint and Test / Lint OpenAPI

no-ambiguous-paths

Paths should resolve unambiguously. Found two ambiguous paths: `/v2/org/{org}/carbide/instance/type/{instanceTypeId}` and `/v2/org/{org}/carbide/instance/{instanceId}/infiniband-interface`.
parameters:
- schema:
type: string
Expand Down Expand Up @@ -7124,7 +7124,7 @@
in: query
name: orderBy
description: Ordering for pagination query
'/v2/org/{org}/carbide/instance/{instanceId}/nvlink-interface':

Check warning on line 7127 in openapi/spec.yaml

View workflow job for this annotation

GitHub Actions / Lint and Test / Lint OpenAPI

no-ambiguous-paths

Paths should resolve unambiguously. Found two ambiguous paths: `/v2/org/{org}/carbide/instance/type/{instanceTypeId}` and `/v2/org/{org}/carbide/instance/{instanceId}/nvlink-interface`.
parameters:
- schema:
type: string
Expand Down Expand Up @@ -9815,7 +9815,7 @@
$ref: '#/components/responses/ForbiddenError'
tags:
- Rack
'/v2/org/{org}/carbide/rack/task/{id}':

Check warning on line 9818 in openapi/spec.yaml

View workflow job for this annotation

GitHub Actions / Lint and Test / Lint OpenAPI

no-ambiguous-paths

Paths should resolve unambiguously. Found two ambiguous paths: `/v2/org/{org}/carbide/rack/{id}/validation` and `/v2/org/{org}/carbide/rack/task/{id}`.
parameters:
- schema:
type: string
Expand Down
2 changes: 1 addition & 1 deletion sdk/standard/api_allocation.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_audit.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_dpu_extension_service.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_expected_machine.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_expected_power_shelf.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_expected_switch.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_infini_band_partition.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_infrastructure_provider.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_instance.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_instance_type.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_ip_block.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_machine.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_metadata.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_network_security_group.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_nv_link_logical_partition.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_operating_system.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_rack.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_service_account.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_site.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_sku.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_ssh_key.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_ssh_key_group.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_subnet.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_tenant.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_tenant_account.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_tray.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_user.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_vpc.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_vpc_peering.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion sdk/standard/api_vpc_prefix.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading