Skip to content

docs(npu-operator): align installation page with 1.2.0 form + namespace layout#790

Open
luohua13 wants to merge 1 commit into
masterfrom
docs/npu-operator-1.2-form-update
Open

docs(npu-operator): align installation page with 1.2.0 form + namespace layout#790
luohua13 wants to merge 1 commit into
masterfrom
docs/npu-operator-1.2-form-update

Conversation

@luohua13
Copy link
Copy Markdown
Contributor

@luohua13 luohua13 commented May 14, 2026

Summary

Adapt the NPU Operator installation page to match the 1.2.0 release:

  • Namespace consolidation — every component (driver, device-plugin, npu-exporter, ascend-operator, noded, clusterd, resilience-controller, mindio-tft, mindio-acp, NFD, oci-runtime) now lives in the operator's own namespace (default npu-operator). Previously they were spread across kube-system, mindx-dl, volcano-system, and default.
  • Driver Version field — surfaced as a top-level form input (label Driver Version, default 25.5.0, hidden when Driver is disabled). The table row is renamed from Version and the default bumped accordingly.
  • Volcano hiddenvccontroller / vcscheduler toggles are intentionally absent from the form; install the platform's Volcano cluster plugin separately when enabling ClusterD.
  • Verification step — the kubectl wait for npu-driver switches from kube-system to npu-operator with a note about non-default namespaces.
  • Installing Monitor — the operator now ships an npu-exporter-servicemonitor ServiceMonitor in its own namespace, so the manual ServiceMonitor YAML (which created it in monitoring namespace selecting the npu-exporter namespace) is removed.

Verified against an actual 1.2.0 install on the g1-c2-arm cluster:

  • kubectl get pod -A | grep -E "npu|ascend|noded|clusterd|exporter|mindio" — all in npu-operator.
  • kubectl get servicemonitor -A | grep npu-exporter — auto-created in npu-operator.
  • kubectl -n npu-operator get ds npu-driver -o jsonpath='{.spec.template.spec.containers[0].env}'DRIVER_VERSION=25.5.0, HOST_DRIVER_SOURCE_PATH=/tmp/driver_pkg.

Test plan

  • Render the page locally; confirm the form table and notes display correctly.
  • Cross-check the table rows against the actual deployment form on a 1.2.0 OperatorHub install.
  • Verify the marketplace ServiceMonitor command finds the auto-created object on a fresh install with NPU Exporter enabled.

Summary by CodeRabbit

  • Documentation

    • Clarified operator-managed pods deploy to the operator namespace (default npu-operator); Volcano components excluded and provided via a separate Volcano cluster plugin
    • Updated verification instructions to reference the operator namespace
    • Monitoring: ServiceMonitor is now auto-created for npu-exporter
    • Default driver/firmware version bumped to 25.5.0
  • Release Notes

    • Added v1.2.0 and v1.1.3 notes; packaging switches to an OLM operator bundle (OperatorHub)
    • Upgrade warning: in-place upgrades from v1.1.3 or earlier are not supported—uninstall/reinstall required
  • New Features

    • MindCluster/Ascend stack upgraded to v7.3.0
  • Bug Fixes

    • Fixed Prometheus ServiceMonitor scraping for npu-exporter and other installation/detection fixes

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 19138ec2-22f4-4619-9cb0-b4aef235ca66

📥 Commits

Reviewing files that changed from the base of the PR and between 49fb5c8 and aec4888.

📒 Files selected for processing (2)
  • docs/en/hardware_accelerator/npu/npu_operator/installation.mdx
  • docs/en/hardware_accelerator/npu/npu_operator/release_notes.mdx
✅ Files skipped from review due to trivial changes (1)
  • docs/en/hardware_accelerator/npu/npu_operator/release_notes.mdx
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/en/hardware_accelerator/npu/npu_operator/installation.mdx

Walkthrough

The NPU Operator docs clarify that operator-managed pods deploy into the operator namespace (default npu-operator), update component defaults (driver/firmware 25.5.0), change verification to use the operator namespace, automate ServiceMonitor creation for NPU Exporter, and add v1.2.0/v1.1.3 release notes.

Changes

NPU Operator Installation Instructions

Layer / File(s) Summary
Deployment namespace, component defaults, and verification
docs/en/hardware_accelerator/npu/npu_operator/installation.mdx
Clarifies operator-managed pods are deployed into the operator namespace (default npu-operator), excludes Volcano components from the deployment form (use separate Volcano plugin when ClusterD is enabled), updates deployment form text and default driver/firmware to 25.5.0, and updates verification steps to watch npu-driver in the chosen operator namespace.
Operator-managed ServiceMonitor for NPU Exporter
docs/en/hardware_accelerator/npu/npu_operator/installation.mdx
Replaces manual ServiceMonitor manifest/apply instructions with documentation that the operator auto-creates a ServiceMonitor named npu-exporter-servicemonitor in the operator namespace when NPU Exporter is enabled, and includes a verification command.

Release Notes

Layer / File(s) Summary
v1.2.0 and v1.1.3 release notes
docs/en/hardware_accelerator/npu/npu_operator/release_notes.mdx
Adds v1.2.0 release notes (breaking change to OLM operator bundle, no in-place upgrades from v1.1.3, uninstall/reinstall steps, component stack upgrades including default driver/firmware 25.5.0, and bug fixes) and a v1.1.3 reference section.

Estimated Code Review Effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 In operator fields where documents grow,
Namespace paths and versions gently show,
Monitors spring when exporters wake,
Clear steps now guide each install we make,
A tiny hop for docs—joy in the flow.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: updating NPU Operator installation documentation to align with version 1.2.0 release and reflect the new namespace layout where all operator components run in the operator's namespace.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/npu-operator-1.2-form-update

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/en/hardware_accelerator/npu/npu_operator/installation.mdx`:
- Line 105: The command example uses kubectl with the watch flag piped to grep
which causes buffering and prevents live updates; replace that pipeline with one
of the recommended alternatives: use a label selector with watch (e.g., use
kubectl get pod -l <label>=npu-driver -w if pods have a label), remove -w for a
one-time filtered listing (kubectl get pod | grep npu-driver), or use kubectl
wait to block until the npu-driver pods are Ready (kubectl wait
--for=condition=Ready pod -l <label>=npu-driver --timeout=600s); update the
example line that currently shows "kubectl -n npu-operator get pod -w | grep
npu-driver" to one of these alternatives and mention replacing <label> with the
actual pod label if applicable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 84ef781e-4410-423f-9241-402abb20a6ca

📥 Commits

Reviewing files that changed from the base of the PR and between 41cb8cd and 49fb5c8.

📒 Files selected for processing (1)
  • docs/en/hardware_accelerator/npu/npu_operator/installation.mdx

Comment thread docs/en/hardware_accelerator/npu/npu_operator/installation.mdx
…elease notes

The 1.2.0 release of Alauda Build of NPU Operator changes the delivery
model from cluster plugin (`Marketplace > Cluster Plugins`) to OLM
operator (`Marketplace > OperatorHub`). Adapt the installation page
and add a release notes page mirroring the per-product layout used
by hami-docs.

Installation page:
- Rename Downloading/Uploading sections from "Cluster plugin" to
  "Packages" and enumerate both the operator package (npu-operator)
  and the cluster plugin packages (NFD required, Volcano optional).
- Split installation into two subsections: NFD as a Cluster Plugin
  and NPU Operator via OperatorHub (Install dialog, namespace
  selector, deployment form).
- Bump default Driver Version to 25.5.0 and rename the row to match
  the form label; add a note that all operator-managed pods land in
  the operator namespace and that Volcano components are absent.
- Verification step 1 switches from the Cluster plugin page to the
  OperatorHub details page / Installed Operators view.
- Verification step 2 watches the npu-driver pod in the operator
  namespace (was kube-system) with a note for non-default namespaces.
- Installing Monitor: drop the obsolete manual ServiceMonitor snippet
  (which targeted the wrong namespaces); the operator now auto-
  installs npu-exporter-servicemonitor in its own namespace.

Release notes:
- v1.2.0 mapped to openFuyao npu-operator 1.2.0; headline is the
  cluster-plugin-to-operator delivery change (no in-place upgrade
  from v1.1.3) and the MindCluster/Ascend v7.3.0 stack bump.
- Downstream bug fix highlighted is the npu-exporter ServiceMonitor
  not taking effect; plus the two community 1.1.1 -> 1.2.0 fixes.
- v1.1.3 mapped to openFuyao npu-operator 1.1.1 (MindCluster v7.2.RC1,
  cluster-plugin delivery).
@JounQin JounQin force-pushed the docs/npu-operator-1.2-form-update branch from 47e94eb to aec4888 Compare May 14, 2026 11:06
@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying alauda-container-platform with  Cloudflare Pages  Cloudflare Pages

Latest commit: aec4888
Status: ✅  Deploy successful!
Preview URL: https://56806856.alauda-container-platform.pages.dev
Branch Preview URL: https://docs-npu-operator-1-2-form-u-6gq3.alauda-container-platform.pages.dev

View logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant