OCPEDGE-2280: mutable topology#2008
Conversation
|
@jeff-roche: This pull request references OCPEDGE-2280 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target either version "5.0." or "openshift-5.0.", but it targets "openshift-4.22" instead. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
438e03c to
98a1ba4
Compare
brandisher
left a comment
There was a problem hiding this comment.
I'm missing a "why" statement covering why a day 2, out-of-payload operator is the right choice for this. The CVO section towards the bottom hints at the why a bit but more explicit detail is needed.
With that in mind, I haven't reviewed the EP fully because I don't understand why this is the approach we're taking. The assessment of CVO seems very light and not enough to exclude that as a potential option to meet the goals.
| - The CLI would need direct access to operator internals, violating separation of concerns | ||
| - Error recovery and retry logic is better suited to an operator's reconciliation loop | ||
|
|
||
| ### Controller in CVO |
There was a problem hiding this comment.
Is CVO the only option in the core operators where this might make sense?
There was a problem hiding this comment.
I expanded to include some other operators, none of which fit the bill in my opinion. This is an entirely new process and shoehorning it into another operator that wasn't designed for tackling this type of procedure seems irresponsible to me
There was a problem hiding this comment.
Which operator handles adding nodes to clusters?
There was a problem hiding this comment.
Expanded to include other operators in the Alternatives section. The controller now lives in CCO, which was agreed on in the JoelSpeed/jaypoulz thread.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
@brandisher I've added a new paragraph under the |
JoelSpeed
left a comment
There was a problem hiding this comment.
🤖 Generated with Claude Code
There are significant portions of this proposal that assume behaviour of OpenShift that either doesn't exist, or doesn't work in the way proposed. I'm assuming here that this is hallucination of Claude?
The EP as it stands today doesn't actually make sense for implementation. It also doesn't align with what I thought we had agreed on the architecture call.
Has anyone tried to manually take a cluster and scale up and manually transition from a single replica to multiple replicas? IMO this is the most important next step for this project
What I thought we had agreed:
- To scale from SNO to HA, the user must create two new control plane nodes and join them to the cluster
- On HighlyAvailable topology - KAS, KCM, etcd, etc all get scheduled automatically as static pods on these nodes - I don't see anything that prevents this based on if it's a SNO cluster today, this needs to be checked (it probably should)
- MCO still serves ignition for control plane nodes on SNO, so user needs to create the control plane nodes somehow to ignite from here
- New fields are added to the infrastructure spec to allow the user to say "I intend for this cluster to be HA going forward"
- A controller is added to cluster config operator
- This checks that the precondition of having additional control plane nodes in the cluster is met
- Once the precondition is met, it updates the status to reflect spec
- Operators now react to the change in status and transition from single to HA
- etcd operator promotes learners to full members, quorum goes from 1->3 (I don't know if this guard is in place today, we should add if not)
- KAS/KCM - no change, it already scheduled new KAS/kCM pods
- Others - Those that previously deploy a single replica of their operand now move to 2 replicas, other changes might be needed on a per operator basis, I was expecting those details in the EP but don't see them yet
patrickdillon
left a comment
There was a problem hiding this comment.
I know the scope is limited to baremetal/platform:none, but I know there is interest for mutable topologies in cloud platforms as well so as much as appropriate I would to ensure the design leaves a path forward for those cloud platforms.
Also, like the other enhancement I don't see any mention of mastersSchedulable which affects the calculation for infrastructureTopology. How is the mastersSchedulable field handled/taken into account for this solution?
zaneb
left a comment
There was a problem hiding this comment.
This one looks directionally correct 👍
| OTTO maintains a directed graph of supported transitions. For the initial implementation: | ||
|
|
||
| ```text | ||
| SingleReplica (SNO, platform: none) → HighlyAvailable (3-node compact) |
There was a problem hiding this comment.
I think it's a mistake to define the supported topologies in terms of the controlPlaneTopology field. There are at least 6 use cases I can think of that users have articulated:
- single-node (1 schedulable control plane, 0+ workers, no load balancer)
- compact (3 schedulable control plane, 0+ workers)
- standby (3 non-schedulable control plane, 0 workers)
- HA (3 non-schedulable control plane, 2+ workers)
- TNA (2 non-schedulable control plane, 1 arbiter, 2+ workers)
- TNF (2 schedulable control plane w/ STONITH, 0 workers)
There was a problem hiding this comment.
I've expanded the detail around CP and infra topology, as well as some validation rules around number of workers. For the first pass, we will report an error prior to transitioning if there are any worker nodes.
There was a problem hiding this comment.
Can you point me to this expansion? I have the same question as Zane still having re-read the EP. This IMO needs more expansion unless I missed a section
There was a problem hiding this comment.
The Terms section now acknowledges the broader topology landscape (TNA, TNF, compact, etc.) and explicitly states this enhancement targets controlPlaneTopology transitions only. The architecture doesn't preclude future expansion to other configurations. The field is named desiredControlPlaneTopology to make this scope explicit.
| 10. OTTO updates the Infrastructure status fields: | ||
| - `controlPlaneTopology` transitions from `SingleReplica` to `HighlyAvailable` | ||
| - `infrastructureTopology` transitions from `SingleReplica` to `HighlyAvailable` | ||
| 11. Operators reconcile against the new topology values and adjust their deployment strategies, replica counts, and placement policies |
There was a problem hiding this comment.
Are we going to try to e.g. restart OLM operators (which previously have treated the topology as fixed)?
There was a problem hiding this comment.
Do you have a view into how many/which olm operators are reading this value? Are they reading it at startup, or watching the resource? The expected pattern would be that the operator sees the change, and then reacts by updating the operand (e.g. scaling from 1 to 2 replicas now that it's been told the cluster is HA)
There was a problem hiding this comment.
OLM operator behavior is listed as an open question. Operators that watch the infrastructure CR should react automatically. Those that read at startup may need a restart. The scope of affected operators needs investigation.
|
Big update coming next week to realign this with CCO instead of a dedicated operator, add some more technical detail around the flow, and address masters schedulable. Thank you everyone for the quick and thorough reviews, I believe we are rapidly converging on a solid solution! |
Introduce the Mutable Topology enhancement, which replaces the previous Adaptable Topology proposal. Instead of a new topology enum that all operators must interpret, this approach uses a dedicated operator (OTTO) to orchestrate transitions between existing fixed topology modes. Initial scope: SNO to HA compact on platform: none. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move the topology transition controller from a standalone operator (OTTO) into cluster-config-operator. CCO owns the config.openshift.io API group and infrastructure CR lifecycle, making it the natural home. Key design decisions: - desiredTopology initialized by installer to match controlPlaneTopology (no kubebuilder default — value is cluster-specific) - Controller triggers on desiredTopology != status.controlPlaneTopology - On failure, controller resets desiredTopology to current topology - Upgrade blocked via Upgradeable=False during transitions - Condition types: TopologyTransitionProgressing, Completed, Failed - Per-operator topology audit required for Dev Preview entry Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
a8d48b3 to
22b3682
Compare
|
Are there limitations for a SNO to TNF transition ? TNF requires BMC/Redfish so if the SNO bare metal hardware does not have it, does it block the transition? I could see this being a problem trying to match hardware in general (BMC firmware versions, vendor types, etc. ). |
JoelSpeed
left a comment
There was a problem hiding this comment.
This is much better than the previous iteration. I still fee like there's some disconnect between the new and old stuff, some stuff may still be hanging over from the previous iteration that doesn't quite make sense now, PTAL at my comments
| This enhancement enables OpenShift clusters to transition between topology modes as a Day 2 operation. This changes the existing OpenShift assumption that topologies are immutable after installation. | ||
|
|
||
| A new `desiredTopology` field in the infrastructure spec expresses the administrator's intent to transition. A topology transition controller in cluster-config-operator watches for changes to this field, validates preconditions, coordinates the transition, and updates the existing topology status fields when the cluster is ready. | ||
| A new `oc adm transition topology` CLI command provides an interface for cluster administrators to initiate transitions. |
There was a problem hiding this comment.
Is this a common addition to the CLI? I have nothing against extending the CLI, but do question if it is strictly required
There was a problem hiding this comment.
I think it is not strictly required, this is more of a usability thing. In theory a cluster admin could go in and update the desired topology and manually monitor progress but that might feel disconnected. Through the CLI we could give some structure to the process
There was a problem hiding this comment.
The CLI is a convenience, not strictly required. An administrator could patch spec.desiredControlPlaneTopology directly. The CLI follows the oc adm upgrade pattern — validates preconditions client-side and patches the CR.
Address review feedback from brandisher, JoelSpeed, zaneb, patrickdillon, DanielFroehlich, and dhensel-rh across 6 categories: API design: - Rename desiredTopology to desiredControlPlaneTopology - Make field empty by default (installer does not populate) - Replace CEL validation with DesiredTopologyMode named type - Drop ValidatingAdmissionPolicy for status field protection - Document spec-to-status mapping (CP topology, infra topology, mastersSchedulable) - Add worker node precondition check (compact clusters only) Workflow: - Clarify node-driven vs topology-driven operator reactions - CLI returns immediately; monitoring is separate - Failure handling uses standard K8s retry pattern (no spec reset) - Transition conditions on CCO ClusterOperator status (not infra CR) - Cancel semantics: only before status update (step 9) - CEO etcd scaling is independent (unsafe scaling path), not orchestrated by transition controller Accuracy: - CEO rollback replaced with manual quorum-restore.sh throughout - "type alias" corrected to "named type" (Go terminology) - Upgradeable=False blocks upgrades, not all version changes - "30+ operators" claim removed (uncited) - mastersSchedulable clarified as unchanged for SNO to HA compact Scope: - IBI clusters excluded (non-goal + topology considerations) - Platform:none rationale expanded (edge computing context) - Baremetal risk reframed as future scope - Backup compatibility added as open question Content: - Concrete failure examples (quorum loss, node readiness, operator reconciliation) - SLO dimensions defined in GA graduation criteria - Operational guidance updated (no availability guarantee, backup recommendation) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@jeff-roche: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
|
||
| 2. **Topology transition controller in cluster-config-operator** — A new controller in CCO that watches the infrastructure CR for `desiredControlPlaneTopology` changes, validates preconditions, coordinates the transition, and updates the status topology fields when the cluster is ready for the new mode. | ||
|
|
||
| 3. **`oc adm transition topology` CLI command** — A command that validates preconditions, patches `spec.desiredControlPlaneTopology` on the infrastructure CR, and returns immediately. |
There was a problem hiding this comment.
As of now this is mainly focused on control plane topology, but as this becomes more developed would we want to have this be under an umbrella term to allow easier additions and more logical groupings, wdyt?
spec:
topology:
desiredControlPlaneTopology: "HighlyAvailable"
desiredInfraTopology: "Single" # Possible future thingsThere was a problem hiding this comment.
Would we need desired or would it being located under spec imply desired? I'm wondering if we could just match the status field names and let spec imply desired. In terms of adding it to a topology object, I'm fine with it but I'm also curious @JoelSpeed's thoughts
Summary
Introduces the Mutable Topology enhancement proposal, which enables OpenShift clusters to transition between topology modes as a Day 2 operation. This replaces the previous Adaptable Topology proposal.
Key Design Decisions
spec.desiredTopologyon the Infrastructure CR, validates preconditions, coordinates the transition across operators, and updates topology status fields when complete. CCO was chosen over CVO, CEO, and MCO (and over a standalone operator) because it owns theconfig.openshift.ioAPI group and the Infrastructure CR lifecycle. See Alternatives in the proposal for the full placement analysis.TopologyModevalues (SingleReplica,HighlyAvailable, etc.). Operators continue reacting to fixed topology values they already understand. Transition complexity is concentrated in a single controller rather than distributed across 30+ operators.spec.desiredTopologyexpresses administrator intent;status.controlPlaneTopologyreflects observed state. Mirrors theoc adm upgradepattern (patch spec, controller does the work).MutableTopologygate progresses through DevPreview → TechPreview → GA. Controller is not registered when the gate is disabled (zero runtime overhead).Scope
platform: noneoc adm transition topology HighlyAvailabledesiredTopology; ValidatingAdmissionPolicy (fail-closed) protects topology status fields from direct edits outside CCOdesiredTopologyon failure (deliberate spec mutation to prevent infinite retry loops); CEO attempts etcd rollbackUpgradeable=Falsewhile a transition is in progressWhat Changed (Revision History)
The proposal was revised to base the controller in CCO rather than proposing a dedicated standalone operator (OTTO). Key changes from the prior revision:
desiredTopologyon failure with rationale for the spec-mutation deviationUpgradeable=Falseenforcement during transitions to prevent concurrent upgradesOut of Scope
platform: baremetal— pending keepalived resolution🤖 Generated with Claude Code