Skip to content

OCPEDGE-2280: mutable topology#2008

Open
jeff-roche wants to merge 3 commits into
openshift:masterfrom
jeff-roche:mutable-topology
Open

OCPEDGE-2280: mutable topology#2008
jeff-roche wants to merge 3 commits into
openshift:masterfrom
jeff-roche:mutable-topology

Conversation

@jeff-roche
Copy link
Copy Markdown

@jeff-roche jeff-roche commented May 11, 2026

Summary

Introduces the Mutable Topology enhancement proposal, which enables OpenShift clusters to transition between topology modes as a Day 2 operation. This replaces the previous Adaptable Topology proposal.

Key Design Decisions

  • Controller in cluster-config-operator (CCO) — A new topology transition controller in CCO watches spec.desiredTopology on the Infrastructure CR, validates preconditions, coordinates the transition across operators, and updates topology status fields when complete. CCO was chosen over CVO, CEO, and MCO (and over a standalone operator) because it owns the config.openshift.io API group and the Infrastructure CR lifecycle. See Alternatives in the proposal for the full placement analysis.
  • No new topology enum values — Transitions move between existing TopologyMode values (SingleReplica, HighlyAvailable, etc.). Operators continue reacting to fixed topology values they already understand. Transition complexity is concentrated in a single controller rather than distributed across 30+ operators.
  • Spec/status contract — Follows the standard Kubernetes pattern: spec.desiredTopology expresses administrator intent; status.controlPlaneTopology reflects observed state. Mirrors the oc adm upgrade pattern (patch spec, controller does the work).
  • Feature-gatedMutableTopology gate progresses through DevPreview → TechPreview → GA. Controller is not registered when the gate is disabled (zero runtime overhead).

Scope

  • Initial transition: SNO → HA compact (3-node) on platform: none
  • CLI: oc adm transition topology HighlyAvailable
  • Admission control: CEL validation on desiredTopology; ValidatingAdmissionPolicy (fail-closed) protects topology status fields from direct edits outside CCO
  • etcd scaling: CEO handles sequential 1→2→3 member scaling via existing learner-to-voter promotion
  • Failure handling: Controller resets desiredTopology on failure (deliberate spec mutation to prevent infinite retry loops); CEO attempts etcd rollback
  • Upgrade safety: CCO sets Upgradeable=False while a transition is in progress

What Changed (Revision History)

The proposal was revised to base the controller in CCO rather than proposing a dedicated standalone operator (OTTO). Key changes from the prior revision:

  • Controller placement moved from a standalone operator to CCO, with full alternatives analysis (CVO, CEO, MCO, standalone operator, CLI-only)
  • Added ValidatingAdmissionPolicy for topology status field protection (fail-closed)
  • Added detailed failure handling: controller resets desiredTopology on failure with rationale for the spec-mutation deviation
  • Expanded graduation criteria with per-operator topology dependency matrix requirement
  • Added monitoring/telemetry requirements (Prometheus metrics, alerts) for GA graduation
  • Added Support Procedures section with team ownership, detection, and recovery procedures
  • Clarified etcd scaling risks: the 2-voter intermediate state is unique to Day 2 transitions (does not occur during bootstrapping)
  • Added Upgradeable=False enforcement during transitions to prevent concurrent upgrades

Out of Scope

  • Bidirectional transitions (HA → SNO)
  • HyperShift / hosted control planes
  • MicroShift
  • Automatic node provisioning
  • Cloud platforms (AWS, Azure, GCP) — design does not preclude future support
  • platform: baremetal — pending keepalived resolution

🤖 Generated with Claude Code

@openshift-ci openshift-ci Bot requested review from bn222 and cooktheryan May 11, 2026 19:46
@jeff-roche jeff-roche changed the title enhancements/topologies: mutable topology enhancement proposal OCPEDGE-2280: mutable topology enhancement proposal May 11, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 11, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 11, 2026

@jeff-roche: This pull request references OCPEDGE-2280 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target either version "5.0." or "openshift-5.0.", but it targets "openshift-4.22" instead.

Details

In response to this:

Summary

  • Introduces the Mutable Topology enhancement, replacing the previous Adaptable Topology proposal
  • Proposes a new optional payload operator (OTTO) to orchestrate topology transitions between existing fixed topology modes, rather than adding a new topology enum
  • Initial scope: SNO to HA compact (3-node) on platform: none

Test plan

  • markdownlint passes (markdownlint-cli2)
  • Reviewer feedback from control plane, API, and architecture teams
  • Template structure validated against guidelines/enhancement_template.md

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jeff-roche jeff-roche changed the title OCPEDGE-2280: mutable topology enhancement proposal OCPEDGE-2280: mutable topology May 11, 2026
@jeff-roche
Copy link
Copy Markdown
Author

Copy link
Copy Markdown
Contributor

@brandisher brandisher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing a "why" statement covering why a day 2, out-of-payload operator is the right choice for this. The CVO section towards the bottom hints at the why a bit but more explicit detail is needed.

With that in mind, I haven't reviewed the EP fully because I don't understand why this is the approach we're taking. The assessment of CVO seems very light and not enough to exclude that as a potential option to meet the goals.

Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
- The CLI would need direct access to operator internals, violating separation of concerns
- Error recovery and retry logic is better suited to an operator's reconciliation loop

### Controller in CVO
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is CVO the only option in the core operators where this might make sense?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expanded to include some other operators, none of which fit the bill in my opinion. This is an entirely new process and shoehorning it into another operator that wasn't designed for tackling this type of procedure seems irresponsible to me

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which operator handles adding nodes to clusters?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded to include other operators in the Alternatives section. The controller now lives in CCO, which was agreed on in the JoelSpeed/jaypoulz thread.

Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 12, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from dgoodwin. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jeff-roche
Copy link
Copy Markdown
Author

jeff-roche commented May 12, 2026

I'm missing a "why" statement covering why a day 2, out-of-payload operator is the right choice for this. The CVO section towards the bottom hints at the why a bit but more explicit detail is needed.

With that in mind, I haven't reviewed the EP fully because I don't understand why this is the approach we're taking. The assessment of CVO seems very light and not enough to exclude that as a potential option to meet the goals.

@brandisher I've added a new paragraph under the ## Proposal header that explains the why. If you're looking for something specifically beyond what I added, I'd be happy to add some more detail

Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Copy link
Copy Markdown
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Generated with Claude Code

There are significant portions of this proposal that assume behaviour of OpenShift that either doesn't exist, or doesn't work in the way proposed. I'm assuming here that this is hallucination of Claude?

The EP as it stands today doesn't actually make sense for implementation. It also doesn't align with what I thought we had agreed on the architecture call.

Has anyone tried to manually take a cluster and scale up and manually transition from a single replica to multiple replicas? IMO this is the most important next step for this project

What I thought we had agreed:

  • To scale from SNO to HA, the user must create two new control plane nodes and join them to the cluster
    • On HighlyAvailable topology - KAS, KCM, etcd, etc all get scheduled automatically as static pods on these nodes - I don't see anything that prevents this based on if it's a SNO cluster today, this needs to be checked (it probably should)
    • MCO still serves ignition for control plane nodes on SNO, so user needs to create the control plane nodes somehow to ignite from here
  • New fields are added to the infrastructure spec to allow the user to say "I intend for this cluster to be HA going forward"
  • A controller is added to cluster config operator
    • This checks that the precondition of having additional control plane nodes in the cluster is met
    • Once the precondition is met, it updates the status to reflect spec
  • Operators now react to the change in status and transition from single to HA
    • etcd operator promotes learners to full members, quorum goes from 1->3 (I don't know if this guard is in place today, we should add if not)
    • KAS/KCM - no change, it already scheduled new KAS/kCM pods
    • Others - Those that previously deploy a single replica of their operand now move to 2 replicas, other changes might be needed on a per operator basis, I was expecting those details in the EP but don't see them yet

Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Copy link
Copy Markdown
Contributor

@patrickdillon patrickdillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the scope is limited to baremetal/platform:none, but I know there is interest for mutable topologies in cloud platforms as well so as much as appropriate I would to ensure the design leaves a path forward for those cloud platforms.

Also, like the other enhancement I don't see any mention of mastersSchedulable which affects the calculation for infrastructureTopology. How is the mastersSchedulable field handled/taken into account for this solution?

Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Copy link
Copy Markdown
Member

@zaneb zaneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one looks directionally correct 👍

Comment thread enhancements/topologies/mutable-topology.md Outdated
OTTO maintains a directed graph of supported transitions. For the initial implementation:

```text
SingleReplica (SNO, platform: none) → HighlyAvailable (3-node compact)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a mistake to define the supported topologies in terms of the controlPlaneTopology field. There are at least 6 use cases I can think of that users have articulated:

  • single-node (1 schedulable control plane, 0+ workers, no load balancer)
  • compact (3 schedulable control plane, 0+ workers)
  • standby (3 non-schedulable control plane, 0 workers)
  • HA (3 non-schedulable control plane, 2+ workers)
  • TNA (2 non-schedulable control plane, 1 arbiter, 2+ workers)
  • TNF (2 schedulable control plane w/ STONITH, 0 workers)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've expanded the detail around CP and infra topology, as well as some validation rules around number of workers. For the first pass, we will report an error prior to transitioning if there are any worker nodes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me to this expansion? I have the same question as Zane still having re-read the EP. This IMO needs more expansion unless I missed a section

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Terms section now acknowledges the broader topology landscape (TNA, TNF, compact, etc.) and explicitly states this enhancement targets controlPlaneTopology transitions only. The architecture doesn't preclude future expansion to other configurations. The field is named desiredControlPlaneTopology to make this scope explicit.

Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md
10. OTTO updates the Infrastructure status fields:
- `controlPlaneTopology` transitions from `SingleReplica` to `HighlyAvailable`
- `infrastructureTopology` transitions from `SingleReplica` to `HighlyAvailable`
11. Operators reconcile against the new topology values and adjust their deployment strategies, replica counts, and placement policies
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to try to e.g. restart OLM operators (which previously have treated the topology as fixed)?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a view into how many/which olm operators are reading this value? Are they reading it at startup, or watching the resource? The expected pattern would be that the operator sees the change, and then reacts by updating the operand (e.g. scaling from 1 to 2 replicas now that it's been told the cluster is HA)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OLM operator behavior is listed as an open question. Operators that watch the infrastructure CR should react automatically. Those that read at startup may need a restart. The scope of affected operators needs investigation.

Comment thread enhancements/topologies/mutable-topology.md
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
@jeff-roche
Copy link
Copy Markdown
Author

jeff-roche commented May 15, 2026

Big update coming next week to realign this with CCO instead of a dedicated operator, add some more technical detail around the flow, and address masters schedulable. Thank you everyone for the quick and thorough reviews, I believe we are rapidly converging on a solid solution!

Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
jeff-roche and others added 2 commits May 18, 2026 12:15
Introduce the Mutable Topology enhancement, which replaces the
previous Adaptable Topology proposal. Instead of a new topology
enum that all operators must interpret, this approach uses a
dedicated operator (OTTO) to orchestrate transitions between
existing fixed topology modes. Initial scope: SNO to HA compact
on platform: none.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move the topology transition controller from a standalone operator
(OTTO) into cluster-config-operator. CCO owns the config.openshift.io
API group and infrastructure CR lifecycle, making it the natural home.

Key design decisions:
- desiredTopology initialized by installer to match controlPlaneTopology
  (no kubebuilder default — value is cluster-specific)
- Controller triggers on desiredTopology != status.controlPlaneTopology
- On failure, controller resets desiredTopology to current topology
- Upgrade blocked via Upgradeable=False during transitions
- Condition types: TopologyTransitionProgressing, Completed, Failed
- Per-operator topology audit required for Dev Preview entry

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread enhancements/topologies/mutable-topology.md
@dhensel-rh
Copy link
Copy Markdown
Contributor

Are there limitations for a SNO to TNF transition ? TNF requires BMC/Redfish so if the SNO bare metal hardware does not have it, does it block the transition? I could see this being a problem trying to match hardware in general (BMC firmware versions, vendor types, etc. ).

Copy link
Copy Markdown
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much better than the previous iteration. I still fee like there's some disconnect between the new and old stuff, some stuff may still be hanging over from the previous iteration that doesn't quite make sense now, PTAL at my comments

Comment thread enhancements/topologies/mutable-topology.md Outdated
This enhancement enables OpenShift clusters to transition between topology modes as a Day 2 operation. This changes the existing OpenShift assumption that topologies are immutable after installation.

A new `desiredTopology` field in the infrastructure spec expresses the administrator's intent to transition. A topology transition controller in cluster-config-operator watches for changes to this field, validates preconditions, coordinates the transition, and updates the existing topology status fields when the cluster is ready.
A new `oc adm transition topology` CLI command provides an interface for cluster administrators to initiate transitions.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a common addition to the CLI? I have nothing against extending the CLI, but do question if it is strictly required

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not strictly required, this is more of a usability thing. In theory a cluster admin could go in and update the desired topology and manually monitor progress but that might feel disconnected. Through the CLI we could give some structure to the process

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI is a convenience, not strictly required. An administrator could patch spec.desiredControlPlaneTopology directly. The CLI follows the oc adm upgrade pattern — validates preconditions client-side and patches the CR.

Comment thread enhancements/topologies/mutable-topology.md
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Comment thread enhancements/topologies/mutable-topology.md Outdated
Address review feedback from brandisher, JoelSpeed, zaneb,
patrickdillon, DanielFroehlich, and dhensel-rh across 6 categories:

API design:
- Rename desiredTopology to desiredControlPlaneTopology
- Make field empty by default (installer does not populate)
- Replace CEL validation with DesiredTopologyMode named type
- Drop ValidatingAdmissionPolicy for status field protection
- Document spec-to-status mapping (CP topology, infra topology,
  mastersSchedulable)
- Add worker node precondition check (compact clusters only)

Workflow:
- Clarify node-driven vs topology-driven operator reactions
- CLI returns immediately; monitoring is separate
- Failure handling uses standard K8s retry pattern (no spec reset)
- Transition conditions on CCO ClusterOperator status (not infra CR)
- Cancel semantics: only before status update (step 9)
- CEO etcd scaling is independent (unsafe scaling path), not
  orchestrated by transition controller

Accuracy:
- CEO rollback replaced with manual quorum-restore.sh throughout
- "type alias" corrected to "named type" (Go terminology)
- Upgradeable=False blocks upgrades, not all version changes
- "30+ operators" claim removed (uncited)
- mastersSchedulable clarified as unchanged for SNO to HA compact

Scope:
- IBI clusters excluded (non-goal + topology considerations)
- Platform:none rationale expanded (edge computing context)
- Baremetal risk reframed as future scope
- Backup compatibility added as open question

Content:
- Concrete failure examples (quorum loss, node readiness, operator
  reconciliation)
- SLO dimensions defined in GA graduation criteria
- Operational guidance updated (no availability guarantee, backup
  recommendation)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

@jeff-roche: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.


2. **Topology transition controller in cluster-config-operator** — A new controller in CCO that watches the infrastructure CR for `desiredControlPlaneTopology` changes, validates preconditions, coordinates the transition, and updates the status topology fields when the cluster is ready for the new mode.

3. **`oc adm transition topology` CLI command** — A command that validates preconditions, patches `spec.desiredControlPlaneTopology` on the infrastructure CR, and returns immediately.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now this is mainly focused on control plane topology, but as this becomes more developed would we want to have this be under an umbrella term to allow easier additions and more logical groupings, wdyt?

spec:
   topology:
        desiredControlPlaneTopology: "HighlyAvailable"
        desiredInfraTopology: "Single" # Possible future things

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we need desired or would it being located under spec imply desired? I'm wondering if we could just match the status field names and let spec imply desired. In terms of adding it to a topology object, I'm fine with it but I'm also curious @JoelSpeed's thoughts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.