Skip to content

CORENET-6665: Virtual Private Clouds#1967

Draft
tssurya wants to merge 3 commits intoopenshift:masterfrom
tssurya:vpcs
Draft

CORENET-6665: Virtual Private Clouds#1967
tssurya wants to merge 3 commits intoopenshift:masterfrom
tssurya:vpcs

Conversation

@tssurya
Copy link
Copy Markdown
Contributor

@tssurya tssurya commented Apr 4, 2026

This PR adds the design/architecture for modelling VPCs in OpenShift

tssurya added 2 commits March 8, 2026 17:43
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 4, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign abhat for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot requested review from abhat and danwinship April 4, 2026 12:30
@tssurya tssurya changed the title Virtual Private Clouds CORENET-6665: Virtual Private Clouds Apr 4, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 4, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 4, 2026

@tssurya: This pull request references CORENET-6665 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR adds the design/architecture for modelling VPCs in OpenShift

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@tssurya tssurya marked this pull request as draft April 4, 2026 12:34
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 4, 2026
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>

The VPC controller's lifecycle management will be done through Cluster Network Operator
on OpenShift. [TBD] The vpc controller pods will just run on the control plane
of the cluster (management clustr in case of multiple clusters) and create the relevant
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

management cluster

Does not create VPCs directly — uses the namespaces provisioned by the
network administrator.
- **VPC controller**: the automated reconciler that translates VPC intent
into OVN-Kubernetes resources.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we implement a 'pause' or 'skip-reconcile' mechanism? (just like PausedUntil )This would be invaluable for troubleshooting or handling manual overrides in specific edge cases.


Does this proposal implement a behavior that's new/unique/novel? Is it poorly
aligned with existing user expectations? Will it be a significant maintenance
burden? Is it likely to be superceded by something else in the near future?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VPC serves more as a logical business boundary than just a flat platform capability. This abstraction suggests that we should decoupled network services like NLB/SLB and FloatingIPs—should we consider these as part of our future roadmap to complete the networking ecosystem?"

## Motivation

Today, achieving VPC-like isolation on OpenShift requires manually creating and
wiring together multiple low-level networking primitives (C(UDN)s, CNCs,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think everywhere you say "C(UDN)s" you mean "(C)UDNs"?

private, isolated) it had in VMware, so that the migration is transparent
to applications.

![VMware Migration: NSX to OpenShift VPC mapping](images/vmware-migration.png)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is nitpicky, but if you end up having to go back to change any of the images, right now several of them have boxes inside boxes, where the inner and outer boxes both have the same fill pattern, and the inner boxes are transparent, and not all aligned the same way to the outer box, so you get different interference patterns in different places. Eg, in the left green box in this image, VM2 appears to be shaded darker than VM1, even though it's not supposed to be.)

remain in `v1beta1` — the primary user-facing interfaces for the
initial delivery are the **CLI plugin** and **OpenShift Console plugin**.
Users should not need to hand-craft VPC YAML; the CLI and console
are the intended entry points.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"users should not need to hand-craft YAML" does not explain why "the API will remain v1beta1"... unless you are suggesting that we will intentionally try to keep users from using the API directly so that we have more flexibility to make incompatible changes to it later, in which case you should say that.

(But based on "Workflow 3: Direct API (YAML / GitOps)" below it seems like you don't mean that?)


N/A

## Introduction
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to come MUCH sooner. Definitely before the User Stories. Possibly between Summary and Motivation.


### Internet Gateway / NAT Gateway

An **internet gateway** enables communication between VPC resources and the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
An **internet gateway** enables communication between VPC resources and the
An **internet gateway** enables bidirectional communication between VPC resources and the

(or "inbound and outbound" if you want more parallelism with the NAT Gateway definition)


// AvailabilityZone selects the failure domain for a VPC subnet.
// It follows the nested selector pattern used by AdminNetworkPolicy
// (e.g. NamespacedPod groups namespaceSelector + podSelector).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The person reading the API docs does not need to know that the API pattern came from ANP

// the selector receive the namespace + UDN for this subnet.
//
// +kubebuilder:validation:Required
// +required
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if AvailabilityZone is multi-cluster-only then the docs above in VPCSubnet should say that

// plugin) so that all pods in the subnet schedule only on matching
// nodes.
//
// The typical use is AZ pinning — e.g.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inside a struct called AvailabilityZone... AZ pinning is the only use, right?

- **production-app-a**, **production-app-b** (Private): node IP SNAT for outbound traffic
- **production-db-a**, **production-db-b** (Isolated): no external routing, no intra-VPC routing

#### Subnet Immutability
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need validation in the CRD


#### DNS

In the OpenStack world, each tenant network has its own DNS. The VPC model
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean VMware? You don't mention OpenStack anywhere else...

@tssurya
Copy link
Copy Markdown
Contributor Author

tssurya commented Apr 13, 2026

thanks @danwinship / @lance5890 for the reviews! addressing them..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants