Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
173 changes: 173 additions & 0 deletions cilium/CFP-42453-oci-cloud-provider-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
# CFP-42453: OCI Cloud Provider Design

**SIG: SIG-COMMUNITY**

**Begin Design Discussion:** 2025-10-15

**Cilium Release:** X.XX

**Authors:** Trung Nguyen <trung.tn.nguyen@oracle.com>

**Status:** Draft

## Summary

This document provides details for integration with OCI Oracle Kubernetes Engine to implement a Cilium Direct Routing solution.

## Motivation

Many customers are requesting support for Cilium on OCI. This feature proposal will provide guidance on various implementation options.

## Goals

* Discuss possible solutions for integrating OCI with Cilium IPAM solutions for Direct Routing
* Determine short term solutions for providing an integration with Cilium
* Determine long term solutions for providing an integration with Cilium where users can specify which VNIC a pod can route out of (see examples below for details)

## Non-Goals

* Timelines of integrations

## Proposal

### Background

OCI Oracle Kubernetes Engine (OKE) provides controls for the user to determine how many VNICs should be attached to a node and which VNIC a pod should route out of.

*Note: The existing OCI OKE solution uses a pre-attach model, where the Cloud Controller Manager preallocates all VNICs/IPs and does not perform any detaches (detaches happen upon node termination), and then the CNI only sets up OS level network routing to the pods*

#### Example 1

A node has 1 VNIC with 256 IPs attached to it. Host and Pod traffic will route out of this VNIC.

```
+--------------------------------------+
| Kubernetes Node |
| |
| +--------+ |
| | VNIC 1 | |
| +--------+ |
| | |
| +--------+ |
| | Pod A | |
| +--------+ |
| |
| Pod A routes traffic via VNIC1 |
+--------------------------------------+
```

#### Example 2

A node has 3 VNIC. Each VNIC has 256 IPs attached. Host traffic routes out of the Primary VNIC (via the default route). And a user can use multus to specify that Pod B can route out of VNIC 2 or VNIC 3 depending on host routing rules.

```
+--------------------------------------------------+
| Kubernetes Node |
| |
| +--------+ +--------+ +--------+ |
| | VNIC 1 | | VNIC 2 | | VNIC 3 | |
| +--------+ +--------+ +--------+ |
| | | | |
| | +------+ | |
| | |Pod B |---------- |
| (Node Traffic) +------+ |
| |
| - Pod B can route traffic via VNIC 2 or VNIC 3 |
| - Node-level traffic exits via VNIC 1 |
+--------------------------------------------------+
```

### Overview

Cilium documents several existing [IPAM solutions](https://docs.cilium.io/en/stable/network/kubernetes/ipam/):

* Out-of-tree solution that attaches an IP CIDR block and sets `v1.node.spec.podCIDR`. Cilium's "kubernetes" IPAM configuration knows how to process `v1.node.spec.podCIDR`
* In-tree solution that extends the Cilium controller to make OCI calls to attach VNICs/IPs and populate IPAM
* Out-of-tree solution that populates the Cilium IPAM Custom Resource
* Out-of-tree Delegated IPAM binary that is compatible with Cilium

#### Kubernetes IPAM solution

There will be a component outside the scope of Cilium (e.g. a leader elected component like Cloud Controller Manager or a per-node daemonset) that attaches a CIDR block to the Primary VNIC of the node and populates the `v1.Node.spec.podCIDR` field.

Cilium has a built-in `kubernetes` IPAM solution, which provides a simple, cloud agnostic solution (implementation-wise, the solution just populates the v1.Node object, so it does not make any assumptions about what CNI is being used).

However, only 1 CIDR block is attachable (`v1.Node.spec.podCIDRs` does not allow you to attach multiple blocks, except to have one IPv4 block and IPv6 block). This solution will not be compatible with requirements for using multiple VNICs (OCI has a basic (non-Cilium) offering that supports separating node traffic from pod traffic onto separate VNICs).

Pros:
- Very Simple
- All Cloud Provider changes are CNI Agnostic

Cons:
- Only one CIDR block can be attached
- Will not fit the multi-VNIC model

#### In-tree Extending the Cilium Operator

The CiliumNode CRD will be updated to have Oracle related fields and the Cilium Operator will be extended to have access to the OCI-Go-SDK. It will perform the VNIC/IP attaches and the CNI IPAM will be updated to know how to route out of specific VNICs.

Pros:
- can support a multi-VNIC model
- requires change in a single component (Cilium Operator)

Cons:
- requires changes/coordination between Cloud Provider and Cilium

#### Out-of-tree solution to populate Cilium IPAM Custom Resource

*Note: This is similar to the previous solution. However, since OKE already has a process to attach VNICs/IPs, we can use this existing functionality.*

The Cilium Agent will generate the Custom Resource object (via `--auto-create-cilium-node-resource`), and a component outside the scope of Cilium (e.g. the Cloud Controller Manager) will attach IP Addresses and populate the Custom Resource objects.

*Note: This solution will require a CNI change to add the option to choose which VNIC to route out of*

Pros:
- with CNI changes, this solution can support a multi-VNIC model

Cons:
- requires changes in two separate components (Cloud Controller Manager and CNI)

#### Out-of-tree Delegated IPAM binary

OCI OKE already has a built-in process (in the Cloud Controller Manager) to attach the appropriate VNICs/IPs. OCI OKE's existing CNI IPAM plugin has the capability to choose a VNIC to route out of and would be modified to become compatible with Cilium as a delegated IPAM plugin.

Pros:
- can support a multi-VNIC model
- requires change in a single component (CNI binary)

Cons:
- requires changes/coordination between Cloud Provider and Cilium to test IPAM

## Impacts / Key Questions

_List crucial impacts and key questions. They likely require discussion and are required to understand the trade-offs of the CFP. During the lifecycle of a CFP, discussion on design aspects can be moved into this section. After reading through this section, it should be possible to understand any potentially negative or controversial impact of this CFP. It should also be possible to derive the key design questions: X vs Y._

### Impact: Integration with OCI

### Impact: Increased Maintenanc

Depending on the chosen solution, there may be additional maintenance and coordination required.

For in-tree solutions, the controller will need to be updated.

For out-of-tree solutions, there may be additional integration testing.

### Key Question: Cilium Testing

How does Cilium perform CNI testing? What solution will require the least amount of maintenance/coordination?

### Key Question: Cloud Provider Support Model

What does the support model look like for Cloud Providers for integrated pieces, like the Cilium Operator?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will let Cilium maintainers correct me, but from what I've seen running Cilium on multiple Cloud Providers over the years is the following:

  • For the Cilium AWS Operator, AWS itself doesn't seem to be involved (I believe they prefer to invest into https://github.com/aws/amazon-vpc-cni-k8s). So the burden of maintaining the AWS Operator falls on Isovalent + the users. We (Datadog) have been very active in maintaining the Operator because we heavily rely on it. This seems to be the case for other big users as well, such as Palantir.
  • For Azure, I believe Azure used to maintain the Operator but then decided to move to the Delegated IPAM model so that they can maintain their code out of tree: https://github.com/Azure/azure-container-networking. We still use the Azure Operator and are not planning to switch to the Delegated IPAM model, so we actively maintain it in Cilium. I don't really know if there are any other big users.
  • For GCP, they have implemented the first option you listed in this doc ("Kubernetes IPAM solution"), so they maintain their code in CCM: https://github.com/kubernetes/cloud-provider-gcp/tree/master/cmd/cloud-controller-manager

There's also the Alibaba Cloud integration but I don't really have context around it

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall any Azure engineers being involved in the Azure IPAM implementation in Cilium. I believe the story was similar for both AWS and Azure with a combination of Isovalent and community user participants in developing the implementation.


On the topic of the highlighted text from this thread, what do you mean by "support model"?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think Azure contributed to the Operator, our integrations started with delegated IPAM (I have worked on this from the Azure side since it was just an idea we had).

@antonipp have you had problems with delegated IPAM? Can you elaborate on your experience with delegated IPAM wrt this:

However, as a user, it really makes life much harder: the delegated IPAM model adds way more complexity.

Azure delegated IPAM is only available in AKS where we manage everything. I expect OCI OKE would offer a similar managed experience. It's not intended for a self-managed cluster, but only because it requires special access to the networking controlplane only available in AKS so that we can offer advanced fabric integration (eg for Azure Overlay networking).

However, I built lots of the delegated IPAM implementation for AKS and I don't think it's more complex than operating Cilium already is. We run one additional daemonset that drops the azure-ipam binary and the CNI conflist that plugs it in. It would be trivial for you to install, operate, and debug, if you can do these for Cilium already.

For me, there's a clear separation of responsibilities - Cilium does the node-local CNI things it's good at, and the cloud provider owns the tight native integration. Doing this through standard interfaces is the only way that makes sense - Cilium talking to the cloud provider(s) directly is never going to be maintained in a way that makes everyone happy.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, we have not tried running delegated IPAM in our infra because as you mentioned it looks like it wasn't designed for self-managed clusters but only for AKS. I thought about trying it out because at one point there were talks about deprecating the Azure Operator IPAM model which we were using. The only reason why delegated IPAM looked more complex to me is that it indeed requires more moving parts, i.e. the deamonset + the binary and it makes things way different from what we already have in AWS and Azure, where it's exactly the same model with the Cilium Operator managing everything.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cilium does the node-local CNI things it's good at, and the cloud provider owns the tight native integration.

This was my original thought as well, and is kind of how the initial (non multi-vnic) solution will work, where OCI OKE will attach the IPs, give them to Cilium, and let Cilium do all of its "node-local CNI things". The delegated IPAM would have a similar frame of mind, where the IPAM would have the ability to select which IP to use (e.g. which VNIC to pick an IP from), and pass that onto Cilium.

Although I think DRA (mentioned in the thread) may provide a way for users to specify a multi-nic approach when it provides consumable-capacity (using DRA, a pod could request 2 different VNIC devices).


### Key Question: Multi-VNIC Solution

Is there any negative impact from implementing a Multi-VNIC solution?

## Future Milestones

### Multi-VNIC Solution

Long term, OKE will provide a multi-NIC solution, where users can route pods out of VNICs.

TODO: Provide more details on this