NE-2732: Design proposal for Gateway API CRD management mode#2023
NE-2732: Design proposal for Gateway API CRD management mode#2023rikatz wants to merge 7 commits into
Conversation
|
I will help review |
| There is a desire to backport this feature all the way back to | ||
| OpenShift OCP 4.19. The motivation is to allow customers who are |
There was a problem hiding this comment.
Or even OCP 4.18, which would enable a smooth upgrade from 4.18 to 4.19.
There was a problem hiding this comment.
can we go that back on an API backport?
| This enhancement is fully applicable to OKE. Since the | ||
| [gateway-api-without-olm](gateway-api-without-olm.md) enhancement | ||
| enables Gateway API on OKE by eliminating OSSM licensing concerns, | ||
| all three CRD management modes are available on OKE clusters. |
There was a problem hiding this comment.
However, as noted in the "Backport to OCP 4.19" section, we might backport the API to versions of OKE that do rely on an OLM subscription for OSSM.
There was a problem hiding this comment.
but this is not a problem, right?
There was a problem hiding this comment.
The fact that we install OSSM through an OLM subscription shouldn't be a problem. My point is that "the
gateway-api-without-olm enhancement enables Gateway API on OKE by eliminating OSSM licensing concerns" is not relevant when we backport this feature to OCP versions that still depend on OLM for the gateway controller.
There was a problem hiding this comment.
I see. Well, I will add some better wording here given we are again discussing about backporting noOLM
|
|
||
| ## Proposal | ||
|
|
||
| Add a new `gatewayAPI` struct field to the `IngressControllerSpec` |
There was a problem hiding this comment.
A better place for the gatewayAPI field would be ingresses.config.openshift.io or even a new ingresses.operator.openshift.io resource. We previously proposed adding a ingresses.operator.openshift.io resource as a home for OperatorLogLevel in #1314. Whether we can backport the addition of a new resource is an open question.
There was a problem hiding this comment.
I prefer going to what is less friction at this point. If adding a new resource and backporting is a huge friction, I would not do it. As this subject is already "hot" whatever we can make to get less friction I prefer :)
There was a problem hiding this comment.
If we are going for least friction - should we do just do it as an annotation or magical GatewayClass name rather than putting it into an API that isn't quite right? 😄 Playing devils advocate
I wonder if we should have a hybrid approach. A "not so good way, but backports nicely" like tell users to create a "openshift-opt-out-crds" controller-name gatewayclass or an annotation on the the ingresses.config.openshift.io object (yea that's ugly I know). We'd start with and backport this solution.
Then a "proper api" for 5.0 moving forward: ingress.operator.openshift.io. We could delay that design for later, and possibly implement some migration logic, or at least a forced migration (Upgradeable=false) in 5.0 to make sure we can deprecate the "not so good way, but backports nicely".
Definitely more work for us, but does simplify the backporting.
There was a problem hiding this comment.
IIUC API reviewers are fine with us moving with a new API definition, I will go with this path
|
/assign |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@rikatz: This pull request references NE-2732 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
everettraven
left a comment
There was a problem hiding this comment.
Overall, this enhancement looks pretty good. I've only got a few comments on the API side of things.
| ### API Extensions | ||
|
|
||
| This enhancement introduces a **new CRD** in the | ||
| `operator.openshift.io/v1` group. The new resource follows the |
There was a problem hiding this comment.
Just a note, net-new APIs must be introduced in the v1alpha1 group so that we never accidentally ship them by default until the feature is GA.
There was a problem hiding this comment.
@Miciah you told me we could do v1 directly, so leaving the discussion here for you :)
happy to update the EP with the conclusion
There was a problem hiding this comment.
There was a problem hiding this comment.
sounds good! I will change here
| // crdManagementMode specifies how the Cluster Ingress | ||
| // Operator manages Gateway API Custom Resource Definitions | ||
| // (CRDs) and the associated Gateway controller stack. | ||
| // | ||
| // When set to "Managed" (the default), CIO installs, owns, | ||
| // and upgrades the Gateway API CRDs, protects them with a | ||
| // Validating Admission Policy, and deploys the full Gateway | ||
| // controller stack (the Istio instance deployed by CIO, | ||
| // GatewayClass, Gateway resources). This is the only fully | ||
| // supported configuration. | ||
| // | ||
| // When set to "Unmanaged", CIO does not install or manage | ||
| // Gateway API CRDs and does not deploy the Gateway controller | ||
| // stack. The cluster administrator or a third-party product | ||
| // is responsible for providing their own CRDs and Gateway | ||
| // controller. CIO reports observational status only. | ||
| // | ||
| // When set to "ExternalCRDs", CIO does not manage Gateway | ||
| // API CRDs but does deploy the OpenShift Gateway controller | ||
| // stack. The cluster administrator brings their own CRDs. | ||
| // This is an unsupported configuration intended for | ||
| // development and testing. |
There was a problem hiding this comment.
My understanding of the Gateway API is very rudimentary because that isn't my wheelhouse, but the scope of management that the field name implies and what these settings actually align with seems to be a bit at odds.
I wonder if a more generic managementMode is more appropriate here? something like:
apiVersion: operator.openshift.io/v1
kind: Ingress
metadata:
name: cluster
spec:
gatewayAPI:
managementMode: Managed | Unmanaged | ControllersOnly(naming is arbitrary here, feel free to use different mode names).
There was a problem hiding this comment.
no, I actually liked the proposal! Will change here, thanks!
| // When set to "ExternalCRDs", CIO does not manage Gateway | ||
| // API CRDs but does deploy the OpenShift Gateway controller | ||
| // stack. The cluster administrator brings their own CRDs. | ||
| // This is an unsupported configuration intended for | ||
| // development and testing. |
There was a problem hiding this comment.
I'm a bit hesitant to include this mode of operation. It isn't really clear to me what the use case for an end-user setting this mode would be such that they would get some kind of value out of it.
The only thing I can think of is to test if an existing version of the default OpenShift controllers can handle a changed API, but if we are documenting what the supported pairing is by shipping, testing, and vetting a fully-managed option I'm not sure why this would be something an end-user of OpenShift would care to put their cluster in an unsupported state for?
If you folks feel strongly that this should be an option for configuration, I'd prefer we make it abundantly clear via the enum value that this is unsupported with a naming pattern like UnsupportedCRDsOnlyUnmanaged
There was a problem hiding this comment.
this one is mostly about:
- layered products (MCP Gateway, RHOAI) wanting to test new CRDs with the current Istio version - eg.: some fields are promoted on a new version of Gateway API that we didn't vetted yet, and users want to test it
- users that previously deployed their CRD and are willing to upgrade and move toward ours but the process of upgrade means going through the upgrade path
btw on the original proposal from @knobunc this is exactly how it is worded and I am fine making the "Unsupported" word explicit part of the name:
[Possibly] No CRDs, but unsupported Gateway: If you install GWAPI and configure the OpenShift GatewayClass and Gateway, we would run our gateway, but this would be an unsupported configuration
| // deploys the full Gateway controller stack (the Istio | ||
| // instance deployed by CIO, GatewayClass, Gateway). This is | ||
| // the default mode and the only fully supported configuration. | ||
| ManagedGatewayAPICRDs GatewayAPIManagementMode = "Managed" |
There was a problem hiding this comment.
General convention is to use the pattern {alias}{representation}, so:
| ManagedGatewayAPICRDs GatewayAPIManagementMode = "Managed" | |
| GatewayAPIManagementModeManaged GatewayAPIManagementMode = "Managed" |
| // GatewayClass, Gateway). The customer brings their own | ||
| // CRDs. This is an UNSUPPORTED configuration useful for | ||
| // development, testing, or advanced users. | ||
| ControllersOnlyGatewayAPICRDs GatewayAPIManagementMode = "ControllersOnly" |
There was a problem hiding this comment.
If we are going to stick with allowing this mode to be configured, and it is truly an unsupported configuration, please make it abundantly clear by prefixing this mode with Unsupported.
It makes it abundantly clear to any future readers of the configuration that they are in an unsupported configuration mode and it makes users think twice because they have to type out the word Unsupported before flipping to that mode.
| ControllersOnlyGatewayAPICRDs GatewayAPIManagementMode = "ControllersOnly" | |
| GatewayAPIManagementModeUnsupportedControllersOnly GatewayAPIManagementMode = "UnsupportedControllersOnly" |
| The following conditions are set within | ||
| `status.gatewayAPI.conditions`: | ||
|
|
||
| | Condition Type | Status | Reason | Description | | ||
| |---|---|---|---| | ||
| | `CRDsManaged` | `True` | `ManagedByCIO` | CIO is actively managing CRDs | | ||
| | `CRDsManaged` | `False` | `Unmanaged` | Administrator chose Unmanaged mode | | ||
| | `CRDsManaged` | `False` | `ControllersOnly` | Administrator chose ControllersOnly mode (unsupported) | | ||
| | `CRDsPresent` | `True` | `CRDsFound` | Gateway API CRDs are present on the cluster | | ||
| | `CRDsPresent` | `False` | `CRDsNotFound` | Gateway API CRDs are not present on the cluster | | ||
| | `CRDsCompliant` | `True` | `VersionMatch` | Installed CRDs match the expected version | | ||
| | `CRDsCompliant` | `False` | `VersionMismatch` | Installed CRDs do not match the expected version. Message includes the expected and actual versions. | | ||
| | `CRDsCompliant` | `Unknown` | `NotApplicable` | Compliance check is not applicable (e.g., Unmanaged mode with no CRDs present) | |
There was a problem hiding this comment.
Just a note, it would be helpful to include this information in some form in the GoDoc for the conditions field of the status so that end-users can have this same information even when using something like kubectl explain ... to get the field documentation of the CR
|
@rikatz: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This proposal establishes a new knob to allow users to manage their Gateway API CRDs within an OCP cluster, establishing 3 modes: Managed, Unmanaged and ExternalCRDs
This proposal intends to provide users with a knob that allows them to manage the Gateway API CRDs without fully losing support of an OCP cluster, and to rollback on their decision in case they want back OCP Gateway API, and also allow the usage of external third party controllers.