Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,12 @@ It exists as a way for us to facilitate two things:

1. To turn off and on the feature flags for olm v1 so we could ship it in the openshift payload without it being turned on by default
2. To handle the `clusterstatus` resource for the v1 components

Because OCP has an API that's part of the `cluster-version-operator`, that isn't in plain kubernetes, that tracks the state of all the OCP components, and if you're in the payload you are required to write status to it

## Features

- **Standalone Mode**: Manages OLMv1 components (catalogd, operator-controller) in standard OpenShift clusters
- **HyperShift Mode**: Supports HyperShift hosted clusters where OLMv1 components run in the management cluster but watch hosted cluster API servers

For more information on HyperShift support, see [docs/hypershift.md](docs/hypershift.md).
179 changes: 179 additions & 0 deletions docs/examples/hypershift-deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
---
# Example HyperShift deployment for cluster-olm-operator
# This shows how to configure cluster-olm-operator to manage OLMv1 components
# for a hosted cluster.
#
# In this example:
# - Management cluster runs in namespace: clusters-customer1
# - Hosted cluster name: customer1
# - Admin kubeconfig secret: admin-kubeconfig

apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-olm-operator
namespace: clusters-customer1
labels:
app: cluster-olm-operator
hypershift.openshift.io/control-plane-component: cluster-olm-operator
spec:
replicas: 1
selector:
matchLabels:
app: cluster-olm-operator
template:
metadata:
labels:
app: cluster-olm-operator
hypershift.openshift.io/control-plane-component: cluster-olm-operator
spec:
serviceAccountName: cluster-olm-operator
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
initContainers:
- name: copy-catalogd-manifests
image: quay.io/openshift/origin-olm-catalogd:latest
imagePullPolicy: IfNotPresent
command:
- /bin/sh
args:
- -c
- /cp-manifests /operand-assets
volumeMounts:
- mountPath: /operand-assets
name: operand-assets
securityContext:
readOnlyRootFilesystem: true
terminationMessagePolicy: FallbackToLogsOnError
- name: copy-operator-controller-manifests
image: quay.io/openshift/origin-olm-operator-controller:latest
imagePullPolicy: IfNotPresent
command:
- /bin/sh
args:
- -c
- /cp-manifests /operand-assets
volumeMounts:
- mountPath: /operand-assets
name: operand-assets
securityContext:
readOnlyRootFilesystem: true
terminationMessagePolicy: FallbackToLogsOnError
Comment on lines +36 to +63
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Init containers are missing allowPrivilegeEscalation: false (Checkov CKV_K8S_20).

Both copy-catalogd-manifests and copy-operator-controller-manifests set readOnlyRootFilesystem: true but omit allowPrivilegeEscalation: false and capabilities.drop: [ALL], which the main container already has. Users copy-pasting this example will deploy with insecure init container defaults.

🔒 Proposed fix for both init containers
         securityContext:
           readOnlyRootFilesystem: true
+          allowPrivilegeEscalation: false
+          capabilities:
+            drop:
+            - ALL
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: copy-catalogd-manifests
image: quay.io/openshift/origin-olm-catalogd:latest
imagePullPolicy: IfNotPresent
command:
- /bin/sh
args:
- -c
- /cp-manifests /operand-assets
volumeMounts:
- mountPath: /operand-assets
name: operand-assets
securityContext:
readOnlyRootFilesystem: true
terminationMessagePolicy: FallbackToLogsOnError
- name: copy-operator-controller-manifests
image: quay.io/openshift/origin-olm-operator-controller:latest
imagePullPolicy: IfNotPresent
command:
- /bin/sh
args:
- -c
- /cp-manifests /operand-assets
volumeMounts:
- mountPath: /operand-assets
name: operand-assets
securityContext:
readOnlyRootFilesystem: true
terminationMessagePolicy: FallbackToLogsOnError
- name: copy-catalogd-manifests
image: quay.io/openshift/origin-olm-catalogd:latest
imagePullPolicy: IfNotPresent
command:
- /bin/sh
args:
- -c
- /cp-manifests /operand-assets
volumeMounts:
- mountPath: /operand-assets
name: operand-assets
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
terminationMessagePolicy: FallbackToLogsOnError
- name: copy-operator-controller-manifests
image: quay.io/openshift/origin-olm-operator-controller:latest
imagePullPolicy: IfNotPresent
command:
- /bin/sh
args:
- -c
- /cp-manifests /operand-assets
volumeMounts:
- mountPath: /operand-assets
name: operand-assets
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
terminationMessagePolicy: FallbackToLogsOnError
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/examples/hypershift-deployment.yaml` around lines 36 - 63, The two
initContainers copy-catalogd-manifests and copy-operator-controller-manifests
are missing hardened security settings; update each init container's
securityContext to include allowPrivilegeEscalation: false and add
capabilities.drop: [ "ALL" ] (matching the main container's securityContext),
ensuring these fields sit alongside readOnlyRootFilesystem: true to prevent
privilege escalation for those init containers.

containers:
- name: cluster-olm-operator
image: quay.io/openshift/origin-cluster-olm-operator:latest
terminationMessagePolicy: FallbackToLogsOnError
command:
- /cluster-olm-operator
args:
- start
imagePullPolicy: IfNotPresent
env:
# Standard environment variables
- name: OPERATOR_NAME
value: cluster-olm-operator
- name: OPERATOR_IMAGE_VERSION
value: 4.16.0
- name: KUBE_RBAC_PROXY_IMAGE
value: quay.io/openshift/origin-kube-rbac-proxy:latest
- name: CATALOGD_IMAGE
value: quay.io/openshift/origin-olm-catalogd:latest
- name: OPERATOR_CONTROLLER_IMAGE
value: quay.io/openshift/origin-olm-operator-controller:latest

# HyperShift mode configuration
# Setting these enables HyperShift mode
- name: HOSTED_KUBECONFIG_SECRET
value: admin-kubeconfig
- name: HOSTED_NAMESPACE
value: clusters-customer1
Comment on lines +86 to +91
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Example deployment is missing the env vars required for clients.go dual-API routing.

HOSTED_KUBECONFIG_SECRET and HOSTED_NAMESPACE activate builder.go's hook but clients.go checks HYPERSHIFT_MODE=true and HOSTED_KUBECONFIG (a file path). Without those, the operator's DynamicClient/RESTMapper will watch the management cluster's API server, not the hosted cluster's. The operator pod also needs a volume mounting the admin-kubeconfig secret at the path specified by HOSTED_KUBECONFIG.

This is a downstream consequence of the root-cause mismatch documented in pkg/clients/clients.go lines 81–91. Fix the env var inconsistency there first; this example will need updating accordingly.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/examples/hypershift-deployment.yaml` around lines 86 - 91, clients.go
currently expects HYPERSHIFT_MODE=true and HOSTED_KUBECONFIG (file path) while
builder.go uses HOSTED_KUBECONFIG_SECRET and HOSTED_NAMESPACE; update clients.go
to accept the same env vars builder.go uses (HOSTED_KUBECONFIG_SECRET and
HOSTED_NAMESPACE) or normalize both modules to a single convention (e.g.,
HYPERSHIFT_MODE + HOSTED_KUBECONFIG path) so dual-API routing uses the hosted
kubeconfig; then update the example hypershift-deployment.yaml to set
HYPERSHIFT_MODE=true (if you choose that convention) or to include
HOSTED_KUBECONFIG_SECRET and HOSTED_NAMESPACE and add a volume/volumeMount that
mounts the admin-kubeconfig secret to the path referenced by HOSTED_KUBECONFIG
so DynamicClient/RESTMapper target the hosted cluster. Ensure references to
clients.go, builder.go, HOSTED_KUBECONFIG, HOSTED_KUBECONFIG_SECRET,
HOSTED_NAMESPACE, DynamicClient and RESTMapper are aligned.


resources:
requests:
cpu: 10m
memory: 20Mi
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
volumeMounts:
- mountPath: /operand-assets
name: operand-assets
- mountPath: /tmp
name: tmp
volumes:
- name: operand-assets
emptyDir: {}
- name: tmp
emptyDir: {}

---
# ServiceAccount for cluster-olm-operator
apiVersion: v1
kind: ServiceAccount
metadata:
name: cluster-olm-operator
namespace: clusters-customer1

---
# RBAC for cluster-olm-operator in management cluster
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cluster-olm-operator-management
rules:
# Management cluster permissions
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
resources: ["services", "serviceaccounts", "configmaps"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["rbac.authorization.k8s.io"]
resources: ["roles", "rolebindings", "clusterroles", "clusterrolebindings"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["config.openshift.io"]
resources: ["proxies"]
verbs: ["get", "list", "watch"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cluster-olm-operator-management
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-olm-operator-management
subjects:
- kind: ServiceAccount
name: cluster-olm-operator
namespace: clusters-customer1

---
# Example: admin-kubeconfig secret
# This secret contains the kubeconfig for the hosted cluster's API server
# In a real HyperShift deployment, this is created automatically by HyperShift
apiVersion: v1
kind: Secret
metadata:
name: admin-kubeconfig
namespace: clusters-customer1
type: Opaque
data:
# Base64-encoded kubeconfig for the hosted cluster
# This would be generated by HyperShift control-plane-operator
kubeconfig: |
YXBpVmVyc2lvbjogdjEKY2x1c3RlcnM6Ci0gY2x1c3RlcjoKICAgIGNlcnRpZmljYXRl
LWF1dGhvcml0eS1kYXRhOiA8YmFzZTY0LWNhLWNlcnQ+CiAgICBzZXJ2ZXI6IGh0dHBz
Oi8vYXBpLmN1c3RvbWVyMS5leGFtcGxlLmNvbTo2NDQzCiAgbmFtZTogY3VzdG9tZXIx
CmNvbnRleHRzOgotIGNvbnRleHQ6CiAgICBjbHVzdGVyOiBjdXN0b21lcjEKICAgIHVz
ZXI6IGFkbWluCiAgbmFtZTogYWRtaW5AY3VzdG9tZXIxCmN1cnJlbnQtY29udGV4dDog
YWRtaW5AY3VzdG9tZXIxCmtpbmQ6IENvbmZpZwpwcmVmZXJlbmNlczoge30KdXNlcnM6
Ci0gbmFtZTogYWRtaW4KICB1c2VyOgogICAgY2xpZW50LWNlcnRpZmljYXRlLWRhdGE6
IDxiYXNlNjQtY2xpZW50LWNlcnQ+CiAgICBjbGllbnQta2V5LWRhdGE6IDxiYXNlNjQt
Y2xpZW50LWtleT4K
182 changes: 182 additions & 0 deletions docs/hypershift.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
# HyperShift Support

cluster-olm-operator supports running in HyperShift mode, where it manages OLMv1 components (catalogd and operator-controller) for hosted clusters.

## Overview

In HyperShift deployments, cluster-olm-operator can run in the management cluster and manage OLMv1 components that watch hosted cluster API servers. This enables:

- catalogd to serve catalogs from the management cluster while watching ClusterCatalog resources in the hosted cluster's API server
- operator-controller to install operators into hosted cluster worker nodes while watching ClusterExtension resources in the hosted cluster's API server

This corresponds to **Approach 1: Control Plane Placement** as described in the [HyperShift OLMv1 design document](https://github.com/openshift/enhancements/blob/master/enhancements/olm/hypershift-olmv1.md).

## Architecture

### Standalone Mode (Default)

In standalone OpenShift clusters:
- cluster-olm-operator runs in `openshift-cluster-olm-operator` namespace
- catalogd and operator-controller watch the local cluster's API server using in-cluster config
- Components run in `olmv1-system` namespace

### HyperShift Mode

In HyperShift deployments:
- cluster-olm-operator runs in the management cluster (in the hosted control plane namespace, e.g., `clusters-customer1`)
- catalogd and operator-controller watch the **hosted cluster's** API server using a mounted kubeconfig
- Components are configured with `--kubeconfig` and `--system-namespace` flags
- The `admin-kubeconfig` secret provides connectivity to the hosted cluster's API server

## Configuration

HyperShift mode is enabled by setting environment variables on the cluster-olm-operator deployment:

### Required Environment Variables

| Variable | Description | Example |
|----------|-------------|---------|
| `HOSTED_KUBECONFIG_SECRET` | Name of the secret containing the hosted cluster's kubeconfig | `admin-kubeconfig` |
| `HOSTED_NAMESPACE` | The hosted control plane namespace in the management cluster | `clusters-customer1` |

### Example Deployment Configuration

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-olm-operator
namespace: clusters-customer1 # Hosted control plane namespace
spec:
template:
spec:
containers:
- name: cluster-olm-operator
image: quay.io/openshift/origin-cluster-olm-operator:latest
env:
- name: HOSTED_KUBECONFIG_SECRET
value: admin-kubeconfig
- name: HOSTED_NAMESPACE
value: clusters-customer1
# ... other environment variables ...
```

## How It Works

When HyperShift mode is detected (via `HOSTED_KUBECONFIG_SECRET` environment variable):

1. **Kubeconfig Injection Hook**: The `InjectHostedClusterKubeconfigHook` deployment hook is automatically applied to catalogd and operator-controller deployments

2. **Volume Mounting**: The hook adds a volume referencing the kubeconfig secret:
```yaml
volumes:
- name: hosted-kubeconfig
secret:
secretName: admin-kubeconfig # Value from HOSTED_KUBECONFIG_SECRET
```

3. **Volume Mounts**: The kubeconfig is mounted into all containers:
```yaml
volumeMounts:
- name: hosted-kubeconfig
mountPath: /var/run/secrets/kubeconfig
readOnly: true
```

4. **Command-line Flags**: Additional arguments are added to containers:
```yaml
args:
- --kubeconfig=/var/run/secrets/kubeconfig/kubeconfig
- --system-namespace=clusters-customer1 # Value from HOSTED_NAMESPACE
```

## Components Affected

The HyperShift configuration is automatically applied to:

- **catalogd**: Watches ClusterCatalog resources in the hosted cluster's API server
- **operator-controller**: Watches ClusterExtension resources in the hosted cluster's API server and installs operators into hosted cluster worker nodes

Both components continue to serve their control plane functions from the management cluster while interacting with hosted cluster API resources.

## Upstream Requirements

For HyperShift mode to work, the upstream components must support:

- **catalogd**: `--kubeconfig` flag support ([catalogd PR #xyz](https://github.com/operator-framework/catalogd/pull/xyz))
- **operator-controller**: `--kubeconfig` flag support ([operator-controller PR #xyz](https://github.com/operator-framework/operator-controller/pull/xyz))
Comment on lines +106 to +107
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Replace placeholder PR links before merging.

#xyz are non-functional placeholders; readers won't be able to verify upstream support status.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/hypershift.md` around lines 106 - 107, Replace the placeholder PR links
for catalogd and operator-controller in the hypershift.md lines that list
`**catalogd**: --kubeconfig` and `**operator-controller**: --kubeconfig` (the
entries referencing PR `#xyz`) with the actual upstream PR numbers or working
links before merging; locate the two lines containing "catalogd" and
"operator-controller" in the docs and update the bracketed URLs to the real PR
URLs (or remove the PR link if none exists) so the documentation points to
verifiable upstream changes.

- Both components: `--system-namespace` flag to specify the namespace context

## Detection and Logging

When cluster-olm-operator starts in HyperShift mode:

```
I0312 10:15:23.123456 1 builder.go:150] HyperShift mode detected, injecting kubeconfig configuration deployment="catalogd" kubeconfigSecret="admin-kubeconfig" hostedNamespace="clusters-customer1"
I0312 10:15:23.234567 1 builder.go:150] HyperShift mode detected, injecting kubeconfig configuration deployment="operator-controller" kubeconfigSecret="admin-kubeconfig" hostedNamespace="clusters-customer1"
```
Comment on lines +114 to +117
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language specifiers to fenced code blocks (markdownlint MD040).

Both log-output blocks at Lines 114 and 121 are missing a language identifier. Use text or log to suppress the warning and improve rendering.

📝 Proposed fix
-```
+```text
 I0312 10:15:23.123456 ...
-```
+```text
 I0312 10:15:23.345678 ...

Also applies to: 121-124

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 114-114: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/hypershift.md` around lines 114 - 117, The two fenced code blocks that
show log output (the blocks starting with "I0312 10:15:23.123456 ..." and "I0312
10:15:23.234567 ...") are missing a language specifier, which triggers
markdownlint MD040; update each triple-backtick fence to include a language
identifier such as text (e.g., ```text) so both log-output blocks in the
hypershift.md section render properly and silence the lint warning.


Individual deployment hooks also log their actions:

```
I0312 10:15:23.345678 1 builder.go:354] Injecting hosted cluster kubeconfig configuration deployment="catalogd" kubeconfigSecret="admin-kubeconfig" hostedNamespace="clusters-customer1"
I0312 10:15:23.456789 1 builder.go:380] Configured container container="catalogd" kubeconfigPath="/var/run/secrets/kubeconfig/kubeconfig" systemNamespace="clusters-customer1"
```

## Verification

To verify cluster-olm-operator is running in HyperShift mode:

1. Check environment variables:
```bash
kubectl get deployment cluster-olm-operator -n clusters-customer1 -o yaml | grep -A2 HOSTED_
```

2. Check catalogd/operator-controller deployments for kubeconfig configuration:
```bash
kubectl get deployment catalogd -n clusters-customer1 -o yaml | grep -A5 "hosted-kubeconfig"
kubectl get deployment operator-controller -n clusters-customer1 -o yaml | grep "kubeconfig"
```

3. Verify components are watching the hosted cluster API:
```bash
# Check catalogd logs
kubectl logs -n clusters-customer1 deployment/catalogd | grep "kubeconfig"

# Check operator-controller logs
kubectl logs -n clusters-customer1 deployment/operator-controller | grep "kubeconfig"
```

## Troubleshooting

### Components not connecting to hosted cluster

**Symptoms**: catalogd or operator-controller cannot list resources, API connection errors

**Checks**:
1. Verify the `admin-kubeconfig` secret exists and is properly mounted
2. Check the secret contains a valid kubeconfig
3. Verify network connectivity from management cluster to hosted cluster API server
4. Check RBAC permissions in the kubeconfig

### Missing environment variables

**Symptoms**: Components use in-cluster config instead of hosted cluster kubeconfig

**Solution**: Ensure both `HOSTED_KUBECONFIG_SECRET` and `HOSTED_NAMESPACE` environment variables are set on the cluster-olm-operator deployment

### Hook not applied

**Symptoms**: Deployments don't have kubeconfig volumes or --kubeconfig flags

**Checks**:
1. Verify environment variables are set before cluster-olm-operator starts
2. Check cluster-olm-operator logs for "HyperShift mode detected" messages
3. Verify the deployment controller is processing deployments correctly

## References

- [HyperShift OLMv1 Design Proposal](https://github.com/openshift/enhancements/blob/master/enhancements/olm/hypershift-olmv1.md)
- [catalogd Documentation](https://github.com/operator-framework/catalogd)
- [operator-controller Documentation](https://github.com/operator-framework/operator-controller)
- [HyperShift Documentation](https://hypershift-docs.netlify.app/)
Loading