Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ on:

env:
GO_VERSION: '1.24.9'
CERT_MANAGER_VERSION: 'v1.16.2'

jobs:
detect-noop:
Expand Down Expand Up @@ -143,6 +144,7 @@ jobs:
PROPERTY_PROVIDER: 'azure'
RESOURCE_SNAPSHOT_CREATION_MINIMUM_INTERVAL: ${{ matrix.resource-snapshot-creation-minimum-interval }}
RESOURCE_CHANGES_COLLECTION_DURATION: ${{ matrix.resource-changes-collection-duration }}
CERT_MANAGER_VERSION: ${{ env.CERT_MANAGER_VERSION }}

- name: Collect logs
if: always()
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ Azure Fleet repo contains the code that the [Azure Kubernetes Fleet Manager](htt
It follows the CNCF sandbox project [KubeFleet](https://github.com/kubefleet-dev/) and most of the development is done in the [KubeFleet](https://github.com/kubefleet-dev/).

## Get Involved
For any questions, please see the [KubeFleet discussion board](https://github.com/kubefleet-dev/kubefleet/discussions).
For any questions, please see the [KubeFleet discussion board](https://github.com/Azure/fleet/discussions).

For any issues, please open an issue in the [KubeFleet](https://github.com/kubefleet-dev/kubefleet/issues)
For any issues, please open an issue in the [KubeFleet](https://github.com/Azure/fleet/issues)


## Quickstart
Expand Down
2 changes: 1 addition & 1 deletion SUPPORT.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ feature request as a new Issue.

For help and questions about using this project, please

* start the conversation in the [GitHub Discussions](https://github.com/kubefleet-dev/kubefleet/discussions/).
* start the conversation in the [GitHub Discussions](https://github.com/Azure/fleet/discussions/).

We are actively exploring other means for developers, system admins, and anyone who has an interest
in the multi-cluster domain to engage with us. Please stay tuned.
81 changes: 80 additions & 1 deletion charts/hub-agent/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,33 @@

## Install Chart

### Default Installation (Self-Signed Certificates)

```console
# Helm install with fleet-system namespace already created
helm install hub-agent ./charts/hub-agent/
```

### Installation with cert-manager

When using cert-manager for certificate management, install cert-manager as a prerequisite first:

```console
# Install cert-manager (omit --version to get latest, or specify a version like --version v1.16.2)
# Note: See CERT_MANAGER_VERSION in .github/workflows/ci.yml for the version tested in CI
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--set crds.enabled=true

# Then install hub-agent with cert-manager enabled
helm install hub-agent ./charts/hub-agent --set useCertManager=true --set enableWorkload=true --set enableWebhook=true
```

This configures cert-manager to manage webhook certificates.

## Upgrade Chart

```console
Expand All @@ -32,6 +54,12 @@ _See [helm install](https://helm.sh/docs/helm/helm_install/) for command documen
| `affinity` | Node affinity for hub-agent pods | `{}` |
| `tolerations` | Tolerations for hub-agent pods | `[]` |
| `logVerbosity` | Log level (klog V logs) | `5` |
| `enableWebhook` | Enable webhook server | `true` |
| `webhookServiceName` | Webhook service name | `fleetwebhook` |
| `enableGuardRail` | Enable guard rail webhook configurations | `true` |
| `webhookClientConnectionType` | Connection type for webhook client (service or url) | `service` |
| `useCertManager` | Use cert-manager for webhook certificate management (requires `enableWorkload=true`) | `false` |
| `webhookCertSecretName` | Name of the Secret where cert-manager stores the certificate | `fleet-webhook-server-cert` |
| `enableV1Beta1APIs` | Watch for v1beta1 APIs | `true` |
| `hubAPIQPS` | QPS for fleet-apiserver (not including events/node heartbeat) | `250` |
| `hubAPIBurst` | Burst for fleet-apiserver (not including events/node heartbeat) | `1000` |
Expand All @@ -41,4 +69,55 @@ _See [helm install](https://helm.sh/docs/helm/helm_install/) for command documen
| `MaxFleetSizeSupported` | Max number of member clusters supported | `100` |
| `resourceSnapshotCreationMinimumInterval` | The minimum interval at which resource snapshots could be created. | `30s` |
| `resourceChangesCollectionDuration` | The duration for collecting resource changes into one snapshot. | `15s` |
| `enableWorkload` | Enable kubernetes builtin workload to run in hub cluster. | `false` |
| `enableWorkload` | Enable kubernetes builtin workload to run in hub cluster. | `false` |

## Certificate Management

The hub-agent supports two modes for webhook certificate management:

### Automatic Certificate Generation (Default)

By default, the hub-agent generates certificates automatically at startup. This mode:
- Requires no external dependencies
- Works out of the box
- Certificates are valid for 10 years
- **Limitation: Only supports single replica deployment** (replicaCount must be 1)

### cert-manager (Optional)

When `useCertManager=true`, certificates are managed by cert-manager. This mode:
- Requires cert-manager to be installed as a prerequisite
- Requires `enableWorkload=true` to allow cert-manager pods to run in the hub cluster (without this, pod creation would be blocked by the webhook)
- Requires `enableWebhook=true` because cert-manager is only used for webhook certificate management
- Handles certificate rotation automatically (90-day certificates)
- Follows industry-standard certificate management practices
- **Supports high availability with multiple replicas** (replicaCount > 1)
- Suitable for production environments

To switch to cert-manager mode:
```console
# Install cert-manager first (omit --version to get latest, or specify a version like --version v1.16.2)
# Note: See CERT_MANAGER_VERSION in .github/workflows/ci.yml for the version tested in CI
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--set crds.enabled=true

# Then install hub-agent with cert-manager enabled
helm install hub-agent ./charts/hub-agent --set useCertManager=true --set enableWorkload=true --set enableWebhook=true
```

The `webhookCertSecretName` parameter specifies the Secret name for the certificate:
- Default: `fleet-webhook-server-cert`
- When using cert-manager, this is where cert-manager stores the certificate
- Must match the secret name referenced in the deployment volume mount

Example with custom secret name:
```console
helm install hub-agent ./charts/hub-agent \
--set useCertManager=true \
--set enableWorkload=true \
--set webhookCertSecretName=my-webhook-secret
```
63 changes: 63 additions & 0 deletions charts/hub-agent/templates/certificate.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
{{- if and .Values.enableWebhook .Values.useCertManager }}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
# This name must match FleetWebhookCertName in pkg/webhook/webhook.go
name: fleet-webhook-certificate
namespace: {{ .Values.namespace }}
labels:
{{- include "hub-agent.labels" . | nindent 4 }}
spec:
# Secret name where cert-manager will store the certificate
secretName: {{ .Values.webhookCertSecretName }}

# Certificate duration (90 days is cert-manager's default and recommended)
duration: 2160h # 90 days

# Renew certificate 30 days before expiry
renewBefore: 720h # 30 days

# Subject configuration
subject:
organizations:
- KubeFleet

# Common name
commonName: fleet-webhook.{{ .Values.namespace }}.svc

# DNS names for the certificate
dnsNames:
- {{ .Values.webhookServiceName }}
- {{ .Values.webhookServiceName }}.{{ .Values.namespace }}
- {{ .Values.webhookServiceName }}.{{ .Values.namespace }}.svc
- {{ .Values.webhookServiceName }}.{{ .Values.namespace }}.svc.cluster.local

# Issuer reference - using self-signed issuer
issuerRef:
name: fleet-selfsigned-issuer
kind: Issuer
group: cert-manager.io

# Private key configuration
privateKey:
algorithm: ECDSA
size: 256

# Key usages
usages:
- digital signature
- key encipherment
- server auth
---
# Self-signed issuer for generating the certificate
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: fleet-selfsigned-issuer
namespace: {{ .Values.namespace }}
labels:
{{- include "hub-agent.labels" . | nindent 4 }}
spec:
selfSigned: {}
{{- end }}
21 changes: 21 additions & 0 deletions charts/hub-agent/templates/deployment.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
{{- if and (not .Values.useCertManager) (gt (.Values.replicaCount | int) 1) }}
{{- fail "ERROR: replicaCount > 1 requires useCertManager=true (self-signed certificates cannot be shared across replicas)" }}
{{- end }}
apiVersion: apps/v1
kind: Deployment
metadata:
Expand All @@ -6,6 +9,7 @@ metadata:
labels:
{{- include "hub-agent.labels" . | nindent 4 }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
{{- include "hub-agent.selectorLabels" . | nindent 6 }}
Expand Down Expand Up @@ -34,6 +38,7 @@ spec:
- --webhook-service-name={{ .Values.webhookServiceName }}
- --enable-guard-rail={{ .Values.enableGuardRail }}
- --enable-workload={{ .Values.enableWorkload }}
- --use-cert-manager={{ .Values.useCertManager }}
- --whitelisted-users=system:serviceaccount:fleet-system:hub-agent-sa
- --webhook-client-connection-type={{.Values.webhookClientConnectionType}}
- --v={{ .Values.logVerbosity }}
Expand Down Expand Up @@ -82,6 +87,22 @@ spec:
fieldPath: metadata.namespace
resources:
{{- toYaml .Values.resources | nindent 12 }}
{{- if .Values.useCertManager }}
volumeMounts:
- name: webhook-cert
# This path must match FleetWebhookCertDir in pkg/webhook/webhook.go
mountPath: /tmp/k8s-webhook-server/serving-certs
readOnly: true
{{- end }}
{{- if .Values.useCertManager }}
volumes:
- name: webhook-cert
secret:
secretName: {{ .Values.webhookCertSecretName }}
# defaultMode 0444 (read for all) allows the container process to read the certs
# regardless of the user/group it runs as
defaultMode: 0444
{{- end }}
{{- with .Values.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
Expand Down
11 changes: 9 additions & 2 deletions charts/hub-agent/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,20 @@ webhookServiceName: fleetwebhook
enableGuardRail: true
webhookClientConnectionType: service
enableWorkload: false
# useCertManager enables cert-manager for webhook certificate management
# When enabled, cert-manager must be installed as a prerequisite (it is not installed automatically by this chart)
# and a Certificate resource will be created
useCertManager: false
# webhookCertSecretName is ONLY used when useCertManager=true
# It specifies the name of the Secret where cert-manager stores the certificate
# webhookCertSecretName: fleet-webhook-server-cert

forceDeleteWaitTime: 15m0s
clusterUnhealthyThreshold: 3m0s
resourceSnapshotCreationMinimumInterval: 30s
resourceChangesCollectionDuration: 15s

namespace:
fleet-system
namespace: fleet-system

resources:
limits:
Expand Down
47 changes: 29 additions & 18 deletions cmd/hubagent/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ import (
"flag"
"fmt"
"math"
"net/http"
"os"
"strings"
"sync"

apiextensionsv1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1"
Expand Down Expand Up @@ -66,8 +66,7 @@ var (
)

const (
FleetWebhookCertDir = "/tmp/k8s-webhook-server/serving-certs"
FleetWebhookPort = 9443
FleetWebhookPort = 9443
)

func init() {
Expand Down Expand Up @@ -122,7 +121,7 @@ func main() {
},
WebhookServer: ctrlwebhook.NewServer(ctrlwebhook.Options{
Port: FleetWebhookPort,
CertDir: FleetWebhookCertDir,
CertDir: webhook.FleetWebhookCertDir,
}),
}
if opts.EnablePprof {
Expand Down Expand Up @@ -158,12 +157,31 @@ func main() {
}

if opts.EnableWebhook {
whiteListedUsers := strings.Split(opts.WhiteListedUsers, ",")
if err := SetupWebhook(mgr, options.WebhookClientConnectionType(opts.WebhookClientConnectionType), opts.WebhookServiceName, whiteListedUsers,
opts.EnableGuardRail, opts.EnableV1Beta1APIs, opts.DenyModifyMemberClusterLabels, opts.EnableWorkload, opts.NetworkingAgentsEnabled); err != nil {
// Generate webhook configuration with certificates
webhookConfig, err := webhook.NewWebhookConfigFromOptions(mgr, opts, FleetWebhookPort)
if err != nil {
klog.ErrorS(err, "unable to create webhook config")
exitWithErrorFunc()
}

// Setup webhooks with the manager
if err := SetupWebhook(mgr, webhookConfig); err != nil {
klog.ErrorS(err, "unable to set up webhook")
exitWithErrorFunc()
}

// When using cert-manager, add a readiness check to ensure CA bundles are injected before marking ready.
// This prevents the pod from accepting traffic before cert-manager has populated the webhook CA bundles,
// which would cause webhook calls to fail.
if opts.UseCertManager {
if err := mgr.AddReadyzCheck("cert-manager-ca-injection", func(req *http.Request) error {
return webhookConfig.CheckCAInjection(req.Context())
}); err != nil {
klog.ErrorS(err, "unable to set up cert-manager CA injection readiness check")
exitWithErrorFunc()
}
klog.V(2).InfoS("Added cert-manager CA injection readiness check")
}
}

ctx := ctrl.SetupSignalHandler()
Expand Down Expand Up @@ -213,20 +231,13 @@ func main() {
wg.Wait()
}

// SetupWebhook generates the webhook cert and then set up the webhook configurator.
func SetupWebhook(mgr manager.Manager, webhookClientConnectionType options.WebhookClientConnectionType, webhookServiceName string,
whiteListedUsers []string, enableGuardRail, isFleetV1Beta1API bool, denyModifyMemberClusterLabels bool, enableWorkload bool, networkingAgentsEnabled bool) error {
// Generate self-signed key and crt files in FleetWebhookCertDir for the webhook server to start.
w, err := webhook.NewWebhookConfig(mgr, webhookServiceName, FleetWebhookPort, &webhookClientConnectionType, FleetWebhookCertDir, enableGuardRail, denyModifyMemberClusterLabels, enableWorkload)
if err != nil {
klog.ErrorS(err, "fail to generate WebhookConfig")
return err
}
if err = mgr.Add(w); err != nil {
// SetupWebhook registers the webhook config and webhook handlers with the manager.
func SetupWebhook(mgr manager.Manager, webhookConfig *webhook.Config) error {
if err := mgr.Add(webhookConfig); err != nil {
klog.ErrorS(err, "unable to add WebhookConfig")
return err
}
if err = webhook.AddToManager(mgr, whiteListedUsers, denyModifyMemberClusterLabels, networkingAgentsEnabled); err != nil {
if err := webhook.AddToManager(mgr, webhookConfig); err != nil {
klog.ErrorS(err, "unable to register webhooks to the manager")
return err
}
Expand Down
4 changes: 4 additions & 0 deletions cmd/hubagent/options/options.go
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,9 @@ type Options struct {
// EnableWorkload enables workload resources (pods and replicasets) to be created in the hub cluster.
// When set to true, the pod and replicaset validating webhooks are disabled.
EnableWorkload bool
// UseCertManager indicates whether to use cert-manager for webhook certificate management.
// When enabled, webhook certificates are managed by cert-manager instead of self-signed generation.
UseCertManager bool
// ResourceSnapshotCreationMinimumInterval is the minimum interval at which resource snapshots could be created.
// Whether the resource snapshot is created or not depends on the both ResourceSnapshotCreationMinimumInterval and ResourceChangesCollectionDuration.
ResourceSnapshotCreationMinimumInterval time.Duration
Expand Down Expand Up @@ -187,6 +190,7 @@ func (o *Options) AddFlags(flags *flag.FlagSet) {
flags.IntVar(&o.PprofPort, "pprof-port", 6065, "The port for pprof profiling.")
flags.BoolVar(&o.DenyModifyMemberClusterLabels, "deny-modify-member-cluster-labels", false, "If set, users not in the system:masters cannot modify member cluster labels.")
flags.BoolVar(&o.EnableWorkload, "enable-workload", false, "If set, workloads (pods and replicasets) can be created in the hub cluster. This disables the pod and replicaset validating webhooks.")
flags.BoolVar(&o.UseCertManager, "use-cert-manager", false, "If set, cert-manager will be used for webhook certificate management instead of self-signed certificates.")
flags.DurationVar(&o.ResourceSnapshotCreationMinimumInterval, "resource-snapshot-creation-minimum-interval", 30*time.Second, "The minimum interval at which resource snapshots could be created.")
flags.DurationVar(&o.ResourceChangesCollectionDuration, "resource-changes-collection-duration", 15*time.Second,
"The duration for collecting resource changes into one snapshot. The default is 15 seconds, which means that the controller will collect resource changes for 15 seconds before creating a resource snapshot.")
Expand Down
4 changes: 4 additions & 0 deletions cmd/hubagent/options/validation.go
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,10 @@ func (o *Options) Validate() field.ErrorList {
errs = append(errs, field.Invalid(newPath.Child("WebhookServiceName"), o.WebhookServiceName, "Webhook service name is required when webhook is enabled"))
}

if o.UseCertManager && !o.EnableWorkload {
errs = append(errs, field.Invalid(newPath.Child("UseCertManager"), o.UseCertManager, "UseCertManager requires EnableWorkload to be true (when EnableWorkload is false, a validating webhook blocks pod creation except for certain system pods; cert-manager controller pods must be allowed to run in the hub cluster)"))
}

connectionType := o.WebhookClientConnectionType
if _, err := parseWebhookClientConnectionString(connectionType); err != nil {
errs = append(errs, field.Invalid(newPath.Child("WebhookClientConnectionType"), o.WebhookClientConnectionType, err.Error()))
Expand Down
Loading
Loading