Operator for stateful service migration
This guide explains how to deploy both CheckpointBackup and MigrationBackup controllers using the automated deploy.sh script.
The deployment script (deploy.sh) handles:
- CheckpointBackup Controller: Deployed as DaemonSet to member clusters via Karmada PropagationPolicies
- MigrationBackup Controller: Deployed to management/control plane cluster
- Automatic RBAC: Creates necessary service accounts, roles, and bindings
- Namespace Management: Creates and propagates namespaces
- CRD Propagation: Ensures CRDs are available on member clusters
kubectlinstalled and configured- Access to Karmada control plane
- Access to management cluster
- Docker images built and pushed to registry
- Karmada control plane running and accessible
- Member clusters registered with Karmada
- Kubeconfig file for Karmada control plane
- Kubernetes cluster for running MigrationBackup controller
- Kubeconfig file for management cluster
- Container registry access
- Registry credentials secret configured
./deploy.sh [options]| Option | Description | Required |
|---|---|---|
-c, --checkpoint |
Deploy CheckpointBackup controller | Choice |
-m, --migration |
Deploy MigrationBackup controller | Choice |
-a, --all |
Deploy all controllers | Choice |
-v, --version VERSION |
Version tag for images (default: v1.16) | No |
-k, --karmada-config PATH |
Path to Karmada kubeconfig | For checkpoint |
-g, --mgmt-config PATH |
Path to management cluster kubeconfig | For migration |
-l, --clusters LIST |
Comma-separated member cluster names | For checkpoint |
-d, --dry-run |
Show what would be deployed | No |
-h, --help |
Show help message | No |
./deploy.sh --all \
--karmada-config ~/.kube/karmada \
--mgmt-config ~/.kube/config \
--clusters cluster1,cluster2,cluster3 \
--version v2.0./deploy.sh --checkpoint \
--karmada-config ~/.kube/karmada \
--clusters cluster1,cluster2 \
--version v2.0./deploy.sh --migration \
--mgmt-config ~/.kube/config \
--version v2.0./deploy.sh --all \
--karmada-config ~/.kube/karmada \
--mgmt-config ~/.kube/config \
--clusters cluster1,cluster2 \
--dry-run- Namespace:
stateful-migration - CRD:
checkpointbackups.migration.dcnlab.com - RBAC: Service account, ClusterRole, ClusterRoleBinding
- DaemonSet: CheckpointBackup controller with buildah
- PropagationPolicies: For namespace, CRD, RBAC, DaemonSet
lehuannhatrang/stateful-migration-operator:checkpointBackup_<VERSION>- Includes buildah and container tools
- Size: ~120MB
- Privileged container with
SYS_ADMIN,SYS_PTRACEcapabilities - Host network and PID access
- Volume mounts for kubelet checkpoints and buildah storage
- Namespace:
stateful-migration - CRDs: All migration-related CRDs
- RBAC: Service account and permissions (follows deploy/all-in-one.yaml pattern)
- Deployment: MigrationBackup controller
- Service: Metrics and health endpoints
lehuannhatrang/stateful-migration-operator:migrationBackup_<VERSION>- Minimal distroless image
- Size: ~15MB
- Non-privileged container
- Leader election enabled
- Metrics and health endpoints
# Check PropagationPolicies
kubectl --kubeconfig ~/.kube/karmada get propagationpolicy -n stateful-migration
# Check DaemonSet on member clusters
kubectl get daemonset checkpoint-backup-controller -n stateful-migration
# Check pods on member clusters
kubectl get pods -n stateful-migration -l app.kubernetes.io/name=checkpoint-backup-controllerThe deployment script automatically prompts for and configures registry credentials:
# Registry credentials are configured automatically during deployment
# The script will prompt for:
# - Registry username
# - Registry password
# - Registry URL (optional, defaults to Docker Hub)
# Verify registry credentials were created and propagated
kubectl --kubeconfig ~/.kube/karmada get secret registry-credentials -n stateful-migration
kubectl get secret registry-credentials -n stateful-migration # On member clustersManual Registry Configuration (if needed):
# Only if you need to update credentials manually
kubectl --kubeconfig ~/.kube/karmada apply -f config/checkpoint-backup/registry-credentials-secret.yaml
# Create PropagationPolicy manually
kubectl --kubeconfig ~/.kube/karmada apply -f - <<EOF
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
name: registry-credentials-propagation
namespace: stateful-migration
spec:
resourceSelectors:
- apiVersion: v1
kind: Secret
name: registry-credentials
placement:
clusterAffinity:
clusterNames:
- cluster1
- cluster2
EOF# Check deployment
kubectl --kubeconfig ~/.kube/config get deployment migration-backup-controller -n stateful-migration
# Check pods
kubectl --kubeconfig ~/.kube/config get pods -n stateful-migration -l app.kubernetes.io/name=migration-backup-controller
# Check logs
kubectl --kubeconfig ~/.kube/config logs -n stateful-migration deployment/migration-backup-controller -f
# Check service
kubectl --kubeconfig ~/.kube/config get svc -n stateful-migration migration-backup-controller-metrics# Create a test StatefulMigration resource
kubectl --kubeconfig ~/.kube/config apply -f - <<EOF
apiVersion: migration.dcnlab.com/v1
kind: StatefulMigration
metadata:
name: test-migration
namespace: default
spec:
resourceRef:
kind: StatefulSet
name: my-statefulset
namespace: default
schedule: "0 2 * * *"
sourceClusters:
- cluster1
registry:
server: "your-registry.com"
repository: "your-repo/checkpoints"
EOF# Check PropagationPolicy status
kubectl --kubeconfig ~/.kube/karmada get propagationpolicy -n stateful-migration -o wide
# Check cluster registration
kubectl --kubeconfig ~/.kube/karmada get clusters# Check node selectors and tolerations
kubectl describe daemonset checkpoint-backup-controller -n stateful-migration
# Check node conditions
kubectl get nodes -o wide# Check pod logs
kubectl logs -n stateful-migration -l app.kubernetes.io/name=checkpoint-backup-controller
# Check events
kubectl get events -n stateful-migration --sort-by='.lastTimestamp'# Test registry connectivity from pod
kubectl exec -n stateful-migration <pod-name> -- buildah login your-registry.com
# Check secret propagation
kubectl get secret registry-credentials -n stateful-migration# Check if RBAC resources are propagated to member clusters
kubectl --kubeconfig ~/.kube/karmada get clusterpropagationpolicy checkpoint-backup-cluster-rbac
# Verify ClusterRole exists on member cluster
kubectl get clusterrole checkpoint-backup-role
# Verify ClusterRoleBinding exists on member cluster
kubectl get clusterrolebinding checkpoint-backup-rolebinding
# Verify ServiceAccount exists on member cluster
kubectl get serviceaccount checkpoint-backup-sa -n stateful-migration
# Check if controller can access CheckpointBackup CRD
kubectl auth can-i list checkpointbackups.migration.dcnlab.com --as=system:serviceaccount:stateful-migration:checkpoint-backup-sa
# Check if controller can access kubelet checkpoint API
kubectl auth can-i create nodes/checkpoint --as=system:serviceaccount:stateful-migration:checkpoint-backup-sa
# If RBAC is missing, manually apply and propagate
kubectl --kubeconfig ~/.kube/karmada apply -f config/rbac/checkpoint_backup_rbac.yaml# Common error: "kubelet checkpoint API returned status 403: Forbidden"
# This indicates missing permissions for nodes/checkpoint
# Check if controller has node checkpoint permissions
kubectl auth can-i create nodes/checkpoint --as=system:serviceaccount:stateful-migration:checkpoint-backup-sa
kubectl auth can-i get nodes --as=system:serviceaccount:stateful-migration:checkpoint-backup-sa
# Test kubelet checkpoint API directly from controller pod
kubectl exec -n stateful-migration <checkpoint-backup-pod> -- curl -k -X POST \
-H "Authorization: Bearer $(kubectl exec -n stateful-migration <checkpoint-backup-pod> -- cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
https://localhost:10250/checkpoint/test-namespace/test-pod/test-container
# Check if kubelet checkpoint feature is enabled on nodes
kubectl get nodes -o jsonpath='{.items[*].status.features.checkpointContainer}'# Common error: "json: cannot unmarshal string into Go struct field CheckpointResponse.items"
# This indicates the kubelet checkpoint API returns a different format than expected
# Check controller logs for DEBUG messages showing actual API responses
kubectl logs -n stateful-migration -l app.kubernetes.io/name=checkpoint-backup-controller | grep "DEBUG:"
# Check what checkpoint files are actually created
kubectl exec -n stateful-migration <checkpoint-backup-pod> -- ls -la /var/lib/kubelet/checkpoints/
# Test the actual kubelet response format
kubectl exec -n stateful-migration <checkpoint-backup-pod> -- curl -v -k -X POST \
-H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
"https://localhost:10250/checkpoint/test-namespace/test-pod/test-container?timeout=60"
# The controller now includes fallback handling for different response formats
# Check KUBELET_CHECKPOINT_API_DEBUG.md for detailed troubleshooting# Check buildah functionality
kubectl exec -n stateful-migration <pod-name> -- buildah version
# Check storage configuration
kubectl exec -n stateful-migration <pod-name> -- buildah info
# Check kubelet checkpoint API
kubectl exec -n stateful-migration <pod-name> -- curl -k https://localhost:10250/healthz# Delete PropagationPolicies
kubectl --kubeconfig ~/.kube/karmada delete propagationpolicy -n stateful-migration --all
kubectl --kubeconfig ~/.kube/karmada delete clusterpropagationpolicy checkpoint-backup-cluster-rbac
# Delete DaemonSet
kubectl --kubeconfig ~/.kube/karmada delete daemonset checkpoint-backup-controller -n stateful-migration
# Delete RBAC from Karmada
kubectl --kubeconfig ~/.kube/karmada delete -f config/rbac/checkpoint_backup_rbac.yaml
# Delete namespace
kubectl --kubeconfig ~/.kube/karmada delete namespace stateful-migration# Delete all resources from all-in-one manifest (recommended)
kubectl --kubeconfig ~/.kube/config delete -f deploy/all-in-one.yaml
# Or delete individual components:
# Delete deployment and service
kubectl --kubeconfig ~/.kube/config delete deployment migration-backup-controller -n stateful-migration
kubectl --kubeconfig ~/.kube/config delete svc migration-backup-controller-metrics -n stateful-migration
# Delete RBAC
kubectl --kubeconfig ~/.kube/config delete clusterrole migration-backup-controller-role
kubectl --kubeconfig ~/.kube/config delete clusterrolebinding migration-backup-controller-rolebinding
kubectl --kubeconfig ~/.kube/config delete role migration-backup-leader-election-role -n stateful-migration
kubectl --kubeconfig ~/.kube/config delete rolebinding migration-backup-leader-election-rolebinding -n stateful-migration
kubectl --kubeconfig ~/.kube/config delete serviceaccount migration-backup-controller -n stateful-migration
# Delete CRDs (only if not used by other controllers)
kubectl --kubeconfig ~/.kube/config delete -f config/crd/bases/
# Delete namespace (only if not used by CheckpointBackup controllers)
kubectl --kubeconfig ~/.kube/config delete namespace stateful-migrationEdit the script variables:
NAMESPACE="custom-migration-namespace"
OPERATOR_NAMESPACE="custom-operator-namespace"Edit the script variables:
DOCKERHUB_USERNAME="your-registry.com/your-org"
REPOSITORY_NAME="custom-operator"Add clusters to the list:
./deploy.sh --checkpoint \
--clusters cluster1,cluster2,cluster3,cluster4 \
--karmada-config ~/.kube/karmada- CheckpointBackup Controller: 100m CPU, 128Mi memory (requests), 500m CPU, 512Mi memory (limits)
- MigrationBackup Controller: 10m CPU, 64Mi memory (requests), 500m CPU, 128Mi memory (limits)
- Kubelet checkpoints: 50MB-500MB per container
- Buildah storage: 1GB-10GB per node
- Registry bandwidth: Consider checkpoint image sizes
- CheckpointBackup controllers scale with cluster nodes (DaemonSet)
- MigrationBackup controller typically runs as single replica with leader election
This deployment script provides a complete, production-ready solution for deploying the Stateful Migration Operator across your Karmada-managed clusters! 🚀
./build-and-push.shBuilds:
lehuannhatrang/stateful-migration-operator:checkpointBackup_v1.16lehuannhatrang/stateful-migration-operator:migrationBackup_v1.16
./build-and-push.sh all v1.17Builds:
lehuannhatrang/stateful-migration-operator:checkpointBackup_v1.17lehuannhatrang/stateful-migration-operator:migrationBackup_v1.17
./build-and-push.sh checkpointBuilds:
lehuannhatrang/stateful-migration-operator:checkpointBackup_v1.16
./build-and-push.sh checkpoint v2.0Builds:
lehuannhatrang/stateful-migration-operator:checkpointBackup_v2.0(includes buildah and container tools)
./build-and-push.sh migrationBuilds:
lehuannhatrang/stateful-migration-operator:migrationBackup_v1.16
./build-and-push.sh migration v1.18Builds:
lehuannhatrang/stateful-migration-operator:migrationBackup_v1.18
# Build development version
./build-and-push.sh all dev-$(date +%Y%m%d)
# Build feature branch version
./build-and-push.sh all feature-auth-v1.0
# Build release candidate
./build-and-push.sh all v2.0-rc1# Production release
./build-and-push.sh all v1.19
# Staging deployment
./build-and-push.sh all staging-v1.19
# Hotfix release
./build-and-push.sh all v1.18.1| Parameter | Description | Default | Examples |
|---|---|---|---|
| controller-type | Type of controller to build | all |
all, checkpoint, migration |
| version | Version tag for images | v1.16 |
v1.17, v2.0, dev-20241215 |
lehuannhatrang/stateful-migration-operator:<controller-type>_<version>
.
Where:
<controller-type>is eithercheckpointBackupormigrationBackup<version>is the version parameter you provide