-
Notifications
You must be signed in to change notification settings - Fork 44
WPB-21820: Enable HA k8s cluster #829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
sghosh23
wants to merge
23
commits into
master
Choose a base branch
from
enable-ha-k8s-cluster
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
a51a99f
Enable HA cluster with kube-vip
sghosh23 8d7eb8d
Add changelog
sghosh23 db35ec3
clean up and fix doc
sghosh23 e7ec736
update doc
sghosh23 7f9ef2d
keep inventory doc simple
sghosh23 ae5e636
Add necessary vars to and image to deploy HA k8s cluster on CI based …
sghosh23 c019073
Deploy kube-vip when the cluster is already up
sghosh23 23f14be
Disable kube-vip for CI deployment
sghosh23 ebce003
Try fix CI deployment
sghosh23 3bc2c20
fix ther bootstrapping by provding empty dic fo loadbalancer_apiserver
sghosh23 6724e56
Use two phase approach
sghosh23 a8eefba
fix k8s_cluster inventroy path logic
sghosh23 25fecf9
remove fixed interface name
sghosh23 16fe249
try with availability check
sghosh23 3a5518c
try with fixed alias IP
sghosh23 ec4c1a0
drop alias
sghosh23 ff1c347
Add the right interface
sghosh23 94c3b51
try alias_ip with dterministic ips for the servers
sghosh23 909533d
fix variable precedence issue and add extended timeout on leader sele…
sghosh23 56bea84
clean up checks which was blocking the deployment
sghosh23 61484ff
Route the VIP address to the kubenode from adminhost
sghosh23 bea385f
Disable kube-vip for CI deployment
sghosh23 59d71ac
Fix sonarCloud complains
sghosh23 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,65 @@ | ||
| # Offline Inventory Configuration | ||
|
|
||
| Ansible inventory for offline/air-gapped deployments of Wire infrastructure. | ||
|
|
||
| ## Directory Structure | ||
|
|
||
| ``` | ||
| offline/ | ||
| ├── 99-static # Main inventory file | ||
| ├── group_vars/ | ||
| │ ├── all/offline.yml # Base settings (k8s version, etc.) | ||
| │ ├── k8s-cluster/k8s-cluster.yml # kube-vip HA configuration | ||
| │ ├── postgresql/postgresql.yml # PostgreSQL settings | ||
| │ └── demo/offline.yml # Demo overrides | ||
| └── artifacts/ # Generated (kubeconfig, etc.) | ||
| ``` | ||
|
|
||
| ## Configuration Files | ||
|
|
||
| | File | Purpose | | ||
| |------|---------| | ||
| | `99-static` | Define hosts and group memberships | | ||
| | `group_vars/all/offline.yml` | Base settings (k8s version, container runtime) | | ||
| | `group_vars/k8s-cluster/k8s-cluster.yml` | kube-vip HA, API server, networking | | ||
| | `group_vars/postgresql/postgresql.yml` | PostgreSQL configuration | | ||
|
|
||
| ## Key Variables to Customize | ||
|
|
||
| **In `99-static`:** | ||
| - Host IP addresses (`ansible_host` and `ip`) | ||
| - Node assignments to groups (`[kube-master]`, `[kube-node]`, `[etcd]`) | ||
|
|
||
| **In `group_vars/k8s-cluster/k8s-cluster.yml`:** | ||
| - `kube_vip_address` - Virtual IP for HA (e.g., `192.168.122.100`) | ||
| - `kube_vip_interface` - Network interface (e.g., `enp1s0`) | ||
|
|
||
| **In `group_vars/all/offline.yml`:** | ||
| - `kube_version` - Kubernetes version | ||
| - Network settings (usually defaults are fine) | ||
|
|
||
|
|
||
| ## Documentation | ||
|
|
||
| - **Kubespray**: https://github.com/kubernetes-sigs/kubespray | ||
| - **Wire Docs**: https://docs.wire.com/ | ||
|
|
||
| ## Important Notes | ||
|
|
||
| - VIP must be in same subnet as control plane nodes | ||
| - VIP must not be in DHCP range | ||
| - etcd requires odd number of members (3, 5, 7) | ||
| - Keep `artifacts/` directory secure (contains admin kubeconfig) | ||
| - For production, encrypt sensitive files with SOPS | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| **Inventory not found:** | ||
| ```bash | ||
| ansible-inventory -i ansible/inventory/offline --list | ||
| ``` | ||
|
|
||
| **Can't SSH to nodes:** | ||
| ```bash | ||
| ansible -i ansible/inventory/offline/hosts.ini all -m ping | ||
| ``` |
103 changes: 103 additions & 0 deletions
103
ansible/inventory/offline/group_vars/k8s-cluster/k8s-cluster.yml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,103 @@ | ||
| --- | ||
| # Kubernetes cluster configuration for offline deployment | ||
| # | ||
| # This file contains configuration overrides for the Kubernetes cluster | ||
| # deployed via Kubespray. These settings override the defaults in | ||
| # ansible/roles-external/kubespray/roles/kubespray-defaults/defaults/main/ | ||
|
|
||
| # ============================================================================== | ||
| # kube-vip Configuration for High Availability Control Plane | ||
| # ============================================================================== | ||
| # | ||
| # kube-vip provides a Virtual IP (VIP) for the Kubernetes API server, | ||
| # enabling automatic failover between control plane nodes without requiring | ||
| # external load balancers. Perfect for bare-metal and air-gapped deployments. | ||
| # | ||
| # Reference: https://kube-vip.io/ | ||
|
|
||
| # Enable kube-vip (optional - set to true if you need HA control plane) | ||
| kube_vip_enabled: false | ||
|
|
||
| # Enable control plane VIP (required for HA) | ||
| kube_vip_controlplane_enabled: true | ||
|
|
||
| # Virtual IP address for the Kubernetes API server | ||
| # IMPORTANT: This must be: | ||
| # - In the same subnet as your control plane nodes | ||
| # - Unused and not in DHCP range | ||
| # - Accessible from all nodes and external clients | ||
| # | ||
| # Set the appropriate VIP address for your environment | ||
| # Example: If control plane nodes are 192.168.122.21-23, use 192.168.122.100 | ||
| kube_vip_address: "192.168.122.100" | ||
|
|
||
| # Network interface to bind the VIP to | ||
| # Find this by running: ssh kubenode1 "ip -br addr show" | ||
| # | ||
| # For Hetzner Cloud: Use "enp7s0" (private network interface) | ||
| # For other environments: Check with "ip -br addr show" | ||
| # Common values: eth0, enp1s0, enp7s0 | ||
| kube_vip_interface: "enp1s0" | ||
|
|
||
| # Use ARP for Layer 2 VIP management (recommended for most deployments) | ||
| # Set to false only if using BGP for Layer 3 routing | ||
| kube_vip_arp_enabled: true | ||
|
|
||
| # Enable kube-vip for LoadBalancer services (optional) | ||
| # Set to true if you want kube-vip to also handle LoadBalancer service IPs | ||
| # For control plane HA only, keep this false | ||
| kube_vip_services_enabled: false | ||
|
|
||
| # Required for kube-vip with ARP mode | ||
| # Prevents kube-proxy from responding to ARP requests for the VIP | ||
| kube_proxy_strict_arp: true | ||
|
|
||
| # Leader election timing (fix for kube-vip GitHub issue #453) | ||
| # Increased timeouts prevent "context deadline exceeded" errors during lease acquisition | ||
| # Default values are too aggressive for cloud environments with slower etcd/API responses | ||
| # These settings are particularly important for Hetzner Cloud and similar providers | ||
| kube_vip_leader_election_enabled: true | ||
| kube_vip_leaseduration: 30 # seconds (default: 15) | ||
| kube_vip_renewdeadline: 20 # seconds (default: 10) | ||
| kube_vip_retryperiod: 4 # seconds (default: 2) | ||
|
|
||
| # ============================================================================== | ||
| # Bootstrap Strategy for kube-vip HA | ||
| # ============================================================================== | ||
| # | ||
| # IMPORTANT: The following configurations are COMMENTED OUT to avoid bootstrap | ||
| # chicken-and-egg problem during automated cluster deployment. | ||
| # | ||
| # For NEW cluster deployment via bin/offline-cluster.sh: | ||
| # - These remain commented out | ||
| # - Phase 1 bootstraps without loadbalancer_apiserver (kubeadm uses node IP) | ||
| # - Phase 2 passes loadbalancer_apiserver dynamically via -e flag | ||
| # | ||
| # For MANUAL kube-vip setup on EXISTING cluster: | ||
| # - Uncomment the sections below | ||
| # - Update the IP addresses to match your VIP | ||
| # - Run: ansible-playbook -i inventory/offline/hosts.ini kubernetes.yml --tags=node,kube-vip,master,client | ||
| # | ||
| # See: offline/kube-vip-ha-setup.md for detailed documentation | ||
| # | ||
| # Reference: kubespray's test approach in | ||
| # ansible/roles-external/kubespray/tests/files/packet_centos7-flannel-addons-ha.yml | ||
|
|
||
| # API server advertise address (use VIP for consistency) | ||
| # This is the address the API server advertises to clients | ||
| # apiserver_loadbalancer_domain_name: "192.168.122.100" | ||
|
|
||
| # Configure API server endpoint to use VIP | ||
| # This tells all Kubernetes components to connect via the VIP | ||
| # loadbalancer_apiserver: | ||
| # address: "192.168.122.100" | ||
| # port: 6443 | ||
|
|
||
| # Disable localhost load balancer since we have VIP | ||
| # When using kube-vip, we don't need the nginx localhost proxy | ||
| # loadbalancer_apiserver_localhost: false | ||
|
|
||
| # Add VIP to API server SSL certificates | ||
| # This ensures the API server certificate is valid for the VIP address | ||
| # supplementary_addresses_in_ssl_keys: | ||
| # - "192.168.122.100" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Added: kube-vip v0.8.0 for high-availability Kubernetes control plane with automatic failover, including comprehensive documentation and offline build configuration (enabled in CI to validate production deployment path) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this somewhere else? right now, a user 'just' edits this one file.