openshift · jaypoulz · Jan 15, 2026 · coderabbitai · Jan 20, 2026 · coderabbitai
diff --git a/etcd/README.md b/etcd/README.md
@@ -0,0 +1,206 @@
+# etcd.openshift.io API Group
+
+This API group contains CRDs related to etcd cluster management in Two Node OpenShift with Fencing deployments.
+
+## API Versions
+
+### v1alpha1
+
+Contains the `PacemakerCluster` custom resource for monitoring Pacemaker cluster health in Two Node OpenShift with Fencing deployments.
+
+#### PacemakerCluster
+
+- **Feature Gate**: `DualReplica`
+- **Component**: `two-node-fencing`
+- **Scope**: Cluster-scoped singleton resource (must be named "cluster")
+- **Resource Path**: `pacemakerclusters.etcd.openshift.io`
+
+The `PacemakerCluster` resource provides visibility into the health and status of a Pacemaker-managed cluster.
+It is periodically updated by the cluster-etcd-operator's status collector.
+
+### Status Subresource Design
+
+This resource uses the standard Kubernetes status subresource pattern (`+kubebuilder:subresource:status`).
+The status collector creates the resource without status, then immediately populates it via the `/status` endpoint.
+
+**Why not atomic create-with-status?**
+
+We initially explored removing the status subresource to allow creating the resource with status in a single
+atomic operation. This would ensure the resource is never observed in an incomplete state. However:
+
+1. The Kubernetes API server strips the `status` field from create requests when a status subresource is enabled
+2. Without the subresource, we cannot use separate RBAC for spec vs status updates
+3. The OpenShift API test framework assumes status subresource exists for status update tests
+
+The status collector performs a two-step operation: create resource, then immediately update status.
+The brief window where status is empty is acceptable since the healthcheck controller handles missing status gracefully.
+
+### Pacemaker Resources
+
+A **pacemaker resource** is a unit of work managed by pacemaker. In pacemaker terminology, resources are services
+or applications that pacemaker monitors, starts, stops, and moves between nodes to maintain high availability.
+
+For Two Node OpenShift with Fencing, we manage three resource types:
+- **Kubelet**: The Kubernetes node agent and a prerequisite for etcd
+- **Etcd**: The distributed key-value store
+- **FencingAgent**: Used to isolate failed nodes during a quorum loss event (tracked separately)
+
+### Status Structure
+
+```yaml
+status:                    # Optional on creation, populated via status subresource
+  conditions:              # Required when status present (min 3 items)
+    - type: Healthy
+    - type: InService
+    - type: NodeCountAsExpected
+  lastUpdated: <timestamp> # Required when status present, cannot decrease
+  nodes:                   # Control-plane nodes (0-5, expects 2 for TNF)
+    - name: <hostname>     # RFC 1123 subdomain name
+      addresses:           # Required: List of node addresses (1-8 items)
+        - type: InternalIP # Currently only InternalIP is supported
+          address: <ip>    # First address used for etcd peer URLs
+      conditions:          # Required: Node-level conditions (min 9 items)
+        - type: Healthy
+        - type: Online
+        - type: InService
+        - type: Active
+        - type: Ready
+        - type: Clean
+        - type: Member
+        - type: FencingAvailable
+        - type: FencingHealthy
+      resources:           # Required: Pacemaker resources on this node (min 2)
+        - name: Kubelet    # Both Kubelet and Etcd must be present
+          conditions:      # Required: Resource-level conditions (min 8 items)
+            - type: Healthy
+            - type: InService
+            - type: Managed
+            - type: Enabled
+            - type: Operational
+            - type: Active
+            - type: Started
+            - type: Schedulable
+        - name: Etcd
+          conditions: [...]  # Same 8 conditions as Kubelet (abbreviated)
+      fencingAgents:       # Required: Fencing agents for THIS node (1-8)
+        - name: <nodename>_<method>  # e.g., "master-0_redfish"
+          method: <method>           # Fencing method: redfish, ipmi, fence_aws, etc.
+          conditions: [...]  # Same 8 conditions as resources (abbreviated)
-        - name: <nodename>_<method>  # e.g., "master-0_redfish"
-          method: <method>           # Fencing method: redfish, ipmi, fence_aws, etc.
-          conditions: [...]  # Same 8 conditions as resources (abbreviated)
+        - name: <nodename>_<method>  # e.g., "master-0_redfish"
+          method: redfish            # Currently only redfish is supported (aligns with CRD validation)
+          conditions: [...]  # Same 8 conditions as resources (abbreviated)
-        - name: <nodename>_<method>  # e.g., "master-0_redfish"
-          method: <method>           # Fencing method: redfish, ipmi, fence_aws, etc.
-          conditions: [...]  # Same 8 conditions as resources (abbreviated)
+        - name: <nodename>_<method>  # e.g., "master-0_redfish"
+          method: redfish            # Currently only redfish is supported (aligns with CRD validation)
+          conditions: [...]  # Same 8 conditions as resources (abbreviated)
+```
+
+### Fencing Agents
+
+Fencing agents are STONITH (Shoot The Other Node In The Head) devices used to isolate failed nodes.
+Unlike regular pacemaker resources (Kubelet, Etcd), fencing agents are tracked separately because:
+
+1. **Mapping by target, not schedule**: Resources are mapped to the node where they are scheduled to run.
+   Fencing agents are mapped to the node they can *fence* (their target), regardless of which node
+   their monitoring operations are scheduled on.
+
+2. **Multiple agents per node**: A node can have multiple fencing agents for redundancy
+   (e.g., both Redfish and IPMI). Expected: 1 per node, supported: up to 8.
+
+3. **Health tracking via two node-level conditions**:
+   - **FencingAvailable**: True if at least one agent is healthy (fencing works), False if all agents unhealthy (degrades operator)
+   - **FencingHealthy**: True if all agents are healthy (ideal state), False if any agent is unhealthy (emits warning events)
+
+### Cluster-Level Conditions
+
+| Condition | True | False |
+|-----------|------|-------|
+| `Healthy` | Cluster is healthy (`ClusterHealthy`) | Cluster has issues (`ClusterUnhealthy`) |
+| `InService` | In service (`InService`) | In maintenance (`InMaintenance`) |
+| `NodeCountAsExpected` | Node count is as expected (`AsExpected`) | Wrong count (`InsufficientNodes`, `ExcessiveNodes`) |
+
+### Node-Level Conditions
+
+| Condition | True | False |
+|-----------|------|-------|
+| `Healthy` | Node is healthy (`NodeHealthy`) | Node has issues (`NodeUnhealthy`) |
+| `Online` | Node is online (`Online`) | Node is offline (`Offline`) |
+| `InService` | In service (`InService`) | In maintenance (`InMaintenance`) |
+| `Active` | Node is active (`Active`) | Node is in standby (`Standby`) |
+| `Ready` | Node is ready (`Ready`) | Node is pending (`Pending`) |
+| `Clean` | Node is clean (`Clean`) | Node is unclean (`Unclean`) |
+| `Member` | Node is a member (`Member`) | Not a member (`NotMember`) |
+| `FencingAvailable` | At least one agent healthy (`FencingAvailable`) | All agents unhealthy (`FencingUnavailable`) - degrades operator |
+| `FencingHealthy` | All agents healthy (`FencingHealthy`) | Some agents unhealthy (`FencingUnhealthy`) - emits warnings |
+
+### Resource-Level Conditions
+
+Each resource in the `resources` array and each fencing agent in the `fencingAgents` array has its own conditions.
+
+| Condition | True | False |
+|-----------|------|-------|
+| `Healthy` | Resource is healthy (`ResourceHealthy`) | Resource has issues (`ResourceUnhealthy`) |
+| `InService` | In service (`InService`) | In maintenance (`InMaintenance`) |
+| `Managed` | Managed by pacemaker (`Managed`) | Not managed (`Unmanaged`) |
+| `Enabled` | Resource is enabled (`Enabled`) | Resource is disabled (`Disabled`) |
+| `Operational` | Resource is operational (`Operational`) | Resource has failed (`Failed`) |
+| `Active` | Resource is active (`Active`) | Resource is not active (`Inactive`) |
+| `Started` | Resource is started (`Started`) | Resource is stopped (`Stopped`) |
+| `Schedulable` | Resource is schedulable (`Schedulable`) | Resource is not schedulable (`Unschedulable`) |
+
+### Validation Rules
+
+**Resource naming:**
+- Resource name must be "cluster" (singleton)
+
+**Node name validation:**
+- Must be a lowercase RFC 1123 subdomain name
+- Consists of lowercase alphanumeric characters, '-' or '.'
+- Must start and end with an alphanumeric character
+- Maximum 253 characters
+
+**Node addresses:**
+- Uses `PacemakerNodeAddress` type (similar to `corev1.NodeAddress` but with IP validation)
+- Currently only `InternalIP` type is supported
+- Pacemaker allows multiple addresses for Corosync communication between nodes (1-8 addresses)
+- The first address in the list is used for IP-based peer URLs for etcd membership
+- IP validation:
+  - Must be a valid global unicast IPv4 or IPv6 address
+  - Must be in canonical form (e.g., `192.168.1.1` not `192.168.001.001`, or `2001:db8::1` not `2001:0db8::1`)
+  - Excludes loopback, link-local, and multicast addresses
+  - Maximum length is 39 characters (full IPv6 address)
+
+**Timestamp validation:**
+- `lastUpdated` is required when status is present
+- Once set, cannot be set to an earlier timestamp (validation uses `!has(oldSelf.lastUpdated)` to handle initial creation)
+- Timestamps must always increase (prevents stale updates from overwriting newer data)
+
+**Status fields:**
+- `status` - Optional on creation (pointer type), populated via status subresource
+- When status is present, all fields within are required:
+  - `conditions` - Required array of cluster conditions (min 3 items)
+  - `lastUpdated` - Required timestamp for staleness detection
+  - `nodes` - Required array of control-plane node statuses (min 0, max 5; empty allowed for catastrophic failures)
+
+**Node fields (when node present):**
+- `name` - Required, RFC 1123 subdomain
+- `addresses` - Required (min 1, max 8 items)
+- `conditions` - Required (min 9 items with specific types enforced via XValidation)
+- `resources` - Required (min 2 items: Kubelet and Etcd)
+- `fencingAgents` - Required (min 1, max 8 items)
+
+**Conditions validation:**
+- Cluster-level: MinItems=3 (Healthy, InService, NodeCountAsExpected)
+- Node-level: MinItems=9 (Healthy, Online, InService, Active, Ready, Clean, Member, FencingAvailable, FencingHealthy)
+- Resource-level: MinItems=8 (Healthy, InService, Managed, Enabled, Operational, Active, Started, Schedulable)
+- Fencing agent-level: MinItems=8 (same conditions as resources)
+
+All condition arrays have XValidation rules to ensure specific condition types are present.
+
+**Resource names:**
+- Valid values are: `Kubelet`, `Etcd`
+- Both resources must be present in each node's `resources` array
+
+**Fencing agent fields:**
+- `name`: The pacemaker resource name (e.g., "master-0_redfish"), max 253 characters
+- `method`: The fencing method (e.g., "redfish", "ipmi", "fence_aws"), max 63 characters
+- `conditions`: Required, same 8 conditions as resources
+
+### Usage
+
+The cluster-etcd-operator healthcheck controller watches this resource and updates operator conditions based on
+the cluster state. The aggregate `Healthy` conditions at each level (cluster, node, resource) provide a quick
+way to determine overall health.
diff --git a/etcd/install.go b/etcd/install.go
@@ -0,0 +1,26 @@
+package etcd
+
+import (
+	"k8s.io/apimachinery/pkg/runtime"
+	"k8s.io/apimachinery/pkg/runtime/schema"
+
+	v1alpha1 "github.com/openshift/api/etcd/v1alpha1"
+)
+
+const (
+	GroupName = "etcd.openshift.io"
+)
+
+var (
+	schemeBuilder = runtime.NewSchemeBuilder(v1alpha1.Install)
+	// Install is a function which adds every version of this group to a scheme
+	Install = schemeBuilder.AddToScheme
+)
+
+func Resource(resource string) schema.GroupResource {
+	return schema.GroupResource{Group: GroupName, Resource: resource}
+}
+
+func Kind(kind string) schema.GroupKind {
+	return schema.GroupKind{Group: GroupName, Kind: kind}
+}
diff --git a/etcd/v1alpha1/Makefile b/etcd/v1alpha1/Makefile
@@ -0,0 +1,3 @@
+.PHONY: test
+test:
+	make -C ../../tests test GINKGO_EXTRA_ARGS=--focus="etcd.openshift.io/v1alpha1"
-.PHONY: test
-test:
-	make -C ../../tests test GINKGO_EXTRA_ARGS=--focus="etcd.openshift.io/v1alpha1"
+.PHONY: all clean test
+all: test
+clean:
+	`@true`
+test:
+	make -C ../../tests test GINKGO_EXTRA_ARGS=--focus="etcd.openshift.io/v1alpha1"
-.PHONY: test
-test:
-	make -C ../../tests test GINKGO_EXTRA_ARGS=--focus="etcd.openshift.io/v1alpha1"
+.PHONY: all clean test
+all: test
+clean:
+	`@true`
+test:
+	make -C ../../tests test GINKGO_EXTRA_ARGS=--focus="etcd.openshift.io/v1alpha1"
diff --git a/etcd/v1alpha1/doc.go b/etcd/v1alpha1/doc.go
@@ -0,0 +1,6 @@
+// +k8s:deepcopy-gen=package,register
+// +k8s:defaulter-gen=TypeMeta
+// +k8s:openapi-gen=true
+// +openshift:featuregated-schema-gen=true
+// +groupName=etcd.openshift.io
+package v1alpha1
diff --git a/etcd/v1alpha1/register.go b/etcd/v1alpha1/register.go
@@ -0,0 +1,39 @@
+package v1alpha1
+
+import (
+	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
+	"k8s.io/apimachinery/pkg/runtime"
+	"k8s.io/apimachinery/pkg/runtime/schema"
+)
+
+var (
+	GroupName     = "etcd.openshift.io"
+	GroupVersion  = schema.GroupVersion{Group: GroupName, Version: "v1alpha1"}
+	schemeBuilder = runtime.NewSchemeBuilder(addKnownTypes)
+	// Install is a function which adds this version to a scheme
+	Install = schemeBuilder.AddToScheme
+
+	// SchemeGroupVersion generated code relies on this name
+	// Deprecated
+	SchemeGroupVersion = GroupVersion
+	// AddToScheme exists solely to keep the old generators creating valid code
+	// DEPRECATED
+	AddToScheme = schemeBuilder.AddToScheme
+)
+
+// Resource generated code relies on this being here, but it logically belongs to the group
+// DEPRECATED
+func Resource(resource string) schema.GroupResource {
+	return schema.GroupResource{Group: GroupName, Resource: resource}
+}
+
+func addKnownTypes(scheme *runtime.Scheme) error {
+	metav1.AddToGroupVersion(scheme, GroupVersion)
+
+	scheme.AddKnownTypes(GroupVersion,
+		&PacemakerCluster{},
+		&PacemakerClusterList{},
+	)
+
+	return nil
+}