diff --git a/.tide/designs/blackbox-endpoint-monitoring.md b/.tide/designs/blackbox-endpoint-monitoring.md new file mode 100644 index 0000000..1640761 --- /dev/null +++ b/.tide/designs/blackbox-endpoint-monitoring.md @@ -0,0 +1,339 @@ +# Blackbox Endpoint Monitoring + +**Status:** Draft +**Date:** 2026-04-12 +**Scope:** Platform -- Monitoring, Alerting, PagerDuty + +--- + +## Problem + +`grafana.prod.platform.sei.io` went unreachable due to a DNS delegation issue. We had no alert for it. Our monitoring stack (kube-prometheus-stack) watches internal service health -- it does not probe the public endpoints that users actually hit. DNS resolution failures, TLS expiry, and HTTP routing misconfigurations are invisible until someone notices manually. + +## Solution + +Deploy `prometheus-blackbox-exporter` as a separate HelmRelease in the monitoring namespace. Use the Prometheus Operator `Probe` CRD to define target endpoints, and a `PrometheusRule` to alert when probes fail. Routes to `pagerduty-platform` via the existing `team: platform` label matcher. + +--- + +## Architecture + +``` +Prometheus (kube-prometheus-stack) + | + | scrapes Probe targets via blackbox exporter + | +blackbox-exporter (monitoring namespace, port 9115) + | + | HTTP requests to public endpoints + | +sei-gateway (NLB, TLS termination) + | + +-- grafana.prod.platform.sei.io --> Grafana + +-- grafana.pacific-1.seinetwork.io --> 301 redirect + +-- grafana.atlantic-2.seinetwork.io --> 301 redirect + +-- grafana.arctic-1.seinetwork.io --> 301 redirect +``` + +The blackbox exporter is a stateless HTTP probe runner. Prometheus scrapes it by passing the target URL as a query parameter. The `Probe` CRD automates this -- prometheus-operator translates `Probe` resources into the correct scrape config targeting the blackbox exporter's `/probe` endpoint. + +No new Prometheus instance. No new ServiceMonitor plumbing. The existing kube-prometheus-stack Prometheus discovers `Probe` and `PrometheusRule` resources via label selectors already configured (it uses `release: sei-prod` or equivalent; the new resources carry matching labels). + +--- + +## Probe Configuration + +### Blackbox Exporter Modules + +Configured in the HelmRelease values. Two modules are sufficient: + +| Module | Purpose | Config | +|--------|---------|--------| +| `http_2xx` | Validates endpoint returns 200, TLS is valid, DNS resolves | `prober: http`, `method: GET`, `fail_if_ssl: false`, `preferred_ip_protocol: ip4` | +| `http_301` | Validates legacy domains redirect (301/302) | `prober: http`, `method: GET`, `valid_status_codes: [301, 302]`, `no_follow_redirects: true` | + +The `http_2xx` module performs the full chain: DNS resolution, TCP connect, TLS handshake, HTTP request, status code validation. The `probe_dns_lookup_time_seconds`, `probe_ssl_earliest_cert_expiry`, and `probe_success` metrics come automatically. + +### Probe Resources + +**Primary endpoint probe:** + +```yaml +apiVersion: monitoring.coreos.com/v1 +kind: Probe +metadata: + name: grafana-prod + namespace: monitoring + labels: + release: sei-prod +spec: + interval: 60s + module: http_2xx + prober: + url: blackbox-exporter-prometheus-blackbox-exporter.monitoring:9115 + targets: + staticConfig: + static: + - https://grafana.prod.platform.sei.io + labels: + probe_group: platform-endpoints +``` + +**Legacy redirect probe:** + +```yaml +apiVersion: monitoring.coreos.com/v1 +kind: Probe +metadata: + name: grafana-legacy-redirects + namespace: monitoring + labels: + release: sei-prod +spec: + interval: 120s + module: http_301 + prober: + url: blackbox-exporter-prometheus-blackbox-exporter.monitoring:9115 + targets: + staticConfig: + static: + - https://grafana.pacific-1.seinetwork.io + - https://grafana.atlantic-2.seinetwork.io + - https://grafana.arctic-1.seinetwork.io + labels: + probe_group: legacy-redirects +``` + +### Scrape Intervals + +- Primary endpoints: 60s. Frequent enough to catch issues within 2-3 minutes (with a `for: 2m` alert), infrequent enough to not be noisy. +- Legacy redirects: 120s. These are lower priority -- a broken redirect is annoying but not an outage. + +--- + +## Alert Rules + +```yaml +apiVersion: monitoring.coreos.com/v1 +kind: PrometheusRule +metadata: + name: blackbox-endpoint-alerts + namespace: monitoring + labels: + release: sei-prod + team: platform +spec: + groups: + - name: blackbox-endpoints + rules: + - alert: EndpointDown + expr: probe_success{probe_group="platform-endpoints"} == 0 + for: 2m + labels: + severity: critical + team: platform + annotations: + summary: "{{ $labels.instance }} is unreachable" + description: >- + Blackbox probe to {{ $labels.instance }} has been failing for 2 minutes. + This could indicate DNS, TLS, or HTTP routing failure. + runbook_url: https://wiki.sei.io/platform/runbooks/endpoint-down + + - alert: EndpointSSLCertExpiringSoon + expr: probe_ssl_earliest_cert_expiry{probe_group="platform-endpoints"} - time() < 7 * 24 * 3600 + for: 10m + labels: + severity: warning + team: platform + annotations: + summary: "TLS cert for {{ $labels.instance }} expires in {{ $value | humanizeDuration }}" + description: >- + The TLS certificate for {{ $labels.instance }} expires within 7 days. + cert-manager should have renewed it automatically -- investigate why it did not. + + - alert: EndpointHighLatency + expr: probe_duration_seconds{probe_group="platform-endpoints"} > 5 + for: 5m + labels: + severity: warning + team: platform + annotations: + summary: "{{ $labels.instance }} probe latency > 5s" + description: >- + Blackbox probe to {{ $labels.instance }} is taking over 5 seconds. + Check DNS resolution time (probe_dns_lookup_time_seconds) and + HTTP response time for the slow component. + + - alert: LegacyRedirectDown + expr: probe_success{probe_group="legacy-redirects"} == 0 + for: 5m + labels: + severity: warning + team: platform + annotations: + summary: "Legacy redirect {{ $labels.instance }} is failing" + description: >- + Legacy Grafana redirect at {{ $labels.instance }} has been failing for 5 minutes. + Expected a 301/302 response. +``` + +### Alert Routing + +All rules carry `team: platform`. The existing Alertmanager config matches `team: platform` to the `pagerduty-platform` receiver. No Alertmanager changes needed. + +Severity mapping: +- `critical` (EndpointDown) -- pages on-call immediately via PagerDuty +- `warning` (SSL expiry, latency, legacy redirects) -- PagerDuty low-urgency or Slack, depending on existing routing rules + +--- + +## File Layout + +All files live in the platform repo. Base manifests with kustomize overlays per environment. + +``` +clusters/prod/monitoring/ + kustomization.yaml # add blackbox-exporter.yaml, probes, alerts + blackbox-exporter.yaml # HelmRelease + probe-grafana-prod.yaml # Probe CRD for primary endpoint + probe-grafana-legacy-redirects.yaml # Probe CRD for legacy redirects + prometheusrule-blackbox.yaml # PrometheusRule with alert definitions + +clusters/dev/monitoring/ + kustomization.yaml # add blackbox-exporter.yaml (no probes -- see below) + blackbox-exporter.yaml # HelmRelease (same chart, minimal config) +``` + +### Why No Base + +There are only 4 files and the probe targets differ between environments. A `manifests/base/monitoring/blackbox/` with kustomize patches would add indirection without reducing duplication. If we add a third environment or probe targets grow significantly, extract a base then. + +### Dev Environment + +Dev does not expose Grafana publicly -- it is accessed via port-forward. There are no public endpoints to probe. The blackbox exporter HelmRelease is deployed in dev for consistency (so the chart upgrade path is tested in dev before prod), but no `Probe` or `PrometheusRule` resources are created. When dev gains public endpoints, add probes then. + +--- + +## Deployment + +### HelmRelease + +```yaml +apiVersion: helm.toolkit.fluxcd.io/v2 +kind: HelmRelease +metadata: + name: blackbox-exporter + namespace: monitoring +spec: + interval: 1h + chart: + spec: + chart: prometheus-blackbox-exporter + version: "9.x" # pin to latest 9.x + sourceRef: + kind: HelmRepository + name: prometheus-community + namespace: flux-system + values: + config: + modules: + http_2xx: + prober: http + timeout: 10s + http: + method: GET + preferred_ip_protocol: ip4 + follow_redirects: true + fail_if_ssl: false + http_301: + prober: http + timeout: 10s + http: + method: GET + preferred_ip_protocol: ip4 + no_follow_redirects: true + valid_status_codes: + - 301 + - 302 + replicas: 1 + resources: + requests: + cpu: 10m + memory: 32Mi + limits: + memory: 64Mi + serviceMonitor: + enabled: true + labels: + release: sei-prod +``` + +### Flux Reconciliation + +No ordering dependencies. The blackbox exporter HelmRelease, Probe CRDs, and PrometheusRule can all be applied in any order: + +- The HelmRelease deploys the exporter pod and service. +- The Probe CRDs are picked up by the existing Prometheus instance on its next reconciliation cycle (typically within 60s). +- If the exporter pod is not yet running when Prometheus first scrapes, the probe returns `probe_success=0`, which is the correct initial state. The `for: 2m` duration on the alert prevents a false page during rollout. + +Add all four files to `clusters/prod/monitoring/kustomization.yaml`: + +```yaml +resources: + # ... existing resources + - blackbox-exporter.yaml + - probe-grafana-prod.yaml + - probe-grafana-legacy-redirects.yaml + - prometheusrule-blackbox.yaml +``` + +--- + +## Limitations + +### In-Cluster DNS Resolution (Critical Tradeoff) + +The blackbox exporter runs inside the cluster. Its DNS queries go through CoreDNS, which forwards external names to upstream resolvers (typically the VPC DNS resolver in AWS). This means: + +**What it catches:** +- TLS certificate expiry or misconfiguration +- HTTP routing errors (wrong backend, 502/503 from Gateway) +- Application-level failures (Grafana itself is down) +- DNS record deletion or misconfiguration (CNAME/A record pointing to wrong target) +- Gateway or NLB failures + +**What it does NOT catch:** +- DNS delegation failures between public resolvers and Route53 (the exact incident that prompted this work). The VPC resolver can resolve Route53-hosted zones directly without traversing the public delegation chain. +- Public DNS propagation delays +- ISP-level DNS issues + +This is a real gap. The incident that motivated this design -- a DNS delegation issue -- would likely not have been caught by an in-cluster probe. + +### Mitigation Options + +1. **Accept the limitation for now.** The in-cluster probe still catches 4 out of 5 failure modes (TLS, routing, application, NLB). DNS delegation changes are rare events (this was the first one). Ship this, then evaluate whether an external probe is worth the operational cost. + +2. **Force public DNS resolution.** Configure the blackbox exporter to use a public DNS resolver (e.g., `8.8.8.8`) instead of CoreDNS. This is possible via a custom `resolv.conf` on the pod (`dnsPolicy: None`, `dnsConfig.nameservers: [8.8.8.8]`). However, this breaks resolution for any in-cluster targets we might add later, and it adds an external dependency (Google DNS) to our monitoring path. + +3. **External synthetic monitoring.** Use a SaaS probe (Datadog Synthetic, Uptime Robot, AWS Route53 Health Checks, Grafana Cloud Synthetic Monitoring). These run from outside the cluster and exercise the full public DNS chain. This is the correct long-term answer but adds a vendor dependency and is out of scope for this design. + +**Recommendation:** Option 1 (ship it with the known limitation) plus a Route53 Health Check for `grafana.prod.platform.sei.io` as a cheap external signal. Route53 Health Checks cost ~$0.75/month per endpoint, run from multiple AWS regions, and can trigger CloudWatch alarms that feed into PagerDuty. This gives us external DNS validation without a full synthetic monitoring vendor. The Route53 Health Check is a separate task -- it lives in Terraform, not in the platform repo's monitoring stack. + +### Other Limitations + +- **Probing from a single location.** The exporter runs in one cluster in one region. It cannot detect region-specific routing issues. Acceptable for our single-cluster setup. +- **No content validation.** The probe checks HTTP status codes, not response body content. A Grafana login page returning 200 with an error message would not trigger an alert. Blackbox exporter supports `fail_if_body_not_matches_regexp`, but this is fragile and not worth the maintenance burden for a login page. +- **No WebSocket probing.** The blackbox exporter does not support WebSocket health checks. If we add EVM WebSocket endpoints to the probe list later, we would need a TCP probe (which validates connectivity but not protocol correctness). + +--- + +## Future Expansion + +When `platform.sei.io` public RPC endpoints go live (per the public DNS design), add probes for: + +``` +https://pacific-1-rpc-rpc.pacific-1.platform.sei.io/status +https://pacific-1-rpc-evm.pacific-1.platform.sei.io (POST with eth_blockNumber) +``` + +These would use a new `http_2xx_post` module for JSON-RPC endpoints. Keep them in a separate `Probe` resource with a `probe_group: rpc-endpoints` label for distinct alerting rules (these are protocol-team endpoints, not platform-team). diff --git a/.tide/designs/inplace-update-strategy.md b/.tide/designs/inplace-update-strategy.md index fae5af8..0073e04 100644 --- a/.tide/designs/inplace-update-strategy.md +++ b/.tide/designs/inplace-update-strategy.md @@ -2,11 +2,11 @@ ## Summary -The InPlace update strategy is a lightweight, operator-driven deployment mode that propagates spec changes (image, sidecar image) directly to existing SeiNode resources without creating entrant nodes or performing blue-green traffic switching. The controller's role is confined to change propagation, health monitoring, and status reporting. +The InPlace update strategy is a lightweight, operator-driven deployment mode that propagates spec changes (image, sidecar image) directly to existing SeiNode resources without creating entrant nodes or performing blue-green traffic switching. Like BlueGreen and HardFork, InPlace uses the deployment plan/task machinery — but with a minimal plan that approves readiness and monitors convergence. This design also formalizes `updateStrategy` as a required field on SeiNodeDeployment. The previous implicit nil path (fire-and-forget in-place updates with no tracking) is removed -- all deployments must declare an explicit strategy: `InPlace`, `BlueGreen`, or `HardFork`. -The critical technical challenge is the sidecar mark-ready gate: when a pod restarts after an image update, the sidecar starts fresh and returns 503 from `/v0/healthz`, blocking seid startup indefinitely. This design solves that by having the SeiNode controller's Running phase reconciler submit `mark-ready` unconditionally on every reconcile. +The critical technical challenge is the sidecar mark-ready gate: when a pod restarts after an image update, the sidecar starts fresh and returns 503 from `/v0/healthz`, blocking seid startup indefinitely. This design solves that via a signal-and-react pattern: the SeiNodeDeployment's InPlace plan writes a readiness approval onto each SeiNode, and the SeiNode controller reacts by submitting `mark-ready` to its own sidecar. This preserves the single-writer invariant — only the SeiNode controller talks to the sidecar. ## Motivation @@ -62,17 +62,26 @@ This pattern extends naturally to BlueGreen and HardFork. Today those strategies 1. **Engineer updates manifests.** The engineer changes `spec.template.spec.image` (or sidecar image) on the SeiNodeDeployment and applies the change (via GitOps push, `kubectl apply`, etc.). The engineer is responsible for timing -- the controller does not validate block height. -2. **templateHash diverges; condition set.** The SeiNodeDeployment controller's `reconcileSeiNodes` detects that the current `templateHash` differs from `status.templateHash`. With `updateStrategy.type == InPlace`, it sets the `RolloutInProgress` condition and writes a `RolloutStatus` to `status.rollout`. +2. **templateHash diverges; condition set; plan created.** The SeiNodeDeployment controller's `reconcileSeiNodes` detects that the current `templateHash` differs from `status.templateHash`. With `updateStrategy.type == InPlace`, it sets the `RolloutInProgress` condition, writes a `RolloutStatus` to `status.rollout`, and generates an InPlace deployment plan. -3. **ensureSeiNode propagates changes.** The `ensureSeiNode` loop patches each child SeiNode's image. All nodes are updated simultaneously -- chain upgrades are coordinated halts where sequential rollout provides no safety benefit. +3. **Plan step: UpdateNodeSpecs.** The plan's first task patches each child SeiNode's image via the kube client. All nodes are updated simultaneously -- chain upgrades are coordinated halts where sequential rollout provides no safety benefit. The SeiNode controller converges StatefulSets in its `reconcileRunning` via SSA. Kubernetes detects the pod template change and terminates the old pod, scheduling a new one with the updated image. -4. **SeiNode controller converges StatefulSets.** Each SeiNode's `reconcileRunning` calls `reconcileNodeStatefulSet`, which applies the updated StatefulSet spec via SSA. Kubernetes detects the pod template change and terminates the old pod, scheduling a new one with the updated image. +4. **Plan step: MarkReady.** The sidecar starts fresh on the new pod and returns 503 from `/v0/healthz`, blocking seid. The `MarkReady` task polls each node's sidecar via `sidecarClientForNode` (the same pattern used by `awaitNodesCaughtUpExecution`). Once the sidecar is reachable, it submits `mark-ready`. Once the sidecar reports `Ready`, the task completes. The sidecar flips `/v0/healthz` to 200, and seid starts via the wait wrapper. -5. **Sidecar restarts fresh; controller submits mark-ready.** The new pod's sidecar starts clean. The `reconcileRunning` method submits `mark-ready` unconditionally (it is idempotent). The sidecar flips to ready, `/v0/healthz` returns 200, and seid starts via the wait wrapper. +5. **Plan completes.** The plan is marked complete, the controller clears `RolloutInProgress`, clears `status.rollout`, and updates `status.templateHash`. -6. **SeiNodeDeployment controller monitors convergence.** On each reconcile, the controller checks the rollout: for each node, it reads the child SeiNode's phase and pod readiness. When all nodes are Running with ready pods, the rollout is complete. The controller clears `RolloutInProgress`, clears `status.rollout`, and updates `status.templateHash`. +6. **Failure detection.** If a node's pod enters CrashLoopBackOff or the sidecar never becomes reachable, the plan step stays in `ExecutionRunning`. The rollout status shows per-node state. The controller does NOT auto-rollback -- blockchain rollback after a chain upgrade would leave the node unable to process new blocks. The engineer inspects the status and decides: push a fix, revert the image, or investigate. -7. **Failure detection.** If a node's pod enters CrashLoopBackOff or the SeiNode transitions to Failed, the rollout status reflects this per-node. The controller does NOT auto-rollback -- blockchain rollback after a chain upgrade would leave the node unable to process new blocks. The engineer inspects the status and decides: push a fix, revert the image, or investigate. +### InPlace Plan + +``` +UpdateNodeSpecs → MarkReady +``` + +| Task | Executor | Description | +|------|----------|-------------| +| `UpdateNodeSpecs` | SeiNodeDeployment | Patches each child SeiNode's image via kube client. The SeiNode controller converges StatefulSets. | +| `MarkReady` | SeiNodeDeployment | Polls each node's sidecar via `sidecarClientForNode`. Once reachable, submits `mark-ready`. Completes when all sidecars report `Ready`. | ## CRD Changes @@ -203,15 +212,25 @@ The existing `DeploymentStatus` type and `Deployment` field are removed. BlueGre ### Conditions -New condition type: +New condition types: ```go +// SeiNodeDeployment conditions: const ( // ConditionRolloutInProgress indicates a rollout is active. // Set when a templateHash divergence is detected. Cleared when // all nodes converge or the rollout is superseded. ConditionRolloutInProgress = "RolloutInProgress" ) + +// SeiNode conditions: +const ( + // ConditionReadinessApproved is set by the SeiNodeDeployment + // controller to signal that the SeiNode controller should submit + // mark-ready to the sidecar. Cleared by the SeiNode controller + // after successful submission. + ConditionReadinessApproved = "ReadinessApproved" +) ``` ## Controller Changes @@ -261,7 +280,72 @@ func (r *SeiNodeDeploymentReconciler) detectDeploymentNeeded(group *seiv1alpha1. ### `ensureSeiNode` (nodedeployment/nodes.go) -No changes needed. The existing in-place propagation already handles image and sidecar updates. When `RolloutInProgress` is true and the strategy is InPlace, `ensureSeiNode` runs normally (no plan blocks it). For BlueGreen/HardFork, the existing plan machinery takes over. +No changes needed to the core logic. The existing in-place propagation already handles image and sidecar updates. The InPlace plan's `UpdateNodeSpecs` task calls `ensureSeiNode` and additionally sets the `ReadinessApproved` condition on each child SeiNode. + +### InPlace deployment planner (planner/deployment.go) + +Add an `inPlaceDeploymentPlanner` that generates the minimal two-task plan: + +```go +func (p *inPlaceDeploymentPlanner) BuildPlan( + group *seiv1alpha1.SeiNodeDeployment, +) (*seiv1alpha1.TaskPlan, error) { + planID := uuid.New().String() + nodeNames := group.Status.IncumbentNodes + ns := group.Namespace + + prog := []struct { + taskType string + params any + }{ + {task.TaskTypeUpdateNodeSpecs, &task.UpdateNodeSpecsParams{ + GroupName: group.Name, + Namespace: ns, + NodeNames: nodeNames, + }}, + {task.TaskTypeAwaitNodesRunning, &task.AwaitNodesRunningParams{ + GroupName: group.Name, + Namespace: ns, + Expected: len(nodeNames), + NodeNames: nodeNames, + }}, + } + + tasks := make([]seiv1alpha1.PlannedTask, len(prog)) + for i, p := range prog { + t, err := buildPlannedTask(planID, p.taskType, i, p.params) + if err != nil { + return nil, err + } + tasks[i] = t + } + return &seiv1alpha1.TaskPlan{ID: planID, Phase: seiv1alpha1.TaskPlanActive, Tasks: tasks}, nil +} +``` + +### UpdateNodeSpecs task (task/deployment_update.go) + +New task type. Patches each child SeiNode's image via the existing `ensureSeiNode` mechanism and sets the `ReadinessApproved` condition: + +```go +const TaskTypeUpdateNodeSpecs = "update-node-specs" + +type UpdateNodeSpecsParams struct { + GroupName string `json:"groupName"` + Namespace string `json:"namespace"` + NodeNames []string `json:"nodeNames"` +} +``` + +The task execution: +1. Lists child SeiNodes by name +2. For each node, updates spec fields from the parent deployment template (image, sidecar image) +3. Sets the `ReadinessApproved` condition on the SeiNode's status +4. Completes synchronously + +### SeiNode controller changes (node/controller.go) + +The `reconcileRunning` method gains a `shouldMarkReady` check that reacts to the `ReadinessApproved` condition. See the code in the Sidecar Mark-Ready Resolution section above. ### Rollout status reconciliation (nodedeployment/status.go) @@ -337,11 +421,17 @@ The sidecar starts fresh on every pod restart. Its `/v0/healthz` endpoint return After an in-place image update, the StatefulSet rolls the pod. The new sidecar starts, binds its port, and returns 503 from `/v0/healthz`. The seid container's wait wrapper polls `/v0/healthz` and blocks forever. -### Solution: Controller Re-submits Mark-Ready +### Solution: Signal-and-React via ReadinessApproved -The `reconcileRunning` method submits `mark-ready` unconditionally on every reconcile when the sidecar is reachable. The `mark-ready` task is fire-and-forget and idempotent -- submitting it to an already-ready sidecar is a no-op. +The solution preserves a clean separation of concerns: + +- **SeiNodeDeployment controller** (orchestrator) writes intent onto child SeiNodes via a `ReadinessApproved` condition as part of the InPlace deployment plan. +- **SeiNode controller** (executor) observes the condition, submits `mark-ready` to the sidecar through its existing client, and clears the condition. + +This maintains the **single-writer invariant**: only the SeiNode controller holds a sidecar client and submits tasks. The SeiNodeDeployment controller never talks to the sidecar directly. The Kubernetes resource (SeiNode) is the communication channel between the two controllers. ```go +// In reconcileRunning (SeiNode controller): func (r *SeiNodeReconciler) reconcileRunning(ctx context.Context, node *seiv1alpha1.SeiNode) (ctrl.Result, error) { if err := r.reconcileNodeStatefulSet(ctx, node); err != nil { return ctrl.Result{}, fmt.Errorf("reconciling statefulset: %w", err) @@ -357,24 +447,46 @@ func (r *SeiNodeReconciler) reconcileRunning(ctx context.Context, node *seiv1alp return ctrl.Result{RequeueAfter: statusPollInterval}, nil } - r.ensureMarkReady(ctx, node, sc) + // React to readiness approval from the deployment controller, + // or self-approve if standalone (no owner reference). + if r.shouldMarkReady(node) { + r.submitMarkReady(ctx, node, sc) + } return r.reconcileRuntimeTasks(ctx, node, sc) } -func (r *SeiNodeReconciler) ensureMarkReady(ctx context.Context, node *seiv1alpha1.SeiNode, sc task.SidecarClient) { +func (r *SeiNodeReconciler) shouldMarkReady(node *seiv1alpha1.SeiNode) bool { + if hasCondition(node, seiv1alpha1.ConditionReadinessApproved) { + return true + } + // Standalone fallback: no parent deployment, self-approve + if !hasOwnerOfKind(node, "SeiNodeDeployment") { + return true + } + return false +} + +func (r *SeiNodeReconciler) submitMarkReady(ctx context.Context, node *seiv1alpha1.SeiNode, sc task.SidecarClient) { req := sidecar.TaskRequest{Type: sidecar.TaskTypeMarkReady} if _, err := sc.SubmitTask(ctx, req); err != nil { log.FromContext(ctx).V(1).Info("mark-ready submission failed", "error", err) + return + } + // Clear the condition after successful submission + removeCondition(node, seiv1alpha1.ConditionReadinessApproved) + if err := r.Status().Update(ctx, node); err != nil { + log.FromContext(ctx).Info("failed to clear ReadinessApproved", "error", err) } } ``` This approach: -- Requires no sidecar changes -- Solves the problem for ALL strategies, not just InPlace -- Keeps the sidecar stateless by design -- Is safe because mark-ready is idempotent +- Keeps sidecar interaction exclusively in the SeiNode controller +- Uses Kubernetes resources as the communication channel (idiomatic) +- Provides exactly-once mark-ready via the plan (not fire-and-forget on every reconcile) +- Handles standalone SeiNodes via owner reference check +- Aligns with Cluster API (Cluster orchestrates, Machine executes) and Crossplane (Composite drives, Managed actuates) patterns ### Open Question: Block Height Sourcing @@ -474,14 +586,16 @@ If the operator changes the image again while a rollout is active, `RolloutInPro | File | Change | |------|--------| | `api/v1alpha1/seinodedeployment_types.go` | Add `UpdateStrategyInPlace` to enum. Replace `DeploymentStatus` with unified `RolloutStatus`. Make `UpdateStrategy` required. Add `ConditionRolloutInProgress`. Remove `Deployment` field, add `Rollout` field. | +| `api/v1alpha1/seinode_types.go` | Add `ConditionReadinessApproved` constant. | | `api/v1alpha1/zz_generated.deepcopy.go` | Regenerated | | `manifests/crd/bases/sei.io_seinodedeployments.yaml` | Regenerated | | `internal/controller/nodedeployment/nodes.go` | `detectDeploymentNeeded`: set `RolloutInProgress` condition, create unified `RolloutStatus`. Migrate BlueGreen/HardFork to use `RolloutStatus`. | | `internal/controller/nodedeployment/status.go` | Add `reconcileRolloutStatus`. Extend `computeGroupPhase` for `RolloutInProgress` condition. | | `internal/controller/nodedeployment/plan.go` | Read `RolloutStatus` instead of `DeploymentStatus` for BlueGreen/HardFork plan generation. | -| `internal/planner/deployment.go` | Read entrant/incumbent from `RolloutStatus` instead of `DeploymentStatus`. | -| `internal/controller/node/controller.go` | `reconcileRunning`: add `ensureMarkReady` call before `reconcileRuntimeTasks`. | -| `internal/controller/node/plan_execution.go` | Add `ensureMarkReady` method. | +| `internal/planner/deployment.go` | Add `inPlaceDeploymentPlanner` with `UpdateNodeSpecs → AwaitRunning` plan. Read entrant/incumbent from `RolloutStatus` instead of `DeploymentStatus`. | +| `internal/task/deployment_update.go` | New file. `UpdateNodeSpecs` task: patches child SeiNode specs and sets `ReadinessApproved` condition. | +| `internal/task/deployment.go` | Register `TaskTypeUpdateNodeSpecs`. | +| `internal/controller/node/controller.go` | `reconcileRunning`: add `shouldMarkReady` / `submitMarkReady` for signal-and-react pattern. | ## Test Plan @@ -493,20 +607,26 @@ If the operator changes the image again while a rollout is active, `RolloutInPro | `TestDetectDeploymentNeeded_InPlace_AlreadyActive` | `nodes_test.go` | `RolloutInProgress=True` prevents duplicate detection | | `TestDetectDeploymentNeeded_BlueGreen_MigratedToRollout` | `nodes_test.go` | BlueGreen creates `RolloutStatus` with entrant/incumbent fields | | `TestBuildRolloutNodes` | `nodes_test.go` | Creates entries for each incumbent node | +| `TestInPlacePlan_TwoTasks` | `deployment_test.go` | InPlace plan has exactly: UpdateNodeSpecs, AwaitRunning | +| `TestUpdateNodeSpecs_SetsReadinessApproved` | `deployment_update_test.go` | Task patches SeiNode image and sets `ReadinessApproved` condition | | `TestReconcileRolloutStatus_AllReady` | `status_test.go` | Rollout cleared, templateHash updated, `RolloutInProgress` set to False | | `TestReconcileRolloutStatus_Partial` | `status_test.go` | Rollout persists with mixed ready/not-ready | | `TestReconcileRolloutStatus_WithFailedNode` | `status_test.go` | Shows `ready: false, phase: Failed` | | `TestComputeGroupPhase_RolloutInProgress` | `status_test.go` | Returns `Upgrading` when `RolloutInProgress` is True | -| `TestEnsureMarkReady` | `reconciler_test.go` | Submits mark-ready task via sidecar client | -| `TestReconcileRunning_SubmitsMarkReady` | `reconciler_test.go` | mark-ready called before runtime tasks | +| `TestShouldMarkReady_WithApproval` | `reconciler_test.go` | Returns true when `ReadinessApproved` condition is present | +| `TestShouldMarkReady_Standalone` | `reconciler_test.go` | Returns true when no SeiNodeDeployment owner reference | +| `TestShouldMarkReady_ManagedNoApproval` | `reconciler_test.go` | Returns false when owned by SeiNodeDeployment but no approval | +| `TestSubmitMarkReady_ClearsCondition` | `reconciler_test.go` | Submits mark-ready and removes `ReadinessApproved` condition | ## Implementation Order -1. **ensureMarkReady.** Add to SeiNode controller's `reconcileRunning`. Ships independently -- unblocks pod restarts for all strategies. Highest priority. -2. **CRD types.** Add `InPlace` enum, unified `RolloutStatus`, `ConditionRolloutInProgress`. Make `updateStrategy` required. Remove `DeploymentStatus`. `make manifests generate`. -3. **detectDeploymentNeeded refactor.** Set `RolloutInProgress` condition, write unified `RolloutStatus`. Migrate BlueGreen/HardFork. -4. **Rollout status reconciliation.** Add `reconcileRolloutStatus`, extend `computeGroupPhase`. -5. **Events.** Emit `RolloutStarted`, `RolloutComplete`. -6. **Tests.** Unit tests for each step. +1. **CRD types.** Add `InPlace` enum, `ConditionReadinessApproved` on SeiNode, unified `RolloutStatus`, `ConditionRolloutInProgress`. Make `updateStrategy` required. Remove `DeploymentStatus`. `make manifests generate`. +2. **SeiNode mark-ready signal-and-react.** Add `shouldMarkReady` / `submitMarkReady` to the SeiNode controller's `reconcileRunning`. Reacts to `ReadinessApproved` condition. Standalone fallback via owner reference check. +3. **InPlace deployment planner.** Add `inPlaceDeploymentPlanner` with `UpdateNodeSpecs → AwaitRunning` plan. Register in `ForDeployment`. +4. **UpdateNodeSpecs task.** New task type that patches child SeiNode specs and sets `ReadinessApproved`. +5. **detectDeploymentNeeded refactor.** Set `RolloutInProgress` condition, write unified `RolloutStatus`. Migrate BlueGreen/HardFork. Add zero-value migration handling. +6. **Rollout status reconciliation.** Add `reconcileRolloutStatus`, extend `computeGroupPhase`. Add stalled escalation. +7. **Events.** Emit `RolloutStarted`, `RolloutComplete`, `RolloutStalled`. +8. **Tests.** Unit tests for each step. -Step 1 is the highest-priority standalone fix -- it resolves the sidecar restart problem that currently blocks all image updates on Running nodes. +Steps 1-2 can ship independently as they unblock pod restarts for standalone SeiNodes. Steps 3-7 complete the InPlace strategy with full plan-based orchestration. diff --git a/.tide/designs/internal-networking-use-cases.md b/.tide/designs/internal-networking-use-cases.md new file mode 100644 index 0000000..ccfa56f --- /dev/null +++ b/.tide/designs/internal-networking-use-cases.md @@ -0,0 +1,512 @@ +# Internal Networking Use Cases for sei-k8s-controller + +Design exploration: how the operator's networking primitives serve internal development, testing, and load testing workflows where the topology does not match production. + +--- + +## Networking Primitives Available Today + +The operator exposes two layers of networking: + +**Per-node (SeiNode controller):** +- Headless Service (`ClusterIP: None`) with `PublishNotReadyAddresses: true`, named identically to the SeiNode. Provides stable DNS at `{node-name}-0.{node-name}.{namespace}.svc.cluster.local` for every port seid exposes. This exists unconditionally for every node. + +**Per-group (SeiNodeGroup controller):** +- External Service (`{group-name}-external`) -- ClusterIP, LoadBalancer, or NodePort. Selector targets `sei.io/nodegroup: {name}` (plus `sei.io/revision` during deployments). Ports derived from node mode via `seiconfig.NodePortsForMode()`. +- HTTPRoute (Gateway API) -- routes baseDomain subdomains (`rpc.*`, `rest.*`, `grpc.*`, `evm-rpc.*`, `evm-ws.*`) to the external Service. Requires Gateway CRDs and a Gateway implementation. +- AuthorizationPolicy (Istio) -- ALLOW policy restricting which identities can reach pods. Auto-injects the controller SA. +- ServiceMonitor (Prometheus Operator) -- scrapes the `metrics` port. + +**Key insight for internal use:** The headless Services and the external ClusterIP Service are plain Kubernetes networking. No Gateway, no Istio, no external DNS required. Any pod in the cluster can reach them directly via Kubernetes DNS. This is the foundation for all internal workflows. + +--- + +## Use Case 1: Compose a Test Network + +**Goal:** Spin up a complete chain (validators + RPCs + state syncers) in a single namespace for integration testing. + +### Architecture + +Use one SeiNodeGroup per role. The genesis ceremony group creates the chain; additional groups join it as full nodes. All groups live in the same namespace for DNS simplicity. + +``` +Namespace: integration-test + | + +-- SeiNodeGroup "testnet-validators" (genesis ceremony, 4 validators) + | +-- SeiNode "testnet-validators-0" + | +-- SeiNode "testnet-validators-1" + | +-- SeiNode "testnet-validators-2" + | +-- SeiNode "testnet-validators-3" + | + +-- SeiNodeGroup "testnet-rpc" (3 full nodes, ClusterIP service) + | +-- SeiNode "testnet-rpc-0" + | +-- SeiNode "testnet-rpc-1" + | +-- SeiNode "testnet-rpc-2" + | + +-- SeiNodeGroup "testnet-state-syncer" (1 state syncer) + +-- SeiNode "testnet-state-syncer-0" +``` + +### Manifests + +**Step 1: Validator group with genesis ceremony** + +```yaml +apiVersion: sei.io/v1alpha1 +kind: SeiNodeGroup +metadata: + name: testnet-validators + namespace: integration-test +spec: + replicas: 4 + + genesis: + chainId: integration-test-1 + stakingAmount: "10000000usei" + accountBalance: "1000000000000000000000usei,1000000000000000000000uusdc" + # Fund test accounts for load testing + accounts: + - address: "sei1testaccount..." + balance: "1000000000000000000000usei" + + template: + metadata: + labels: + sei.io/role: validator + spec: + chainId: integration-test-1 + image: "ghcr.io/sei-protocol/sei:v6.3.0" + entrypoint: + command: ["seid"] + args: ["start", "--home", "/sei"] + sidecar: + image: ghcr.io/sei-protocol/seictl@sha256:2cb320dd... + validator: {} + + # No networking section -- validators do not need external traffic. + # The per-node headless Services are sufficient for P2P communication. +``` + +**Step 2: RPC group that peers with the validators** + +```yaml +apiVersion: sei.io/v1alpha1 +kind: SeiNodeGroup +metadata: + name: testnet-rpc + namespace: integration-test +spec: + replicas: 3 + + template: + metadata: + labels: + sei.io/role: rpc + spec: + chainId: integration-test-1 + image: "ghcr.io/sei-protocol/sei:v6.3.0" + entrypoint: + command: ["seid"] + args: ["start", "--home", "/sei"] + sidecar: + image: ghcr.io/sei-protocol/seictl@sha256:2cb320dd... + + # Discover validator nodes by label -- no EC2, no static IPs + peers: + - label: + selector: + sei.io/nodegroup: testnet-validators + + fullNode: {} + + networking: + service: + type: ClusterIP + # No gateway -- internal only + # No isolation -- test namespace is trusted +``` + +This creates `testnet-rpc-external.integration-test.svc.cluster.local` as a ClusterIP Service load-balancing across all 3 RPC nodes. + +### DNS Topology + +| DNS Name | Type | What it reaches | +|----------|------|-----------------| +| `testnet-rpc-external.integration-test.svc.cluster.local` | ClusterIP | Round-robin across all RPC pods | +| `testnet-rpc-0-0.testnet-rpc-0.integration-test.svc.cluster.local` | Headless | Specific RPC pod (ordinal 0) | +| `testnet-validators-0-0.testnet-validators-0.integration-test.svc.cluster.local` | Headless | Specific validator pod (ordinal 0) | + +### What Works Today + +- Genesis ceremony orchestration is fully automated -- the group controller handles identity generation, gentx collection, genesis assembly, and peer discovery. +- Label-based peer discovery (`peers[].label.selector`) resolves to headless DNS names via the node controller's `reconcilePeers()`. The RPC group automatically discovers validator nodes without hardcoded addresses. +- The ClusterIP external Service is created with correct ports derived from the node mode. + +### What Is Missing or Could Be Improved + +1. **Cross-group genesis awareness.** The RPC group has no way to know when the genesis ceremony is complete. Today you must either wait for the validators to reach `Ready` phase before applying the RPC group, or the RPC nodes will fail their `configure-genesis` task and retry for up to 30 minutes. An explicit `dependsOn` or genesis-readiness gate on the SeiNodeGroup would make this deterministic. + +2. **Namespace-scoped genesis sharing.** The genesis ceremony uploads artifacts to S3. The RPC group's sidecar downloads genesis from S3 using the chain ID. This works but requires S3 access from the test cluster. For fully local test networks, an in-cluster genesis distribution mechanism (ConfigMap or PVC-based) would remove the S3 dependency. + +3. **Test accounts in genesis.** The `genesis.accounts` field supports funded accounts, but there is no built-in way to generate deterministic test keys. Users must bring their own mnemonics or addresses. + +--- + +## Use Case 2: Load Test an RPC Fleet + +**Goal:** Run synthetic load against a group of RPC nodes behind a single endpoint, measure throughput and latency. + +### Architecture + +The existing load test pattern (from `manifests/samples/jobs/loadtest-job.yaml`) targets individual node headless Services. For fleet-level load testing, target the group's ClusterIP external Service instead -- Kubernetes distributes connections across all backends. + +### Manifest: Load Test Job Targeting the External Service + +```yaml +apiVersion: v1 +kind: ConfigMap +metadata: + name: rpc-loadtest-config + namespace: integration-test +data: + profile.json: | + { + "chainId": 713715, + "seiChainId": "integration-test-1", + "endpoints": [ + "http://testnet-rpc-external.integration-test.svc.cluster.local:8545" + ], + "accounts": { + "count": 200, + "newAccountRate": 0.0 + }, + "scenarios": [ + { "name": "EVMTransfer", "weight": 7 }, + { "name": "EVMContractDeploy", "weight": 1 }, + { "name": "EVMContractCall", "weight": 2 } + ], + "settings": { + "workers": 200, + "tps": 500, + "statsInterval": "5s", + "bufferSize": 500, + "trackBlocks": true, + "prewarm": true, + "rampUp": true + } + } +--- +apiVersion: batch/v1 +kind: Job +metadata: + name: rpc-fleet-loadtest + namespace: integration-test +spec: + ttlSecondsAfterFinished: 3600 + parallelism: 4 + completions: 4 + template: + spec: + restartPolicy: Never + containers: + - name: seiload + image: ghcr.io/sei-protocol/sei-load:latest + args: + - --config + - /etc/seiload/profile.json + resources: + requests: + cpu: "2" + memory: 2Gi + volumeMounts: + - name: config + mountPath: /etc/seiload + volumes: + - name: config + configMap: + name: rpc-loadtest-config +``` + +### Load Test Targeting Strategy Comparison + +| Strategy | Endpoint | When to use | +|----------|----------|-------------| +| **Fleet (ClusterIP)** | `testnet-rpc-external:8545` | Measure aggregate fleet throughput, realistic production-like routing | +| **Per-node (headless)** | `testnet-rpc-0-0.testnet-rpc-0:8545` | Stress a single node, identify per-node bottlenecks | +| **Fan-out (all headless)** | List all node endpoints | Saturate every node simultaneously, find the weakest link | + +The existing load test sample uses the fan-out strategy (listing every `genesis-test-N-0.genesis-test-N` headless address). For production-representative load testing, the fleet strategy via ClusterIP is preferred -- it exercises the same connection distribution that real clients experience behind a Gateway. + +### Cross-Namespace Load Testing + +If the load test infrastructure lives in a shared `loadtest` namespace while the RPC fleet is in `integration-test`: + +```yaml +# Endpoint in the load test config: +"endpoints": [ + "http://testnet-rpc-external.integration-test.svc.cluster.local:8545" +] +``` + +Kubernetes DNS resolves cross-namespace service names with the full FQDN. No special configuration required. The only blocker would be Istio AuthorizationPolicy -- but for test environments, omit the `isolation` section entirely. + +### What Is Missing or Could Be Improved + +1. **No Service-level metrics from the operator.** The external Service distributes load, but there is no built-in way to see per-node request distribution, error rates, or latency percentiles from the operator's perspective. The ServiceMonitor scrapes seid Prometheus metrics (block height, consensus state), not HTTP request metrics. For load testing visibility, you need either Istio telemetry (if in the mesh) or a sidecar metrics proxy. + +2. **No built-in load test integration.** The load test job is a standalone manifest. A `loadTest` field on SeiNodeGroup (or a separate `SeiLoadTest` CRD) could automate the lifecycle: wait for Ready, run load, collect results, tear down. + +3. **Connection pooling awareness.** ClusterIP uses iptables/IPVS round-robin per new connection. For HTTP/2 or long-lived connections, a single connection pins to one backend. The operator does not configure session affinity or connection balancing -- users must ensure their load generator opens many short-lived connections or uses HTTP/1.1 for even distribution. + +--- + +## Use Case 3: Test Upgrade Flows (HardFork Deployments) + +**Goal:** Validate a HardFork deployment in staging before running it in production. + +### Architecture + +Create a genesis network, let it produce blocks, then apply a HardFork update strategy with a new image. The operator orchestrates the halt-height signaling, entrant node creation, binary switch, and teardown. + +### Manifests + +**Initial deployment:** + +```yaml +apiVersion: sei.io/v1alpha1 +kind: SeiNodeGroup +metadata: + name: upgrade-test + namespace: staging +spec: + replicas: 4 + + genesis: + chainId: upgrade-test-1 + stakingAmount: "10000000usei" + accountBalance: "1000000000000000000000usei" + overrides: + # Set a low halt height so the upgrade triggers quickly + # (in practice, use a realistic height for staging) + + template: + metadata: + labels: + sei.io/role: validator + spec: + chainId: upgrade-test-1 + image: "ghcr.io/sei-protocol/sei:v6.3.0" + entrypoint: + command: ["seid"] + args: ["start", "--home", "/sei"] + sidecar: + image: ghcr.io/sei-protocol/seictl@sha256:2cb320dd... + validator: {} + + updateStrategy: + type: HardFork + + networking: + service: + type: ClusterIP +``` + +**Trigger the upgrade (edit the image and set halt height):** + +```yaml +spec: + template: + spec: + image: "ghcr.io/sei-protocol/sei:v7.0.0" + updateStrategy: + type: HardFork + hardFork: + haltHeight: 1000 +``` + +The operator detects the templateHash change, creates entrant nodes with the new image, signals the incumbent nodes to halt at the specified height via `await-condition` with SIGTERM, waits for the entrants to catch up, switches the external Service selector to the new revision, and tears down the incumbents. + +### Networking During Upgrades + +During a HardFork deployment, the external Service selector adds `sei.io/revision: {incumbentRevision}` to pin traffic to the active set. After the switch, the selector updates to the entrant revision. For internal testing, this means: + +- Any pod targeting the ClusterIP external Service sees zero-downtime if the halt and switch complete within the Service's readiness probe window. +- You can observe both sets simultaneously by targeting headless Services directly: + - Incumbent: `upgrade-test-0-0.upgrade-test-0.staging.svc.cluster.local` + - Entrant: `upgrade-test-g2-0-0.upgrade-test-g2-0.staging.svc.cluster.local` + +### What Is Missing or Could Be Improved + +1. **No staging-specific halt height automation.** In production, the halt height is coordinated across the network. In staging, you want the chain to produce enough blocks to exercise the upgrade handler, then halt. There is no built-in "halt after N blocks" semantic -- you must calculate and set the height manually. + +2. **No upgrade dry-run mode.** A mode that creates entrant nodes, verifies they sync past the upgrade height, but does NOT switch traffic or tear down incumbents would let teams validate upgrade compatibility without committing to the switch. + +3. **No rollback.** HardFork deployments are one-way. If the new binary fails, the plan enters Failed state. The incumbents are already halted. Recovery requires manual intervention (new group, restore from snapshot). For staging this is acceptable, but documenting the recovery path would help. + +--- + +## Use Case 4: Debug Individual Node Behavior + +**Goal:** Connect directly to a specific node in a group for debugging -- inspect RPC responses, check sync status, query state. + +### How It Works Today + +Every SeiNode gets a headless Service with all ports exposed and `PublishNotReadyAddresses: true`. This means the node is DNS-reachable even during initialization, before it passes readiness probes. + +**Direct node access:** + +```bash +# From any pod in the cluster (or via kubectl port-forward from your workstation): + +# CometBFT RPC (status, net_info, consensus_state) +curl http://testnet-rpc-0-0.testnet-rpc-0.integration-test.svc.cluster.local:26657/status + +# EVM JSON-RPC +curl -X POST http://testnet-rpc-0-0.testnet-rpc-0.integration-test.svc.cluster.local:8545 \ + -H "Content-Type: application/json" \ + -d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' + +# gRPC reflection +grpcurl -plaintext testnet-rpc-0-0.testnet-rpc-0.integration-test.svc.cluster.local:9090 list + +# REST API +curl http://testnet-rpc-0-0.testnet-rpc-0.integration-test.svc.cluster.local:1317/cosmos/base/tendermint/v1beta1/syncing + +# Sidecar API (task status, health, diagnostics) +curl http://testnet-rpc-0-0.testnet-rpc-0.integration-test.svc.cluster.local:7777/v0/healthz +curl http://testnet-rpc-0-0.testnet-rpc-0.integration-test.svc.cluster.local:7777/v0/tasks +``` + +**From your workstation via port-forward:** + +```bash +# Forward a specific node's RPC port +kubectl port-forward -n integration-test svc/testnet-rpc-0 26657:26657 + +# Or forward the sidecar for task diagnostics +kubectl port-forward -n integration-test svc/testnet-rpc-0 7777:7777 +``` + +### Comparing Nodes Side-by-Side + +Because headless Services give per-node addressability, you can compare responses across nodes in a group: + +```bash +# Check sync status across all nodes in a group +for i in 0 1 2; do + echo "--- testnet-rpc-$i ---" + curl -s "http://testnet-rpc-$i-0.testnet-rpc-$i.integration-test.svc.cluster.local:26657/status" \ + | jq '.result.sync_info.latest_block_height' +done +``` + +### What Is Missing or Could Be Improved + +1. **No debug port aggregation.** To inspect all nodes in a group, you must iterate over headless Services manually. A debug endpoint on the external Service that fans out queries to all backends and aggregates responses would simplify multi-node debugging. + +2. **No built-in exec/attach shortcut.** Debugging often requires `kubectl exec` into the seid or sidecar container. The operator could provide a `kubectl sei debug ` plugin that resolves the pod and container automatically. + +3. **Sidecar task history is ephemeral.** The sidecar's `/v0/tasks` endpoint shows current task state, but task history is lost on pod restart. For post-mortem debugging, task history should be persisted (in the SeiNode status or an external store). + +--- + +## Use Case 5: Mirror Production Traffic + +**Goal:** Replay production RPC traffic against a test fleet to validate correctness before cutover. + +### How It Works Today + +The `manifests/samples/istio/pacific-1-rpc-mirror/` directory contains a complete Istio traffic mirroring setup: + +1. **ServiceEntry** registers the EC2 RPC ALB as `ec2-rpc.pacific-1.internal` (mesh-external). +2. **DestinationRule** disables mTLS for the EC2 backend, prevents HTTP/2 upgrade (CometBFT is HTTP/1.1), configures outlier detection. +3. **VirtualService** routes 100% of HTTP RPC traffic to EC2 with 100% mirror to `pacific-1-rpc-external.default.svc.cluster.local`. WebSocket traffic goes to EC2 only (Istio cannot mirror WebSocket). +4. **Telemetry** logs requests with response code >= 400 or latency > 5s. Mirrored requests arrive with Host header suffixed `-shadow`, making them filterable. +5. **PeerAuthentication** enforces STRICT mTLS on the K8s pods. + +The cutover VirtualService (`virtual-service-cutover.yaml`) shows the progressive weight shifting from EC2 to K8s. + +### Networking Configuration for Mirroring + +The K8s RPC fleet needs: +- An external ClusterIP Service (created by the SeiNodeGroup networking config) +- Istio sidecar injection on the pods (the operator does not inject this -- the namespace must have `istio-injection: enabled`) +- The VirtualService, ServiceEntry, DestinationRule, and PeerAuthentication are manually managed (not created by the operator) + +```yaml +# The SeiNodeGroup for the mirror target -- same as production +apiVersion: sei.io/v1alpha1 +kind: SeiNodeGroup +metadata: + name: pacific-1-rpc + namespace: default +spec: + replicas: 3 + template: + metadata: + labels: + sei.io/role: rpc + spec: + chainId: pacific-1 + image: "ghcr.io/sei-protocol/sei:v6.3.0" + # ... (same as production RPC spec) + fullNode: + snapshot: + s3: + targetHeight: 198740000 + trustPeriod: "9999h0m0s" + + networking: + service: + type: ClusterIP + + # AuthorizationPolicy allows the Istio gateway SA + isolation: + authorizationPolicy: + allowedSources: + - principals: + - "cluster.local/ns/istio-system/sa/sei-gateway-istio" + - namespaces: + - default +``` + +The mirror VirtualService references `pacific-1-rpc-external.default.svc.cluster.local` -- the external Service name is deterministic (`{group-name}-external`). + +### What Is Missing or Could Be Improved + +1. **No operator-managed mirroring.** The Istio resources (VirtualService, ServiceEntry, DestinationRule) are manually applied. The operator could support a `networking.mirror` field that generates these resources, or at least a `networking.istio` section for VirtualService management. + +2. **No result comparison integration.** The mirrored responses are discarded by Envoy. The comment in the VirtualService references a "result-compare task" on the sidecar, but this relies on the replayer node type. There is no built-in way to compare mirrored responses to the primary responses at the fleet level. + +3. **WebSocket gap.** Istio cannot mirror WebSocket connections. During the mirror phase, WebSocket clients only hit the EC2 backend. The cutover VirtualService handles this with weighted routing, but there is no intermediate validation step for WebSocket traffic. + +4. **No traffic recording/replay.** True replay (deterministic request replay from recorded production traffic) would require a request capture layer. The current mirroring is live -- it only works while production traffic is flowing. + +--- + +## Summary: CRD Fields Mapped to Internal Use Cases + +| CRD Field | Production Use | Internal Use | +|-----------|---------------|--------------| +| `networking.service.type: ClusterIP` | Backend for Gateway HTTPRoute | **Direct endpoint for load test jobs, integration tests, cross-namespace access** | +| `networking.service.type: LoadBalancer` | External IP for production traffic | Not needed internally -- ClusterIP suffices | +| `networking.gateway.baseDomain` | External DNS routing | **Omit entirely** -- no Gateway needed for internal workflows | +| `networking.isolation.authorizationPolicy` | Lock down pod access to Gateway SA | **Omit entirely** -- test namespaces are trusted | +| `monitoring.serviceMonitor` | Production Prometheus scraping | **Keep** -- useful for load test observability | +| Headless Service (per-node, automatic) | P2P networking, sidecar communication | **Direct node debugging, per-node load testing, side-by-side comparison** | +| `peers[].label.selector` | Cross-group peer discovery | **Compose multi-role test networks without hardcoded addresses** | +| `genesis` | N/A (production chains have existing genesis) | **Bootstrap private test chains from scratch** | +| `updateStrategy.type: HardFork` | Coordinate binary upgrades across production fleet | **Validate upgrade handlers in staging before production** | + +## Summary: What Is Missing for Internal Developer Workflows + +| Gap | Impact | Suggested Direction | +|-----|--------|-------------------| +| Cross-group dependency ordering | RPC groups applied before genesis completes must retry for up to 30 min | `dependsOn` field on SeiNodeGroup, or a Condition-based readiness gate | +| In-cluster genesis distribution | Test networks require S3 access for genesis sharing | ConfigMap or PVC-based genesis source as alternative to S3 | +| Halt-after-N-blocks for staging | Must manually calculate halt height for upgrade testing | `updateStrategy.hardFork.haltAfterBlocks` relative offset | +| Upgrade dry-run mode | Cannot validate upgrade compatibility without committing to the switch | `updateStrategy.dryRun: true` that creates entrants but skips switch | +| Service-level request metrics | No HTTP request metrics from the operator for load test analysis | Istio telemetry in the mesh, or a metrics sidecar on the external Service | +| Operator-managed Istio resources | VirtualService/DestinationRule for mirroring are manually applied | `networking.istio` section or separate CRD for traffic management | +| Load test lifecycle | Load test jobs are standalone, no coordination with group readiness | `SeiLoadTest` CRD or `loadTest` field that gates on group Ready | +| Debug aggregation | Must iterate headless Services manually to compare nodes | Debug endpoint or CLI plugin for multi-node queries | diff --git a/api/v1alpha1/seinode_types.go b/api/v1alpha1/seinode_types.go index 5e5b144..e178652 100644 --- a/api/v1alpha1/seinode_types.go +++ b/api/v1alpha1/seinode_types.go @@ -221,6 +221,14 @@ type SeiNodeStatus struct { // Phase is the high-level lifecycle state. Phase SeiNodePhase `json:"phase,omitempty"` + // CurrentImage is the seid container image observed running on the + // owned StatefulSet. Updated by the SeiNode controller when the + // StatefulSet rollout completes (currentRevision == updateRevision). + // Parent controllers compare this against spec.image to determine + // whether a spec change has been fully actuated. + // +optional + CurrentImage string `json:"currentImage,omitempty"` + // +listType=map // +listMapKey=type // +optional diff --git a/api/v1alpha1/seinodedeployment_types.go b/api/v1alpha1/seinodedeployment_types.go index fb36542..b939f2b 100644 --- a/api/v1alpha1/seinodedeployment_types.go +++ b/api/v1alpha1/seinodedeployment_types.go @@ -43,26 +43,18 @@ type SeiNodeDeploymentSpec struct { Monitoring *MonitoringConfig `json:"monitoring,omitempty"` // UpdateStrategy controls how changes to the template are rolled out - // to child SeiNodes. When set, the controller uses blue-green - // deployment orchestration instead of in-place updates. - // When not set, template changes are applied in-place via ensureSeiNode. - // +optional - UpdateStrategy *UpdateStrategy `json:"updateStrategy,omitempty"` + // to child SeiNodes. Every deployment must declare an explicit strategy. + UpdateStrategy UpdateStrategy `json:"updateStrategy"` } // UpdateStrategyType identifies the deployment strategy. -// +kubebuilder:validation:Enum=BlueGreen;HardFork +// +kubebuilder:validation:Enum=InPlace;BlueGreen;HardFork type UpdateStrategyType string const ( - // UpdateStrategyBlueGreen performs a blue-green deployment once the - // green nodes have caught up to the chain tip (catching_up == false). + UpdateStrategyInPlace UpdateStrategyType = "InPlace" UpdateStrategyBlueGreen UpdateStrategyType = "BlueGreen" - - // UpdateStrategyHardFork performs a blue-green deployment at a specific - // block height. The old binary halts via sidecar SIGTERM at the - // configured halt-height, and the new binary continues from that height. - UpdateStrategyHardFork UpdateStrategyType = "HardFork" + UpdateStrategyHardFork UpdateStrategyType = "HardFork" ) // UpdateStrategy controls how spec changes propagate to child SeiNodes. @@ -247,10 +239,10 @@ type SeiNodeDeploymentStatus struct { // +optional IncumbentNodes []string `json:"incumbentNodes,omitempty"` - // Deployment tracks an in-progress deployment. - // Nil when no deployment is active. + // Rollout tracks an in-progress rollout across all strategy types. + // Nil when no rollout is active. // +optional - Deployment *DeploymentStatus `json:"deployment,omitempty"` + Rollout *RolloutStatus `json:"rollout,omitempty"` // NetworkingStatus reports the observed state of networking resources. // +optional @@ -288,18 +280,54 @@ type RouteStatus struct { Protocol string `json:"protocol,omitempty"` } -// DeploymentStatus tracks metadata for an in-progress deployment. -// The task plan itself lives in SeiNodeDeploymentStatus.Plan. -type DeploymentStatus struct { - // IncumbentRevision identifies the generation of the currently live nodes. - IncumbentRevision string `json:"incumbentRevision"` +// RolloutStatus tracks an in-progress rollout. Used by all strategies +// to report per-node convergence state. +type RolloutStatus struct { + // Strategy is the strategy type driving this rollout. + Strategy UpdateStrategyType `json:"strategy"` + + // TargetHash is the templateHash being rolled out to. + TargetHash string `json:"targetHash"` + + // StartedAt is when the rollout was first detected. + StartedAt metav1.Time `json:"startedAt"` + + // Nodes reports per-node rollout state. + // +listType=map + // +listMapKey=name + Nodes []RolloutNodeStatus `json:"nodes"` - // EntrantRevision identifies the generation of the new nodes being deployed. - EntrantRevision string `json:"entrantRevision"` + // IncumbentNodes lists the names of the currently active SeiNode + // resources. Only populated for BlueGreen and HardFork strategies. + // +optional + IncumbentNodes []string `json:"incumbentNodes,omitempty"` - // EntrantNodes lists the names of the new SeiNode resources. + // EntrantNodes lists the names of the new SeiNode resources being + // created. Only populated for BlueGreen and HardFork strategies. // +optional EntrantNodes []string `json:"entrantNodes,omitempty"` + + // IncumbentRevision identifies the generation of the currently live nodes. + // Only populated for BlueGreen and HardFork strategies. + // +optional + IncumbentRevision string `json:"incumbentRevision,omitempty"` + + // EntrantRevision identifies the generation of the new nodes. + // Only populated for BlueGreen and HardFork strategies. + // +optional + EntrantRevision string `json:"entrantRevision,omitempty"` +} + +// RolloutNodeStatus tracks a single node's convergence during a rollout. +type RolloutNodeStatus struct { + // Name is the SeiNode resource name. + Name string `json:"name"` + + // Ready is true when the node is Running with a ready pod. + Ready bool `json:"ready"` + + // Phase is the SeiNode's current phase. + Phase SeiNodePhase `json:"phase,omitempty"` } // Status condition types for SeiNodeDeployment. @@ -311,6 +339,7 @@ const ( ConditionPlanInProgress = "PlanInProgress" ConditionGenesisCeremonyNeeded = "GenesisCeremonyNeeded" ConditionForkGenesisCeremonyNeeded = "ForkGenesisCeremonyNeeded" + ConditionRolloutInProgress = "RolloutInProgress" ) // +kubebuilder:object:root=true @@ -319,7 +348,7 @@ const ( // +kubebuilder:printcolumn:name="Ready",type=integer,JSONPath=`.status.readyReplicas` // +kubebuilder:printcolumn:name="Replicas",type=integer,JSONPath=`.status.replicas` // +kubebuilder:printcolumn:name="Phase",type=string,JSONPath=`.status.phase` -// +kubebuilder:printcolumn:name="Revision",type=string,JSONPath=`.status.deployment.entrantRevision`,priority=1 +// +kubebuilder:printcolumn:name="Revision",type=string,JSONPath=`.status.rollout.entrantRevision`,priority=1 // +kubebuilder:printcolumn:name="Age",type=date,JSONPath=`.metadata.creationTimestamp` // SeiNodeDeployment is the Schema for the seinodedeployments API. diff --git a/api/v1alpha1/zz_generated.deepcopy.go b/api/v1alpha1/zz_generated.deepcopy.go index 5583dc9..b1be2c6 100644 --- a/api/v1alpha1/zz_generated.deepcopy.go +++ b/api/v1alpha1/zz_generated.deepcopy.go @@ -31,26 +31,6 @@ func (in *ArchiveSpec) DeepCopy() *ArchiveSpec { return out } -// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil. -func (in *DeploymentStatus) DeepCopyInto(out *DeploymentStatus) { - *out = *in - if in.EntrantNodes != nil { - in, out := &in.EntrantNodes, &out.EntrantNodes - *out = make([]string, len(*in)) - copy(*out, *in) - } -} - -// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new DeploymentStatus. -func (in *DeploymentStatus) DeepCopy() *DeploymentStatus { - if in == nil { - return nil - } - out := new(DeploymentStatus) - in.DeepCopyInto(out) - return out -} - // DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil. func (in *EC2TagsPeerSource) DeepCopyInto(out *EC2TagsPeerSource) { *out = *in @@ -437,6 +417,52 @@ func (in *ResultExportConfig) DeepCopy() *ResultExportConfig { return out } +// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil. +func (in *RolloutNodeStatus) DeepCopyInto(out *RolloutNodeStatus) { + *out = *in +} + +// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new RolloutNodeStatus. +func (in *RolloutNodeStatus) DeepCopy() *RolloutNodeStatus { + if in == nil { + return nil + } + out := new(RolloutNodeStatus) + in.DeepCopyInto(out) + return out +} + +// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil. +func (in *RolloutStatus) DeepCopyInto(out *RolloutStatus) { + *out = *in + in.StartedAt.DeepCopyInto(&out.StartedAt) + if in.Nodes != nil { + in, out := &in.Nodes, &out.Nodes + *out = make([]RolloutNodeStatus, len(*in)) + copy(*out, *in) + } + if in.IncumbentNodes != nil { + in, out := &in.IncumbentNodes, &out.IncumbentNodes + *out = make([]string, len(*in)) + copy(*out, *in) + } + if in.EntrantNodes != nil { + in, out := &in.EntrantNodes, &out.EntrantNodes + *out = make([]string, len(*in)) + copy(*out, *in) + } +} + +// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new RolloutStatus. +func (in *RolloutStatus) DeepCopy() *RolloutStatus { + if in == nil { + return nil + } + out := new(RolloutStatus) + in.DeepCopyInto(out) + return out +} + // DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil. func (in *RouteStatus) DeepCopyInto(out *RouteStatus) { *out = *in @@ -572,11 +598,7 @@ func (in *SeiNodeDeploymentSpec) DeepCopyInto(out *SeiNodeDeploymentSpec) { *out = new(MonitoringConfig) (*in).DeepCopyInto(*out) } - if in.UpdateStrategy != nil { - in, out := &in.UpdateStrategy, &out.UpdateStrategy - *out = new(UpdateStrategy) - (*in).DeepCopyInto(*out) - } + in.UpdateStrategy.DeepCopyInto(&out.UpdateStrategy) } // DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SeiNodeDeploymentSpec. @@ -607,9 +629,9 @@ func (in *SeiNodeDeploymentStatus) DeepCopyInto(out *SeiNodeDeploymentStatus) { *out = make([]string, len(*in)) copy(*out, *in) } - if in.Deployment != nil { - in, out := &in.Deployment, &out.Deployment - *out = new(DeploymentStatus) + if in.Rollout != nil { + in, out := &in.Rollout, &out.Rollout + *out = new(RolloutStatus) (*in).DeepCopyInto(*out) } if in.NetworkingStatus != nil { diff --git a/config/crd/sei.io_seinodedeployments.yaml b/config/crd/sei.io_seinodedeployments.yaml index 4766576..6c64768 100644 --- a/config/crd/sei.io_seinodedeployments.yaml +++ b/config/crd/sei.io_seinodedeployments.yaml @@ -26,7 +26,7 @@ spec: - jsonPath: .status.phase name: Phase type: string - - jsonPath: .status.deployment.entrantRevision + - jsonPath: .status.rollout.entrantRevision name: Revision priority: 1 type: string @@ -681,9 +681,7 @@ spec: updateStrategy: description: |- UpdateStrategy controls how changes to the template are rolled out - to child SeiNodes. When set, the controller uses blue-green - deployment orchestration instead of in-place updates. - When not set, template changes are applied in-place via ensureSeiNode. + to child SeiNodes. Every deployment must declare an explicit strategy. properties: hardFork: description: |- @@ -704,6 +702,7 @@ spec: type: description: Type selects the deployment strategy. enum: + - InPlace - BlueGreen - HardFork type: string @@ -717,6 +716,7 @@ spec: required: - replicas - template + - updateStrategy type: object status: description: SeiNodeDeploymentStatus defines the observed state of a SeiNodeDeployment. @@ -780,28 +780,6 @@ spec: x-kubernetes-list-map-keys: - type x-kubernetes-list-type: map - deployment: - description: |- - Deployment tracks an in-progress deployment. - Nil when no deployment is active. - properties: - entrantNodes: - description: EntrantNodes lists the names of the new SeiNode resources. - items: - type: string - type: array - entrantRevision: - description: EntrantRevision identifies the generation of the - new nodes being deployed. - type: string - incumbentRevision: - description: IncumbentRevision identifies the generation of the - currently live nodes. - type: string - required: - - entrantRevision - - incumbentRevision - type: object genesisHash: description: GenesisHash is the SHA-256 hex digest of the assembled genesis.json. @@ -992,6 +970,85 @@ spec: description: Replicas is the desired number of SeiNodes. format: int32 type: integer + rollout: + description: |- + Rollout tracks an in-progress rollout across all strategy types. + Nil when no rollout is active. + properties: + entrantNodes: + description: |- + EntrantNodes lists the names of the new SeiNode resources being + created. Only populated for BlueGreen and HardFork strategies. + items: + type: string + type: array + entrantRevision: + description: |- + EntrantRevision identifies the generation of the new nodes. + Only populated for BlueGreen and HardFork strategies. + type: string + incumbentNodes: + description: |- + IncumbentNodes lists the names of the currently active SeiNode + resources. Only populated for BlueGreen and HardFork strategies. + items: + type: string + type: array + incumbentRevision: + description: |- + IncumbentRevision identifies the generation of the currently live nodes. + Only populated for BlueGreen and HardFork strategies. + type: string + nodes: + description: Nodes reports per-node rollout state. + items: + description: RolloutNodeStatus tracks a single node's convergence + during a rollout. + properties: + name: + description: Name is the SeiNode resource name. + type: string + phase: + description: Phase is the SeiNode's current phase. + enum: + - Pending + - Initializing + - Running + - Failed + - Terminating + type: string + ready: + description: Ready is true when the node is Running with + a ready pod. + type: boolean + required: + - name + - ready + type: object + type: array + x-kubernetes-list-map-keys: + - name + x-kubernetes-list-type: map + startedAt: + description: StartedAt is when the rollout was first detected. + format: date-time + type: string + strategy: + description: Strategy is the strategy type driving this rollout. + enum: + - InPlace + - BlueGreen + - HardFork + type: string + targetHash: + description: TargetHash is the templateHash being rolled out to. + type: string + required: + - nodes + - startedAt + - strategy + - targetHash + type: object templateHash: description: |- TemplateHash is a hash of the spec fields that require deployment diff --git a/config/crd/sei.io_seinodes.yaml b/config/crd/sei.io_seinodes.yaml index 7e8784d..0714938 100644 --- a/config/crd/sei.io_seinodes.yaml +++ b/config/crd/sei.io_seinodes.yaml @@ -558,6 +558,14 @@ spec: x-kubernetes-list-map-keys: - type x-kubernetes-list-type: map + currentImage: + description: |- + CurrentImage is the seid container image observed running on the + owned StatefulSet. Updated by the SeiNode controller when the + StatefulSet rollout completes (currentRevision == updateRevision). + Parent controllers compare this against spec.image to determine + whether a spec change has been fully actuated. + type: string externalAddress: description: |- ExternalAddress is the routable P2P address (host:port) for this node, diff --git a/internal/controller/node/controller.go b/internal/controller/node/controller.go index 4b2327b..335343b 100644 --- a/internal/controller/node/controller.go +++ b/internal/controller/node/controller.go @@ -175,6 +175,10 @@ func (r *SeiNodeReconciler) reconcileRunning(ctx context.Context, node *seiv1alp return ctrl.Result{}, fmt.Errorf("reconciling service: %w", err) } + if err := r.observeCurrentImage(ctx, node); err != nil { + return ctrl.Result{}, fmt.Errorf("observing current image: %w", err) + } + sc := r.buildSidecarClient(node) if sc == nil { sidecarUnreachableTotal.WithLabelValues(node.Namespace, node.Name).Inc() @@ -184,6 +188,30 @@ func (r *SeiNodeReconciler) reconcileRunning(ctx context.Context, node *seiv1alp return r.reconcileRuntimeTasks(ctx, node, sc) } +func (r *SeiNodeReconciler) observeCurrentImage(ctx context.Context, node *seiv1alpha1.SeiNode) error { + sts := &appsv1.StatefulSet{} + if err := r.Get(ctx, types.NamespacedName{Name: node.Name, Namespace: node.Namespace}, sts); err != nil { + if apierrors.IsNotFound(err) { + return nil + } + return err + } + + if sts.Status.CurrentRevision == "" || sts.Status.CurrentRevision != sts.Status.UpdateRevision { + return nil + } + if sts.Status.ReadyReplicas < 1 { + return nil + } + + if node.Status.CurrentImage != node.Spec.Image { + patch := client.MergeFromWithOptions(node.DeepCopy(), client.MergeFromWithOptimisticLock{}) + node.Status.CurrentImage = node.Spec.Image + return r.Status().Patch(ctx, node, patch) + } + return nil +} + // transitionPhase transitions the node to a new phase and emits the associated // metric counter, phase gauge, and Kubernetes event. func (r *SeiNodeReconciler) transitionPhase(ctx context.Context, node *seiv1alpha1.SeiNode, phase seiv1alpha1.SeiNodePhase) (ctrl.Result, error) { diff --git a/internal/controller/node/reconciler_test.go b/internal/controller/node/reconciler_test.go index 46f6930..ede1609 100644 --- a/internal/controller/node/reconciler_test.go +++ b/internal/controller/node/reconciler_test.go @@ -80,6 +80,11 @@ func getSeiNode(t *testing.T, ctx context.Context, c client.Client, name, namesp return node } +const ( + testImageV2 = "ghcr.io/sei-protocol/seid:v2.0.0" + testRevision = "rev-2" +) + func TestNodeReconcile_NotFound(t *testing.T) { g := NewWithT(t) r, _ := newNodeReconciler(t) @@ -202,7 +207,7 @@ func TestNodeReconcile_RunningPhase_UpdatesStatefulSetImage(t *testing.T) { // Update the image on the SeiNode spec. node = getSeiNode(t, ctx, c, "mynet-0", "default") - node.Spec.Image = "ghcr.io/sei-protocol/seid:v2.0.0" + node.Spec.Image = testImageV2 g.Expect(c.Update(ctx, node)).To(Succeed()) // Reconcile — this enters reconcileRunning which should update the StatefulSet. @@ -212,9 +217,239 @@ func TestNodeReconcile_RunningPhase_UpdatesStatefulSetImage(t *testing.T) { // StatefulSet should now reflect the new image. g.Expect(c.Get(ctx, types.NamespacedName{Name: "mynet-0", Namespace: "default"}, sts)).To(Succeed()) seid = findContainer(sts.Spec.Template.Spec.Containers, "seid") - g.Expect(seid.Image).To(Equal("ghcr.io/sei-protocol/seid:v2.0.0")) + g.Expect(seid.Image).To(Equal(testImageV2)) +} + +func TestObserveCurrentImage_UpdatesWhenConverged(t *testing.T) { + g := NewWithT(t) + ctx := context.Background() + + node := newGenesisNode("mynet-0", "default") + node.Finalizers = []string{nodeFinalizerName} + node.Status.Phase = seiv1alpha1.PhaseRunning + node.Spec.Image = testImageV2 + + sts := &appsv1.StatefulSet{ + ObjectMeta: metav1.ObjectMeta{Name: "mynet-0", Namespace: "default"}, + Spec: appsv1.StatefulSetSpec{ + Replicas: ptrInt32(1), + ServiceName: "mynet-0", + Selector: &metav1.LabelSelector{MatchLabels: map[string]string{"sei.io/node": "mynet-0"}}, + Template: corev1.PodTemplateSpec{ + ObjectMeta: metav1.ObjectMeta{Labels: map[string]string{"sei.io/node": "mynet-0"}}, + Spec: corev1.PodSpec{Containers: []corev1.Container{{Name: "seid", Image: testImageV2}}}, + }, + }, + } + + s := newNodeTestScheme(t) + c := fake.NewClientBuilder(). + WithScheme(s). + WithObjects(node, sts). + WithStatusSubresource(&seiv1alpha1.SeiNode{}, &appsv1.StatefulSet{}). + Build() + + sts.Status.CurrentRevision = testRevision + sts.Status.UpdateRevision = testRevision + sts.Status.ReadyReplicas = 1 + g.Expect(c.Status().Update(ctx, sts)).To(Succeed()) + + r := &SeiNodeReconciler{Client: c, Scheme: s} + g.Expect(r.observeCurrentImage(ctx, node)).To(Succeed()) + + fetched := getSeiNode(t, ctx, c, "mynet-0", "default") + g.Expect(fetched.Status.CurrentImage).To(Equal(testImageV2)) +} + +func TestObserveCurrentImage_SkipsWhenRolling(t *testing.T) { + g := NewWithT(t) + ctx := context.Background() + + node := newGenesisNode("mynet-0", "default") + node.Finalizers = []string{nodeFinalizerName} + node.Status.Phase = seiv1alpha1.PhaseRunning + node.Spec.Image = testImageV2 + + sts := &appsv1.StatefulSet{ + ObjectMeta: metav1.ObjectMeta{Name: "mynet-0", Namespace: "default"}, + Spec: appsv1.StatefulSetSpec{ + Replicas: ptrInt32(1), + ServiceName: "mynet-0", + Selector: &metav1.LabelSelector{MatchLabels: map[string]string{"sei.io/node": "mynet-0"}}, + Template: corev1.PodTemplateSpec{ + ObjectMeta: metav1.ObjectMeta{Labels: map[string]string{"sei.io/node": "mynet-0"}}, + Spec: corev1.PodSpec{Containers: []corev1.Container{{Name: "seid", Image: testImageV2}}}, + }, + }, + } + + s := newNodeTestScheme(t) + c := fake.NewClientBuilder(). + WithScheme(s). + WithObjects(node, sts). + WithStatusSubresource(&seiv1alpha1.SeiNode{}, &appsv1.StatefulSet{}). + Build() + + sts.Status.CurrentRevision = "rev-1" + sts.Status.UpdateRevision = testRevision + g.Expect(c.Status().Update(ctx, sts)).To(Succeed()) + + r := &SeiNodeReconciler{Client: c, Scheme: s} + g.Expect(r.observeCurrentImage(ctx, node)).To(Succeed()) + + fetched := getSeiNode(t, ctx, c, "mynet-0", "default") + g.Expect(fetched.Status.CurrentImage).To(BeEmpty()) +} + +func TestObserveCurrentImage_SkipsWhenReadyReplicasZero(t *testing.T) { + g := NewWithT(t) + ctx := context.Background() + + node := newGenesisNode("mynet-0", "default") + node.Finalizers = []string{nodeFinalizerName} + node.Status.Phase = seiv1alpha1.PhaseRunning + node.Spec.Image = testImageV2 + + sts := &appsv1.StatefulSet{ + ObjectMeta: metav1.ObjectMeta{Name: "mynet-0", Namespace: "default"}, + Spec: appsv1.StatefulSetSpec{ + Replicas: ptrInt32(1), + ServiceName: "mynet-0", + Selector: &metav1.LabelSelector{MatchLabels: map[string]string{"sei.io/node": "mynet-0"}}, + Template: corev1.PodTemplateSpec{ + ObjectMeta: metav1.ObjectMeta{Labels: map[string]string{"sei.io/node": "mynet-0"}}, + Spec: corev1.PodSpec{Containers: []corev1.Container{{Name: "seid", Image: testImageV2}}}, + }, + }, + } + + s := newNodeTestScheme(t) + c := fake.NewClientBuilder(). + WithScheme(s). + WithObjects(node, sts). + WithStatusSubresource(&seiv1alpha1.SeiNode{}, &appsv1.StatefulSet{}). + Build() + + sts.Status.CurrentRevision = testRevision + sts.Status.UpdateRevision = testRevision + sts.Status.ReadyReplicas = 0 + g.Expect(c.Status().Update(ctx, sts)).To(Succeed()) + + r := &SeiNodeReconciler{Client: c, Scheme: s} + g.Expect(r.observeCurrentImage(ctx, node)).To(Succeed()) + + fetched := getSeiNode(t, ctx, c, "mynet-0", "default") + g.Expect(fetched.Status.CurrentImage).To(BeEmpty()) } +func TestObserveCurrentImage_SkipsWhenEmptyRevision(t *testing.T) { + g := NewWithT(t) + ctx := context.Background() + + node := newGenesisNode("mynet-0", "default") + node.Finalizers = []string{nodeFinalizerName} + node.Status.Phase = seiv1alpha1.PhaseRunning + node.Spec.Image = testImageV2 + + sts := &appsv1.StatefulSet{ + ObjectMeta: metav1.ObjectMeta{Name: "mynet-0", Namespace: "default"}, + Spec: appsv1.StatefulSetSpec{ + Replicas: ptrInt32(1), + ServiceName: "mynet-0", + Selector: &metav1.LabelSelector{MatchLabels: map[string]string{"sei.io/node": "mynet-0"}}, + Template: corev1.PodTemplateSpec{ + ObjectMeta: metav1.ObjectMeta{Labels: map[string]string{"sei.io/node": "mynet-0"}}, + Spec: corev1.PodSpec{Containers: []corev1.Container{{Name: "seid", Image: testImageV2}}}, + }, + }, + } + + s := newNodeTestScheme(t) + c := fake.NewClientBuilder(). + WithScheme(s). + WithObjects(node, sts). + WithStatusSubresource(&seiv1alpha1.SeiNode{}, &appsv1.StatefulSet{}). + Build() + + sts.Status.CurrentRevision = "" + sts.Status.UpdateRevision = testRevision + sts.Status.ReadyReplicas = 1 + g.Expect(c.Status().Update(ctx, sts)).To(Succeed()) + + r := &SeiNodeReconciler{Client: c, Scheme: s} + g.Expect(r.observeCurrentImage(ctx, node)).To(Succeed()) + + fetched := getSeiNode(t, ctx, c, "mynet-0", "default") + g.Expect(fetched.Status.CurrentImage).To(BeEmpty()) +} + +func TestObserveCurrentImage_NoopWhenAlreadyCurrent(t *testing.T) { + g := NewWithT(t) + ctx := context.Background() + + node := newGenesisNode("mynet-0", "default") + node.Finalizers = []string{nodeFinalizerName} + node.Status.Phase = seiv1alpha1.PhaseRunning + node.Spec.Image = testImageV2 + node.Status.CurrentImage = testImageV2 + + sts := &appsv1.StatefulSet{ + ObjectMeta: metav1.ObjectMeta{Name: "mynet-0", Namespace: "default"}, + Spec: appsv1.StatefulSetSpec{ + Replicas: ptrInt32(1), + ServiceName: "mynet-0", + Selector: &metav1.LabelSelector{MatchLabels: map[string]string{"sei.io/node": "mynet-0"}}, + Template: corev1.PodTemplateSpec{ + ObjectMeta: metav1.ObjectMeta{Labels: map[string]string{"sei.io/node": "mynet-0"}}, + Spec: corev1.PodSpec{Containers: []corev1.Container{{Name: "seid", Image: testImageV2}}}, + }, + }, + } + + s := newNodeTestScheme(t) + c := fake.NewClientBuilder(). + WithScheme(s). + WithObjects(node, sts). + WithStatusSubresource(&seiv1alpha1.SeiNode{}, &appsv1.StatefulSet{}). + Build() + + sts.Status.CurrentRevision = testRevision + sts.Status.UpdateRevision = testRevision + sts.Status.ReadyReplicas = 1 + g.Expect(c.Status().Update(ctx, sts)).To(Succeed()) + + r := &SeiNodeReconciler{Client: c, Scheme: s} + g.Expect(r.observeCurrentImage(ctx, node)).To(Succeed()) + + fetched := getSeiNode(t, ctx, c, "mynet-0", "default") + g.Expect(fetched.Status.CurrentImage).To(Equal(testImageV2)) +} + +func TestObserveCurrentImage_StatefulSetNotFound(t *testing.T) { + g := NewWithT(t) + ctx := context.Background() + + node := newGenesisNode("mynet-0", "default") + node.Finalizers = []string{nodeFinalizerName} + node.Status.Phase = seiv1alpha1.PhaseRunning + node.Spec.Image = testImageV2 + + s := newNodeTestScheme(t) + c := fake.NewClientBuilder(). + WithScheme(s). + WithObjects(node). + WithStatusSubresource(&seiv1alpha1.SeiNode{}). + Build() + + r := &SeiNodeReconciler{Client: c, Scheme: s} + g.Expect(r.observeCurrentImage(ctx, node)).To(Succeed()) + + fetched := getSeiNode(t, ctx, c, "mynet-0", "default") + g.Expect(fetched.Status.CurrentImage).To(BeEmpty()) +} + +func ptrInt32(v int32) *int32 { return &v } + func TestNodeDeletion_SnapshotNode_WithoutRetain_DeletesPVC(t *testing.T) { g := NewWithT(t) ctx := context.Background() diff --git a/internal/controller/nodedeployment/labels.go b/internal/controller/nodedeployment/labels.go index 69737bf..253ce57 100644 --- a/internal/controller/nodedeployment/labels.go +++ b/internal/controller/nodedeployment/labels.go @@ -26,8 +26,8 @@ func seiNodeName(group *seiv1alpha1.SeiNodeDeployment, ordinal int) string { // node set. Used for traffic routing during deployments and for // observability labels on pods. func activeRevision(group *seiv1alpha1.SeiNodeDeployment) string { - if group.Status.Deployment != nil && group.Status.Deployment.IncumbentRevision != "" { - return group.Status.Deployment.IncumbentRevision + if group.Status.Rollout != nil && group.Status.Rollout.IncumbentRevision != "" { + return group.Status.Rollout.IncumbentRevision } return strconv.FormatInt(group.Generation, 10) } @@ -41,7 +41,7 @@ func externalServiceName(group *seiv1alpha1.SeiNodeDeployment) string { // deployment, it includes the revision label to pin traffic to the // active set. At steady state, it selects by group membership only. func groupSelector(group *seiv1alpha1.SeiNodeDeployment) map[string]string { - if group.Status.Deployment != nil { + if group.Status.Rollout != nil { return map[string]string{ groupLabel: group.Name, revisionLabel: activeRevision(group), @@ -96,9 +96,10 @@ func managedByAnnotations() map[string]string { return map[string]string{managedByAnnotation: controllerName} } -// templateHash computes a hash over spec fields that require new nodes -// when changed. Any container image change triggers a full pod restart, -// so both the chain binary and sidecar images are included. +// templateHash computes a hash over spec fields that trigger a deployment +// plan when changed. Currently tracked: chainId, image, entrypoint, and +// sidecar image. Fields like overrides, peers, and replica count propagate +// in-place via ensureSeiNode without requiring a deployment plan. func templateHash(spec *seiv1alpha1.SeiNodeSpec) string { h := sha256.New() h.Write([]byte(spec.ChainID)) diff --git a/internal/controller/nodedeployment/nodes.go b/internal/controller/nodedeployment/nodes.go index 0659448..1ff6714 100644 --- a/internal/controller/nodedeployment/nodes.go +++ b/internal/controller/nodedeployment/nodes.go @@ -82,29 +82,52 @@ func (r *SeiNodeDeploymentReconciler) detectGenesisCeremonyNeeded(group *seiv1al // fields that require new nodes (image, entrypoint, chainId) are hashed; // sidecar, overrides, and replica changes propagate in-place. func (r *SeiNodeDeploymentReconciler) detectDeploymentNeeded(group *seiv1alpha1.SeiNodeDeployment) { - if group.Spec.UpdateStrategy == nil { - return - } if group.Status.TemplateHash == "" { return // first reconcile, no baseline to compare against } - if group.Status.Deployment != nil { - return - } - if hasConditionTrue(group, seiv1alpha1.ConditionPlanInProgress) { - return - } currentHash := templateHash(&group.Spec.Template.Spec) if currentHash == group.Status.TemplateHash { return // no deployment-worthy fields changed } - group.Status.Deployment = &seiv1alpha1.DeploymentStatus{ + // Supersession: if the spec moved since the active rollout was created, + // replace the stale plan so the controller converges on the latest spec. + if hasConditionTrue(group, seiv1alpha1.ConditionRolloutInProgress) { + if group.Status.Rollout != nil && group.Status.Rollout.TargetHash == currentHash { + return // rollout already targets the current spec + } + group.Status.Plan = nil + r.Recorder.Eventf(group, corev1.EventTypeNormal, "RolloutSuperseded", + "Spec changed during active rollout, replacing plan (old target: %s)", group.Status.Rollout.TargetHash) + } + + if !hasConditionTrue(group, seiv1alpha1.ConditionRolloutInProgress) && + hasConditionTrue(group, seiv1alpha1.ConditionPlanInProgress) { + return // non-deployment plan in progress (e.g. genesis) + } + + strategyType := group.Spec.UpdateStrategy.Type + if strategyType == "" { + log.Log.Info("updateStrategy.type is empty, treating as InPlace — update the manifest", + "group", group.Name, "namespace", group.Namespace) + strategyType = seiv1alpha1.UpdateStrategyInPlace + } + + group.Status.Rollout = &seiv1alpha1.RolloutStatus{ + Strategy: strategyType, + TargetHash: currentHash, + StartedAt: metav1.Now(), IncumbentRevision: planner.IncumbentRevision(group), EntrantRevision: planner.EntrantRevision(group), - EntrantNodes: planner.EntrantNodeNames(group), + IncumbentNodes: group.Status.IncumbentNodes, } + + setCondition(group, seiv1alpha1.ConditionRolloutInProgress, metav1.ConditionTrue, + "TemplateChanged", fmt.Sprintf("templateHash changed from %s to %s", group.Status.TemplateHash, currentHash)) + + r.Recorder.Eventf(group, corev1.EventTypeNormal, "RolloutStarted", + "InPlace rollout started (strategy: %s, target: %s)", strategyType, currentHash[:8]) } // populateIncumbentNodes lists child SeiNodes and records their names diff --git a/internal/controller/nodedeployment/nodes_test.go b/internal/controller/nodedeployment/nodes_test.go index 250b2a2..97d6586 100644 --- a/internal/controller/nodedeployment/nodes_test.go +++ b/internal/controller/nodedeployment/nodes_test.go @@ -4,7 +4,9 @@ import ( "testing" . "github.com/onsi/gomega" + apimeta "k8s.io/apimachinery/pkg/api/meta" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" + "k8s.io/client-go/tools/record" seiv1alpha1 "github.com/sei-protocol/sei-k8s-controller/api/v1alpha1" ) @@ -141,6 +143,92 @@ func TestGenerateSeiNode_NoAnnotationsWhenNil(t *testing.T) { g.Expect(node.Annotations).To(BeNil()) } +func TestDetectDeploymentNeeded_InPlace_SetsRolloutInProgress(t *testing.T) { + g := NewWithT(t) + group := newTestGroup("archive-rpc", "sei") + group.Spec.UpdateStrategy = seiv1alpha1.UpdateStrategy{Type: seiv1alpha1.UpdateStrategyInPlace} + group.Status.TemplateHash = testOldHash + group.Status.IncumbentNodes = []string{"archive-rpc-0", "archive-rpc-1", "archive-rpc-2"} + + r := &SeiNodeDeploymentReconciler{Recorder: record.NewFakeRecorder(10)} + r.detectDeploymentNeeded(group) + + cond := apimeta.FindStatusCondition(group.Status.Conditions, seiv1alpha1.ConditionRolloutInProgress) + g.Expect(cond).NotTo(BeNil()) + g.Expect(cond.Status).To(Equal(metav1.ConditionTrue)) + g.Expect(cond.Reason).To(Equal("TemplateChanged")) + + g.Expect(group.Status.Rollout).NotTo(BeNil()) + g.Expect(group.Status.Rollout.Strategy).To(Equal(seiv1alpha1.UpdateStrategyInPlace)) + g.Expect(group.Status.Rollout.TargetHash).NotTo(BeEmpty()) + g.Expect(group.Status.Rollout.IncumbentNodes).To(ConsistOf("archive-rpc-0", "archive-rpc-1", "archive-rpc-2")) +} + +func TestDetectDeploymentNeeded_InPlace_AlreadyActive_SameTarget(t *testing.T) { + g := NewWithT(t) + group := newTestGroup("archive-rpc", "sei") + group.Spec.UpdateStrategy = seiv1alpha1.UpdateStrategy{Type: seiv1alpha1.UpdateStrategyInPlace} + group.Status.TemplateHash = testOldHash + group.Status.IncumbentNodes = []string{"archive-rpc-0"} + + currentHash := templateHash(&group.Spec.Template.Spec) + + setCondition(group, seiv1alpha1.ConditionRolloutInProgress, metav1.ConditionTrue, + "TemplateChanged", "already rolling") + + existingRollout := &seiv1alpha1.RolloutStatus{ + Strategy: seiv1alpha1.UpdateStrategyInPlace, + TargetHash: currentHash, + } + group.Status.Rollout = existingRollout + + r := &SeiNodeDeploymentReconciler{Recorder: record.NewFakeRecorder(10)} + r.detectDeploymentNeeded(group) + + g.Expect(group.Status.Rollout).To(Equal(existingRollout)) +} + +func TestDetectDeploymentNeeded_InPlace_Supersedes_StaleRollout(t *testing.T) { + g := NewWithT(t) + group := newTestGroup("archive-rpc", "sei") + group.Spec.UpdateStrategy = seiv1alpha1.UpdateStrategy{Type: seiv1alpha1.UpdateStrategyInPlace} + group.Status.TemplateHash = testOldHash + group.Status.IncumbentNodes = []string{"archive-rpc-0"} + + setCondition(group, seiv1alpha1.ConditionRolloutInProgress, metav1.ConditionTrue, + "TemplateChanged", "already rolling") + + group.Status.Rollout = &seiv1alpha1.RolloutStatus{ + Strategy: seiv1alpha1.UpdateStrategyInPlace, + TargetHash: "stale-hash", + } + group.Status.Plan = &seiv1alpha1.TaskPlan{Phase: seiv1alpha1.TaskPlanActive} + + r := &SeiNodeDeploymentReconciler{Recorder: record.NewFakeRecorder(10)} + r.detectDeploymentNeeded(group) + + g.Expect(group.Status.Rollout.TargetHash).NotTo(Equal("stale-hash")) + g.Expect(group.Status.Plan).To(BeNil()) +} + +func TestDetectDeploymentNeeded_EmptyType_TreatedAsInPlace(t *testing.T) { + g := NewWithT(t) + group := newTestGroup("archive-rpc", "sei") + group.Spec.UpdateStrategy = seiv1alpha1.UpdateStrategy{Type: ""} + group.Status.TemplateHash = testOldHash + group.Status.IncumbentNodes = []string{"archive-rpc-0"} + + r := &SeiNodeDeploymentReconciler{Recorder: record.NewFakeRecorder(10)} + r.detectDeploymentNeeded(group) + + g.Expect(group.Status.Rollout).NotTo(BeNil()) + g.Expect(group.Status.Rollout.Strategy).To(Equal(seiv1alpha1.UpdateStrategyInPlace)) + + cond := apimeta.FindStatusCondition(group.Status.Conditions, seiv1alpha1.ConditionRolloutInProgress) + g.Expect(cond).NotTo(BeNil()) + g.Expect(cond.Status).To(Equal(metav1.ConditionTrue)) +} + func TestGenerateSeiNode_DeepCopiesTemplate(t *testing.T) { g := NewWithT(t) group := newTestGroup("archive-rpc", "sei") diff --git a/internal/controller/nodedeployment/plan.go b/internal/controller/nodedeployment/plan.go index a34db8e..4a11fc1 100644 --- a/internal/controller/nodedeployment/plan.go +++ b/internal/controller/nodedeployment/plan.go @@ -73,14 +73,17 @@ func (r *SeiNodeDeploymentReconciler) startPlan(ctx context.Context, group *seiv func (r *SeiNodeDeploymentReconciler) completePlan(ctx context.Context, group *seiv1alpha1.SeiNodeDeployment, statusBase client.Patch) (ctrl.Result, error) { logger := log.FromContext(ctx) - isDeploymentPlan := group.Status.Deployment != nil + isDeploymentPlan := group.Status.Rollout != nil if isDeploymentPlan { group.Status.ObservedGeneration = group.Generation if err := r.reconcileNetworking(ctx, group); err != nil { return ctrl.Result{}, fmt.Errorf("reconciling networking after deployment: %w", err) } - group.Status.Deployment = nil + group.Status.Rollout = nil + setCondition(group, seiv1alpha1.ConditionRolloutInProgress, metav1.ConditionFalse, + "RolloutComplete", "Deployment completed successfully") + r.Recorder.Event(group, corev1.EventTypeNormal, "RolloutComplete", "Deployment rollout completed successfully") } if group.Spec.Genesis != nil && !isDeploymentPlan { @@ -105,7 +108,11 @@ func (r *SeiNodeDeploymentReconciler) failPlan(ctx context.Context, group *seiv1 group.Status.Phase = seiv1alpha1.GroupPhaseDegraded group.Status.Plan = nil - group.Status.Deployment = nil + if group.Status.Rollout != nil { + group.Status.Rollout = nil + setCondition(group, seiv1alpha1.ConditionRolloutInProgress, metav1.ConditionFalse, + "RolloutFailed", "Deployment plan failed") + } clearPlanInProgress(group, "PlanFailed", "Plan failed") r.Recorder.Event(group, corev1.EventTypeWarning, "PlanFailed", "Plan failed") diff --git a/internal/controller/nodedeployment/plan_test.go b/internal/controller/nodedeployment/plan_test.go new file mode 100644 index 0000000..0dfa2bd --- /dev/null +++ b/internal/controller/nodedeployment/plan_test.go @@ -0,0 +1,180 @@ +package nodedeployment + +import ( + "context" + "testing" + + . "github.com/onsi/gomega" + apimeta "k8s.io/apimachinery/pkg/api/meta" + metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" + k8sruntime "k8s.io/apimachinery/pkg/runtime" + clientgoscheme "k8s.io/client-go/kubernetes/scheme" + "k8s.io/client-go/tools/record" + "sigs.k8s.io/controller-runtime/pkg/client" + "sigs.k8s.io/controller-runtime/pkg/client/fake" + + seiv1alpha1 "github.com/sei-protocol/sei-k8s-controller/api/v1alpha1" +) + +func newPlanTestScheme(t *testing.T) *k8sruntime.Scheme { + t.Helper() + s := k8sruntime.NewScheme() + if err := clientgoscheme.AddToScheme(s); err != nil { + t.Fatal(err) + } + if err := seiv1alpha1.AddToScheme(s); err != nil { + t.Fatal(err) + } + return s +} + +func newPlanTestReconciler(t *testing.T, objs ...client.Object) (*SeiNodeDeploymentReconciler, client.Client) { + t.Helper() + s := newPlanTestScheme(t) + c := fake.NewClientBuilder(). + WithScheme(s). + WithObjects(objs...). + WithStatusSubresource(&seiv1alpha1.SeiNodeDeployment{}). + Build() + r := &SeiNodeDeploymentReconciler{ + Client: c, + Scheme: s, + Recorder: record.NewFakeRecorder(100), + } + return r, c +} + +func TestCompletePlan_ClearsRolloutInProgress(t *testing.T) { + g := NewWithT(t) + ctx := context.Background() + + group := newTestGroup("archive-rpc", "sei") + group.Generation = 3 + group.Status.Rollout = &seiv1alpha1.RolloutStatus{ + Strategy: seiv1alpha1.UpdateStrategyInPlace, + TargetHash: "newhash1234", + StartedAt: metav1.Now(), + Nodes: []seiv1alpha1.RolloutNodeStatus{ + {Name: "archive-rpc-0", Ready: true, Phase: seiv1alpha1.PhaseRunning}, + }, + } + group.Status.Plan = &seiv1alpha1.TaskPlan{Phase: seiv1alpha1.TaskPlanComplete} + setPlanInProgress(group, "Deployment", "deploying") + setCondition(group, seiv1alpha1.ConditionRolloutInProgress, metav1.ConditionTrue, + "TemplateChanged", "hash changed") + + childNode := &seiv1alpha1.SeiNode{ + ObjectMeta: metav1.ObjectMeta{ + Name: "archive-rpc-0", + Namespace: "sei", + Labels: map[string]string{groupLabel: "archive-rpc"}, + OwnerReferences: []metav1.OwnerReference{{ + APIVersion: "sei.io/v1alpha1", + Kind: "SeiNodeDeployment", + Name: "archive-rpc", + UID: group.UID, + Controller: boolPtr(true), + }}, + }, + Status: seiv1alpha1.SeiNodeStatus{Phase: seiv1alpha1.PhaseRunning}, + } + + r, c := newPlanTestReconciler(t, group, childNode) + + statusBase := client.MergeFromWithOptions(group.DeepCopy(), client.MergeFromWithOptimisticLock{}) + _, err := r.completePlan(ctx, group, statusBase) + g.Expect(err).NotTo(HaveOccurred()) + + fetched := &seiv1alpha1.SeiNodeDeployment{} + g.Expect(c.Get(ctx, client.ObjectKeyFromObject(group), fetched)).To(Succeed()) + + g.Expect(fetched.Status.Rollout).To(BeNil()) + g.Expect(fetched.Status.Plan).To(BeNil()) + + rolloutCond := apimeta.FindStatusCondition(fetched.Status.Conditions, seiv1alpha1.ConditionRolloutInProgress) + g.Expect(rolloutCond).NotTo(BeNil()) + g.Expect(rolloutCond.Status).To(Equal(metav1.ConditionFalse)) + g.Expect(rolloutCond.Reason).To(Equal("RolloutComplete")) + + planCond := apimeta.FindStatusCondition(fetched.Status.Conditions, seiv1alpha1.ConditionPlanInProgress) + g.Expect(planCond).NotTo(BeNil()) + g.Expect(planCond.Status).To(Equal(metav1.ConditionFalse)) +} + +func TestFailPlan_ClearsRolloutInProgress(t *testing.T) { + g := NewWithT(t) + ctx := context.Background() + + group := newTestGroup("archive-rpc", "sei") + group.Generation = 3 + group.Status.Rollout = &seiv1alpha1.RolloutStatus{ + Strategy: seiv1alpha1.UpdateStrategyInPlace, + TargetHash: "newhash1234", + StartedAt: metav1.Now(), + Nodes: []seiv1alpha1.RolloutNodeStatus{ + {Name: "archive-rpc-0"}, + {Name: "archive-rpc-1"}, + {Name: "archive-rpc-2"}, + }, + } + group.Status.Plan = &seiv1alpha1.TaskPlan{Phase: seiv1alpha1.TaskPlanFailed} + setPlanInProgress(group, "Deployment", "deploying") + setCondition(group, seiv1alpha1.ConditionRolloutInProgress, metav1.ConditionTrue, + "TemplateChanged", "hash changed") + + ownerRef := metav1.OwnerReference{ + APIVersion: "sei.io/v1alpha1", + Kind: "SeiNodeDeployment", + Name: "archive-rpc", + UID: group.UID, + Controller: boolPtr(true), + } + childRunning := &seiv1alpha1.SeiNode{ + ObjectMeta: metav1.ObjectMeta{ + Name: "archive-rpc-0", Namespace: "sei", + Labels: map[string]string{groupLabel: "archive-rpc"}, + OwnerReferences: []metav1.OwnerReference{ownerRef}, + }, + Status: seiv1alpha1.SeiNodeStatus{Phase: seiv1alpha1.PhaseRunning}, + } + childFailed := &seiv1alpha1.SeiNode{ + ObjectMeta: metav1.ObjectMeta{ + Name: "archive-rpc-1", Namespace: "sei", + Labels: map[string]string{groupLabel: "archive-rpc"}, + OwnerReferences: []metav1.OwnerReference{ownerRef}, + }, + Status: seiv1alpha1.SeiNodeStatus{Phase: seiv1alpha1.PhaseFailed}, + } + childFailed2 := &seiv1alpha1.SeiNode{ + ObjectMeta: metav1.ObjectMeta{ + Name: "archive-rpc-2", Namespace: "sei", + Labels: map[string]string{groupLabel: "archive-rpc"}, + OwnerReferences: []metav1.OwnerReference{ownerRef}, + }, + Status: seiv1alpha1.SeiNodeStatus{Phase: seiv1alpha1.PhaseFailed}, + } + + r, c := newPlanTestReconciler(t, group, childRunning, childFailed, childFailed2) + + statusBase := client.MergeFromWithOptions(group.DeepCopy(), client.MergeFromWithOptimisticLock{}) + _, err := r.failPlan(ctx, group, statusBase) + g.Expect(err).NotTo(HaveOccurred()) + + fetched := &seiv1alpha1.SeiNodeDeployment{} + g.Expect(c.Get(ctx, client.ObjectKeyFromObject(group), fetched)).To(Succeed()) + + g.Expect(fetched.Status.Rollout).To(BeNil()) + g.Expect(fetched.Status.Plan).To(BeNil()) + g.Expect(fetched.Status.Phase).To(Equal(seiv1alpha1.GroupPhaseDegraded)) + + rolloutCond := apimeta.FindStatusCondition(fetched.Status.Conditions, seiv1alpha1.ConditionRolloutInProgress) + g.Expect(rolloutCond).NotTo(BeNil()) + g.Expect(rolloutCond.Status).To(Equal(metav1.ConditionFalse)) + g.Expect(rolloutCond.Reason).To(Equal("RolloutFailed")) + + planCond := apimeta.FindStatusCondition(fetched.Status.Conditions, seiv1alpha1.ConditionPlanInProgress) + g.Expect(planCond).NotTo(BeNil()) + g.Expect(planCond.Status).To(Equal(metav1.ConditionFalse)) +} + +func boolPtr(b bool) *bool { return &b } diff --git a/internal/controller/nodedeployment/status.go b/internal/controller/nodedeployment/status.go index 7ab3c2f..9cc6495 100644 --- a/internal/controller/nodedeployment/status.go +++ b/internal/controller/nodedeployment/status.go @@ -40,6 +40,8 @@ func (r *SeiNodeDeploymentReconciler) updateStatus(ctx context.Context, group *s group.Status.ReadyReplicas = readyReplicas group.Status.Nodes = nodeStatuses + reconcileRolloutStatus(group, nodes) + group.Status.Phase = computeGroupPhase(group, readyReplicas, group.Spec.Replicas, nodes) group.Status.NetworkingStatus = r.buildNetworkingStatus(group) @@ -49,9 +51,43 @@ func (r *SeiNodeDeploymentReconciler) updateStatus(ctx context.Context, group *s return r.Status().Patch(ctx, group, statusBase) } +func reconcileRolloutStatus(group *seiv1alpha1.SeiNodeDeployment, nodes []seiv1alpha1.SeiNode) { + if group.Status.Rollout == nil || group.Status.Rollout.Strategy != seiv1alpha1.UpdateStrategyInPlace { + return + } + + nodePhaseMap := make(map[string]seiv1alpha1.SeiNodePhase, len(nodes)) + for i := range nodes { + nodePhaseMap[nodes[i].Name] = nodes[i].Status.Phase + } + + allReady := true + for i := range group.Status.Rollout.Nodes { + rn := &group.Status.Rollout.Nodes[i] + phase := nodePhaseMap[rn.Name] + rn.Phase = phase + rn.Ready = phase == seiv1alpha1.PhaseRunning + if !rn.Ready { + allReady = false + } + } + + if allReady && !hasConditionTrue(group, seiv1alpha1.ConditionPlanInProgress) { + group.Status.TemplateHash = group.Status.Rollout.TargetHash + group.Status.ObservedGeneration = group.Generation + group.Status.Rollout = nil + setCondition(group, seiv1alpha1.ConditionRolloutInProgress, metav1.ConditionFalse, + "RolloutComplete", "All nodes converged") + return + } +} + func computeGroupPhase(group *seiv1alpha1.SeiNodeDeployment, ready, desired int32, nodes []seiv1alpha1.SeiNode) seiv1alpha1.SeiNodeDeploymentPhase { + if hasConditionTrue(group, seiv1alpha1.ConditionRolloutInProgress) { + return seiv1alpha1.GroupPhaseUpgrading + } if hasConditionTrue(group, seiv1alpha1.ConditionPlanInProgress) { - if group.Status.Deployment != nil { + if group.Status.Rollout != nil { return seiv1alpha1.GroupPhaseUpgrading } return seiv1alpha1.GroupPhaseInitializing diff --git a/internal/controller/nodedeployment/status_test.go b/internal/controller/nodedeployment/status_test.go index 03b46ec..17162e6 100644 --- a/internal/controller/nodedeployment/status_test.go +++ b/internal/controller/nodedeployment/status_test.go @@ -4,10 +4,14 @@ import ( "testing" . "github.com/onsi/gomega" + apimeta "k8s.io/apimachinery/pkg/api/meta" + metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" seiv1alpha1 "github.com/sei-protocol/sei-k8s-controller/api/v1alpha1" ) +const testOldHash = "oldhash1234" + func emptyGroup() *seiv1alpha1.SeiNodeDeployment { return &seiv1alpha1.SeiNodeDeployment{} } @@ -69,7 +73,9 @@ func TestComputeGroupPhase_AllFailed(t *testing.T) { func TestComputeGroupPhase_Upgrading(t *testing.T) { g := NewWithT(t) group := emptyGroup() - group.Status.Deployment = &seiv1alpha1.DeploymentStatus{ + group.Status.Rollout = &seiv1alpha1.RolloutStatus{ + Strategy: seiv1alpha1.UpdateStrategyBlueGreen, + TargetHash: "abc123", IncumbentRevision: "1", EntrantRevision: "2", } @@ -98,6 +104,104 @@ func makeNodes(n int, phase seiv1alpha1.SeiNodePhase) []seiv1alpha1.SeiNode { // --- NetworkingStatus --- +func TestReconcileRolloutStatus_InPlace_AllReady(t *testing.T) { + g := NewWithT(t) + group := emptyGroup() + group.Generation = 2 + group.Status.Rollout = &seiv1alpha1.RolloutStatus{ + Strategy: seiv1alpha1.UpdateStrategyInPlace, + TargetHash: "newhash1234", + StartedAt: metav1.Now(), + Nodes: []seiv1alpha1.RolloutNodeStatus{ + {Name: "node-0"}, + {Name: "node-1"}, + }, + } + group.Status.TemplateHash = testOldHash + + nodes := []seiv1alpha1.SeiNode{ + {ObjectMeta: metav1.ObjectMeta{Name: "node-0"}, Status: seiv1alpha1.SeiNodeStatus{Phase: seiv1alpha1.PhaseRunning}}, + {ObjectMeta: metav1.ObjectMeta{Name: "node-1"}, Status: seiv1alpha1.SeiNodeStatus{Phase: seiv1alpha1.PhaseRunning}}, + } + + reconcileRolloutStatus(group, nodes) + + g.Expect(group.Status.Rollout).To(BeNil()) + g.Expect(group.Status.TemplateHash).To(Equal("newhash1234")) + g.Expect(group.Status.ObservedGeneration).To(Equal(int64(2))) + + cond := apimeta.FindStatusCondition(group.Status.Conditions, seiv1alpha1.ConditionRolloutInProgress) + g.Expect(cond).NotTo(BeNil()) + g.Expect(cond.Status).To(Equal(metav1.ConditionFalse)) + g.Expect(cond.Reason).To(Equal("RolloutComplete")) +} + +func TestReconcileRolloutStatus_InPlace_DoesNotClearWhilePlanActive(t *testing.T) { + g := NewWithT(t) + group := emptyGroup() + group.Generation = 2 + group.Status.Rollout = &seiv1alpha1.RolloutStatus{ + Strategy: seiv1alpha1.UpdateStrategyInPlace, + TargetHash: "newhash1234", + StartedAt: metav1.Now(), + Nodes: []seiv1alpha1.RolloutNodeStatus{ + {Name: "node-0"}, + {Name: "node-1"}, + }, + } + group.Status.TemplateHash = testOldHash + setPlanInProgress(group, "Deployment", "deploying") + + nodes := []seiv1alpha1.SeiNode{ + {ObjectMeta: metav1.ObjectMeta{Name: "node-0"}, Status: seiv1alpha1.SeiNodeStatus{Phase: seiv1alpha1.PhaseRunning}}, + {ObjectMeta: metav1.ObjectMeta{Name: "node-1"}, Status: seiv1alpha1.SeiNodeStatus{Phase: seiv1alpha1.PhaseRunning}}, + } + + reconcileRolloutStatus(group, nodes) + + g.Expect(group.Status.Rollout).NotTo(BeNil(), "rollout should not be cleared while PlanInProgress is true") + g.Expect(group.Status.TemplateHash).To(Equal(testOldHash), "templateHash should not change while plan is active") + g.Expect(group.Status.Rollout.Nodes[0].Ready).To(BeTrue()) + g.Expect(group.Status.Rollout.Nodes[1].Ready).To(BeTrue()) +} + +func TestReconcileRolloutStatus_InPlace_Partial(t *testing.T) { + g := NewWithT(t) + group := emptyGroup() + group.Status.Rollout = &seiv1alpha1.RolloutStatus{ + Strategy: seiv1alpha1.UpdateStrategyInPlace, + TargetHash: "newhash1234", + StartedAt: metav1.Now(), + Nodes: []seiv1alpha1.RolloutNodeStatus{ + {Name: "node-0"}, + {Name: "node-1"}, + }, + } + group.Status.TemplateHash = testOldHash + + nodes := []seiv1alpha1.SeiNode{ + {ObjectMeta: metav1.ObjectMeta{Name: "node-0"}, Status: seiv1alpha1.SeiNodeStatus{Phase: seiv1alpha1.PhaseRunning}}, + {ObjectMeta: metav1.ObjectMeta{Name: "node-1"}, Status: seiv1alpha1.SeiNodeStatus{Phase: seiv1alpha1.PhaseInitializing}}, + } + + reconcileRolloutStatus(group, nodes) + + g.Expect(group.Status.Rollout).NotTo(BeNil()) + g.Expect(group.Status.TemplateHash).To(Equal(testOldHash)) + g.Expect(group.Status.Rollout.Nodes[0].Ready).To(BeTrue()) + g.Expect(group.Status.Rollout.Nodes[1].Ready).To(BeFalse()) +} + +func TestComputeGroupPhase_RolloutInProgress(t *testing.T) { + g := NewWithT(t) + group := emptyGroup() + setCondition(group, seiv1alpha1.ConditionRolloutInProgress, metav1.ConditionTrue, + "TemplateChanged", "hash changed") + nodes := makeNodes(3, seiv1alpha1.PhaseRunning) + phase := computeGroupPhase(group, 3, 3, nodes) + g.Expect(phase).To(Equal(seiv1alpha1.GroupPhaseUpgrading)) +} + func TestBuildNetworkingStatus_FullMode_DualDomain(t *testing.T) { g := NewWithT(t) group := newTestGroup("pacific-1-wave", "pacific-1") diff --git a/internal/planner/deployment.go b/internal/planner/deployment.go index b47511e..4c6a38b 100644 --- a/internal/planner/deployment.go +++ b/internal/planner/deployment.go @@ -13,10 +13,9 @@ import ( // ForDeployment returns the appropriate GroupPlanner for the group's // configured update strategy. func ForDeployment(group *seiv1alpha1.SeiNodeDeployment) (GroupPlanner, error) { - if group.Spec.UpdateStrategy == nil { - return nil, fmt.Errorf("no update strategy on %s/%s", group.Namespace, group.Name) - } switch group.Spec.UpdateStrategy.Type { + case seiv1alpha1.UpdateStrategyInPlace: + return &inPlaceDeploymentPlanner{}, nil case seiv1alpha1.UpdateStrategyHardFork: return &hardForkDeploymentPlanner{}, nil case seiv1alpha1.UpdateStrategyBlueGreen: @@ -44,8 +43,8 @@ func EntrantRevision(group *seiv1alpha1.SeiNodeDeployment) string { // IncumbentRevision returns the revision string for the incumbent set, // derived from the last successfully reconciled generation. func IncumbentRevision(group *seiv1alpha1.SeiNodeDeployment) string { - if group.Status.Deployment != nil && group.Status.Deployment.IncumbentRevision != "" { - return group.Status.Deployment.IncumbentRevision + if group.Status.Rollout != nil && group.Status.Rollout.IncumbentRevision != "" { + return group.Status.Rollout.IncumbentRevision } return strconv.FormatInt(group.Status.ObservedGeneration, 10) } @@ -111,6 +110,46 @@ func (p *hardForkDeploymentPlanner) BuildPlan( return &seiv1alpha1.TaskPlan{ID: planID, Phase: seiv1alpha1.TaskPlanActive, Tasks: tasks}, nil } +// inPlaceDeploymentPlanner builds a deployment plan for the InPlace strategy. +type inPlaceDeploymentPlanner struct{} + +func (p *inPlaceDeploymentPlanner) BuildPlan( + group *seiv1alpha1.SeiNodeDeployment, +) (*seiv1alpha1.TaskPlan, error) { + planID := uuid.New().String() + nodeNames := group.Status.IncumbentNodes + ns := group.Namespace + + prog := []struct { + taskType string + params any + }{ + {task.TaskTypeUpdateNodeSpecs, &task.UpdateNodeSpecsParams{ + GroupName: group.Name, + Namespace: ns, + NodeNames: nodeNames, + }}, + {task.TaskTypeAwaitSpecUpdate, &task.AwaitSpecUpdateParams{ + Namespace: ns, + NodeNames: nodeNames, + }}, + {task.TaskTypeMarkNodesReady, &task.MarkNodesReadyParams{ + Namespace: ns, + NodeNames: nodeNames, + }}, + } + + tasks := make([]seiv1alpha1.PlannedTask, len(prog)) + for i, p := range prog { + t, err := buildPlannedTask(planID, p.taskType, i, p.params) + if err != nil { + return nil, err + } + tasks[i] = t + } + return &seiv1alpha1.TaskPlan{ID: planID, Phase: seiv1alpha1.TaskPlanActive, Tasks: tasks}, nil +} + // blueGreenDeploymentPlanner builds a deployment plan for the BlueGreen strategy. type blueGreenDeploymentPlanner struct{} diff --git a/internal/planner/deployment_test.go b/internal/planner/deployment_test.go new file mode 100644 index 0000000..8688c9c --- /dev/null +++ b/internal/planner/deployment_test.go @@ -0,0 +1,60 @@ +package planner + +import ( + "encoding/json" + "testing" + + . "github.com/onsi/gomega" + metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" + + seiv1alpha1 "github.com/sei-protocol/sei-k8s-controller/api/v1alpha1" + "github.com/sei-protocol/sei-k8s-controller/internal/task" +) + +func TestInPlacePlan_ThreeTasks(t *testing.T) { + g := NewWithT(t) + + group := &seiv1alpha1.SeiNodeDeployment{ + ObjectMeta: metav1.ObjectMeta{Name: "wave-group", Namespace: "pacific-1", Generation: 2}, + Spec: seiv1alpha1.SeiNodeDeploymentSpec{ + Replicas: 3, + UpdateStrategy: seiv1alpha1.UpdateStrategy{Type: seiv1alpha1.UpdateStrategyInPlace}, + }, + Status: seiv1alpha1.SeiNodeDeploymentStatus{ + IncumbentNodes: []string{"wave-group-0", "wave-group-1", "wave-group-2"}, + }, + } + + planner := &inPlaceDeploymentPlanner{} + plan, err := planner.BuildPlan(group) + g.Expect(err).NotTo(HaveOccurred()) + + g.Expect(plan.Phase).To(Equal(seiv1alpha1.TaskPlanActive)) + g.Expect(plan.Tasks).To(HaveLen(3)) + + g.Expect(plan.Tasks[0].Type).To(Equal(task.TaskTypeUpdateNodeSpecs)) + g.Expect(plan.Tasks[1].Type).To(Equal(task.TaskTypeAwaitSpecUpdate)) + g.Expect(plan.Tasks[2].Type).To(Equal(task.TaskTypeMarkNodesReady)) + + for i, pt := range plan.Tasks { + g.Expect(pt.Status).To(Equal(seiv1alpha1.TaskPending), "task[%d] should be Pending", i) + g.Expect(pt.ID).NotTo(BeEmpty(), "task[%d] should have an ID", i) + g.Expect(pt.Params).NotTo(BeNil(), "task[%d] should have params", i) + } + + var updateParams task.UpdateNodeSpecsParams + g.Expect(json.Unmarshal(plan.Tasks[0].Params.Raw, &updateParams)).To(Succeed()) + g.Expect(updateParams.GroupName).To(Equal("wave-group")) + g.Expect(updateParams.Namespace).To(Equal("pacific-1")) + g.Expect(updateParams.NodeNames).To(Equal([]string{"wave-group-0", "wave-group-1", "wave-group-2"})) + + var awaitParams task.AwaitSpecUpdateParams + g.Expect(json.Unmarshal(plan.Tasks[1].Params.Raw, &awaitParams)).To(Succeed()) + g.Expect(awaitParams.Namespace).To(Equal("pacific-1")) + g.Expect(awaitParams.NodeNames).To(Equal([]string{"wave-group-0", "wave-group-1", "wave-group-2"})) + + var markParams task.MarkNodesReadyParams + g.Expect(json.Unmarshal(plan.Tasks[2].Params.Raw, &markParams)).To(Succeed()) + g.Expect(markParams.Namespace).To(Equal("pacific-1")) + g.Expect(markParams.NodeNames).To(Equal([]string{"wave-group-0", "wave-group-1", "wave-group-2"})) +} diff --git a/internal/planner/planner.go b/internal/planner/planner.go index 793e85a..776dd0d 100644 --- a/internal/planner/planner.go +++ b/internal/planner/planner.go @@ -63,9 +63,9 @@ func ForGroup(group *seiv1alpha1.SeiNodeDeployment) (GroupPlanner, error) { return &genesisGroupPlanner{}, nil } - // Deployment: reconcileSeiNodes sets Deployment metadata when it + // Deployment: reconcileSeiNodes sets Rollout metadata when it // detects a spec change requiring deployment orchestration. - if group.Status.Deployment != nil && group.Status.Plan == nil { + if group.Status.Rollout != nil && group.Status.Plan == nil { return ForDeployment(group) } diff --git a/internal/task/deployment.go b/internal/task/deployment.go index 5955517..daf1622 100644 --- a/internal/task/deployment.go +++ b/internal/task/deployment.go @@ -5,6 +5,9 @@ import "github.com/google/uuid" // Controller-managed deployment task types. const ( TaskTypeCreateEntrantNodes = "create-entrant-nodes" + TaskTypeUpdateNodeSpecs = "update-node-specs" + TaskTypeAwaitSpecUpdate = "await-spec-update" + TaskTypeMarkNodesReady = "mark-nodes-ready" TaskTypeSubmitHaltSignal = "submit-halt-signal" TaskTypeAwaitNodesAtHeight = "await-nodes-at-height" TaskTypeAwaitNodesCaughtUp = "await-nodes-caught-up" @@ -56,6 +59,28 @@ type SwitchTrafficParams struct { EntrantRevision string `json:"entrantRevision"` } +// UpdateNodeSpecsParams holds parameters for patching child SeiNode specs +// during an InPlace deployment. +type UpdateNodeSpecsParams struct { + GroupName string `json:"groupName"` + Namespace string `json:"namespace"` + NodeNames []string `json:"nodeNames"` +} + +// AwaitSpecUpdateParams holds parameters for waiting until all nodes +// have converged to the desired image (status.currentImage == spec.image). +type AwaitSpecUpdateParams struct { + Namespace string `json:"namespace"` + NodeNames []string `json:"nodeNames"` +} + +// MarkNodesReadyParams holds parameters for submitting mark-ready to +// each node's sidecar after an InPlace rollout completes. +type MarkNodesReadyParams struct { + Namespace string `json:"namespace"` + NodeNames []string `json:"nodeNames"` +} + // TeardownNodesParams holds parameters for deleting incumbent SeiNode resources. type TeardownNodesParams struct { Namespace string `json:"namespace"` diff --git a/internal/task/deployment_switch.go b/internal/task/deployment_switch.go index ffdbe50..443a8b7 100644 --- a/internal/task/deployment_switch.go +++ b/internal/task/deployment_switch.go @@ -41,14 +41,14 @@ func (e *switchTrafficExecution) Execute(ctx context.Context) error { return Terminal(err) } - if group.Status.Deployment == nil { - return Terminal(fmt.Errorf("no deployment status on group %s", e.params.GroupName)) + if group.Status.Rollout == nil { + return Terminal(fmt.Errorf("no rollout status on group %s", e.params.GroupName)) } patch := client.MergeFrom(group.DeepCopy()) - group.Status.Deployment.IncumbentRevision = e.params.EntrantRevision + group.Status.Rollout.IncumbentRevision = e.params.EntrantRevision if err := e.cfg.KubeClient.Status().Patch(ctx, group, patch); err != nil { - return fmt.Errorf("patching deployment revision: %w", err) // transient + return fmt.Errorf("patching rollout revision: %w", err) // transient } log.FromContext(ctx).Info("traffic switched to entrant revision", diff --git a/internal/task/deployment_update.go b/internal/task/deployment_update.go new file mode 100644 index 0000000..7ca6483 --- /dev/null +++ b/internal/task/deployment_update.go @@ -0,0 +1,185 @@ +package task + +import ( + "context" + "encoding/json" + "fmt" + + sidecar "github.com/sei-protocol/seictl/sidecar/client" + "k8s.io/apimachinery/pkg/types" + "sigs.k8s.io/controller-runtime/pkg/log" + + seiv1alpha1 "github.com/sei-protocol/sei-k8s-controller/api/v1alpha1" +) + +// --- UpdateNodeSpecs: patches child SeiNode specs (image) --- + +type updateNodeSpecsExecution struct { + taskBase + params UpdateNodeSpecsParams + cfg ExecutionConfig +} + +func deserializeUpdateNodeSpecs(id string, params json.RawMessage, cfg ExecutionConfig) (TaskExecution, error) { + var p UpdateNodeSpecsParams + if len(params) > 0 { + if err := json.Unmarshal(params, &p); err != nil { + return nil, fmt.Errorf("deserializing update-node-specs params: %w", err) + } + } + return &updateNodeSpecsExecution{ + taskBase: taskBase{id: id, status: ExecutionRunning}, + params: p, + cfg: cfg, + }, nil +} + +func (e *updateNodeSpecsExecution) Execute(ctx context.Context) error { + logger := log.FromContext(ctx) + + group, err := ResourceAs[*seiv1alpha1.SeiNodeDeployment](e.cfg) + if err != nil { + return Terminal(err) + } + + desiredImage := group.Spec.Template.Spec.Image + + for _, name := range e.params.NodeNames { + node := &seiv1alpha1.SeiNode{} + if err := e.cfg.KubeClient.Get(ctx, types.NamespacedName{Name: name, Namespace: e.params.Namespace}, node); err != nil { + return fmt.Errorf("getting node %s: %w", name, err) + } + if node.Spec.Image == desiredImage { + continue + } + node.Spec.Image = desiredImage + if sc := group.Spec.Template.Spec.Sidecar; sc != nil && node.Spec.Sidecar != nil { + node.Spec.Sidecar.Image = sc.Image + } + if err := e.cfg.KubeClient.Update(ctx, node); err != nil { + return fmt.Errorf("updating node %s spec: %w", name, err) + } + logger.Info("updated node spec", "node", name, "image", desiredImage) + } + + e.complete() + return nil +} + +func (e *updateNodeSpecsExecution) Status(_ context.Context) ExecutionStatus { + return e.status +} + +// --- AwaitSpecUpdate: waits for StatefulSet rollout to complete --- + +type awaitSpecUpdateExecution struct { + taskBase + params AwaitSpecUpdateParams + cfg ExecutionConfig +} + +func deserializeAwaitSpecUpdate(id string, params json.RawMessage, cfg ExecutionConfig) (TaskExecution, error) { + var p AwaitSpecUpdateParams + if len(params) > 0 { + if err := json.Unmarshal(params, &p); err != nil { + return nil, fmt.Errorf("deserializing await-spec-update params: %w", err) + } + } + return &awaitSpecUpdateExecution{ + taskBase: taskBase{id: id, status: ExecutionRunning}, + params: p, + cfg: cfg, + }, nil +} + +func (e *awaitSpecUpdateExecution) Execute(_ context.Context) error { return nil } + +func (e *awaitSpecUpdateExecution) Status(ctx context.Context) ExecutionStatus { + if s, done := e.isTerminal(); done { + return s + } + // TODO: detect terminal pod failures (ImagePullBackOff, ErrImagePull) and + // fail the task instead of polling indefinitely. The kubelet waiting reason + // strings are not exported as constants in k8s.io/api — needs either + // hardcoded reason matching or a duration-based heuristic. + for _, name := range e.params.NodeNames { + node := &seiv1alpha1.SeiNode{} + if err := e.cfg.KubeClient.Get(ctx, types.NamespacedName{Name: name, Namespace: e.params.Namespace}, node); err != nil { + return ExecutionRunning + } + if node.Status.CurrentImage != node.Spec.Image { + return ExecutionRunning + } + } + e.complete() + return ExecutionComplete +} + +// --- MarkNodesReady: submits mark-ready to each node's sidecar --- + +type markNodesReadyExecution struct { + taskBase + params MarkNodesReadyParams + cfg ExecutionConfig + marked map[string]bool +} + +func deserializeMarkNodesReady(id string, params json.RawMessage, cfg ExecutionConfig) (TaskExecution, error) { + var p MarkNodesReadyParams + if len(params) > 0 { + if err := json.Unmarshal(params, &p); err != nil { + return nil, fmt.Errorf("deserializing mark-nodes-ready params: %w", err) + } + } + return &markNodesReadyExecution{ + taskBase: taskBase{id: id, status: ExecutionRunning}, + params: p, + cfg: cfg, + marked: make(map[string]bool, len(p.NodeNames)), + }, nil +} + +func (e *markNodesReadyExecution) Execute(_ context.Context) error { return nil } + +func (e *markNodesReadyExecution) Status(ctx context.Context) ExecutionStatus { + if s, done := e.isTerminal(); done { + return s + } + logger := log.FromContext(ctx) + + allReady := true + for _, name := range e.params.NodeNames { + if e.marked[name] { + continue + } + node := &seiv1alpha1.SeiNode{} + if err := e.cfg.KubeClient.Get(ctx, types.NamespacedName{Name: name, Namespace: e.params.Namespace}, node); err != nil { + allReady = false + continue + } + sc, err := sidecarClientForNode(node) + if err != nil { + allReady = false + continue + } + resp, err := sc.Status(ctx) + if err != nil { + allReady = false + continue + } + if resp.Status == sidecar.Ready { + e.marked[name] = true + continue + } + if _, err := sc.SubmitTask(ctx, sidecar.TaskRequest{Type: sidecar.TaskTypeMarkReady}); err != nil { + logger.V(1).Info("mark-ready submission failed", "node", name, "error", err) + } + allReady = false + } + + if allReady { + e.complete() + return ExecutionComplete + } + return ExecutionRunning +} diff --git a/internal/task/deployment_update_test.go b/internal/task/deployment_update_test.go new file mode 100644 index 0000000..987766c --- /dev/null +++ b/internal/task/deployment_update_test.go @@ -0,0 +1,286 @@ +package task + +import ( + "context" + "encoding/json" + "testing" + + metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" + "k8s.io/apimachinery/pkg/types" + "sigs.k8s.io/controller-runtime/pkg/client/fake" + + seiv1alpha1 "github.com/sei-protocol/sei-k8s-controller/api/v1alpha1" + "github.com/sei-protocol/sei-k8s-controller/internal/platform/platformtest" +) + +func testDeploymentGroup() *seiv1alpha1.SeiNodeDeployment { + return &seiv1alpha1.SeiNodeDeployment{ + ObjectMeta: metav1.ObjectMeta{Name: "wave", Namespace: "sei", UID: "uid-wave"}, + Spec: seiv1alpha1.SeiNodeDeploymentSpec{ + Replicas: 2, + Template: seiv1alpha1.SeiNodeTemplate{ + Spec: seiv1alpha1.SeiNodeSpec{ + ChainID: "pacific-1", + Image: "sei:v2.0.0", + FullNode: &seiv1alpha1.FullNodeSpec{ + Snapshot: &seiv1alpha1.SnapshotSource{ + S3: &seiv1alpha1.S3SnapshotSource{TargetHeight: 100}, + }, + }, + Sidecar: &seiv1alpha1.SidecarConfig{ + Image: "seictl:v2", + Port: 7777, + }, + }, + }, + }, + } +} + +func testDeploymentCfg(t *testing.T, group *seiv1alpha1.SeiNodeDeployment, nodes ...*seiv1alpha1.SeiNode) ExecutionConfig { + t.Helper() + s := testScheme(t) + builder := fake.NewClientBuilder(). + WithScheme(s). + WithObjects(group). + WithStatusSubresource(&seiv1alpha1.SeiNode{}) + for _, n := range nodes { + builder = builder.WithObjects(n) + } + c := builder.Build() + return ExecutionConfig{ + KubeClient: c, + Scheme: s, + Resource: group, + Platform: platformtest.Config(), + } +} + +// --- UpdateNodeSpecs --- + +func TestUpdateNodeSpecs_PatchesImage(t *testing.T) { + group := testDeploymentGroup() + node := &seiv1alpha1.SeiNode{ + ObjectMeta: metav1.ObjectMeta{Name: "wave-0", Namespace: "sei"}, + Spec: seiv1alpha1.SeiNodeSpec{ + ChainID: "pacific-1", + Image: "sei:v1.0.0", + FullNode: &seiv1alpha1.FullNodeSpec{ + Snapshot: &seiv1alpha1.SnapshotSource{ + S3: &seiv1alpha1.S3SnapshotSource{TargetHeight: 100}, + }, + }, + Sidecar: &seiv1alpha1.SidecarConfig{ + Image: "seictl:v1", + Port: 7777, + }, + }, + } + cfg := testDeploymentCfg(t, group, node) + + params := UpdateNodeSpecsParams{ + GroupName: "wave", + Namespace: "sei", + NodeNames: []string{"wave-0"}, + } + raw, _ := json.Marshal(params) + exec, err := deserializeUpdateNodeSpecs("id-1", raw, cfg) + if err != nil { + t.Fatalf("deserialize: %v", err) + } + + ctx := context.Background() + if err := exec.Execute(ctx); err != nil { + t.Fatalf("Execute: %v", err) + } + if exec.Status(ctx) != ExecutionComplete { + t.Fatalf("expected Complete, got %s", exec.Status(ctx)) + } + + fetched := &seiv1alpha1.SeiNode{} + if err := cfg.KubeClient.Get(ctx, types.NamespacedName{Name: "wave-0", Namespace: "sei"}, fetched); err != nil { + t.Fatalf("get node: %v", err) + } + if fetched.Spec.Image != "sei:v2.0.0" { + t.Errorf("image = %q, want %q", fetched.Spec.Image, "sei:v2.0.0") + } + if fetched.Spec.Sidecar.Image != "seictl:v2" { + t.Errorf("sidecar image = %q, want %q", fetched.Spec.Sidecar.Image, "seictl:v2") + } +} + +func TestUpdateNodeSpecs_SkipsCurrentImage(t *testing.T) { + group := testDeploymentGroup() + node := &seiv1alpha1.SeiNode{ + ObjectMeta: metav1.ObjectMeta{Name: "wave-0", Namespace: "sei"}, + Spec: seiv1alpha1.SeiNodeSpec{ + ChainID: "pacific-1", + Image: "sei:v2.0.0", + FullNode: &seiv1alpha1.FullNodeSpec{ + Snapshot: &seiv1alpha1.SnapshotSource{ + S3: &seiv1alpha1.S3SnapshotSource{TargetHeight: 100}, + }, + }, + Sidecar: &seiv1alpha1.SidecarConfig{ + Image: "seictl:v2", + Port: 7777, + }, + }, + } + cfg := testDeploymentCfg(t, group, node) + + params := UpdateNodeSpecsParams{ + GroupName: "wave", + Namespace: "sei", + NodeNames: []string{"wave-0"}, + } + raw, _ := json.Marshal(params) + exec, err := deserializeUpdateNodeSpecs("id-1", raw, cfg) + if err != nil { + t.Fatalf("deserialize: %v", err) + } + + ctx := context.Background() + if err := exec.Execute(ctx); err != nil { + t.Fatalf("Execute: %v", err) + } + if exec.Status(ctx) != ExecutionComplete { + t.Fatalf("expected Complete, got %s", exec.Status(ctx)) + } + + fetched := &seiv1alpha1.SeiNode{} + if err := cfg.KubeClient.Get(ctx, types.NamespacedName{Name: "wave-0", Namespace: "sei"}, fetched); err != nil { + t.Fatalf("get node: %v", err) + } + if fetched.Spec.Image != "sei:v2.0.0" { + t.Errorf("image should remain %q, got %q", "sei:v2.0.0", fetched.Spec.Image) + } +} + +// --- AwaitSpecUpdate --- + +func TestAwaitSpecUpdate_CompletesWhenConverged(t *testing.T) { + group := testDeploymentGroup() + node := &seiv1alpha1.SeiNode{ + ObjectMeta: metav1.ObjectMeta{Name: "wave-0", Namespace: "sei"}, + Spec: seiv1alpha1.SeiNodeSpec{ + ChainID: "pacific-1", + Image: "sei:v2.0.0", + FullNode: &seiv1alpha1.FullNodeSpec{}, + }, + Status: seiv1alpha1.SeiNodeStatus{ + CurrentImage: "sei:v2.0.0", + }, + } + cfg := testDeploymentCfg(t, group, node) + + params := AwaitSpecUpdateParams{ + Namespace: "sei", + NodeNames: []string{"wave-0"}, + } + raw, _ := json.Marshal(params) + exec, err := deserializeAwaitSpecUpdate("id-2", raw, cfg) + if err != nil { + t.Fatalf("deserialize: %v", err) + } + + if exec.Status(context.Background()) != ExecutionComplete { + t.Fatal("expected Complete when currentImage == spec.image") + } +} + +func TestAwaitSpecUpdate_RunningWhenNotConverged(t *testing.T) { + group := testDeploymentGroup() + node := &seiv1alpha1.SeiNode{ + ObjectMeta: metav1.ObjectMeta{Name: "wave-0", Namespace: "sei"}, + Spec: seiv1alpha1.SeiNodeSpec{ + ChainID: "pacific-1", + Image: "sei:v2.0.0", + FullNode: &seiv1alpha1.FullNodeSpec{}, + }, + Status: seiv1alpha1.SeiNodeStatus{ + CurrentImage: "sei:v1.0.0", + }, + } + cfg := testDeploymentCfg(t, group, node) + + params := AwaitSpecUpdateParams{ + Namespace: "sei", + NodeNames: []string{"wave-0"}, + } + raw, _ := json.Marshal(params) + exec, err := deserializeAwaitSpecUpdate("id-2", raw, cfg) + if err != nil { + t.Fatalf("deserialize: %v", err) + } + + if exec.Status(context.Background()) != ExecutionRunning { + t.Fatalf("expected Running when not converged, got %s", exec.Status(context.Background())) + } +} + +func TestAwaitSpecUpdate_RunningWhenNodeNotFound(t *testing.T) { + group := testDeploymentGroup() + cfg := testDeploymentCfg(t, group) + + params := AwaitSpecUpdateParams{ + Namespace: "sei", + NodeNames: []string{"wave-nonexistent"}, + } + raw, _ := json.Marshal(params) + exec, err := deserializeAwaitSpecUpdate("id-2", raw, cfg) + if err != nil { + t.Fatalf("deserialize: %v", err) + } + + if exec.Status(context.Background()) != ExecutionRunning { + t.Fatalf("expected Running for missing node, got %s", exec.Status(context.Background())) + } +} + +// --- MarkNodesReady --- + +func TestMarkNodesReady_Deserializes(t *testing.T) { + group := testDeploymentGroup() + cfg := testDeploymentCfg(t, group) + + params := MarkNodesReadyParams{ + Namespace: "sei", + NodeNames: []string{"wave-0", "wave-1"}, + } + raw, _ := json.Marshal(params) + exec, err := deserializeMarkNodesReady("id-3", raw, cfg) + if err != nil { + t.Fatalf("deserialize: %v", err) + } + + mnr, ok := exec.(*markNodesReadyExecution) + if !ok { + t.Fatal("expected *markNodesReadyExecution") + } + if len(mnr.params.NodeNames) != 2 { + t.Errorf("nodeNames len = %d, want 2", len(mnr.params.NodeNames)) + } + if mnr.params.Namespace != "sei" { + t.Errorf("namespace = %q, want %q", mnr.params.Namespace, "sei") + } +} + +func TestMarkNodesReady_StartsRunning(t *testing.T) { + group := testDeploymentGroup() + cfg := testDeploymentCfg(t, group) + + params := MarkNodesReadyParams{ + Namespace: "sei", + NodeNames: []string{"wave-0"}, + } + raw, _ := json.Marshal(params) + exec, err := deserializeMarkNodesReady("id-3", raw, cfg) + if err != nil { + t.Fatalf("deserialize: %v", err) + } + + if exec.Status(context.Background()) != ExecutionRunning { + t.Fatalf("expected initial status Running, got %s", exec.Status(context.Background())) + } +} diff --git a/internal/task/task.go b/internal/task/task.go index aae1c5b..053cbd3 100644 --- a/internal/task/task.go +++ b/internal/task/task.go @@ -191,6 +191,9 @@ var registry = map[string]taskDeserializer{ TaskTypeTeardownBootstrap: deserializeBootstrapTeardown, // Controller-side deployment tasks + TaskTypeUpdateNodeSpecs: deserializeUpdateNodeSpecs, + TaskTypeAwaitSpecUpdate: deserializeAwaitSpecUpdate, + TaskTypeMarkNodesReady: deserializeMarkNodesReady, TaskTypeCreateEntrantNodes: deserializeCreateEntrantNodes, TaskTypeSubmitHaltSignal: deserializeSubmitHaltSignal, TaskTypeAwaitNodesAtHeight: deserializeAwaitNodesAtHeight, diff --git a/manifests/sei.io_seinodedeployments.yaml b/manifests/sei.io_seinodedeployments.yaml index 4766576..6c64768 100644 --- a/manifests/sei.io_seinodedeployments.yaml +++ b/manifests/sei.io_seinodedeployments.yaml @@ -26,7 +26,7 @@ spec: - jsonPath: .status.phase name: Phase type: string - - jsonPath: .status.deployment.entrantRevision + - jsonPath: .status.rollout.entrantRevision name: Revision priority: 1 type: string @@ -681,9 +681,7 @@ spec: updateStrategy: description: |- UpdateStrategy controls how changes to the template are rolled out - to child SeiNodes. When set, the controller uses blue-green - deployment orchestration instead of in-place updates. - When not set, template changes are applied in-place via ensureSeiNode. + to child SeiNodes. Every deployment must declare an explicit strategy. properties: hardFork: description: |- @@ -704,6 +702,7 @@ spec: type: description: Type selects the deployment strategy. enum: + - InPlace - BlueGreen - HardFork type: string @@ -717,6 +716,7 @@ spec: required: - replicas - template + - updateStrategy type: object status: description: SeiNodeDeploymentStatus defines the observed state of a SeiNodeDeployment. @@ -780,28 +780,6 @@ spec: x-kubernetes-list-map-keys: - type x-kubernetes-list-type: map - deployment: - description: |- - Deployment tracks an in-progress deployment. - Nil when no deployment is active. - properties: - entrantNodes: - description: EntrantNodes lists the names of the new SeiNode resources. - items: - type: string - type: array - entrantRevision: - description: EntrantRevision identifies the generation of the - new nodes being deployed. - type: string - incumbentRevision: - description: IncumbentRevision identifies the generation of the - currently live nodes. - type: string - required: - - entrantRevision - - incumbentRevision - type: object genesisHash: description: GenesisHash is the SHA-256 hex digest of the assembled genesis.json. @@ -992,6 +970,85 @@ spec: description: Replicas is the desired number of SeiNodes. format: int32 type: integer + rollout: + description: |- + Rollout tracks an in-progress rollout across all strategy types. + Nil when no rollout is active. + properties: + entrantNodes: + description: |- + EntrantNodes lists the names of the new SeiNode resources being + created. Only populated for BlueGreen and HardFork strategies. + items: + type: string + type: array + entrantRevision: + description: |- + EntrantRevision identifies the generation of the new nodes. + Only populated for BlueGreen and HardFork strategies. + type: string + incumbentNodes: + description: |- + IncumbentNodes lists the names of the currently active SeiNode + resources. Only populated for BlueGreen and HardFork strategies. + items: + type: string + type: array + incumbentRevision: + description: |- + IncumbentRevision identifies the generation of the currently live nodes. + Only populated for BlueGreen and HardFork strategies. + type: string + nodes: + description: Nodes reports per-node rollout state. + items: + description: RolloutNodeStatus tracks a single node's convergence + during a rollout. + properties: + name: + description: Name is the SeiNode resource name. + type: string + phase: + description: Phase is the SeiNode's current phase. + enum: + - Pending + - Initializing + - Running + - Failed + - Terminating + type: string + ready: + description: Ready is true when the node is Running with + a ready pod. + type: boolean + required: + - name + - ready + type: object + type: array + x-kubernetes-list-map-keys: + - name + x-kubernetes-list-type: map + startedAt: + description: StartedAt is when the rollout was first detected. + format: date-time + type: string + strategy: + description: Strategy is the strategy type driving this rollout. + enum: + - InPlace + - BlueGreen + - HardFork + type: string + targetHash: + description: TargetHash is the templateHash being rolled out to. + type: string + required: + - nodes + - startedAt + - strategy + - targetHash + type: object templateHash: description: |- TemplateHash is a hash of the spec fields that require deployment diff --git a/manifests/sei.io_seinodes.yaml b/manifests/sei.io_seinodes.yaml index 7e8784d..0714938 100644 --- a/manifests/sei.io_seinodes.yaml +++ b/manifests/sei.io_seinodes.yaml @@ -558,6 +558,14 @@ spec: x-kubernetes-list-map-keys: - type x-kubernetes-list-type: map + currentImage: + description: |- + CurrentImage is the seid container image observed running on the + owned StatefulSet. Updated by the SeiNode controller when the + StatefulSet rollout completes (currentRevision == updateRevision). + Parent controllers compare this against spec.image to determine + whether a spec change has been fully actuated. + type: string externalAddress: description: |- ExternalAddress is the routable P2P address (host:port) for this node,