Skip to content

Add secure file settings support for stateless Elasticsearch#9309

Merged
barkbay merged 11 commits intomainfrom
stateless/secure-file-settings
Apr 20, 2026
Merged

Add secure file settings support for stateless Elasticsearch#9309
barkbay merged 11 commits intomainfrom
stateless/secure-file-settings

Conversation

@barkbay
Copy link
Copy Markdown
Contributor

@barkbay barkbay commented Apr 3, 2026

Implements secure settings delivery for stateless Elasticsearch via file-based settings (cluster_secrets), replacing the keystore init container used by stateful clusters.

  • Adds ClusterSecrets field to SettingsState and IsStateless flag to Settings
  • Preserves cluster_secrets when the SCP controller rebuilds the file settings Secret
  • Adds BuildSecureSettingsData to deliver keystore entries via cluster_secrets (dotted keys like s3.client.default.access_key are passed through as-is; Elasticsearch's SecureClusterStateSettings accepts flat dotted keys via Settings.fromXContent)
  • Adds ReconcileClusterSecrets to manage cluster_secrets independently with hash-based change detection
  • Introduces a filesettings.Secret abstraction (Load/modify/Save) that unifies the reconciliation paths
  • Moves keystore and file-settings reconciliation from shared code back into the stateful driver (stateless has its own path via ReconcileClusterSecrets)
  • Rejects indexLifecyclePolicies in StackConfigPolicies targeting stateless clusters (ILM is managed server-side on stateless)
  • Excludes ILM policies from empty stateless settings state

Controller conflict handling

The file settings Secret (settings.json) is shared between two controllers, each managing disjoint fields:

┌───────────────────────┐       ┌────────────────────────┐
│    ES Controller      │       │    SCP Controller      │
│                       │       │                        │
│ ReconcileCluster-     │       │ reconcileElasticsearch-│
│ Secrets()             │       │ Resources()            │
│                       │       │                        │
│ 1. Load (parse        │       │ 1. Load (parse         │
│    current Secret)    │       │    current Secret)     │
│ 2. SetClusterSecrets  │       │ 2. ApplyPolicy (replace│
│    (patch one field)  │       │    state, preserve     │
│ 3. Save with          │       │    cluster_secrets)    │
│    WithAdditive-      │       │ 3. Save with mutator   │
│    Metadata()         │       │    (set soft owners)   │
│    (never remove      │       │    (managed cleanup    │
│     SCP metadata)     │       │     of stale keys)     │
└──────────┬────────────┘       └──────────┬─────────────┘
           │                               │
           ▼                               ▼
┌─────────────────────────────────────────────────────────┐
│                  File Settings Secret                   │
│                                                         │
│  data:                                                  │
│    settings.json:                                       │
│    ┌───────────────────────────────────────────────┐    │
│    │ {                                             │    │
│    │   "metadata": { "version": "..." },           │    │
│    │   "state": {                                  │    │
│    │     "cluster_settings": { ... },     ◄── SCP  │    │
│    │     "snapshot_repositories": {},     ◄── SCP  │    │
│    │     "security_role_mappings": {},    ◄── SCP  │    │
│    │     "cluster_secrets": {             ◄── ES   │    │
│    │       "string_secrets": {                     │    │
│    │         "s3.client.default.access_key": "..." │    │
│    │       }                                       │    │
│    │     }                                         │    │
│    │   }                                           │    │
│    │ }                                             │    │
│    └───────────────────────────────────────────────┘    │
│                                                         │
│  labels:      ◄── SCP manages (soft-owner refs)         │
│  annotations: ◄── SCP manages (hash, secure-settings)   │
│               ◄── ES merges additively (never removes)  │
│                                                         │
│  Concurrency: retry on Conflict/AlreadyExists           │
│  Each retry re-reads → no stale overwrites              │
└─────────────────────────────────────────────────────────┘

Both the ES controller (cluster_secrets) and the SCP controller (cluster_settings, etc.) write to the same file settings Secret. Each controller manages disjoint fields and preserves the other's work. Kubernetes optimistic concurrency (resourceVersion) handles races — the loser retries on the next reconciliation loop.

Concern ES Controller SCP Controller
Writes cluster_secrets only All other settings fields
Reads Parses current settings via filesettings.Load Parses current settings via filesettings.Load
Modification SetClusterSecrets (patches one field) ApplyPolicy (replaces state, preserves cluster_secrets)
Save mode WithAdditiveMetadata (never remove SCP labels/annotations) Default (managed cleanup removes stale SCP keys)
Retry retry.OnError on Conflict / AlreadyExists retry.OnError on Conflict / AlreadyExists

Required for #9204
Depends on #9290

🤖 Generated with Claude Code

@botelastic botelastic Bot added the triage label Apr 3, 2026
@barkbay barkbay added >feature Adds or discusses adding a feature to the product exclude-from-release-notes Exclude this PR from appearing in the release notes and removed triage labels Apr 3, 2026
@prodsecmachine
Copy link
Copy Markdown
Collaborator

prodsecmachine commented Apr 3, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@barkbay barkbay force-pushed the stateless/secure-file-settings branch 3 times, most recently from 8f549b5 to 17817f9 Compare April 3, 2026 15:22
@barkbay barkbay requested a review from Copilot April 3, 2026 15:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for delivering secure settings to stateless Elasticsearch via file-based settings (cluster_secrets) instead of the keystore init container used by stateful clusters, while ensuring the StackConfigPolicy (SCP) controller and ES controller can safely co-manage the same file-settings Secret.

Changes:

  • Extend file-based settings state with cluster_secrets and add stateless-aware empty settings behavior (exclude ILM by default).
  • Add secure-settings expansion (BuildSecureSettingsData) to convert dotted keys into nested cluster_secrets maps.
  • Introduce ReconcileClusterSecrets to manage cluster_secrets independently with hash-based change detection and preservation during SCP rebuild/reset flows.
  • Split reconciliation responsibility: keystore stays in the stateful driver; secure settings move to the stateless driver.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/controller/stackconfigpolicy/controller.go Passes stateless flag into file-settings secret generation; adjusts reset log message.
pkg/controller/remotecluster/keystore/keystore.go Adds reusable helper to expose remote API key secret as a secure-settings source.
pkg/controller/elasticsearch/filesettings/secret.go Adds stateless-aware secret generation and preserves cluster_secrets during SCP rebuilds.
pkg/controller/elasticsearch/filesettings/secret_test.go Updates tests for new constructor signature; adds stateless preservation test.
pkg/controller/elasticsearch/filesettings/reconciler.go Passes current secret into empty-secret reconcile to preserve cluster_secrets for stateless.
pkg/controller/elasticsearch/filesettings/reconciler_test.go Fixes expectation: no update when secret content is unchanged.
pkg/controller/elasticsearch/filesettings/file_settings.go Adds ClusterSecrets field; adds stateless flag; excludes ILM from empty stateless state.
pkg/controller/elasticsearch/filesettings/file_settings_test.go Updates expectations for new empty-state signature; adds stateless ILM exclusion test.
pkg/controller/elasticsearch/filesettings/cluster_secrets.go New reconciler for cluster_secrets field with hash/version handling.
pkg/controller/elasticsearch/filesettings/cluster_secrets_test.go Tests for create/update/no-op/preserve/clear behaviors of ReconcileClusterSecrets.
pkg/controller/elasticsearch/elasticsearch_controller.go Selects stateless vs stateful driver at runtime via es.IsStateless().
pkg/controller/elasticsearch/driver/stateless/driver.go Implements stateless secure settings delivery via cluster_secrets and secret watches.
pkg/controller/elasticsearch/driver/stateful/driver.go Moves file-settings empty-secret logic + keystore reconciliation into stateful driver.
pkg/controller/elasticsearch/driver/shared/results.go Removes keystore resources from shared reconcile state.
pkg/controller/elasticsearch/driver/shared/reconcile.go Removes keystore + file-settings reconciliation from shared path; exports MaybeReconcileEmptyFileSettingsSecret.
pkg/controller/elasticsearch/driver/shared/reconcile_test.go Renames test to match exported MaybeReconcileEmptyFileSettingsSecret.
pkg/controller/elasticsearch/driver/shared/reconcile_integration_test.go Updates integration expectations now that file-settings secret isn’t created in shared reconcile.
pkg/controller/common/keystore/user_secret.go Adds BuildSecureSettingsData helper to expand dotted keys into nested cluster_secrets structure.
pkg/controller/common/keystore/user_secret_test.go Adds unit tests for BuildSecureSettingsData behavior (flat, dotted, merge, empty).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/controller/elasticsearch/filesettings/secret.go Outdated
Comment thread pkg/controller/elasticsearch/driver/stateless/driver.go
@barkbay barkbay force-pushed the stateless/secure-file-settings branch 2 times, most recently from 0235d8a to 5af6685 Compare April 7, 2026 07:15

if err := filesettings.ReconcileSecret(ctx, r.Client, expectedSecret, &es); err != nil {
expectedVersion = candidateExpectedVersion
return filesettings.ReconcileSecret(ctx, r.Client, expectedSecret, &es)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a bug here, ReconcileResource does a fresh Get (so it has the latest resourceVersion), but the expected payload can be computed from older state.

@barkbay barkbay force-pushed the stateless/secure-file-settings branch 4 times, most recently from 6b8812d to dc834b0 Compare April 8, 2026 08:37
@barkbay barkbay marked this pull request as ready for review April 8, 2026 08:38
@barkbay barkbay requested a review from a team as a code owner April 8, 2026 08:38
@barkbay barkbay force-pushed the stateless/secure-file-settings branch from dc834b0 to f265b37 Compare April 8, 2026 10:09
@barkbay barkbay requested a review from Copilot April 8, 2026 10:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/controller/elasticsearch/filesettings/reconciler.go Outdated
Comment thread pkg/controller/elasticsearch/filesettings/secret.go Outdated
Comment thread pkg/controller/elasticsearch/elasticsearch_controller.go
@barkbay
Copy link
Copy Markdown
Contributor Author

barkbay commented Apr 8, 2026

buildkite test this -f p=gke,t=TestStackConfigPolicy*

@barkbay barkbay force-pushed the stateless/secure-file-settings branch from f265b37 to 66bc613 Compare April 8, 2026 12:23
@barkbay
Copy link
Copy Markdown
Contributor Author

barkbay commented Apr 8, 2026

buildkite test this -f p=gke,t=TestStackConfigPolicy*

@barkbay barkbay changed the base branch from stateless/api-implementation to main April 9, 2026 08:30
Stateless Elasticsearch delivers secure settings (object store
credentials, API keys) via cluster_secrets in file-based settings,
instead of the keystore init container used by stateful clusters.

- Add ClusterSecrets field to SettingsState and IsStateless to Settings
- Preserve cluster_secrets when SCP controller rebuilds the Secret
- Add BuildSecureSettingsData to expand dotted keys into nested maps
- Add ReconcileClusterSecrets to manage cluster_secrets independently
- Wire buildClusterSecrets into shared reconciliation for stateless
- Guard stateful keystore reconciliation with !IsStateless()
- Exclude ILM policies from empty stateless settings state

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
barkbay and others added 2 commits April 9, 2026 13:11
…fixes

- Use MergeFrom patch with retry for soft owner removal (metadata-only,
  avoids sending full Data payload and handles conflicts)
- Handle AlreadyExists in MaybeReconcileEmptyFileSettingsSecret (the goal
  is to ensure the Secret exists, not to own its creation)
- Fix applyExpectedSecret comment: Data is replaced, not merged

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/controller/stackconfigpolicy/controller.go
Comment thread pkg/controller/stackconfigpolicy/controller.go Outdated
Comment thread pkg/controller/elasticsearch/filesettings/file_settings_secret.go
…gs key

- Reset() now delegates to ApplyPolicy(empty) which preserves
  cluster_secrets for stateless clusters instead of wiping them
- Mark missing SettingsSecretKey as corrupted to prevent destructive
  overwrites from SetClusterSecrets
- Remove unreachable IsNotFound check after Load (Load never returns
  NotFound)
- Add tests for Reset behavior and missing settings key

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@barkbay barkbay force-pushed the stateless/secure-file-settings branch from d7980b8 to 2a55aa4 Compare April 9, 2026 11:44
@barkbay barkbay requested a review from Copilot April 9, 2026 11:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +691 to +711
if remainingOwners == 0 {
log.Info("Resetting file settings Secret to empty, last policy owner removed",
"es_namespace", namespacedName.Namespace, "es_name", namespacedName.Name,
"owner_namespace", softOwner.Namespace, "owner_name", softOwner.Name)

esMeta := metadata.Propagate(&es, metadata.Metadata{Labels: eslabel.NewLabels(namespacedName)})
fs, err := filesettings.Load(ctx, c, namespacedName, es.IsStateless(), esMeta)
if err != nil {
return err
}
if err := fs.Reset().Save(ctx, c, &es); err != nil && !apierrors.IsNotFound(err) {
return err
}
return nil
}

// Metadata-only patch: persist the soft owner removal without sending Secret data.
log.Info("Removing policy soft owner from file settings Secret",
"es_namespace", namespacedName.Namespace, "es_name", namespacedName.Name,
"owner_namespace", softOwner.Namespace, "owner_name", softOwner.Name,
"remaining_owners", remainingOwners)
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These orphan-reset log lines use log.Info(...), while the rest of this controller uses log.V(1).Info(...) for routine reconciliation messages. This will increase log volume during normal StackConfigPolicy churn (policy add/remove) and may be noisy in production.

Consider downgrading these to V(1) (or matching whatever log level is used for similar reconciliation-path events in this controller).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will increase log volume during normal StackConfigPolicy churn (policy add/remove) and may be noisy in production.

I don't think SCPs are added or removed frequently enough to generate much noise. I thought this kind of log was useful?

@barkbay
Copy link
Copy Markdown
Contributor Author

barkbay commented Apr 9, 2026

buildkite test this -f p=gke,t=TestStackConfigPolicy*

Copy link
Copy Markdown
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean abstraction for the shared file-settings Secret. The Load → Mutate → Save pattern with WithAdditiveMetadata for the ES controller vs managed cleanup for SCP is well-designed and the concurrent access model is sound.

Two inline comments — the ILM guard for stateless and the unnecessary dotted-key expansion in BuildSecureSettingsData. Neither is blocking.

Comment thread pkg/controller/elasticsearch/filesettings/file_settings.go
Comment thread pkg/controller/common/keystore/user_secret.go Outdated
…-settings

# Conflicts:
#	pkg/controller/elasticsearch/driver/shared/reconcile.go
@barkbay barkbay force-pushed the stateless/secure-file-settings branch from ee4fc74 to 4e47ac9 Compare April 20, 2026 06:58
Elasticsearch's SecureClusterStateSettings parses the inner map via
Settings.fromXContent which accepts both flat dotted keys and nested
objects. Expanding to nested maps added complexity and a failure mode
when keys conflict at intermediate paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@barkbay
Copy link
Copy Markdown
Contributor Author

barkbay commented Apr 20, 2026

Merge with #9287 is not trivial, still working on it...

…river

Stateless clusters don't use the init container keystore (they deliver
secure settings via cluster_secrets in file-based settings) and manage
the file-settings Secret themselves via filesettings.ReconcileClusterSecrets.
Running either in ReconcileSharedResources is useless work for stateless
and, for file-settings, conflicts with SCP-deferral logic.

Restore the original per-driver approach: the stateful driver explicitly
calls MaybeReconcileEmptyFileSettingsSecret (version-gated) and its own
reconcileKeystore (with managed keystore password and remote API keys).
Both are stateful-only now. The stateless driver is untouched.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@barkbay barkbay force-pushed the stateless/secure-file-settings branch from dcddadd to 7100bef Compare April 20, 2026 08:16
ILM is managed server-side on stateless clusters; applying ILM via file
settings has no effect and is confusing. When a StackConfigPolicy
targeting a stateless cluster includes indexLifecyclePolicies, mark the
ES resource status as Error and emit a validation event instead of
silently writing ignored settings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@barkbay
Copy link
Copy Markdown
Contributor Author

barkbay commented Apr 20, 2026

buildkite test this -f p=gke,E2E_TAGS=es,t=TestFIPS -m s=9.4.0-SNAPSHOT

[build : ✅ ]

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/controller/common/keystore/user_secret.go
Comment thread pkg/controller/stackconfigpolicy/controller.go
Comment thread pkg/controller/stackconfigpolicy/controller.go
@barkbay
Copy link
Copy Markdown
Contributor Author

barkbay commented Apr 20, 2026

buildkite test this -f p=gke,t=TestStackConfigPolicy*

[build: ✅ ]

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/controller/elasticsearch/filesettings/cluster_secrets.go
@barkbay barkbay merged commit 1679aab into main Apr 20, 2026
17 checks passed
@barkbay barkbay deleted the stateless/secure-file-settings branch April 20, 2026 11:54
@barkbay barkbay mentioned this pull request Apr 21, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

exclude-from-release-notes Exclude this PR from appearing in the release notes >feature Adds or discusses adding a feature to the product stateless

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants