Skip to content

[WIP] Ensure fleet outputs block is added when fleet is enabled.#9127

Draft
naemono wants to merge 9 commits intoelastic:mainfrom
naemono:fix-fleet-agent-non-root-ca-issue
Draft

[WIP] Ensure fleet outputs block is added when fleet is enabled.#9127
naemono wants to merge 9 commits intoelastic:mainfrom
naemono:fix-fleet-agent-non-root-ca-issue

Conversation

@naemono
Copy link
Copy Markdown
Contributor

@naemono naemono commented Feb 10, 2026

Resolves: #9112

Summary

Configuring Fleet-Server with ECK has historically been difficult for the end-user with some edge cases that, without reading through all of the documentation, will cause the deployment to fail. Specifically trusting the necessary Certificate Authorities when using self-signed certificates (which ECK does by default) has not been simple with the different ways that Agent/Fleet-Server can be configured (root/non-root/etc).

Current State

If Agent/Fleet-server is run as root, we inject the CA directly into the containers OS-level CA trust, which allows the CA to be trusted explicitly. When running as non-root, which is required in many secure environments, one must add the following configuration block in Kibana, or Fleet-server will fail (docs reference):

xpack.fleet.outputs:
- id: eck-fleet-agent-output-elasticsearch
  is_default: true
  name: eck-elasticsearch
  type: elasticsearch
  hosts:
  - "https://elasticsearch-es-http.default.svc:9200"
    ssl:
      certificate_authorities: ["/mnt/elastic-internal/elasticsearch-association/default/elasticsearch/certs/ca.crt"]

In addition, the following field must not exist in the Kibana configuration when this outputs block exists:

xpack.fleet.agents.elasticsearch.hosts

Future State

This change is an attempt to do the following

  1. Remove the complexity of dealing with an init container modifying the root CA store of the Fleet-Server container.
  2. Automatically inject the xpack.fleet.outputs block when appropriate
  3. Ensure the xpack.fleet-agents.elasticsearch.hosts field is missing when the outputs block exists.

This will allow the following Kibana configuration block to "just work"

apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: kibana
spec:
  version: 9.3.0
  count: 1
  elasticsearchRef:
    name: elasticsearch
  config:
    xpack.fleet.agents.fleet_server.hosts: ["https://fleet-server-agent-http.default.svc:8220"]
    xpack.fleet.packages:
... cut ...
    xpack.fleet.agentPolicies:
... cut ...
---

Testing

Manual

  • Manual testing - in progress
# Kibana spec.config
# 1. xpack.fleet.agents.elasticsearch.hosts is defined
# 2. no xpack.agents.output block exists

xpack.fleet.agentPolicies:
  - id: eck-fleet-server
    is_managed: true
    monitoring_enabled:
      - logs
      - metrics
    name: Fleet Server on ECK policy
    namespace: default
    package_policies:
      - id: fleet_server-1
        name: fleet_server-1
        package:
          name: fleet_server
    unenroll_timeout: 900
  - id: eck-agent
    is_managed: true
    monitoring_enabled:
      - logs
      - metrics
    name: Elastic Agent on ECK policy
    namespace: default
    package_policies:
      - name: system-1
        package:
          name: system
      - name: kubernetes-1
        package:
          name: kubernetes
    unenroll_timeout: 900
xpack.fleet.agents.elasticsearch.hosts:
  - https://elasticsearch-es-http.default.svc:9200
xpack.fleet.agents.fleet_server.hosts:
  - https://fleet-server-agent-http.default.svc:8220
xpack.fleet.packages:
  - name: system
    version: latest
  - name: elastic_agent
    version: latest
  - name: fleet_server
    version: latest
  - name: kubernetes
    version: latest

# Fleet-Server/Agent spec.config
# 1. runAsUser: 0 is not configured

deployment:
  podTemplate:
    metadata: {}
    spec:
      automountServiceAccountToken: true
      containers: null
      serviceAccountName: fleet-server
  replicas: 1
  strategy: {}
elasticsearchRefs:
  - name: elasticsearch
fleetServerEnabled: true
fleetServerRef: {}
http:
  service:
    metadata: {}
    spec: {}
  tls:
    certificate: {}
kibanaRef:
  name: kibana
mode: fleet
policyID: eck-fleet-server
version: 9.3.0

Resulting Kibana configuration

# ❯ kc view-secret -n default kibana-kb-config kibana.yml  | yq '.xpack.fleet'
# 1. xpack.fleet.agents.elasticsearch.hosts is removed
# 2. xpack.agents.output block now exists

agentPolicies:
---cut---
agents:
  fleet_server:
    hosts:
      - https://fleet-server-agent-http.default.svc:8220
outputs:
  - hosts:
      - https://elasticsearch-es-http.default.svc:9200
    id: eck-fleet-agent-output-elasticsearch
    is_default: true
    name: eck-elasticsearch
    ssl:
      certificate_authorities:
        - /mnt/elastic-internal/elasticsearch-association/default/elasticsearch/certs/ca.crt
    type: elasticsearch
packages:
---cut--

E2E

  • e2e testing - ensure e2e test exists, and run the tests across a multitude of stack versions, ensuring no edge-cases.

Ensure fleet agent ES hosts block is removed when fleet is enabled.
Tests
Cleanup

Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
@prodsecmachine
Copy link
Copy Markdown
Collaborator

prodsecmachine commented Feb 10, 2026

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@botelastic botelastic Bot added the triage label Feb 10, 2026
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
@pebrc
Copy link
Copy Markdown
Collaborator

pebrc commented Feb 19, 2026

Review: Breaking use-case analysis

I looked at this PR with a focus on use cases that could break. The overall direction of removing the init-container CA trust hack and moving to Kibana-side Fleet outputs configuration is sound, but there are several scenarios where the current implementation would cause regressions.

Critical

1. hasFleetConfigured false-positives on Package Registry associations

hasFleetConfigured checks for any child key under xpack.fleet, but packageRegistrySettings sets xpack.fleet.registryUrl when a Package Registry association exists. A Kibana instance with an EPR association but no Fleet usage would trigger fleet output injection because:

  • hasFleetConfigured(cfg) returns true (due to xpack.fleet.registryUrl)
  • esAssocConf.IsConfigured() returns true (Kibana almost always has an ES association)
  • shouldRemoveXPackFleetAgentsES is false (no outputs yet)

This injects a meaningless xpack.fleet.outputs block into non-Fleet Kibana instances.

Suggestion: Check for Fleet-specific keys (xpack.fleet.agents, xpack.fleet.agentPolicies, xpack.fleet.packages) rather than the broad xpack.fleet.*.

2. Removing root CA trust breaks existing root-running Agent deployments

The trustCAScript/runningAsRoot/runningContainerAsRoot functions are removed entirely. These were the only working mechanism for Fleet agents running as root to trust the ES CA in fleet mode (the code's own comments noted that FLEET_CA / ELASTICSEARCH_CA env vars are not respected by Agent in fleet mode).

The replacement (Kibana-side xpack.fleet.outputs) only works when:

  • The Agent has a kibanaRef so Kibana manages its Fleet configuration
  • The Kibana version supports xpack.fleet.outputs

Use cases that break:

  • Existing root-running agents that relied on system trust store injection. After upgrade, pods restart without the CA trust script and data shipping to ES fails.
  • Agents managed by Fleet without ECK-managed Kibana (external Kibana, or outputs configured via Fleet UI/API).

Suggestion: Gate the removal on a version or keep it as a deprecated fallback with a clear migration timeline.

High

3. CA certificate path assumes matching ES references between Kibana and Agent

defaultFleetOutputsConfig derives the cert path from Kibana's ES association, but this path needs to be valid inside the Fleet Agent container. The Agent mounts its certs based on its own ES association ref. The paths only match when Kibana and Agent reference the same ES cluster with the same name and namespace. They diverge when:

  • Kibana and Agent reference ES in different namespaces
  • The ES ObjectSelector uses secretName instead of name
  • The Agent sends data to a different ES cluster than Kibana

4. No Kibana version check before injecting xpack.fleet.outputs

The rest of config_settings.go is careful about version gating (e.g., filterConfigSettings strips xpack.encryptedSavedObjects for < 7.6.0). This new feature should follow the same pattern.

5. Non-ES outputs (logstash, kafka) get the ECK default appended silently

When a user defines only non-ES outputs (e.g., logstash), hasElasticsearchFleetOutput returns false, so ECK appends its default ES output via MergeWith + ucfg.AppendValues. The user ends up with their logstash output plus an ECK-injected elasticsearch output they didn't ask for. There is no opt-out mechanism.

Medium

6. Silent removal of xpack.fleet.agents.elasticsearch.hosts

When a user defines their own elasticsearch-type output AND explicitly sets xpack.fleet.agents.elasticsearch.hosts, the PR removes the hosts entry. The Kibana docs say "We recommend not enabling..." — it's a recommendation, not a hard requirement. Silently removing a user-provided key is surprising and hard to debug.

7. Open design question left in production code

The inline comment asks whether hasElasticsearchFleetOutput should instead be len(fleetCfg.Outputs) > 0, signaling the logic isn't settled.

8. No opt-out mechanism

There's no way for a user to signal "don't manage fleet outputs for me" (e.g., via an annotation or sentinel value). Every Kibana instance that triggers hasFleetConfigured gets the injection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement Enhancement of existing functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fleet Server running as non-root in recent stack versions - certificate issues

3 participants