Skip to content

Allow namespace exclusion on cluster-wide watch via label selector#9123

Open
BobVanB wants to merge 12 commits intoelastic:mainfrom
BobVanB:exclude_on_selector
Open

Allow namespace exclusion on cluster-wide watch via label selector#9123
BobVanB wants to merge 12 commits intoelastic:mainfrom
BobVanB:exclude_on_selector

Conversation

@BobVanB
Copy link
Copy Markdown
Contributor

@BobVanB BobVanB commented Feb 10, 2026

Problem

Currently, an operator or controller can only:

  • Watch a single namespace (limited scope)
  • Watch all namespaces (no way to exclude specific namespaces)

This makes it difficult to run a cluster-wide watch while ignoring certain namespaces (e.g., kube-system or internal namespaces).

Solution

This change introduces:

  • An option to let one service account watch cluster-wide.
  • The ability to exclude namespaces using a selector (such as label or name).

This provides more flexibility and prevents unnecessary events from non-relevant namespaces.

Implementation

  • Added: support for namespace exclusion via selector.
  • Updated documentation to show new configuration options.
  • Added unit tests for namespace filtering.

Example

Start the operator with a namespace exclusion selector:

manager --namespace-label-selector='environment!=internal'

Namespaces with the label environment=internal will be ignored by the operator.

Benefits

  • Less noise in event streams.
  • Better performance by ignoring irrelevant namespaces.
  • Easier configuration for cluster-wide operators.

Testing

  • Deploy the operator with cluster-wide permissions.
  • Configure namespace exclusion using a selector.
  • Verify that resources in excluded namespaces are not processed.

Breaking Changes?

No, existing configurations continue to work. The new functionality is optional.

@BobVanB BobVanB requested a review from a team as a code owner February 10, 2026 06:49
@BobVanB
Copy link
Copy Markdown
Contributor Author

BobVanB commented Feb 10, 2026

@pebrc continuation of #8893

@prodsecmachine
Copy link
Copy Markdown
Collaborator

prodsecmachine commented Feb 10, 2026

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 10, 2026

🔍 Preview links for changed docs

@BobVanB BobVanB changed the title Exclude on selector Allow namespace exclusion on cluster-wide watch Feb 10, 2026
@BobVanB BobVanB changed the title Allow namespace exclusion on cluster-wide watch Allow namespace exclusion on cluster-wide watch via label selector Feb 10, 2026
@pebrc
Copy link
Copy Markdown
Collaborator

pebrc commented Feb 24, 2026

Thanks for this contribution and for picking up from #8893! The idea of making namespace scoping more dynamic is something we've heard interest in.

However, after reviewing the implementation, I don't think we can move forward with this approach. Let me explain why.

The core problem: filtering at the wrong layer

This implementation filters at the reconciler level — each controller fetches the resource, then fetches the Namespace object, checks labels, and returns early if they don't match. But the operator still watches all namespaces at the informer/cache level. This means:

  • Events from all namespaces are still received, enqueued, and trigger reconciliation work.
  • Each reconciliation now does more work (an extra Namespace GET + label parsing + matching) before deciding to skip.
  • The PR description mentions "less noise in event streams" and "better performance by ignoring irrelevant namespaces" — unfortunately neither is true with this design. The event streams are identical; only the reconciler exits earlier.

For context, the existing --namespaces flag provides genuine filtering: informers are only created for the listed namespaces, so events from other namespaces never even reach the operator. This new flag looks similar from the outside but works fundamentally differently.

Resource lifecycle risks

Reconciler-level filtering creates real operational hazards:

  • Label removal orphans resources: If a namespace loses its matching label, resources in it stop being reconciled mid-lifecycle. No cleanup, no finalizer removal, no status update — they're abandoned in whatever state they happen to be in.
  • Label addition doesn't trigger catch-up: If a namespace gains the label, existing resources aren't reconciled until the next event for that specific resource happens to arrive.
  • No RBAC benefit: The operator still needs cluster-wide permissions to watch all namespaces, so there's no security isolation gain.

Functionally, this ends up being equivalent to a namespace-scoped version of the eck.k8s.elastic.co/managed=false annotation, but more expensive to evaluate and harder to reason about.

What we'd recommend instead

If you need label-based namespace selection today, there are approaches that work with the existing operator:

  1. Resolve labels to names externally: Use a script, init container, or sidecar that lists namespaces matching a label selector and passes the resulting names to --namespaces. This gives real cache-level filtering with label-based selection, at the cost of an operator restart when namespaces change. This is a well-established operational pattern.

  2. Use --namespaces directly: For environments where the set of managed namespaces is relatively stable, the explicit namespace list is the simplest and most efficient option.

If we were to build label-based namespace filtering into the operator itself, there are two approaches worth considering:

  1. Resolve at startup: Resolve the label selector to namespace names at startup and feed them into the existing DefaultNamespaces cache mechanism. Essentially option 1 built into the operator. Simple and uses the proven existing mechanism, but requires an operator restart when namespaces change.

  2. Namespace controller: Add a dedicated controller that watches Namespace objects and reacts to label changes. When a namespace gains or loses a matching label, it would dynamically update the set of watched namespaces and trigger reconciliation of resources in affected namespaces. This is the most complete solution — it handles label changes gracefully and provides real cache-level filtering — but it's significantly more complex and requires careful handling of informer lifecycle (starting/stopping watches dynamically), which isn't well-supported by controller-runtime today.

Code-level notes

In case it's useful context, there are also some implementation issues that would need addressing regardless of the architectural direction:

  • Missed controllers: autoops/controller.go and packageregistry/controller.go still use IsUnmanaged and weren't updated.
  • Error handling: When ShouldManageNamespace errors (e.g., transient API issue), IsUnmanagedOrFiltered returns true and the reconciler returns nil — silently dropping the resource with no requeue.
  • Performance: LabelSelectorAsSelector is re-computed on every reconciliation rather than cached once at startup.
  • License controller: This controller didn't previously have an IsUnmanaged check — adding both unmanaged support and namespace filtering in one change mixes concerns.

I hope this context is helpful. We appreciate you investing time in this — the underlying need is real, and I'd encourage exploring the external resolution approach (option 1 above) as a solution that works today without operator changes.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 25, 2026

✅ Vale Linting Results

No issues found on modified lines!


The Vale linter checks documentation changes against the Elastic Docs style guide.

To use Vale locally or report issues, refer to Elastic style guide for Vale.

@BobVanB
Copy link
Copy Markdown
Contributor Author

BobVanB commented Feb 26, 2026

Thanks for the detailed review and for the clear guidance.

You are right about the architectural concern: reconciler-level filtering alone does not reduce informer/event load. We agree that true filtering should happen at cache/watch scope where possible.

On the code-level notes:

  • Missed controllers: autoops and packageregistry are not present in this branch/repo state, so there was nothing to update there. Rebased my branch and added the missing controllers.
  • Error handling: fixed. Namespace-filter evaluation errors are now returned and reconciled properly (not silently dropped).
  • Performance: fixed. Selector parsing/matcher creation is no longer repeated per reconcile.
  • License controller concern: agreed this mixed concerns; we kept changes minimal and can split that part further if you prefer.

Direction we took after your feedback:

  • We aligned toward the startup-resolution path and cleaned up behavior/logging.
  • We also added a namespace-filter controller path for label change handling, while keeping the change as contained as possible in this PR.
  • We removed the design doc from this PR and adjusted misleading log wording.

Final note: this new code is not running in our production environment yet, so it has not been live-tested in prod.

@BobVanB BobVanB force-pushed the exclude_on_selector branch from 1d09217 to 75ec3f5 Compare February 26, 2026 05:45
@pebrc pebrc added the >enhancement Enhancement of existing functionality label Mar 31, 2026
@botelastic botelastic Bot removed the triage label Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement Enhancement of existing functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants