Allow namespace exclusion on cluster-wide watch via label selector by BobVanB · Pull Request #9123 · elastic/cloud-on-k8s

BobVanB · 2026-02-10T06:49:01Z

Problem

Currently, an operator or controller can only:

Watch a single namespace (limited scope)
Watch all namespaces (no way to exclude specific namespaces)

This makes it difficult to run a cluster-wide watch while ignoring certain namespaces (e.g., kube-system or internal namespaces).

Solution

This change introduces:

An option to let one service account watch cluster-wide.
The ability to exclude namespaces using a selector (such as label or name).

This provides more flexibility and prevents unnecessary events from non-relevant namespaces.

Implementation

Added: support for namespace exclusion via selector.
Updated documentation to show new configuration options.
Added unit tests for namespace filtering.

Example

Start the operator with a namespace exclusion selector:

manager --namespace-label-selector='environment!=internal'

Namespaces with the label environment=internal will be ignored by the operator.

Benefits

Less noise in event streams.
Better performance by ignoring irrelevant namespaces.
Easier configuration for cluster-wide operators.

Testing

Deploy the operator with cluster-wide permissions.
Configure namespace exclusion using a selector.
Verify that resources in excluded namespaces are not processed.

Breaking Changes?

No, existing configurations continue to work. The new functionality is optional.

BobVanB · 2026-02-10T06:49:15Z

@pebrc continuation of #8893

prodsecmachine · 2026-02-10T06:49:19Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scanner	Critical	High	Medium	Low	Total (0)
✅	Open Source Security	0	0	0	0	0 issues
✅	Licenses	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

github-actions · 2026-02-10T06:49:49Z

🔍 Preview links for changed docs

docs/design/0005-namespace-filtering.md

pebrc · 2026-02-24T15:54:05Z

Thanks for this contribution and for picking up from #8893! The idea of making namespace scoping more dynamic is something we've heard interest in.

However, after reviewing the implementation, I don't think we can move forward with this approach. Let me explain why.

The core problem: filtering at the wrong layer

This implementation filters at the reconciler level — each controller fetches the resource, then fetches the Namespace object, checks labels, and returns early if they don't match. But the operator still watches all namespaces at the informer/cache level. This means:

Events from all namespaces are still received, enqueued, and trigger reconciliation work.
Each reconciliation now does more work (an extra Namespace GET + label parsing + matching) before deciding to skip.
The PR description mentions "less noise in event streams" and "better performance by ignoring irrelevant namespaces" — unfortunately neither is true with this design. The event streams are identical; only the reconciler exits earlier.

For context, the existing --namespaces flag provides genuine filtering: informers are only created for the listed namespaces, so events from other namespaces never even reach the operator. This new flag looks similar from the outside but works fundamentally differently.

Resource lifecycle risks

Reconciler-level filtering creates real operational hazards:

Label removal orphans resources: If a namespace loses its matching label, resources in it stop being reconciled mid-lifecycle. No cleanup, no finalizer removal, no status update — they're abandoned in whatever state they happen to be in.
Label addition doesn't trigger catch-up: If a namespace gains the label, existing resources aren't reconciled until the next event for that specific resource happens to arrive.
No RBAC benefit: The operator still needs cluster-wide permissions to watch all namespaces, so there's no security isolation gain.

Functionally, this ends up being equivalent to a namespace-scoped version of the eck.k8s.elastic.co/managed=false annotation, but more expensive to evaluate and harder to reason about.

What we'd recommend instead

If you need label-based namespace selection today, there are approaches that work with the existing operator:

Resolve labels to names externally: Use a script, init container, or sidecar that lists namespaces matching a label selector and passes the resulting names to --namespaces. This gives real cache-level filtering with label-based selection, at the cost of an operator restart when namespaces change. This is a well-established operational pattern.
Use --namespaces directly: For environments where the set of managed namespaces is relatively stable, the explicit namespace list is the simplest and most efficient option.

If we were to build label-based namespace filtering into the operator itself, there are two approaches worth considering:

Resolve at startup: Resolve the label selector to namespace names at startup and feed them into the existing DefaultNamespaces cache mechanism. Essentially option 1 built into the operator. Simple and uses the proven existing mechanism, but requires an operator restart when namespaces change.
Namespace controller: Add a dedicated controller that watches Namespace objects and reacts to label changes. When a namespace gains or loses a matching label, it would dynamically update the set of watched namespaces and trigger reconciliation of resources in affected namespaces. This is the most complete solution — it handles label changes gracefully and provides real cache-level filtering — but it's significantly more complex and requires careful handling of informer lifecycle (starting/stopping watches dynamically), which isn't well-supported by controller-runtime today.

Code-level notes

In case it's useful context, there are also some implementation issues that would need addressing regardless of the architectural direction:

Missed controllers: autoops/controller.go and packageregistry/controller.go still use IsUnmanaged and weren't updated.
Error handling: When ShouldManageNamespace errors (e.g., transient API issue), IsUnmanagedOrFiltered returns true and the reconciler returns nil — silently dropping the resource with no requeue.
Performance: LabelSelectorAsSelector is re-computed on every reconciliation rather than cached once at startup.
License controller: This controller didn't previously have an IsUnmanaged check — adding both unmanaged support and namespace filtering in one change mixes concerns.

I hope this context is helpful. We appreciate you investing time in this — the underlying need is real, and I'd encourage exploring the external resolution approach (option 1 above) as a solution that works today without operator changes.

github-actions · 2026-02-25T19:42:17Z

✅ Vale Linting Results

No issues found on modified lines!

The Vale linter checks documentation changes against the Elastic Docs style guide.

To use Vale locally or report issues, refer to Elastic style guide for Vale.

BobVanB · 2026-02-26T05:36:32Z

Thanks for the detailed review and for the clear guidance.

You are right about the architectural concern: reconciler-level filtering alone does not reduce informer/event load. We agree that true filtering should happen at cache/watch scope where possible.

On the code-level notes:

Missed controllers: autoops and packageregistry are not present in this branch/repo state, so there was nothing to update there. Rebased my branch and added the missing controllers.
Error handling: fixed. Namespace-filter evaluation errors are now returned and reconciled properly (not silently dropped).
Performance: fixed. Selector parsing/matcher creation is no longer repeated per reconcile.
License controller concern: agreed this mixed concerns; we kept changes minimal and can split that part further if you prefer.

Direction we took after your feedback:

We aligned toward the startup-resolution path and cleaned up behavior/logging.
We also added a namespace-filter controller path for label change handling, while keeping the change as contained as possible in this PR.
We removed the design doc from this PR and adjusted misleading log wording.

Final note: this new code is not running in our production environment yet, so it has not been live-tested in prod.

BobVanB requested a review from a team as a code owner February 10, 2026 06:49

github-actions Bot deployed to docs-preview February 10, 2026 06:49 View deployment

botelastic Bot added the triage label Feb 10, 2026

BobVanB changed the title ~~Exclude on selector~~ Allow namespace exclusion on cluster-wide watch Feb 10, 2026

BobVanB changed the title ~~Allow namespace exclusion on cluster-wide watch~~ Allow namespace exclusion on cluster-wide watch via label selector Feb 10, 2026

github-actions Bot deployed to docs-preview February 25, 2026 19:41 View deployment

github-actions Bot deployed to docs-preview February 25, 2026 19:47 View deployment

github-actions Bot deployed to docs-preview February 26, 2026 05:13 View deployment

BobVanB added 8 commits February 26, 2026 06:38

feat: add flag namespace-label-selector to filter namespaces

dde29c1

fix: moved docs

b365262

fix: update design choise to english

ff8387a

fix: linefeed type

b210787

fix: use error handeling on IsUnmanagedOrFiltered

af01050

chore(linting): markdown

ade0ba0

fix: dynamic namespaces

c310d0f

fix: correct message and remove design document

75ec3f5

BobVanB force-pushed the exclude_on_selector branch from 1d09217 to 75ec3f5 Compare February 26, 2026 05:45

BobVanB added 4 commits February 26, 2026 06:47

fix: add missing controllers

b7068f2

fix: restricted mode does no watch calls

e672c4a

fix: coherent namespace filter messages

c384d9d

fix: removed dead-code

035e584

pebrc added the >enhancement Enhancement of existing functionality label Mar 31, 2026

botelastic Bot removed the triage label Mar 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow namespace exclusion on cluster-wide watch via label selector#9123

Allow namespace exclusion on cluster-wide watch via label selector#9123
BobVanB wants to merge 12 commits intoelastic:mainfrom
BobVanB:exclude_on_selector

BobVanB commented Feb 10, 2026

Uh oh!

BobVanB commented Feb 10, 2026 •

edited

Loading

Uh oh!

prodsecmachine commented Feb 10, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Feb 10, 2026 •

edited

Loading

Uh oh!

pebrc commented Feb 24, 2026

Uh oh!

github-actions Bot commented Feb 25, 2026 •

edited

Loading

Uh oh!

BobVanB commented Feb 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

BobVanB commented Feb 10, 2026

Problem

Solution

Implementation

Example

Benefits

Testing

Breaking Changes?

Uh oh!

BobVanB commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prodsecmachine commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Snyk checks have passed. No issues have been found so far.

Uh oh!

github-actions Bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Preview links for changed docs

Uh oh!

pebrc commented Feb 24, 2026

The core problem: filtering at the wrong layer

Resource lifecycle risks

What we'd recommend instead

Code-level notes

Uh oh!

github-actions Bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Vale Linting Results

Uh oh!

BobVanB commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BobVanB commented Feb 10, 2026 •

edited

Loading

prodsecmachine commented Feb 10, 2026 •

edited

Loading

github-actions Bot commented Feb 10, 2026 •

edited

Loading

github-actions Bot commented Feb 25, 2026 •

edited

Loading

BobVanB commented Feb 26, 2026 •

edited

Loading