Skip to content

Add CFP for ordered policy#81

Draft
squeed wants to merge 1 commit intocilium:mainfrom
squeed:ordered-policy
Draft

Add CFP for ordered policy#81
squeed wants to merge 1 commit intocilium:mainfrom
squeed:ordered-policy

Conversation

@squeed
Copy link
Contributor

@squeed squeed commented Nov 5, 2025

This demonstrates a possible implementation of ordered policy.

@squeed
Copy link
Contributor Author

squeed commented Nov 5, 2025

@TheBeeZee, @jrajahalme PTAL

Copy link
Member

@jrajahalme jrajahalme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need some clarifications.


### MapState conflict resolution

The key observation is this: **we cannot insert a lower-priority, higher-precedence entry in to the MapState**. Since MapState precedence mirrors datapath precedence, determining this is relatively straightforward: to insert a given key, we scan all ancestor keys, checking if the same identity is selected. If a lower-precedence, higher-priority entry is found, we cannot insert the higher-precedence, lower-priority entry. Conversely, if any descendants of the newly proposed entry are higher-precedence and lower-priority, they must be deleted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should define what we mean with priority and precedence. In ClusterNetworkPolicy lower numeric value "takes precedence" (is considered before any rules with higher numeric "priority"). So "high priority" rules have lower numeric priority value. This is in agreement with language like "this is priority 1 task" (i.e., the highest priority
task in general language).

Furthermore, "higher-precedence" reads as "taking precedence", i.e., being more important. So in written language "higher-priority" has the same meaning as "higher-precedence".

This is hard get right, but we should aim for clarity here by e.g., using the general sense that "higher-priority" == "higher-precedence", and refer to a "numerically low priority value" explicitly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

There were many discussions in the network-policy-sig about this and in the end it was decided that going with "priority" with the associated semantics (lower numeric value is higher priority) is the most intuitive approach (despite the inversion of numeric value and the real meaning of "priority").

It is of course possible to invert the k8s Cluster Network Policy priorities when translating into an internal representation, but it seems like keeping Cilium internals in sync with what the public APIs do is a net-win.

Explicitly defining what "precedence" and "priority" mean in this context would help the readability of the document (and code). Also, if the choice will be for priority to be "numerically higher means higher priority", then I would recommend against using the name "priority", as this will conflict with Cluster Network Policy semantics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, based on the existing PoC implementation, priority is an unsigned integer value, where a higher number overrides a lower number. This is unrelated to any particular API convention, where, generally, a lower number overrides.

While this inversion is a bit confusing, it does ultimately simplify the low-level core. I'll add some commentary about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified the wording, PTAL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any changes, did you forget to push?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did. Sorry, PTAL.


The key observation is this: **we cannot insert a lower-priority, higher-precedence entry in to the MapState**. Since MapState precedence mirrors datapath precedence, determining this is relatively straightforward: to insert a given key, we scan all ancestor keys, checking if the same identity is selected. If a lower-precedence, higher-priority entry is found, we cannot insert the higher-precedence, lower-priority entry. Conversely, if any descendants of the newly proposed entry are higher-precedence and lower-priority, they must be deleted.

Note: higher numerical priority overrides lower numerical priority.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ref above. In my PoC implementation I have adopted the k8c CNP view on the API side, where lower priority values override higher ones, but on the datapath I have expanded the notion that higher numerical values override lower numerical values from ProxyPortPriority when generalizing it to a single Precedence field. The reason for the latter is the clarity in code and comments that comes from the ability to have "higher precedence" mean "numerically greater" and using the greater than operator (>) for comparisons in straightforward manner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a distinction between user API ordering and datapath priority, and made it clear that an ordering API is not part of this CFP.


Incremental updates are aggregated by selectors and then inserted in to the MapState. The MapState is responsible for deconflicting overlapping selectors. This remains the same in the ordered version.

It is safe to insert incremental updates in this manner because identities and selectors are immutable. As a direct consequence, two selectors S1 and S2 might overlap, a new identity I1 can never change state from non-overlapping to overlapping.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S1 and S2 might be non-overlapping to begin with, e.g., by selecting nothing. Then I1 appears and is selected by both S1 and S2, making them overlapping. The key for correct incremental updates is that, for the purposes of mapstate updates, S1 and S2 both select any overlapping identities (e.g, I1). This would naturally be the case if we evaluated the selections from ground up each time, but with our use of selector cache we must make sure that both cached selectors are updated at the same transaction, so that the readers (the mapstate update procedure) observe the new, overlapping identity in both S1 and S2. Historically this was not the case which made the mapstate update much more complicated than it is now.


### MapState conflict resolution

The key observation is this: **we cannot insert a lower-priority, higher-precedence entry in to the MapState**. Since MapState precedence mirrors datapath precedence, determining this is relatively straightforward: to insert a given key, we scan all ancestor keys, checking if the same identity is selected. If a lower-precedence, higher-priority entry is found, we cannot insert the higher-precedence, lower-priority entry. Conversely, if any descendants of the newly proposed entry are higher-precedence and lower-priority, they must be deleted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

There were many discussions in the network-policy-sig about this and in the end it was decided that going with "priority" with the associated semantics (lower numeric value is higher priority) is the most intuitive approach (despite the inversion of numeric value and the real meaning of "priority").

It is of course possible to invert the k8s Cluster Network Policy priorities when translating into an internal representation, but it seems like keeping Cilium internals in sync with what the public APIs do is a net-win.

Explicitly defining what "precedence" and "priority" mean in this context would help the readability of the document (and code). Also, if the choice will be for priority to be "numerically higher means higher priority", then I would recommend against using the name "priority", as this will conflict with Cluster Network Policy semantics.


If a lower-priority non-wildcard PerSelectorPolicy would override an existing wildcard, higher-priority, lower-precedence selector, adding it is blocked.

If a lower-priority wildcard PerSelectorPolicy would override an existing non-wildcard, higher-priority, lower-precedence selector, then it must be inserted, but this will have to be separately resolved via the datapath. Specifically, the datapath must look up both wildcarded and non-wildcarded entries, and select the one with the highest priority, using precedence length as a tiebreaker.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be my lack of understanding of the Cilium datapath, but is this currently handled or is this CFP proposing to implement it?

Also, how about non-wildcard overlaps, for example with CIDR rules? Those are already handled by LPF matching in the dataplane? For example, low priority deny for 10.0.0.0/8 and high priority allow for 10.10.10.0/24?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cilium datapath does this currently for deny and proxy port precedence, the proposal is to expand this to a more generic notion of "precedence".

Cilium policy engine operates on "security identities", not on CIDRs directly. Each unique CIDR (or a group of them via CIliumCIDRGroup) will get mapped to a unique security identity, in the datapath before policy enforcement. In your example 10.0.0.0/8 would have a different security identity than 10.10.10.0/24, and if a packet would be covered by both of them, the ipcache will assign the security identity of 10.10.10.0/24 to the packet for policy enforcement.

On the policy computation side, the selector for 10.0.0.0/8 will select both of these security identities since it covers also the 10.10.10.0/24 CIDR, so the low priority deny entry is inserted for both of these identities. For the identity of 10.10.10.0/24 that low priority deny entry would be overridden by the high priority allow entry, so the bpf datapath policy map will only have the high priority allow entry for the 10.10.10.0/24 CIDR identity.

This also expands to port (range) policies due to the LPF policy map.

1. `(priority 2, id 5, tcp, port *)`
2. `(priority 1, id * tcp, port 80)`

Rule 2 has a longer match. For ID 5, rule 2 overrides rule 1, which is an error. However, rule 2's wildcard selector selects more identities, so it cannot be elided. Rather, at policy resolution time, the numeric priorities will be compared.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this an error? Again, might be my lack of understanding of the current semantics. Seems like something a user might configure. Also, this says "at policy resolution time, the numeric priorities will be compared". Is this "policy resolution time" the control plane or dataplane?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"policy resolution time" here is at the dataplane (cilium bpf datapath). Maybe it would be clearer to write "policy enforcement time" instead? "policy resolution" could be understood to mean "policy map computation time".

I think the "is an error" here should be maybe written as "would be an error if the numeric identities were not compared at the policy enforcement time" etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified a bit.

@squeed squeed requested a review from joestringer November 7, 2025 21:00
This demonstrates a possible implementation of ordered policy.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
title: "Example 1d: valid"
---
graph TD
rule1[prio 1, id *, tcp, 0-1023]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the priorities are swapped here, rule1 should be prio2 and rule2 should be prio1, no?

@joestringer
Copy link
Member

Was this synced to the latest design? Are you still looking for feedback?

@joestringer joestringer marked this pull request as draft March 4, 2026 21:07
@joestringer
Copy link
Member

Moved to draft due to lack of activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants