Skip to content

[caclmgrd] Fix EXTERNAL_CLIENT ACL: duplicate rules, multi-port, and stale state#378

Open
littlespace wants to merge 1 commit into
sonic-net:masterfrom
littlespace:master
Open

[caclmgrd] Fix EXTERNAL_CLIENT ACL: duplicate rules, multi-port, and stale state#378
littlespace wants to merge 1 commit into
sonic-net:masterfrom
littlespace:master

Conversation

@littlespace
Copy link
Copy Markdown

Description

This PR fixes multiple bugs in caclmgrd affecting Control Plane ACL for the EXTERNAL_CLIENT service.


Fix 1: Duplicate iptables/ip6tables rules

Root cause: The rule-emission loop called iptables_cmds.append(ns_prefix + rule_cmd) unconditionally. When multiple ACL rules produced the same command (e.g. a DEFAULT_RULE iterated over multiple source IPs), identical full commands were appended multiple times.

Fix: Build the full command first and use a per-table shadow set (acl_cmds_seen) for O(1) duplicate detection. iptables_cmds stays a list to preserve rule ordering, which is required for correct iptables rule application.


Fix 2: Only the last L4_DST_PORT rule takes effect

Root cause: The L4_DST_PORT accumulation block overwrote dst_ports on every rule iteration:

dst_ports = [rule_props["L4_DST_PORT"]]

When multiple ACL_RULE entries each carried a distinct L4_DST_PORT, only the last one survived into the iptables command generation loop.

Fix: Change overwrite to append-if-not-present so all configured ports generate iptables rules.


Fix 3: ACL_SERVICES class dict polluted by dst_ports writes

Root cause: dst_ports was written back into the class-level ACL_SERVICES dict on every call. On the next reload, the stale port list from the previous run seeded dst_ports before the new rules were scanned. Deleted ports were never removed until process restart.

Fix: Remove the self.ACL_SERVICES[acl_service]["dst_ports"] = dst_ports writes. dst_ports is now kept strictly local to each call, built fresh from the current Config DB snapshot.


Fix 4: int/str type mismatch in dst_ports

Root cause: L4_DST_PORT_RANGE expansion appended int ports while L4_DST_PORT appended str ports (from Redis). The mixed-type list caused the duplicate check ("8083" not in [8083]) to always be True, silently producing duplicate --dport rules when a port appeared in both a range and an individual rule.

Fix: L4_DST_PORT_RANGE expansion now appends str(port) to stay type-consistent.


Fix 5: acl_cmds_seen reset per service instead of per table

Root cause: The dedup shadow set was initialised inside the for acl_service loop, so it was reset for each service. Duplicate iptables commands generated by two services on the same ACL table were not caught.

Fix: Moved acl_cmds_seen initialisation to per-table scope (outside the for acl_service loop).


Tests

  • Added dedup assertion to existing parameterized test harness
  • Added vector: IPv4 multi-port + src-ip (Fix 2)
  • Added vector: L4_DST_PORT_RANGE + overlapping L4_DST_PORT — no duplicate rules (Fix 4)
  • Added vector: service listed twice on same ACL table — no duplicate rules (Fix 5)
  • Added standalone two-call reload test: port deleted between reloads must not appear in second call's rules (Fix 3)

@mssonicbld
Copy link
Copy Markdown

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

…stale state

Fix duplicate iptables/ip6tables rules for EXTERNAL_CLIENT Control Plane
ACL. The rule-emission loop could append identical full commands
(ns-prefix + rule) multiple times. Fix builds the full command first and
uses a per-table shadow set (acl_cmds_seen) for O(1) duplicate detection.
iptables_cmds stays a list to preserve rule ordering.

Fix single-port limitation for EXTERNAL_CLIENT ACL: previously each rule
overwrote dst_ports with a single-element list, so only the last rule's
port was used. Fix accumulates ports via append-if-not-present so all
configured ports generate iptables rules.

Fix ACL_SERVICES class dict pollution: dst_ports was written back into the
class-level ACL_SERVICES dict on every call, causing stale ports from
deleted rules to persist across reloads until process restart. dst_ports
is now kept strictly local to each call.

Fix int/str type mismatch in dst_ports: L4_DST_PORT_RANGE expansion now
appends str(port) to stay type-consistent with L4_DST_PORT (which is a
string from Redis). The mixed-type list silently broke the duplicate check
producing duplicate --dport rules when a port appeared in both a range
and an individual rule.

Fix acl_cmds_seen scope: the dedup shadow set was reset per service, so
duplicate commands generated by two services on the same ACL table were
not caught. Moved to per-table scope.

Fix ip2me block rules using network address instead of interface IP for
non-/32 non-VLAN interfaces. ip_ntwrk.network_address returned the
network base address (e.g. 10.0.1.0 for 10.0.1.2/24) instead of the
interface's own IP. Fix uses ipaddress.ip_interface(iface_cidr).ip.

Fix empty management IP string passed to iptables -s/-d arguments in
generate_allow_internal_docker_ip_traffic_commands when
get_namespace_mgmt_ip fails. Guard added to return early with a warning
when the management IP is unavailable.

Tests:
- Add dedup assertion to existing parameterized test harness
- Add vector: IPv4 multi-port + src-ip
- Add vector: L4_DST_PORT_RANGE + overlapping L4_DST_PORT
- Add vector: service listed twice on same table
- Add standalone two-call reload test: port deleted between reloads must
  not leak stale rules
- Add vector: non-/32 non-VLAN interfaces block interface IP not network
  address
- Add standalone test: empty management IP returns empty list without crash

Signed-off-by: littlespace <littlespace@users.noreply.github.com>
@mssonicbld
Copy link
Copy Markdown

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@littlespace
Copy link
Copy Markdown
Author

@saiarcot895 @qiluo-msft @bingwang-ms @xumia @lguohan can you please check this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants