Skip to content

[caclmgrd] Skip empty ACL tables#370

Merged
liat-grozovik merged 2 commits into
sonic-net:masterfrom
nazariig:master-caclmgrd-fix
May 6, 2026
Merged

[caclmgrd] Skip empty ACL tables#370
liat-grozovik merged 2 commits into
sonic-net:masterfrom
nazariig:master-caclmgrd-fix

Conversation

@nazariig
Copy link
Copy Markdown
Contributor

@nazariig nazariig commented Apr 1, 2026

This PR propagates a fix for timing issue caused by caclmgrd processing of ACL tables during Config DB record removal operation.

Evidence from code

  • What caclmgrd does
self._tables_db_info = config_db_connector.get_table(self.ACL_TABLE)
...
if table_data["type"] != self.ACL_TABLE_TYPE_CTRLPLANE:
  • ConfigDBConnector.get_table() is implemented as
const auto& keys = client.keys(pattern);
for (auto& key: keys)
{
    auto const& entry = client.hgetall<map<string, string>>(key);

This is a two-step read:

  1. list keys (KEYS ACL_TABLE|*)
  2. per key HGETALL

If a key is deleted/changed between step 1 and step 2, hgetall can return {}.
Then table_data exists as an empty map in _tables_db_info, and table_data["type"] throws KeyError.

So missing type does NOT require field-only deletion; plain key deletion racing with get_table() is enough.

Also field-only deletion is possible in the stack:

client.hmset(_hash, data.begin(), data.end());
...
if (!found)
{
    client.hdel(_hash, k);
}

set_entry() can remove extra fields with HDEL while keeping the record key alive.

Evidence from pytest

The log shows rapid table deletes:

sudo config acl remove table DATA_INGRESS_L3TEST
...
sudo config acl remove table DATA_EGRESS_L3V6TEST

That high churn exactly matches a race window for the non-atomic get_table() read in caclmgrd.

So missing type can happen via:

  • field-level HDEL updates
  • key-level delete race during KEYS + HGETALL snapshot collection

SYSLOG:

syslog:2026 Mar 27 03:57:03.486679 sonic INFO start-LogAnalyzer-test_acl_config.2026-03-27-00:57:02
syslog:2026 Mar 27 03:57:08.378308 sonic ERR caclmgrd: Exception occured at Thread-2 (check_and_update_control_plane_acls) thread for namespace '' due to KeyErro
r('type')
syslog:2026 Mar 27 03:57:08.874919 sonic ERR caclmgrd: Main thread detected exception in child thread for namespace '': KeyError('type')
syslog:2026 Mar 27 03:57:08.875092 sonic ERR caclmgrd: Full traceback from child thread:
syslog:2026 Mar 27 03:57:08.875239 sonic ERR caclmgrd: Traceback (most recent call last):
syslog:2026 Mar 27 03:57:08.875386 sonic ERR caclmgrd:   File "/usr/local/bin/caclmgrd", line 970, in check_and_update_control_plane_acls
syslog:2026 Mar 27 03:57:08.875532 sonic ERR caclmgrd:     self.update_control_plane_acls(namespace, new_config_db_connector)
syslog:2026 Mar 27 03:57:08.875668 sonic ERR caclmgrd:     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
syslog:2026 Mar 27 03:57:08.875845 sonic ERR caclmgrd:   File "/usr/local/bin/caclmgrd", line 907, in update_control_plane_acls
syslog:2026 Mar 27 03:57:08.875984 sonic ERR caclmgrd:     iptables_cmds, service_to_source_ip_map  = self.get_acl_rules_and_translate_to_iptables_commands(names
pace, config_db_connector)
syslog:2026 Mar 27 03:57:08.876104 sonic ERR caclmgrd:                                                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^
syslog:2026 Mar 27 03:57:08.876176 sonic ERR caclmgrd:   File "/usr/local/bin/caclmgrd", line 740, in get_acl_rules_and_translate_to_iptables_commands
syslog:2026 Mar 27 03:57:08.876239 sonic ERR caclmgrd:     if table_data["type"] != self.ACL_TABLE_TYPE_CTRLPLANE:
syslog:2026 Mar 27 03:57:08.876302 sonic ERR caclmgrd:        ~~~~~~~~~~^^^^^^^^
syslog:2026 Mar 27 03:57:08.876362 sonic ERR caclmgrd: KeyError: 'type'
syslog:2026 Mar 27 03:57:08.876420 sonic ERR caclmgrd: Detect exception in Child thread, generating SIGKILL for main thread
syslog:2026 Mar 27 03:57:21.252620 sonic INFO end-LogAnalyzer-test_acl_config.2026-03-27-00:57:02
syslog:2026 Mar 27 03:57:22.329663 sonic INFO python3.13[156612]: ansible-extract_log Invoked with directory=/var/log file_prefix=syslog start_string=start-LogAnalyzer-test_acl_config.2026-03-27-00:57:02 target_filename=/tmp/syslog
syslog:2026 Mar 27 03:57:22.329874 sonic DEBUG extract_log for start string LogAnalyzer-test_acl_config.2026-03-27-00:57:02

A picture of a cute animal (not mandatory but encouraged)

      .---.        .-----------
     /     \  __  /    ------
    / /     \(  )/    -----
   //////   ' \/ `   ---
  //// / // :    : ---
 // /   /  /`    '--
//          //..\\
       ====UU====UU====
           '//||\\`
             ''``

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
@nazariig nazariig added bug Something isn't working Request for 202511 Branch labels Apr 1, 2026
@mssonicbld
Copy link
Copy Markdown

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
@mssonicbld
Copy link
Copy Markdown

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@nazariig
Copy link
Copy Markdown
Contributor Author

@vmittal-msft this is a simple fix. Can you please help to merge?

@liat-grozovik liat-grozovik merged commit 2feaf80 into sonic-net:master May 6, 2026
6 checks passed
@liat-grozovik liat-grozovik changed the title [caclmgrd]: Skip empty ACL tables [caclmgrd] Skip empty ACL tables May 6, 2026
@mssonicbld
Copy link
Copy Markdown

Cherry-pick PR to 202511: #381

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants