Skip to content

[action] [PR:370] [caclmgrd]: Skip empty ACL tables#381

Merged
mssonicbld merged 1 commit into
sonic-net:202511from
mssonicbld:cherry/202511/370
May 6, 2026
Merged

[action] [PR:370] [caclmgrd]: Skip empty ACL tables#381
mssonicbld merged 1 commit into
sonic-net:202511from
mssonicbld:cherry/202511/370

Conversation

@mssonicbld
Copy link
Copy Markdown

This PR propagates a fix for timing issue caused by caclmgrd processing of ACL tables during Config DB record removal operation.

Evidence from code

  • What caclmgrd does
self._tables_db_info = config_db_connector.get_table(self.ACL_TABLE)
...
if table_data["type"] != self.ACL_TABLE_TYPE_CTRLPLANE:
  • ConfigDBConnector.get_table() is implemented as
const auto& keys = client.keys(pattern);
for (auto& key: keys)
{
    auto const& entry = client.hgetall<map<string, string>>(key);

This is a two-step read:

  1. list keys (KEYS ACL_TABLE|*)
  2. per key HGETALL

If a key is deleted/changed between step 1 and step 2, hgetall can return {}.
Then table_data exists as an empty map in _tables_db_info, and table_data["type"] throws KeyError.

So missing type does NOT require field-only deletion; plain key deletion racing with get_table() is enough.

Also field-only deletion is possible in the stack:

client.hmset(_hash, data.begin(), data.end());
...
if (!found)
{
    client.hdel(_hash, k);
}

set_entry() can remove extra fields with HDEL while keeping the record key alive.

Evidence from pytest

The log shows rapid table deletes:

sudo config acl remove table DATA_INGRESS_L3TEST
...
sudo config acl remove table DATA_EGRESS_L3V6TEST

That high churn exactly matches a race window for the non-atomic get_table() read in caclmgrd.

So missing type can happen via:

  • field-level HDEL updates
  • key-level delete race during KEYS + HGETALL snapshot collection

SYSLOG:

syslog:2026 Mar 27 03:57:03.486679 sonic INFO start-LogAnalyzer-test_acl_config.2026-03-27-00:57:02
syslog:2026 Mar 27 03:57:08.378308 sonic ERR caclmgrd: Exception occured at Thread-2 (check_and_update_control_plane_acls) thread for namespace '' due to KeyErro
r('type')
syslog:2026 Mar 27 03:57:08.874919 sonic ERR caclmgrd: Main thread detected exception in child thread for namespace '': KeyError('type')
syslog:2026 Mar 27 03:57:08.875092 sonic ERR caclmgrd: Full traceback from child thread:
syslog:2026 Mar 27 03:57:08.875239 sonic ERR caclmgrd: Traceback (most recent call last):
syslog:2026 Mar 27 03:57:08.875386 sonic ERR caclmgrd:   File "/usr/local/bin/caclmgrd", line 970, in check_and_update_control_plane_acls
syslog:2026 Mar 27 03:57:08.875532 sonic ERR caclmgrd:     self.update_control_plane_acls(namespace, new_config_db_connector)
syslog:2026 Mar 27 03:57:08.875668 sonic ERR caclmgrd:     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
syslog:2026 Mar 27 03:57:08.875845 sonic ERR caclmgrd:   File "/usr/local/bin/caclmgrd", line 907, in update_control_plane_acls
syslog:2026 Mar 27 03:57:08.875984 sonic ERR caclmgrd:     iptables_cmds, service_to_source_ip_map  = self.get_acl_rules_and_translate_to_iptables_commands(names
pace, config_db_connector)
syslog:2026 Mar 27 03:57:08.876104 sonic ERR caclmgrd:                                                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^
syslog:2026 Mar 27 03:57:08.876176 sonic ERR caclmgrd:   File "/usr/local/bin/caclmgrd", line 740, in get_acl_rules_and_translate_to_iptables_commands
syslog:2026 Mar 27 03:57:08.876239 sonic ERR caclmgrd:     if table_data["type"] != self.ACL_TABLE_TYPE_CTRLPLANE:
syslog:2026 Mar 27 03:57:08.876302 sonic ERR caclmgrd:        ~~~~~~~~~~^^^^^^^^
syslog:2026 Mar 27 03:57:08.876362 sonic ERR caclmgrd: KeyError: 'type'
syslog:2026 Mar 27 03:57:08.876420 sonic ERR caclmgrd: Detect exception in Child thread, generating SIGKILL for main thread
syslog:2026 Mar 27 03:57:21.252620 sonic INFO end-LogAnalyzer-test_acl_config.2026-03-27-00:57:02
syslog:2026 Mar 27 03:57:22.329663 sonic INFO python3.13[156612]: ansible-extract_log Invoked with directory=/var/log file_prefix=syslog start_string=start-LogAnalyzer-test_acl_config.2026-03-27-00:57:02 target_filename=/tmp/syslog
syslog:2026 Mar 27 03:57:22.329874 sonic DEBUG extract_log for start string LogAnalyzer-test_acl_config.2026-03-27-00:57:02

A picture of a cute animal (not mandatory but encouraged)

      .---.        .-----------
     /     \  __  /    ------
    / /     \(  )/    -----
   //////   ' \/ `   ---
  //// / // :    : ---
 // /   /  /`    '--
//          //..\\
       ====UU====UU====
           '//||\\`
             ''``

Signed-off-by: Sonic Build Admin sonicbld@microsoft.com

This PR propagates a fix for timing issue caused by `caclmgrd` processing of ACL tables during Config DB record removal operation.

#### Evidence from code

* What `caclmgrd` does
```
self._tables_db_info = config_db_connector.get_table(self.ACL_TABLE)
...
if table_data["type"] != self.ACL_TABLE_TYPE_CTRLPLANE:
```

* `ConfigDBConnector.get_table()` is implemented as
```
const auto& keys = client.keys(pattern);
for (auto& key: keys)
{
    auto const& entry = client.hgetall<map<string, string>>(key);
```

This is a two-step read:
1. list keys (`KEYS ACL_TABLE|*`)
2. per key `HGETALL`

If a key is deleted/changed between step 1 and step 2, `hgetall` can return `{}`.
Then `table_data` exists as an empty map in `_tables_db_info`, and `table_data["type"]` throws `KeyError`.

So missing `type` does NOT require field-only deletion; plain key deletion racing with `get_table()` is enough.

Also field-only deletion is possible in the stack:
```
client.hmset(_hash, data.begin(), data.end());
...
if (!found)
{
    client.hdel(_hash, k);
}
```

`set_entry()` can remove extra fields with `HDEL` while keeping the record key alive.

#### Evidence from pytest

The log shows rapid table deletes:
```
sudo config acl remove table DATA_INGRESS_L3TEST
...
sudo config acl remove table DATA_EGRESS_L3V6TEST
```

That high churn exactly matches a race window for the non-atomic `get_table()` read in `caclmgrd`.

So missing `type` can happen via:
- field-level `HDEL` updates
- key-level delete race during `KEYS` + `HGETALL` snapshot collection

#### SYSLOG:
```
syslog:2026 Mar 27 03:57:03.486679 sonic INFO start-LogAnalyzer-test_acl_config.2026-03-27-00:57:02
syslog:2026 Mar 27 03:57:08.378308 sonic ERR caclmgrd: Exception occured at Thread-2 (check_and_update_control_plane_acls) thread for namespace '' due to KeyErro
r('type')
syslog:2026 Mar 27 03:57:08.874919 sonic ERR caclmgrd: Main thread detected exception in child thread for namespace '': KeyError('type')
syslog:2026 Mar 27 03:57:08.875092 sonic ERR caclmgrd: Full traceback from child thread:
syslog:2026 Mar 27 03:57:08.875239 sonic ERR caclmgrd: Traceback (most recent call last):
syslog:2026 Mar 27 03:57:08.875386 sonic ERR caclmgrd:   File "/usr/local/bin/caclmgrd", line 970, in check_and_update_control_plane_acls
syslog:2026 Mar 27 03:57:08.875532 sonic ERR caclmgrd:     self.update_control_plane_acls(namespace, new_config_db_connector)
syslog:2026 Mar 27 03:57:08.875668 sonic ERR caclmgrd:     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
syslog:2026 Mar 27 03:57:08.875845 sonic ERR caclmgrd:   File "/usr/local/bin/caclmgrd", line 907, in update_control_plane_acls
syslog:2026 Mar 27 03:57:08.875984 sonic ERR caclmgrd:     iptables_cmds, service_to_source_ip_map  = self.get_acl_rules_and_translate_to_iptables_commands(names
pace, config_db_connector)
syslog:2026 Mar 27 03:57:08.876104 sonic ERR caclmgrd:                                                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^
syslog:2026 Mar 27 03:57:08.876176 sonic ERR caclmgrd:   File "/usr/local/bin/caclmgrd", line 740, in get_acl_rules_and_translate_to_iptables_commands
syslog:2026 Mar 27 03:57:08.876239 sonic ERR caclmgrd:     if table_data["type"] != self.ACL_TABLE_TYPE_CTRLPLANE:
syslog:2026 Mar 27 03:57:08.876302 sonic ERR caclmgrd:        ~~~~~~~~~~^^^^^^^^
syslog:2026 Mar 27 03:57:08.876362 sonic ERR caclmgrd: KeyError: 'type'
syslog:2026 Mar 27 03:57:08.876420 sonic ERR caclmgrd: Detect exception in Child thread, generating SIGKILL for main thread
syslog:2026 Mar 27 03:57:21.252620 sonic INFO end-LogAnalyzer-test_acl_config.2026-03-27-00:57:02
syslog:2026 Mar 27 03:57:22.329663 sonic INFO python3.13[156612]: ansible-extract_log Invoked with directory=/var/log file_prefix=syslog start_string=start-LogAnalyzer-test_acl_config.2026-03-27-00:57:02 target_filename=/tmp/syslog
syslog:2026 Mar 27 03:57:22.329874 sonic DEBUG extract_log for start string LogAnalyzer-test_acl_config.2026-03-27-00:57:02
```

#### A picture of a cute animal (not mandatory but encouraged)
```
      .---.        .-----------
     /     \  __  /    ------
    / /     \(  )/    -----
   //////   ' \/ `   ---
  //// / // :    : ---
 // /   /  /`    '--
//          //..\\
       ====UU====UU====
           '//||\\`
             ''``
```

Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Author

Original PR: #370

@mssonicbld
Copy link
Copy Markdown
Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld mssonicbld merged commit 918b502 into sonic-net:202511 May 6, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant