[action] [PR:368] [featured]: Skip redundant CONFIG_DB writes in sync_feature_scope#382
Merged
Merged
Conversation
#### Why I did it Fix sonic-net#369 Partially address sonic-net/sonic-buildimage#26350 `sync_feature_scope()` unconditionally writes to CONFIG_DB for every feature at startup and on feature state changes at runtime without checking whether the values have actually changed. We should avoid redundant writes that does not update CONFIG_DB. #### How I did it * Added a read-before-write check in `sync_feature_scope()`: read the current `FEATURE|<name>` entry from CONFIG_DB via `get_entry()`, and only call `mod_entry()` when `has_per_asic_scope` or `has_global_scope` actually differs from the intended value. * Guarded the `get_entry()` return with `or {}` to safely handle `None` returns, consistent with the defensive pattern already used in `resync_feature_state()` and `sync_feature_delay_state()` in the same file. * The conditional write applies to both the host CONFIG_DB and per-namespace CONFIG_DB on multi-ASIC platforms. * Added unit tests covering three cases: values unchanged (no write), values changed (write), and missing/empty entry (write). #### How to verify it I used the following script to test the update on a virtual switch in GNS3. We can iterate quickly since the modified Python file is at `/usr/local/bin/featured` and can be updated on the fly. The virtual switch image is downloaded from [official pipeline build on 2026/03/30](https://dev.azure.com/mssonic/be1b070f-be15-4154-aade-b1d3bfb17054/_build/results?buildId=1074536) on commit [ab081ee](sonic-net/sonic-buildimage@ab081ee). <details> <summary>Bash Script & Setup</summary> I had three files on the switch: ``` admin@sonic:~$ ls new.py old.py test_featured.sh ``` `old.py` is copied from `/usr/local/bin/featured` on the switch. It is almost identical to the [latest previous commit that modified the script](sonic-net@2c5bf36), except due to compilation the first line changed from `#!/usr/bin/env python3` to `#!/usr/bin/python3`. `new.py` is almost identical to the script in PR, except the first line also changed from `#!/usr/bin/env python3` to `#!/usr/bin/python3`. The bash script `test_featured.sh` use either `old.py` or `new.py` to overwrite `/usr/local/bin/featured`, use systemctl to reload featured, then run `config reload` and see how many feature writes there are & how long did it take until the last write. ```bash #!/bin/bash # test_featured.sh - Test featured daemon CONFIG_DB write behavior # Usage: ./test_featured.sh old|new # # Measures how many redundant FEATURE scope writes (has_per_asic_scope, # has_global_scope) the featured daemon makes after config reload, and # how long the CONFIG_DB mutation window stays open. set -e if [ -z "$1" ] || [[ "$1" != "old" && "$1" != "new" ]]; then echo "Usage: $0 old|new" exit 1 fi VERSION="$1" LOG="/tmp/monitor_${VERSION}.log" # Clean up any leftover redis-cli MONITOR processes from previous runs sudo pkill -f "redis-cli.*MONITOR" 2>/dev/null || true sudo pkill -f "docker exec.*redis-cli.*MONITOR" 2>/dev/null || true sleep 1 rm -f "$LOG" # Deploy the selected version of featured echo "==> Installing ${VERSION} version of featured..." sudo cp ~/${VERSION}.py /usr/local/bin/featured sudo systemctl restart featured echo "==> Waiting for featured to settle..." sleep 5 # Start capturing all CONFIG_DB operations echo "==> Starting CONFIG_DB monitor..." redis-cli -n 4 MONITOR < /dev/null > "$LOG" 2>&1 & MONITOR_PID=$! sleep 2 # Run config reload and measure wall-clock time echo "==> Running config reload..." RELOAD_START=$(date +%s.%N) sudo config reload -y -f RELOAD_END=$(date +%s.%N) RELOAD_DURATION=$(python3 -c "print(f'{$RELOAD_END - $RELOAD_START:.2f}')") # Wait for any background daemon writes to finish echo "==> Waiting 90 seconds for background writes to complete..." sleep 90 # Stop monitoring kill $MONITOR_PID 2>/dev/null || true sudo pkill -f "redis-cli.*MONITOR" 2>/dev/null || true sleep 1 # --- Analysis --- # Count: how many times featured wrote scope fields to FEATURE entries SCOPE_WRITE_COUNT=$(grep --text -c 'HSET.*FEATURE.*\(has_per_asic_scope\|has_global_scope\)' "$LOG" || true) # Timestamp of the first CONFIG_DB write (sonic-cfggen bulk load start) FIRST_DB_WRITE=$(grep --text '"HSET"' "$LOG" | head -1 | awk '{print $1}') # Timestamp of the last FEATURE scope field write by featured daemon LAST_SCOPE_WRITE=$(grep --text 'HSET.*FEATURE.*\(has_per_asic_scope\|has_global_scope\)' "$LOG" | tail -1 | awk '{print $1}') # Timestamp of the very last CONFIG_DB write of any kind LAST_ANY_WRITE=$(grep --text '"HSET"' "$LOG" | tail -1 | awk '{print $1}') # Calculate how long after config reload returned the scope writes continued if [ -n "$LAST_SCOPE_WRITE" ] && [ -n "$FIRST_DB_WRITE" ]; then SCOPE_WINDOW=$(python3 -c "print(f'{$LAST_SCOPE_WRITE - $RELOAD_END:.2f}')") SCOPE_DURATION=$(python3 -c "print(f'{$LAST_SCOPE_WRITE - $FIRST_DB_WRITE:.2f}')") else SCOPE_WINDOW="N/A (no scope writes detected)" SCOPE_DURATION="N/A" fi echo "" echo "============================================================" echo " Featured Daemon CONFIG_DB Write Test Results" echo " Version: ${VERSION}" echo "============================================================" echo "" echo " Config reload wall-clock duration: ${RELOAD_DURATION}s" echo "" echo " FEATURE scope writes (has_per_asic_scope / has_global_scope):" echo " Total write count: ${SCOPE_WRITE_COUNT}" echo " First CONFIG_DB write: ${FIRST_DB_WRITE:-N/A}" echo " Last scope write by featured: ${LAST_SCOPE_WRITE:-none}" echo " Last CONFIG_DB write (any): ${LAST_ANY_WRITE:-N/A}" echo "" echo " Seconds from first DB write to last scope write:" echo " ${SCOPE_DURATION}" echo "" echo " Seconds from config reload RETURNING to last scope write:" echo " ${SCOPE_WINDOW}" echo "" echo " Full log: ${LOG}" echo "============================================================" ``` </details> <details> <summary>Full Logs Running the Script on VS</summary> ``` admin@sonic:~$ chmod +x test_featured.sh # Run the old and new version alternatively, test each multiple times # This output is caputred after running `sudo config save-y` and running the script multiple times previously admin@sonic:~$ ./test_featured.sh old && \ sleep 20 && \ ./test_featured.sh new && \ sleep 20 && \ ./test_featured.sh old && \ sleep 20 && \ ./test_featured.sh new ==> Installing old version of featured... ==> Waiting for featured to settle... ==> Starting CONFIG_DB monitor... ==> Running config reload... Acquired lock on /etc/sonic/reload.lock Disabling container and routeCheck monitoring ... Running command: sudo systemctl stop aaastatsd.timer Running command: sudo systemctl stop featured.timer Running command: sudo systemctl stop hostcfgd.timer Running command: sudo systemctl stop tacacs-config.timer Stopping SONiC target ... Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db Running command: /usr/local/bin/db_migrator.py -o migrate Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment Restarting SONiC target ... Enabling container and routeCheck monitoring ... Reloading Monit configuration ... Reinitializing monit daemon Released lock on /etc/sonic/reload.lock ==> Waiting 90 seconds for background writes to complete... ============================================================ Featured Daemon CONFIG_DB Write Test Results Version: old ============================================================ Config reload wall-clock duration: 30.63s FEATURE scope writes (has_per_asic_scope / has_global_scope): Total write count: 84 First CONFIG_DB write: 1774906714.590470 Last scope write by featured: 1774906746.737287 Last CONFIG_DB write (any): 1774906836.590132 Seconds from first DB write to last scope write: 32.15 Seconds from config reload RETURNING to last scope write: -0.30 Full log: /tmp/monitor_old.log ============================================================ ==> Installing new version of featured... ==> Waiting for featured to settle... ==> Starting CONFIG_DB monitor... ==> Running config reload... Acquired lock on /etc/sonic/reload.lock Disabling container and routeCheck monitoring ... Running command: sudo systemctl stop aaastatsd.timer Running command: sudo systemctl stop featured.timer Running command: sudo systemctl stop hostcfgd.timer Running command: sudo systemctl stop tacacs-config.timer Stopping SONiC target ... Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db Running command: /usr/local/bin/db_migrator.py -o migrate Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment Restarting SONiC target ... Enabling container and routeCheck monitoring ... Reloading Monit configuration ... Reinitializing monit daemon Released lock on /etc/sonic/reload.lock ==> Waiting 90 seconds for background writes to complete... ============================================================ Featured Daemon CONFIG_DB Write Test Results Version: new ============================================================ Config reload wall-clock duration: 30.62s FEATURE scope writes (has_per_asic_scope / has_global_scope): Total write count: 38 First CONFIG_DB write: 1774906865.598661 Last scope write by featured: 1774906886.589453 Last CONFIG_DB write (any): 1774906987.320508 Seconds from first DB write to last scope write: 20.99 Seconds from config reload RETURNING to last scope write: -11.33 Full log: /tmp/monitor_new.log ============================================================ ==> Installing old version of featured... ==> Waiting for featured to settle... ==> Starting CONFIG_DB monitor... ==> Running config reload... Acquired lock on /etc/sonic/reload.lock Disabling container and routeCheck monitoring ... Running command: sudo systemctl stop aaastatsd.timer Running command: sudo systemctl stop featured.timer Running command: sudo systemctl stop hostcfgd.timer Running command: sudo systemctl stop tacacs-config.timer Stopping SONiC target ... Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db Running command: /usr/local/bin/db_migrator.py -o migrate Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment Restarting SONiC target ... Enabling container and routeCheck monitoring ... Reloading Monit configuration ... Reinitializing monit daemon Released lock on /etc/sonic/reload.lock ==> Waiting 90 seconds for background writes to complete... ============================================================ Featured Daemon CONFIG_DB Write Test Results Version: old ============================================================ Config reload wall-clock duration: 30.14s FEATURE scope writes (has_per_asic_scope / has_global_scope): Total write count: 84 First CONFIG_DB write: 1774907016.330388 Last scope write by featured: 1774907046.978066 Last CONFIG_DB write (any): 1774907137.935647 Seconds from first DB write to last scope write: 30.65 Seconds from config reload RETURNING to last scope write: -1.36 Full log: /tmp/monitor_old.log ============================================================ ==> Installing new version of featured... ==> Waiting for featured to settle... ==> Starting CONFIG_DB monitor... ==> Running config reload... Acquired lock on /etc/sonic/reload.lock Disabling container and routeCheck monitoring ... Running command: sudo systemctl stop aaastatsd.timer Running command: sudo systemctl stop featured.timer Running command: sudo systemctl stop hostcfgd.timer Running command: sudo systemctl stop tacacs-config.timer Stopping SONiC target ... Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db Running command: /usr/local/bin/db_migrator.py -o migrate Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment Restarting SONiC target ... Enabling container and routeCheck monitoring ... Reloading Monit configuration ... Reinitializing monit daemon Released lock on /etc/sonic/reload.lock ==> Waiting 90 seconds for background writes to complete... ============================================================ Featured Daemon CONFIG_DB Write Test Results Version: new ============================================================ Config reload wall-clock duration: 30.02s FEATURE scope writes (has_per_asic_scope / has_global_scope): Total write count: 38 First CONFIG_DB write: 1774907166.717689 Last scope write by featured: 1774907187.528028 Last CONFIG_DB write (any): 1774907288.738037 Seconds from first DB write to last scope write: 20.81 Seconds from config reload RETURNING to last scope write: -11.11 Full log: /tmp/monitor_new.log ============================================================ ``` </details> TLDR: The number of scope writes dropped from 84 to 38. The time from first CONFIG_DB write to last scope write dropped from ~30–32s to ~21s, so scope synchronization completes about 10s earlier, shortening the potential race-condition window. #### Description for the changelog Skip redundant CONFIG_DB writes in featured daemon's sync_feature_scope to speed up config reload. Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
Author
|
Original PR: #368 |
Author
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why I did it
Fix #369
Partially address sonic-net/sonic-buildimage#26350
sync_feature_scope()unconditionally writes to CONFIG_DB for every feature at startup and on feature state changes at runtime without checking whether the values have actually changed.We should avoid redundant writes that does not update CONFIG_DB.
How I did it
sync_feature_scope(): read the currentFEATURE|<name>entry from CONFIG_DB viaget_entry(), and only callmod_entry()whenhas_per_asic_scopeorhas_global_scopeactually differs from the intended value.get_entry()return withor {}to safely handleNonereturns, consistent with the defensive pattern already used inresync_feature_state()andsync_feature_delay_state()in the same file.How to verify it
I used the following script to test the update on a virtual switch in GNS3. We can iterate quickly since the modified Python file is at
/usr/local/bin/featuredand can be updated on the fly.The virtual switch image is downloaded from official pipeline build on 2026/03/30 on commit ab081ee.
Bash Script & Setup
I had three files on the switch:
old.pyis copied from/usr/local/bin/featuredon the switch. It is almost identical to the latest previous commit that modified the script, except due to compilation the first line changed from#!/usr/bin/env python3to#!/usr/bin/python3.new.pyis almost identical to the script in PR, except the first line also changed from#!/usr/bin/env python3to#!/usr/bin/python3.The bash script
test_featured.shuse eitherold.pyornew.pyto overwrite/usr/local/bin/featured, use systemctl to reload featured, then runconfig reloadand see how many feature writes there are & how long did it take until the last write.Full Logs Running the Script on VS
TLDR: The number of scope writes dropped from 84 to 38. The time from first CONFIG_DB write to last scope write dropped from ~30–32s to ~21s, so scope synchronization completes about 10s earlier, shortening the potential race-condition window.
Description for the changelog
Skip redundant CONFIG_DB writes in featured daemon's sync_feature_scope to speed up config reload.
Signed-off-by: Sonic Build Admin sonicbld@microsoft.com