[monitor-link-group] Add sonic-mgmt tests for Monitor Link Group feature#24555
Open
srodd-nexthop wants to merge 1 commit into
Open
[monitor-link-group] Add sonic-mgmt tests for Monitor Link Group feature#24555srodd-nexthop wants to merge 1 commit into
srodd-nexthop wants to merge 1 commit into
Conversation
Adds tests/monitor-link-group/ covering the Monitor Link Group (MLG) feature added in: - HLD: sonic-net/SONiC#2308 - swss: sonic-net/sonic-swss#4523 - YANG: sonic-net/sonic-buildimage#27004 - swss-common: sonic-net/sonic-swss-common#1181 - utilities: sonic-net/sonic-utilities#4497 Registers the suite under t0 and t1-lag in .azure-pipelines/pr_test_scripts.yaml. Coverage: - HLD scenarios 01, 04, 06, 07, 08, 14, 15 - Corner cases: chained groups, link-up-delay PENDING/flap/zero, min-monitored boundaries, config rollback - Runtime config-change paths (add/remove monitored, add managed to DOWN group, raise min-monitored, description-only update) - link-up-delay edge cases (reduce past elapsed, increase while pending, delete during pending) - Group lifecycle (delete UP, delete-and-recreate-same-name) - Boundary configs (min-monitored=0, no managed-link) - PortChannel coverage (as monitored, as managed) - Multi-group / multi-role fan-out (three roles, 8-group apply) - YANG validation negatives (same intf as monitored+managed, non-Ethernet member) - Resilience (swss restart, config save+reload — marked skip, disruptive) - Cycle detection (R-6): reject cyclic groups, accept after delete - PR-A transition tracking: last_state_change_*, pending_start_time, total_transitions counter - Stress / timing: rapid monitored-link flap convergence, concurrent shared pending - CLI / observability: show monitor-link-group output, PR-B transition lines, PR-C error-down (mlg) admin column Helpers in monitor_link_helpers.py centralize CONFIG_DB shape construction, state-DB polling, oper-state waits, group/member verification, and YANG-aware apply paths. conftest.py provides the interface pool fixture (mlg) that allocates real DUT ports / PortChannels, applies and rolls back CONFIG_DB mutations per-test, and skips on platforms without enough usable interfaces. Signed-off-by: satishkumar <srodd@nexthop.ai>
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
srodd-nexthop
added a commit
to nexthop-ai/SONiC
that referenced
this pull request
May 12, 2026
…ing, cycle detection, and show CLI
Major updates to reflect the in-flight implementation:
R-4: rename uplinks/downlinks to monitored-links/managed-links throughout
the document (definitions, schema tables, JSON examples, state machine,
sequence diagrams, ASCII topology, requirements list, restrictions,
and the YANG block).
R-6: add a 'Dependency-cycle rejection' subsection under multi-group/cross-role
support. Describes the directed dependency graph the daemon builds at SET
time, the strongly-connected-component check, and the observable signal
(no STATE_DB entry plus SWSS_LOG_ERROR) for cycle-forming configurations.
R-7: drop the empty-string defaults on the monitored-links and managed-links
leaf-lists. Updated YANG block reflects the cleaner schema.
R-10: add the second YANG 'must' constraint bounding min-monitored-links by
count(monitored-links). Restrictions section updated accordingly.
PR-A: extend the STATE_DB schema table with last_state_change_{from,to,time},
pending_start_time (set on entry to PENDING, cleared on entry to UP),
and total_transitions. All transition-tracking fields are optional so
legacy consumers ignore them safely.
PR-B: show CLI sample now renders 'Last change:', 'Transitions:', and
'(elapsed: Xs, remaining: Ys)' for PENDING groups. Documented field
semantics and the OVERDUE fallback when the timer overshoots.
PR-C: new paragraph documenting 'error-down (mlg)' rendering in
'show interface status' and 'show interface description' for
MLG-held managed interfaces, plus the per-source tag convention.
Section 11 Testing Requirements rewritten as a structured plan (unit tests +
system tests + negative tests) referencing the parallel sonic-mgmt PR
sonic-net/sonic-mgmt#24555, using Step/Goal/Expected-results tables aligned
with the Overlay-ECMP HLD format.
Revision bumped to 0.3.
Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of PR
Adds the sonic-mgmt test suite for the Monitor Link Group (MLG) feature in
tests/monitor-link-group/, registered for PR CI.MLG tracks a set of monitored-links (uplinks or PortChannels) and brings managed-link interfaces admin-down via
force_downwhen the count of operationally-up monitored-links falls below a configurable threshold. Use cases: dual-homed servers, leaf-spine fabrics where downlinks should not forward when upstream connectivity is lost.Summary
Approach:
sonic-cfggen -j -worconfig apply-patch(no JSON files loaded from images)STATE_DB:MONITOR_LINK_GROUP_STATE_TABLEandSTATE_DB:MONITOR_LINK_GROUP_MEMBER_TABLECoverage:
last_state_change_*,pending_start_time,total_transitionscountershow monitor-link-groupoutput, PR-B transition lines, PR-Cerror-down (mlg)admin columnRelated PRs
Type of change
Approach
What is the motivation for this PR?
End-to-end validation for the MLG feature across the state machine (DOWN / PENDING / UP), refcount semantics, persistence, and CLI surface.
How did you do it?
conftest.pyprovides anmlgfixture wrapping an interface allocator. Each test allocates the ports it needs, applies a small CONFIG_DB delta viamlg.apply(...), and verifies STATE_DB / oper-state via helpers inmonitor_link_helpers.py. Negative tests useapply_config_rawto exercise YANG rejection paths and assert STATE_DB stays empty for the bad group.How did you verify/test it?
The full suite runs cleanly on a multi-port DUT. Disruptive tests (
test_swss_restart_recovers_state,test_config_save_then_reload_persists) are markedpytest.mark.skipbecause they drop BGP sessions and exceed the post-test environment-check budget; they can be unmarked for manual runs.Any platform specific information?
Platform-neutral. Tests require enough usable Ethernet ports (and optionally PortChannels) per topology — the fixture skips when insufficient interfaces are available.