Skip to content

[monitor-link-group] Add sonic-mgmt tests for Monitor Link Group feature#24555

Open
srodd-nexthop wants to merge 1 commit into
sonic-net:masterfrom
nexthop-ai:srodd.monitor-link-group
Open

[monitor-link-group] Add sonic-mgmt tests for Monitor Link Group feature#24555
srodd-nexthop wants to merge 1 commit into
sonic-net:masterfrom
nexthop-ai:srodd.monitor-link-group

Conversation

@srodd-nexthop
Copy link
Copy Markdown

Description of PR

Adds the sonic-mgmt test suite for the Monitor Link Group (MLG) feature in tests/monitor-link-group/, registered for PR CI.

MLG tracks a set of monitored-links (uplinks or PortChannels) and brings managed-link interfaces admin-down via force_down when the count of operationally-up monitored-links falls below a configurable threshold. Use cases: dual-homed servers, leaf-spine fabrics where downlinks should not forward when upstream connectivity is lost.

Summary

Approach:

  • All CONFIG_DB writes go through sonic-cfggen -j -w or config apply-patch (no JSON files loaded from images)
  • Group state and per-member force/allow state verified by polling STATE_DB:MONITOR_LINK_GROUP_STATE_TABLE and STATE_DB:MONITOR_LINK_GROUP_MEMBER_TABLE
  • Per-test fixture allocates real DUT ports and PortChannels from an interface pool, applies/rolls back CONFIG_DB mutations, and skips on platforms without enough usable interfaces

Coverage:

Area Tests
HLD numbered scenarios 01, 04, 06, 07, 08, 14, 15
Corner cases chained cross-role groups, link-up-delay PENDING/flap/zero, min-monitored boundaries, config rollback
Runtime config-change add/remove monitored, add managed to DOWN group, raise min-monitored, description-only update
link-up-delay edge cases reduce past elapsed, increase while pending, delete during pending
Group lifecycle delete UP, delete-and-recreate-same-name
Boundary configs min-monitored=0, no managed-link
PortChannel coverage as monitored, as managed
Multi-group / multi-role fan-out three roles per interface, 8-group simultaneous apply
YANG negatives same intf as monitored+managed, non-Ethernet member
Resilience swss restart, config save+reload (skip-marked, disruptive)
Cycle detection (R-6) reject cyclic groups, accept after delete
PR-A transition tracking last_state_change_*, pending_start_time, total_transitions counter
Stress / timing rapid monitored-link flap convergence, concurrent shared pending
CLI / observability show monitor-link-group output, PR-B transition lines, PR-C error-down (mlg) admin column

Related PRs

Type of change

  • Test case
  • Bug fix
  • Test case enhancement
  • Add new feature
  • Documentation
  • Other (please describe)

Approach

What is the motivation for this PR?

End-to-end validation for the MLG feature across the state machine (DOWN / PENDING / UP), refcount semantics, persistence, and CLI surface.

How did you do it?

conftest.py provides an mlg fixture wrapping an interface allocator. Each test allocates the ports it needs, applies a small CONFIG_DB delta via mlg.apply(...), and verifies STATE_DB / oper-state via helpers in monitor_link_helpers.py. Negative tests use apply_config_raw to exercise YANG rejection paths and assert STATE_DB stays empty for the bad group.

How did you verify/test it?

The full suite runs cleanly on a multi-port DUT. Disruptive tests (test_swss_restart_recovers_state, test_config_save_then_reload_persists) are marked pytest.mark.skip because they drop BGP sessions and exceed the post-test environment-check budget; they can be unmarked for manual runs.

Any platform specific information?

Platform-neutral. Tests require enough usable Ethernet ports (and optionally PortChannels) per topology — the fixture skips when insufficient interfaces are available.

Adds tests/monitor-link-group/ covering the Monitor Link Group (MLG)
feature added in:
  - HLD:           sonic-net/SONiC#2308
  - swss:          sonic-net/sonic-swss#4523
  - YANG:          sonic-net/sonic-buildimage#27004
  - swss-common:   sonic-net/sonic-swss-common#1181
  - utilities:     sonic-net/sonic-utilities#4497

Registers the suite under t0 and t1-lag in
.azure-pipelines/pr_test_scripts.yaml.

Coverage:
  - HLD scenarios 01, 04, 06, 07, 08, 14, 15
  - Corner cases: chained groups, link-up-delay PENDING/flap/zero,
    min-monitored boundaries, config rollback
  - Runtime config-change paths (add/remove monitored, add managed to
    DOWN group, raise min-monitored, description-only update)
  - link-up-delay edge cases (reduce past elapsed, increase while
    pending, delete during pending)
  - Group lifecycle (delete UP, delete-and-recreate-same-name)
  - Boundary configs (min-monitored=0, no managed-link)
  - PortChannel coverage (as monitored, as managed)
  - Multi-group / multi-role fan-out (three roles, 8-group apply)
  - YANG validation negatives (same intf as monitored+managed,
    non-Ethernet member)
  - Resilience (swss restart, config save+reload — marked skip,
    disruptive)
  - Cycle detection (R-6): reject cyclic groups, accept after delete
  - PR-A transition tracking: last_state_change_*, pending_start_time,
    total_transitions counter
  - Stress / timing: rapid monitored-link flap convergence,
    concurrent shared pending
  - CLI / observability: show monitor-link-group output, PR-B
    transition lines, PR-C error-down (mlg) admin column

Helpers in monitor_link_helpers.py centralize CONFIG_DB shape
construction, state-DB polling, oper-state waits, group/member
verification, and YANG-aware apply paths.

conftest.py provides the interface pool fixture (mlg) that allocates
real DUT ports / PortChannels, applies and rolls back CONFIG_DB
mutations per-test, and skips on platforms without enough usable
interfaces.

Signed-off-by: satishkumar <srodd@nexthop.ai>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

srodd-nexthop added a commit to nexthop-ai/SONiC that referenced this pull request May 12, 2026
…ing, cycle detection, and show CLI

Major updates to reflect the in-flight implementation:

  R-4: rename uplinks/downlinks to monitored-links/managed-links throughout
       the document (definitions, schema tables, JSON examples, state machine,
       sequence diagrams, ASCII topology, requirements list, restrictions,
       and the YANG block).

  R-6: add a 'Dependency-cycle rejection' subsection under multi-group/cross-role
       support. Describes the directed dependency graph the daemon builds at SET
       time, the strongly-connected-component check, and the observable signal
       (no STATE_DB entry plus SWSS_LOG_ERROR) for cycle-forming configurations.

  R-7: drop the empty-string defaults on the monitored-links and managed-links
       leaf-lists. Updated YANG block reflects the cleaner schema.

  R-10: add the second YANG 'must' constraint bounding min-monitored-links by
        count(monitored-links). Restrictions section updated accordingly.

  PR-A: extend the STATE_DB schema table with last_state_change_{from,to,time},
        pending_start_time (set on entry to PENDING, cleared on entry to UP),
        and total_transitions. All transition-tracking fields are optional so
        legacy consumers ignore them safely.

  PR-B: show CLI sample now renders 'Last change:', 'Transitions:', and
        '(elapsed: Xs, remaining: Ys)' for PENDING groups. Documented field
        semantics and the OVERDUE fallback when the timer overshoots.

  PR-C: new paragraph documenting 'error-down (mlg)' rendering in
        'show interface status' and 'show interface description' for
        MLG-held managed interfaces, plus the per-source tag convention.

Section 11 Testing Requirements rewritten as a structured plan (unit tests +
system tests + negative tests) referencing the parallel sonic-mgmt PR
sonic-net/sonic-mgmt#24555, using Step/Goal/Expected-results tables aligned
with the Overlay-ECMP HLD format.

Revision bumped to 0.3.

Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>
@srodd-nexthop srodd-nexthop marked this pull request as ready for review May 12, 2026 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants