Skip to content

# [show] Add 'show monitor-link' command#4497

Open
srodd-nexthop wants to merge 6 commits into
sonic-net:masterfrom
nexthop-ai:srodd.monitor-link-show
Open

# [show] Add 'show monitor-link' command#4497
srodd-nexthop wants to merge 6 commits into
sonic-net:masterfrom
nexthop-ai:srodd.monitor-link-show

Conversation

@srodd-nexthop
Copy link
Copy Markdown

@srodd-nexthop srodd-nexthop commented Apr 27, 2026

What I did

Add show monitor-link and show monitor-link <group> CLI commands for the Monitor Link Group feature. Also include show monitor-link output in techsupport dumps.

Files changed:

File Change
show/monitor_link.py New — CLI implementation
show/main.py Register monitor-link subcommand group
utilities_common/cli.py Helper for table formatting
scripts/generate_dump Add show monitor-link to techsupport collection
tests/monitor_link_test.py New — unit tests
tests/mock_tables/state_db.json State DB fixtures for tests

Why I did it

Provides operational visibility into monitor link group state for troubleshooting and post-incident debugging. Without this, group state is only visible by querying STATE_DB directly with redis-cli.

Design

show monitor-link reads three STATE_DB tables:

  • MONITOR_LINK_GROUP_STATE — group config and current state (UP / DOWN / PENDING)
  • PORT_TABLE / LAG_TABLE — to derive live operational uplink count
  • MONITOR_LINK_GROUP_MEMBER — to show which groups are forcing each downlink down (down_due_to)

Output columns: Group Name, State, Uplinks Up/Total, Downlinks, Min-Uplinks, Link-Up-Delay, Description.

show monitor-link <group> shows per-interface detail for that group's downlinks, including the down_due_to reason when an interface is forced down by multiple groups.

generate_dump includes show monitor-link output alongside existing port and LAG state in techsupport.

How I verified it

  • tests/monitor_link_test.py covers: table output across UP, DOWN, and PENDING states; multi-group downlink attribution; mixed Ethernet / PortChannel interfaces; single-group detail view.

Companion PRs

  • sonic-swss-common: STATE_DB table name macros
  • sonic-swss: MonitorLinkGroupMgr implementation
  • sonic-yang-models: YANG model
  • SONiC/SONiC: HLD

HLD: sonic-net/SONiC#2308
Yang Changes: sonic-net/sonic-buildimage#27004
swss: sonic-net/sonic-swss#4523
swss-common: sonic-net/sonic-swss-common#1181

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@srodd-nexthop srodd-nexthop force-pushed the srodd.monitor-link-show branch from 133bcb3 to a339fd8 Compare April 28, 2026 05:34
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Add CLI show command for Monitor Link Group feature.
Reads MONITOR_LINK_GROUP_STATE and MONITOR_LINK_GROUP_MEMBER from STATE_DB
and displays group state, interface membership, uplink counts, and
force-down reasons for downlinks.

Adds get_interface_operational_status() helper to utilities_common/cli.py;
probes PORT_TABLE then LAG_TABLE to determine oper state without relying
on interface name prefix.

Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>
@srodd-nexthop srodd-nexthop force-pushed the srodd.monitor-link-show branch from a339fd8 to 90f35a7 Compare April 28, 2026 08:00
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Companion to the swss daemon rename. Consumes the renamed STATE_DB
MONITOR_LINK_GROUP_STATE fields (monitored-links, managed-links) and renames
the parallel CLI labels and variables.

Display changes:
  "Uplinks Up:"       -> "Monitored Up:"
  "Min-uplinks:"      -> "Min-monitored-links:"
  "(N uplinks, M downlinks)" -> "(N monitored, M managed)"

link_type internal values: 'uplink'->'monitored', 'downlink'->'managed'.

Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>
…onitor-link

Two upstream-port pieces wrapped together because they sit in the same
sonic-utilities branch (srodd.monitor-link-show) and share test surface.

PR-C: `show interface status` -- error-down rendering for MLG-held interfaces

Today both user-shut and MLG-policy-shut interfaces render plain "down" in
the Admin column, so operators chase port/transceiver issues for what is
actually policy enforcement.

New helper _is_held_down_by_monitor_link_group(db, intf) reads STATE_DB.
MONITOR_LINK_GROUP_MEMBER|<intf>.state:
  state == "allow_up"   -> group is healthy; admin_status reflects user intent.
  anything else         -> MLG is holding the port down; return "error-down (mlg)".

Both appl_db_port_status_get (physical ports) and
appl_db_portchannel_status_get (LAGs) consult the helper when status_type
is admin_status and APPL_DB says "down". Sub-interfaces are not tracked
by MLG and pass through unchanged.

The "(mlg)" tag names the source so a future feature driving admin-down
through the same code path can take a distinct tag rather than collapsing
into a generic "error-down".

12 unit tests in tests/intfutil_monitor_link_test.py exercise the helper
(db None, missing key, missing field, allow_up, force_down) and both
getters in admin-up / user-shutdown / MLG-held flavors. Tests slice the
helper + two getters out of the intfutil script via regex into a fresh
namespace, so they don't need the SONiC dev container.

PR-B: `show monitor-link` -- consume PR-A's transition history

Three new helpers in show/monitor_link.py:
  format_last_change(group_data)
    Renders "YYYY-MM-DD HH:MM:SS UTC (FROM -> TO, reason: ...)" or None.
  format_linkup_delay(group_data)
    Baseline "N seconds". When state == PENDING and pending_start_time is
    set, appends "(elapsed: Xs, remaining: Ys)" inside the window or
    "(elapsed: Xs, OVERDUE by Ys)" once raw_elapsed > delay -- the OVERDUE
    branch surfaces a stuck daemon instead of clamping the display to 100%.
  format_transitions(group_data)
    "UP->DOWN=N, DOWN->UP=M" counter line.

get_monitor_link_groups extracts the new STATE_DB fields
(last_state_change_*, pending_start_time, *_count); all default to empty/0
so legacy groups without these fields render unchanged.

Existing 13 tests in monitor_link_test.py and the mock_tables/state_db.json
updated for R-4 (monitored-links / managed-links / Min-monitored-links).
Five new tests under TestMonitorLinkTransitionTracking cover the new lines:
last-change rendering, transition counter line, PENDING progress with
mocked time.time(), OVERDUE display when timer has overshot, and the
backward-compat case with transition fields absent.

Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>
Mirror of the swss-side simplification (single counter, no reason field).

show/monitor_link.py:
  * get_monitor_link_groups: read total_transitions instead of
    up_to_down_count + down_to_up_count; drop last_state_change_reason.
  * format_transitions returns just the number; the show CLI's "Transitions:"
    label line becomes `Transitions:           N` (vs. the previous
    `UP->DOWN=N, DOWN->UP=M`).
  * format_last_change drops the `, reason: ...` segment; output is now
    `YYYY-MM-DD HH:MM:SS UTC (FROM -> TO)`.

tests/mock_tables/state_db.json: replace directional counter fixtures with
a single total_transitions on all three groups; drop last_state_change_reason.

tests/monitor_link_test.py:
  * Update MOCK_STATE_DB_DATA dict keys.
  * test_last_change_line_shown drops the reason-text assertion.
  * test_transitions_counter_line asserts `Transitions:           5` for the
    test_group fixture instead of the directional pair.
  * test_no_last_change_when_fields_missing asserts the counter line
    defaults to `0` instead of the directional pair.

PR-C tests (intfutil_monitor_link_test.py) are unaffected; 12/12 still pass.

Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>
The R-4 rename made 'Min-monitored-links:' (20 chars) the widest label, so
all rows now align to column 23. The 'Link-up-delay:' line gets 9 trailing
spaces instead of the 4 the test was hardcoded against.

Update the literal to the new alignment so the assertion matches the
renderer's output.

Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>
@srodd-nexthop srodd-nexthop force-pushed the srodd.monitor-link-show branch from 22d7f95 to 8b75990 Compare May 14, 2026 04:00
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Unrelated flake in flow_counter_stats_test.py::test_add_pattern_repeatly
(test relies on a fresh mock CONFIG_DB but pytest-xdist scheduling can
leave 2000::/64 already populated). Triggering a fresh run to reshuffle
test ordering.

Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants