# [show] Add 'show monitor-link' command#4497
Open
srodd-nexthop wants to merge 6 commits into
Open
Conversation
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This was referenced Apr 27, 2026
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
133bcb3 to
a339fd8
Compare
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Add CLI show command for Monitor Link Group feature. Reads MONITOR_LINK_GROUP_STATE and MONITOR_LINK_GROUP_MEMBER from STATE_DB and displays group state, interface membership, uplink counts, and force-down reasons for downlinks. Adds get_interface_operational_status() helper to utilities_common/cli.py; probes PORT_TABLE then LAG_TABLE to determine oper state without relying on interface name prefix. Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>
a339fd8 to
90f35a7
Compare
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Open
6 tasks
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Companion to the swss daemon rename. Consumes the renamed STATE_DB MONITOR_LINK_GROUP_STATE fields (monitored-links, managed-links) and renames the parallel CLI labels and variables. Display changes: "Uplinks Up:" -> "Monitored Up:" "Min-uplinks:" -> "Min-monitored-links:" "(N uplinks, M downlinks)" -> "(N monitored, M managed)" link_type internal values: 'uplink'->'monitored', 'downlink'->'managed'. Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>
…onitor-link
Two upstream-port pieces wrapped together because they sit in the same
sonic-utilities branch (srodd.monitor-link-show) and share test surface.
PR-C: `show interface status` -- error-down rendering for MLG-held interfaces
Today both user-shut and MLG-policy-shut interfaces render plain "down" in
the Admin column, so operators chase port/transceiver issues for what is
actually policy enforcement.
New helper _is_held_down_by_monitor_link_group(db, intf) reads STATE_DB.
MONITOR_LINK_GROUP_MEMBER|<intf>.state:
state == "allow_up" -> group is healthy; admin_status reflects user intent.
anything else -> MLG is holding the port down; return "error-down (mlg)".
Both appl_db_port_status_get (physical ports) and
appl_db_portchannel_status_get (LAGs) consult the helper when status_type
is admin_status and APPL_DB says "down". Sub-interfaces are not tracked
by MLG and pass through unchanged.
The "(mlg)" tag names the source so a future feature driving admin-down
through the same code path can take a distinct tag rather than collapsing
into a generic "error-down".
12 unit tests in tests/intfutil_monitor_link_test.py exercise the helper
(db None, missing key, missing field, allow_up, force_down) and both
getters in admin-up / user-shutdown / MLG-held flavors. Tests slice the
helper + two getters out of the intfutil script via regex into a fresh
namespace, so they don't need the SONiC dev container.
PR-B: `show monitor-link` -- consume PR-A's transition history
Three new helpers in show/monitor_link.py:
format_last_change(group_data)
Renders "YYYY-MM-DD HH:MM:SS UTC (FROM -> TO, reason: ...)" or None.
format_linkup_delay(group_data)
Baseline "N seconds". When state == PENDING and pending_start_time is
set, appends "(elapsed: Xs, remaining: Ys)" inside the window or
"(elapsed: Xs, OVERDUE by Ys)" once raw_elapsed > delay -- the OVERDUE
branch surfaces a stuck daemon instead of clamping the display to 100%.
format_transitions(group_data)
"UP->DOWN=N, DOWN->UP=M" counter line.
get_monitor_link_groups extracts the new STATE_DB fields
(last_state_change_*, pending_start_time, *_count); all default to empty/0
so legacy groups without these fields render unchanged.
Existing 13 tests in monitor_link_test.py and the mock_tables/state_db.json
updated for R-4 (monitored-links / managed-links / Min-monitored-links).
Five new tests under TestMonitorLinkTransitionTracking cover the new lines:
last-change rendering, transition counter line, PENDING progress with
mocked time.time(), OVERDUE display when timer has overshot, and the
backward-compat case with transition fields absent.
Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>
Mirror of the swss-side simplification (single counter, no reason field).
show/monitor_link.py:
* get_monitor_link_groups: read total_transitions instead of
up_to_down_count + down_to_up_count; drop last_state_change_reason.
* format_transitions returns just the number; the show CLI's "Transitions:"
label line becomes `Transitions: N` (vs. the previous
`UP->DOWN=N, DOWN->UP=M`).
* format_last_change drops the `, reason: ...` segment; output is now
`YYYY-MM-DD HH:MM:SS UTC (FROM -> TO)`.
tests/mock_tables/state_db.json: replace directional counter fixtures with
a single total_transitions on all three groups; drop last_state_change_reason.
tests/monitor_link_test.py:
* Update MOCK_STATE_DB_DATA dict keys.
* test_last_change_line_shown drops the reason-text assertion.
* test_transitions_counter_line asserts `Transitions: 5` for the
test_group fixture instead of the directional pair.
* test_no_last_change_when_fields_missing asserts the counter line
defaults to `0` instead of the directional pair.
PR-C tests (intfutil_monitor_link_test.py) are unaffected; 12/12 still pass.
Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>
The R-4 rename made 'Min-monitored-links:' (20 chars) the widest label, so all rows now align to column 23. The 'Link-up-delay:' line gets 9 trailing spaces instead of the 4 the test was hardcoded against. Update the literal to the new alignment so the assertion matches the renderer's output. Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>
22d7f95 to
8b75990
Compare
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Unrelated flake in flow_counter_stats_test.py::test_add_pattern_repeatly (test relies on a fresh mock CONFIG_DB but pytest-xdist scheduling can leave 2000::/64 already populated). Triggering a fresh run to reshuffle test ordering. Signed-off-by: Satishkumar Rodd <srodd@nexthop.ai>
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What I did
Add
show monitor-linkandshow monitor-link <group>CLI commands for the Monitor Link Group feature. Also includeshow monitor-linkoutput in techsupport dumps.Files changed:
show/monitor_link.pyshow/main.pymonitor-linksubcommand grouputilities_common/cli.pyscripts/generate_dumpshow monitor-linkto techsupport collectiontests/monitor_link_test.pytests/mock_tables/state_db.jsonWhy I did it
Provides operational visibility into monitor link group state for troubleshooting and post-incident debugging. Without this, group state is only visible by querying STATE_DB directly with
redis-cli.Design
show monitor-linkreads three STATE_DB tables:MONITOR_LINK_GROUP_STATE— group config and current state (UP / DOWN / PENDING)PORT_TABLE/LAG_TABLE— to derive live operational uplink countMONITOR_LINK_GROUP_MEMBER— to show which groups are forcing each downlink down (down_due_to)Output columns: Group Name, State, Uplinks Up/Total, Downlinks, Min-Uplinks, Link-Up-Delay, Description.
show monitor-link <group>shows per-interface detail for that group's downlinks, including thedown_due_toreason when an interface is forced down by multiple groups.generate_dumpincludesshow monitor-linkoutput alongside existing port and LAG state in techsupport.How I verified it
tests/monitor_link_test.pycovers: table output across UP, DOWN, and PENDING states; multi-group downlink attribution; mixed Ethernet / PortChannel interfaces; single-group detail view.Companion PRs
HLD: sonic-net/SONiC#2308
Yang Changes: sonic-net/sonic-buildimage#27004
swss: sonic-net/sonic-swss#4523
swss-common: sonic-net/sonic-swss-common#1181