Skip to content

Fix KeyError in analyzer_logrotate error handler#24651

Open
XuChen-MSFT wants to merge 1 commit into
sonic-net:masterfrom
XuChen-MSFT:xuchen3/fix-analyzer-logrotate-keyerror
Open

Fix KeyError in analyzer_logrotate error handler#24651
XuChen-MSFT wants to merge 1 commit into
sonic-net:masterfrom
XuChen-MSFT:xuchen3/fix-analyzer-logrotate-keyerror

Conversation

@XuChen-MSFT
Copy link
Copy Markdown
Contributor

Description of PR

Summary:
Fix KeyError('stdout') in analyzer_logrotate error handler that crashes the parallel worker process when logrotate fails on DUT.

Fixes https://msazure.visualstudio.com/One/_workitems/edit/37977676

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Approach

What is the motivation for this PR?

When logrotate -f /etc/logrotate.conf fails on a DUT, the analyzer_logrotate function is supposed to catch the RunAnsibleModuleFail exception and log a warning. However, the error handler accesses e.results["stdout"] directly. In some failure scenarios (e.g., Ansible connection issues, module-level failures), the result dictionary does not contain stdout/stderr/rc keys, causing an unhandled KeyError('stdout').

This KeyError crashes the parallel_run worker process (exit code 1), which then blocks all tests using the loganalyzer fixture (autouse=True) with a setup error:

Failed: Processes "['analyzer_logrotate--<MultiAsicSonicHost ...>']" failed with exit code "1"
Exception: 'stdout'

KQL analysis shows this pattern affected 63 tests across multiple testbeds/SKUs in the last 14 days.

How did you do it?

Changed e.results["stdout"] to e.results.get("stdout", "") (and same for stderr and rc) so the error handler gracefully handles missing keys and logs a warning instead of crashing.

How did you verify/test it?

  • Code review: the fix is a one-line defensive change using dict.get() with defaults
  • Verified the RunAnsibleModuleFail.results attribute is dict-like (inherits from Ansible result objects) and supports .get()
  • KQL confirmed the test passes on retry (Attempt Adding ansible readme for proposed folder structure #1 PASSED), so the underlying logrotate failure is transient — the fix ensures the framework handles it gracefully

Any platform specific information?

No — this affects all platforms. The loganalyzer fixture is autouse=True and runs on every DUT.

Supported testbed topology if it's a new test case?

N/A — bug fix only.

Documentation

N/A — no new features.

When logrotate fails on DUT and the Ansible result dict does not contain
the expected 'stdout'/'stderr'/'rc' keys, the error handler crashes with
KeyError('stdout'). This unhandled exception kills the parallel_run worker
process, blocking all tests that use the loganalyzer fixture (autouse=True).

Use dict.get() with defaults instead of direct key access so the handler
logs a warning and continues, which was the original intent.

Fixes: https://msazure.visualstudio.com/One/_workitems/edit/37977676

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Xu Chen <xuchen3@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants