reset error patterns at the start of each LogAnalyzer.analyze call by HrachShah · Pull Request #14 · HrachShah/log-analyzer-cli

HrachShah · 2026-06-20T19:27:35Z

LogAnalyzer accumulates error groups in self._error_patterns across calls. Re-using the same analyzer instance for two batches therefore leaked the first batch's groups into the second batch's result — calling analyzer.analyze(batch2) reported error_groups containing entries from batch1 even though batch1 was never passed in.

analyze_log_entries masked the issue by constructing a fresh LogAnalyzer per call, but the class is part of the public API and reset() is the only documented way to clear state — easy to forget.

Clear self._error_patterns = {} at the top of analyze() so each call starts from a clean slate. Added a regression test that runs two distinct error batches through the same analyzer instance and asserts the second result contains only the second batch's groups.

All 12 analyzer tests pass (11 existing + 1 new). The 12 pre-existing failures in test_cli.py / test_parsers.py are unrelated Click 9.x exit-code mismatches in the upstream suite.

Summary by Sourcery

Ensure LogAnalyzer analysis runs are stateless across calls and refine syslog log-level detection.

Bug Fixes:

Reset accumulated error patterns at the start of each LogAnalyzer.analyze call so error groups from previous batches are not leaked into subsequent results.
Adjust syslog parser log-level detection to use the message portion of the line, avoiding misclassification based on non-message content.

Tests:

Add a regression test verifying that running analyze twice on different batches with the same LogAnalyzer instance produces independent error groups.

Summary by CodeRabbit

Bug Fixes
- Fixed accumulation of error groups when the same analyzer instance processes multiple separate batches of logs
- Improved log-level detection accuracy by focusing analysis on message fields rather than full log lines
Tests
- Added test case to verify analyzer instances properly reset between multiple analyses

…g detect_log_level on the full line causes false positives when host or process names contain words like 'error' or 'warn'; now it runs on the message portion only

sourcery-ai · 2026-06-20T19:27:43Z

Reviewer's Guide

LogAnalyzer.analyze is updated to clear internal error pattern state at the start of each call to avoid leaking error groups across batches, a regression test is added to cover repeated analyze calls on the same instance, and syslog parsing is slightly adjusted to detect log levels from the message portion instead of the entire line.

Sequence diagram for repeated LogAnalyzer.analyze calls with state reset

sequenceDiagram
    actor Client
    participant LogAnalyzer
    participant ErrorPatterns as _error_patterns
    participant AnalysisResult

    Client->>LogAnalyzer: analyze(entries_batch1)
    LogAnalyzer->>ErrorPatterns: [accumulate patterns for batch1]
    LogAnalyzer->>AnalysisResult: AnalysisResult()
    LogAnalyzer-->>Client: AnalysisResult for batch1

    Client->>LogAnalyzer: analyze(entries_batch2)
    LogAnalyzer->>ErrorPatterns: [reset to empty]
    LogAnalyzer->>ErrorPatterns: [accumulate patterns for batch2]
    LogAnalyzer->>AnalysisResult: AnalysisResult()
    LogAnalyzer-->>Client: AnalysisResult for batch2 (only batch2 groups)

File-Level Changes

Change	Details	Files
Ensure LogAnalyzer.analyze starts from a clean error pattern state on each call and add a regression test for repeated analysis on the same instance.	Reset internal error pattern storage at the top of the analyze method to avoid leaking state between calls. Document behavior in-line so callers understand they should not rely on internal aggregation across batches. Add a test that runs two different error batches through the same analyzer and asserts that error_groups for the second batch only reflect the second batch.	`src/log_analyzer_cli/analyzer.py` `tests/test_analyzer.py`
Adjust syslog parser log level detection to operate on the parsed message instead of the full raw line.	Remove earlier detection of log level from the entire line. Change log level detection to use the extracted message field from the regex groups, likely improving level classification accuracy for syslog entries.	`src/log_analyzer_cli/parsers/syslog.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

coderabbitai · 2026-06-20T19:27:47Z

📝 Walkthrough

Walkthrough

Two bug fixes: SyslogParser.parse now computes log level from the extracted message field only (not the full raw line), and LogAnalyzer.analyze() clears _error_patterns at the start of each call. A regression test verifies the analyzer produces isolated error groups across repeated invocations on the same instance.

Changes

Syslog Level Detection and Analyzer State Leak Fixes

Layer / File(s)	Summary
Syslog log-level detection scoped to message field `src/log_analyzer_cli/parsers/syslog.py`	Removes the early `detect_log_level(line)` call on the full raw line; after regex groups are extracted, `level` is assigned via `detect_log_level(groups.get("message", ""))` using only the parsed message text.
Analyzer `_error_patterns` reset and regression test `src/log_analyzer_cli/analyzer.py`, `tests/test_analyzer.py`	`LogAnalyzer.analyze()` resets `_error_patterns` to `{}` at entry; the new test `test_repeated_analyze_does_not_leak_error_groups` calls `analyze()` twice with distinct error batches and asserts each result holds exactly one isolated error group.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐇 Hop hop, the rabbit tidies up his trail,
Old patterns cleared so stale groups never prevail,
The syslog reads the message, not the whole long line,
Each analyze() starts fresh — how terribly fine!
No leaky buckets, no leftover crumbs,
Just clean little fixes from clever little thumbs. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and clearly describes the main change: clearing error patterns at the start of LogAnalyzer.analyze() calls to prevent state leakage across multiple invocations.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/analyzer-reset-error-patterns-per-call

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

Resetting self._error_patterns inside analyze() changes the class’s stateful semantics; if callers relied on cumulative behavior, consider instead keeping _error_patterns as a local per-call structure (or a parameter) and using reset() explicitly where needed to avoid surprising existing users.
The new test reaches into analyzer._error_patterns and depends on private state; consider asserting behavior only through the public analyze/result API to keep tests resilient to internal refactors.
The change in syslog.parse() from detect_log_level(line) to detect_log_level(groups.get("message", "")) may alter level detection for messages where the level appears outside the captured message; it might be worth capturing or documenting the intended difference in behavior to ensure it aligns with real syslog formats.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Resetting `self._error_patterns` inside `analyze()` changes the class’s stateful semantics; if callers relied on cumulative behavior, consider instead keeping `_error_patterns` as a local per-call structure (or a parameter) and using `reset()` explicitly where needed to avoid surprising existing users.
- The new test reaches into `analyzer._error_patterns` and depends on private state; consider asserting behavior only through the public `analyze`/result API to keep tests resilient to internal refactors.
- The change in `syslog.parse()` from `detect_log_level(line)` to `detect_log_level(groups.get("message", ""))` may alter level detection for messages where the level appears outside the captured `message`; it might be worth capturing or documenting the intended difference in behavior to ensure it aligns with real syslog formats.

## Individual Comments

### Comment 1
<location path="src/log_analyzer_cli/parsers/syslog.py" line_range="100" />
<code_context>
                 if groups.get("pid"):
                     metadata["pid"] = groups["pid"]

+                level = detect_log_level(groups.get("message", ""))
+                
                 return ParsedEntry(
</code_context>
<issue_to_address>
**issue (bug_risk):** Restricting log-level detection to only the message body may miss levels present in the prefix or structured fields.

Previously `detect_log_level` received the full raw line, so it could pick up level markers in prefixes (e.g., `ERROR`, `WARN`, numeric priorities) outside the `message` field. With the new call using only `groups["message"]`, those indicators will be ignored in formats where the level isn't inside `message`, which may regress detection for some syslog variants. Consider calling `detect_log_level` on `message` first and falling back to the full `line` when `message` is empty or the result is unknown, to retain robustness while favoring the new behavior.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-06-20T19:29:03Z

                if groups.get("pid"):
                    metadata["pid"] = groups["pid"]

+                level = detect_log_level(groups.get("message", ""))


issue (bug_risk): Restricting log-level detection to only the message body may miss levels present in the prefix or structured fields.

Previously detect_log_level received the full raw line, so it could pick up level markers in prefixes (e.g., ERROR, WARN, numeric priorities) outside the message field. With the new call using only groups["message"], those indicators will be ignored in formats where the level isn't inside message, which may regress detection for some syslog variants. Consider calling detect_log_level on message first and falling back to the full line when message is empty or the result is unknown, to retain robustness while favoring the new behavior.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/test_analyzer.py (1)
152-176: 💤 Low value

Regression test correctly validates state isolation.

The test effectively verifies that error groups don't leak across analyze() calls on the same instance. The assertions properly check pattern difference and count isolation.
Optional: Add symmetric assertion for completeness

For extra thoroughness, you could also assert the first result's count:
 first_result = analyzer.analyze(first_batch)
 second_result = analyzer.analyze(second_batch)
 
 assert len(first_result.error_groups) == 1
+assert first_result.error_groups[0].count == 1
 assert len(second_result.error_groups) == 1
 assert second_result.error_groups[0].pattern != first_result.error_groups[0].pattern
 assert second_result.error_groups[0].count == 1
This makes the test more symmetric and explicit about both results being isolated, though the existing assertions already validate the key invariant.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_analyzer.py` around lines 152 - 176, To make the
test_repeated_analyze_does_not_leak_error_groups test more symmetric and
explicit about state isolation, add an assertion to verify that the first_result
also has an error count of 1, similar to the existing assertion for
second_result. After the line asserting second_result.error_groups[0].count ==
1, add a new assertion that checks first_result.error_groups[0].count == 1. This
ensures both analyze() calls are validated for proper isolation and prevents any
regression where the first batch's error group count could be incorrectly
modified.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/log_analyzer_cli/parsers/syslog.py`:
- Line 100: The CLI filters log entries using detect_log_level on the raw line
before parsing, while the syslog parser detects level using detect_log_level
only on the message field, creating a mismatch. The CLI filtering logic (in
cli.py around line 209) should be refactored to filter based on the parsed
entry's level field instead of calling detect_log_level on the raw line. This
ensures consistency between what the CLI-level filter accepts and what level is
actually stored in the parsed entry by the syslog parser's level detection at
line 100.

---

Nitpick comments:
In `@tests/test_analyzer.py`:
- Around line 152-176: To make the
test_repeated_analyze_does_not_leak_error_groups test more symmetric and
explicit about state isolation, add an assertion to verify that the first_result
also has an error count of 1, similar to the existing assertion for
second_result. After the line asserting second_result.error_groups[0].count ==
1, add a new assertion that checks first_result.error_groups[0].count == 1. This
ensures both analyze() calls are validated for proper isolation and prevents any
regression where the first batch's error group count could be incorrectly
modified.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9c7ef08a-7aaa-42cc-b926-52880ff8d1ff

📥 Commits

Reviewing files that changed from the base of the PR and between e93757f and f817ad4.

📒 Files selected for processing (3)

src/log_analyzer_cli/analyzer.py
src/log_analyzer_cli/parsers/syslog.py
tests/test_analyzer.py

coderabbitai · 2026-06-20T19:32:53Z

                if groups.get("pid"):
                    metadata["pid"] = groups["pid"]

+                level = detect_log_level(groups.get("message", ""))


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Verify CLI level filtering behavior with the parser change # Check how _parse_file uses level filtering before parsing ast-grep --pattern $'def _parse_file($$$): $$$ if include_levels: $$$ $$$ parsed = parser.parse($$$) $$$' # Check if there are any tests that validate level filtering consistency rg -n -A5 -B5 'include_levels' --glob '*test*.py'

Repository: HrachShah/log-analyzer-cli

Length of output: 52

🏁 Script executed:

# Check CLI code structure around level filtering fd 'cli.py' -x cat -n {} | head -250 # Also check syslog parser around line 100 fd 'syslog.py' -x cat -n {} | grep -A 10 -B 10 'level = detect_log_level'

Repository: HrachShah/log-analyzer-cli

Length of output: 10564

🏁 Script executed:

# Find the _parse_file function in cli.py rg -n '_parse_file' --glob '*.py' -A 30 # Also check if detect_log_level is being used elsewhere in cli.py rg -n 'detect_log_level' --glob '*.py'

Repository: HrachShah/log-analyzer-cli

Length of output: 4722

🏁 Script executed:

# Get the complete _parse_file function to understand the flow sed -n '186,250p' src/log_analyzer_cli/cli.py

Repository: HrachShah/log-analyzer-cli

Length of output: 1341

🏁 Script executed:

# Check for tests that validate the level filtering behavior rg -n 'error-db|hostname.*error|include_levels' tests/ --glob '*.py' -A 3 -B 3 # Also check what detect_log_level does sed -n '121,150p' src/log_analyzer_cli/utils.py

Repository: HrachShah/log-analyzer-cli

Length of output: 863

🏁 Script executed:

# Check if there are any tests showing the hostname/error issue being fixed rg -n 'error-db|hostname' tests/ --glob '*.py' -A 5 -B 5 # Check if UNKNOWN levels are converted to INFO or other defaults anywhere rg -n 'UNKNOWN' src/ --glob '*.py' -B 2 -A 2 # List test files to understand test coverage find tests/ -name '*.py' -type f | head -20

Repository: HrachShah/log-analyzer-cli

Length of output: 3642

🏁 Script executed:

# Check for tests that specifically validate the level filtering consistency rg -n 'def test.*level' tests/ --glob '*.py' -A 10 # Check the syslog test to see if it validates the message-only level detection sed -n '34,60p' tests/test_parsers.py

Repository: HrachShah/log-analyzer-cli

Length of output: 3822

The parser and CLI-level filtering have mismatched level detection logic that breaks filtering consistency.

The CLI filters entries using detect_log_level(line) on the full line before parsing (cli.py:209), but the syslog parser detects level using detect_log_level(groups.get("message", "")) on only the message field (syslog.py:100). This creates a critical mismatch:

A line like "error-db-01: connection established" will:

Pass the CLI's --level ERROR filter (because "error" in "error-db-01" matches \berror\b)

But be stored with level="UNKNOWN" after parsing (because the message contains no level keywords)

This contradicts the PR's goal to prevent false positives from hostnames/process names and breaks the semantic contract of the level filter—users expect filtered results to have the filtered level. The CLI should filter using the parsed entry's level, not the raw line.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/log_analyzer_cli/parsers/syslog.py` at line 100, The CLI filters log entries using detect_log_level on the raw line before parsing, while the syslog parser detects level using detect_log_level only on the message field, creating a mismatch. The CLI filtering logic (in cli.py around line 209) should be refactored to filter based on the parsed entry's level field instead of calling detect_log_level on the raw line. This ensures consistency between what the CLI-level filter accepts and what level is actually stored in the parsed entry by the syslog parser's level detection at line 100.

Zo Bot added 2 commits May 20, 2026 13:41

detect log level from the extracted message in syslog parser — runnin…

c63b7bb

…g detect_log_level on the full line causes false positives when host or process names contain words like 'error' or 'warn'; now it runs on the message portion only

reset error groups at the start of each LogAnalyzer.analyze call

f817ad4

sourcery-ai Bot reviewed Jun 20, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reset error patterns at the start of each LogAnalyzer.analyze call#14

reset error patterns at the start of each LogAnalyzer.analyze call#14
HrachShah wants to merge 2 commits into
mainfrom
fix/analyzer-reset-error-patterns-per-call

HrachShah commented Jun 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

sourcery-ai Bot commented Jun 20, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai Bot commented Jun 20, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

sourcery-ai Bot Jun 20, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HrachShah commented Jun 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Summary by CodeRabbit

Uh oh!

sourcery-ai Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for repeated LogAnalyzer.analyze calls with state reset

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

HrachShah commented Jun 20, 2026 •

edited by coderabbitai Bot

Loading

sourcery-ai Bot commented Jun 20, 2026 •

edited

Loading

coderabbitai Bot commented Jun 20, 2026 •

edited

Loading