reset error patterns at the start of each LogAnalyzer.analyze call#14
reset error patterns at the start of each LogAnalyzer.analyze call#14HrachShah wants to merge 2 commits into
Conversation
…g detect_log_level on the full line causes false positives when host or process names contain words like 'error' or 'warn'; now it runs on the message portion only
Reviewer's GuideLogAnalyzer.analyze is updated to clear internal error pattern state at the start of each call to avoid leaking error groups across batches, a regression test is added to cover repeated analyze calls on the same instance, and syslog parsing is slightly adjusted to detect log levels from the message portion instead of the entire line. Sequence diagram for repeated LogAnalyzer.analyze calls with state resetsequenceDiagram
actor Client
participant LogAnalyzer
participant ErrorPatterns as _error_patterns
participant AnalysisResult
Client->>LogAnalyzer: analyze(entries_batch1)
LogAnalyzer->>ErrorPatterns: [accumulate patterns for batch1]
LogAnalyzer->>AnalysisResult: AnalysisResult()
LogAnalyzer-->>Client: AnalysisResult for batch1
Client->>LogAnalyzer: analyze(entries_batch2)
LogAnalyzer->>ErrorPatterns: [reset to empty]
LogAnalyzer->>ErrorPatterns: [accumulate patterns for batch2]
LogAnalyzer->>AnalysisResult: AnalysisResult()
LogAnalyzer-->>Client: AnalysisResult for batch2 (only batch2 groups)
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
📝 WalkthroughWalkthroughTwo bug fixes: ChangesSyslog Level Detection and Analyzer State Leak Fixes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- Resetting
self._error_patternsinsideanalyze()changes the class’s stateful semantics; if callers relied on cumulative behavior, consider instead keeping_error_patternsas a local per-call structure (or a parameter) and usingreset()explicitly where needed to avoid surprising existing users. - The new test reaches into
analyzer._error_patternsand depends on private state; consider asserting behavior only through the publicanalyze/result API to keep tests resilient to internal refactors. - The change in
syslog.parse()fromdetect_log_level(line)todetect_log_level(groups.get("message", ""))may alter level detection for messages where the level appears outside the capturedmessage; it might be worth capturing or documenting the intended difference in behavior to ensure it aligns with real syslog formats.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Resetting `self._error_patterns` inside `analyze()` changes the class’s stateful semantics; if callers relied on cumulative behavior, consider instead keeping `_error_patterns` as a local per-call structure (or a parameter) and using `reset()` explicitly where needed to avoid surprising existing users.
- The new test reaches into `analyzer._error_patterns` and depends on private state; consider asserting behavior only through the public `analyze`/result API to keep tests resilient to internal refactors.
- The change in `syslog.parse()` from `detect_log_level(line)` to `detect_log_level(groups.get("message", ""))` may alter level detection for messages where the level appears outside the captured `message`; it might be worth capturing or documenting the intended difference in behavior to ensure it aligns with real syslog formats.
## Individual Comments
### Comment 1
<location path="src/log_analyzer_cli/parsers/syslog.py" line_range="100" />
<code_context>
if groups.get("pid"):
metadata["pid"] = groups["pid"]
+ level = detect_log_level(groups.get("message", ""))
+
return ParsedEntry(
</code_context>
<issue_to_address>
**issue (bug_risk):** Restricting log-level detection to only the message body may miss levels present in the prefix or structured fields.
Previously `detect_log_level` received the full raw line, so it could pick up level markers in prefixes (e.g., `ERROR`, `WARN`, numeric priorities) outside the `message` field. With the new call using only `groups["message"]`, those indicators will be ignored in formats where the level isn't inside `message`, which may regress detection for some syslog variants. Consider calling `detect_log_level` on `message` first and falling back to the full `line` when `message` is empty or the result is unknown, to retain robustness while favoring the new behavior.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| if groups.get("pid"): | ||
| metadata["pid"] = groups["pid"] | ||
|
|
||
| level = detect_log_level(groups.get("message", "")) |
There was a problem hiding this comment.
issue (bug_risk): Restricting log-level detection to only the message body may miss levels present in the prefix or structured fields.
Previously detect_log_level received the full raw line, so it could pick up level markers in prefixes (e.g., ERROR, WARN, numeric priorities) outside the message field. With the new call using only groups["message"], those indicators will be ignored in formats where the level isn't inside message, which may regress detection for some syslog variants. Consider calling detect_log_level on message first and falling back to the full line when message is empty or the result is unknown, to retain robustness while favoring the new behavior.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/test_analyzer.py (1)
152-176: 💤 Low valueRegression test correctly validates state isolation.
The test effectively verifies that error groups don't leak across
analyze()calls on the same instance. The assertions properly check pattern difference and count isolation.Optional: Add symmetric assertion for completeness
For extra thoroughness, you could also assert the first result's count:
first_result = analyzer.analyze(first_batch) second_result = analyzer.analyze(second_batch) assert len(first_result.error_groups) == 1 +assert first_result.error_groups[0].count == 1 assert len(second_result.error_groups) == 1 assert second_result.error_groups[0].pattern != first_result.error_groups[0].pattern assert second_result.error_groups[0].count == 1This makes the test more symmetric and explicit about both results being isolated, though the existing assertions already validate the key invariant.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_analyzer.py` around lines 152 - 176, To make the test_repeated_analyze_does_not_leak_error_groups test more symmetric and explicit about state isolation, add an assertion to verify that the first_result also has an error count of 1, similar to the existing assertion for second_result. After the line asserting second_result.error_groups[0].count == 1, add a new assertion that checks first_result.error_groups[0].count == 1. This ensures both analyze() calls are validated for proper isolation and prevents any regression where the first batch's error group count could be incorrectly modified.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/log_analyzer_cli/parsers/syslog.py`:
- Line 100: The CLI filters log entries using detect_log_level on the raw line
before parsing, while the syslog parser detects level using detect_log_level
only on the message field, creating a mismatch. The CLI filtering logic (in
cli.py around line 209) should be refactored to filter based on the parsed
entry's level field instead of calling detect_log_level on the raw line. This
ensures consistency between what the CLI-level filter accepts and what level is
actually stored in the parsed entry by the syslog parser's level detection at
line 100.
---
Nitpick comments:
In `@tests/test_analyzer.py`:
- Around line 152-176: To make the
test_repeated_analyze_does_not_leak_error_groups test more symmetric and
explicit about state isolation, add an assertion to verify that the first_result
also has an error count of 1, similar to the existing assertion for
second_result. After the line asserting second_result.error_groups[0].count ==
1, add a new assertion that checks first_result.error_groups[0].count == 1. This
ensures both analyze() calls are validated for proper isolation and prevents any
regression where the first batch's error group count could be incorrectly
modified.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9c7ef08a-7aaa-42cc-b926-52880ff8d1ff
📒 Files selected for processing (3)
src/log_analyzer_cli/analyzer.pysrc/log_analyzer_cli/parsers/syslog.pytests/test_analyzer.py
| if groups.get("pid"): | ||
| metadata["pid"] = groups["pid"] | ||
|
|
||
| level = detect_log_level(groups.get("message", "")) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Verify CLI level filtering behavior with the parser change
# Check how _parse_file uses level filtering before parsing
ast-grep --pattern $'def _parse_file($$$):
$$$
if include_levels:
$$$
$$$
parsed = parser.parse($$$)
$$$'
# Check if there are any tests that validate level filtering consistency
rg -n -A5 -B5 'include_levels' --glob '*test*.py'Repository: HrachShah/log-analyzer-cli
Length of output: 52
🏁 Script executed:
# Check CLI code structure around level filtering
fd 'cli.py' -x cat -n {} | head -250
# Also check syslog parser around line 100
fd 'syslog.py' -x cat -n {} | grep -A 10 -B 10 'level = detect_log_level'Repository: HrachShah/log-analyzer-cli
Length of output: 10564
🏁 Script executed:
# Find the _parse_file function in cli.py
rg -n '_parse_file' --glob '*.py' -A 30
# Also check if detect_log_level is being used elsewhere in cli.py
rg -n 'detect_log_level' --glob '*.py'Repository: HrachShah/log-analyzer-cli
Length of output: 4722
🏁 Script executed:
# Get the complete _parse_file function to understand the flow
sed -n '186,250p' src/log_analyzer_cli/cli.pyRepository: HrachShah/log-analyzer-cli
Length of output: 1341
🏁 Script executed:
# Check for tests that validate the level filtering behavior
rg -n 'error-db|hostname.*error|include_levels' tests/ --glob '*.py' -A 3 -B 3
# Also check what detect_log_level does
sed -n '121,150p' src/log_analyzer_cli/utils.pyRepository: HrachShah/log-analyzer-cli
Length of output: 863
🏁 Script executed:
# Check if there are any tests showing the hostname/error issue being fixed
rg -n 'error-db|hostname' tests/ --glob '*.py' -A 5 -B 5
# Check if UNKNOWN levels are converted to INFO or other defaults anywhere
rg -n 'UNKNOWN' src/ --glob '*.py' -B 2 -A 2
# List test files to understand test coverage
find tests/ -name '*.py' -type f | head -20Repository: HrachShah/log-analyzer-cli
Length of output: 3642
🏁 Script executed:
# Check for tests that specifically validate the level filtering consistency
rg -n 'def test.*level' tests/ --glob '*.py' -A 10
# Check the syslog test to see if it validates the message-only level detection
sed -n '34,60p' tests/test_parsers.pyRepository: HrachShah/log-analyzer-cli
Length of output: 3822
The parser and CLI-level filtering have mismatched level detection logic that breaks filtering consistency.
The CLI filters entries using detect_log_level(line) on the full line before parsing (cli.py:209), but the syslog parser detects level using detect_log_level(groups.get("message", "")) on only the message field (syslog.py:100). This creates a critical mismatch:
A line like "error-db-01: connection established" will:
- Pass the CLI's
--level ERRORfilter (because "error" in "error-db-01" matches\berror\b) - But be stored with
level="UNKNOWN"after parsing (because the message contains no level keywords)
This contradicts the PR's goal to prevent false positives from hostnames/process names and breaks the semantic contract of the level filter—users expect filtered results to have the filtered level. The CLI should filter using the parsed entry's level, not the raw line.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/log_analyzer_cli/parsers/syslog.py` at line 100, The CLI filters log
entries using detect_log_level on the raw line before parsing, while the syslog
parser detects level using detect_log_level only on the message field, creating
a mismatch. The CLI filtering logic (in cli.py around line 209) should be
refactored to filter based on the parsed entry's level field instead of calling
detect_log_level on the raw line. This ensures consistency between what the
CLI-level filter accepts and what level is actually stored in the parsed entry
by the syslog parser's level detection at line 100.
LogAnalyzer accumulates error groups in self._error_patterns across calls. Re-using the same analyzer instance for two batches therefore leaked the first batch's groups into the second batch's result — calling
analyzer.analyze(batch2)reportederror_groupscontaining entries frombatch1even thoughbatch1was never passed in.analyze_log_entriesmasked the issue by constructing a freshLogAnalyzerper call, but the class is part of the public API andreset()is the only documented way to clear state — easy to forget.Clear
self._error_patterns = {}at the top ofanalyze()so each call starts from a clean slate. Added a regression test that runs two distinct error batches through the same analyzer instance and asserts the second result contains only the second batch's groups.All 12 analyzer tests pass (11 existing + 1 new). The 12 pre-existing failures in test_cli.py / test_parsers.py are unrelated Click 9.x exit-code mismatches in the upstream suite.
Summary by Sourcery
Ensure LogAnalyzer analysis runs are stateless across calls and refine syslog log-level detection.
Bug Fixes:
Tests:
Summary by CodeRabbit
Bug Fixes
Tests