filter --levels on parsed entry level, not raw line#13
Conversation
The CLI's _parse_file scanned the raw line with detect_log_level to decide whether to keep a row before calling parser.parse(). For generic-format logs that have no structured level field, the generic parser also calls detect_log_level on the full line, so the same false positives affected both the filter check and the output: the parser misclassified WARNING lines that mention 'error' in the hostname or message as ERROR. Worse, the cli filter ran first and rejected lines that the parser would later have accepted — the user got 'No log entries found matching criteria' even when the file contained real WARNING-level rows. Two changes: 1. generic.py: run detect_log_level on the message portion (the text after the timestamp) rather than the whole line. The previous syslog-parser commit fixed the same issue there. 2. cli.py: filter on parsed.level after parsing instead of running detect_log_level separately on the raw line. Now the parser and the filter agree on every line — the filter only sees levels the report would have printed. The pre-parse scan also did extra string work per line that the parser was about to redo. Add a regression test using a JSON line whose message contains 'error' but whose level field is WARNING; the old code rejected it under --levels=WARNING, the new code keeps it.
Reviewer's GuideAdjusts log level detection and CLI filtering to operate on the parsed message/entry level instead of the raw line, fixing false-positive level matches and adding a regression test for the new behavior. Sequence diagram for CLI level filtering using parsed entry levelsequenceDiagram
participant CLI as _parse_file
participant LogFile as read_log_file
participant Parser as parser.parse
participant UtilsTS as parse_timestamp
participant UtilsLevel as detect_log_level
loop for each line
CLI->>LogFile: read_log_file(file_path)
LogFile-->>CLI: line
CLI->>UtilsTS: parse_timestamp(line)
UtilsTS-->>CLI: timestamp
CLI->>Parser: parse(line)
Parser->>Parser: [extract timestamp match]
Parser->>Parser: message = line[message_start:]
Parser->>UtilsLevel: detect_log_level(message)
UtilsLevel-->>Parser: level
Parser-->>CLI: ParsedEntry(level, message)
CLI->>CLI: [parsed exists]
CLI->>CLI: [parsed.level in include_levels]
CLI->>CLI: entries.append(parsed)
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
📝 WalkthroughWalkthrough
ChangesLevel Filtering Fix
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In
test_analyze_level_filter_uses_parsed_level_not_raw_linethe inline comment starting with# The pre-is truncated; either complete or remove it to avoid confusing future readers.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `test_analyze_level_filter_uses_parsed_level_not_raw_line` the inline comment starting with `# The pre-` is truncated; either complete or remove it to avoid confusing future readers.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/log_analyzer_cli/cli.py`:
- Around line 211-215: The timestamp filtering at lines 211-215 uses
parse_timestamp(line) directly for the start_time and end_time checks, which can
disagree with the parser-specific timestamp extraction from parser.parse(line),
allowing entries with supported timestamp formats to bypass the time-window
filters. After calling parser.parse(line) to get the parsed object, use
parsed.timestamp instead of parse_timestamp(line) for the timestamp comparisons
against start_time and end_time throughout the filtering block (lines 211-215
and also lines 217-231). This ensures both the filtering and subsequent
reporting use the same timestamp extraction logic from the parser.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 36ac41f6-19be-45cd-8a58-9c9aa690389c
📒 Files selected for processing (3)
src/log_analyzer_cli/cli.pysrc/log_analyzer_cli/parsers/generic.pytests/test_cli.py
| timestamp = parse_timestamp(line) | ||
| if start_time and timestamp and timestamp < start_time: | ||
| continue | ||
| if end_time and timestamp and timestamp > end_time: | ||
| continue |
There was a problem hiding this comment.
Use parsed.timestamp for time-window filtering to avoid parser/filter drift.
At Line 211-Line 215, timestamp gating uses parse_timestamp(line), which can disagree with parser-specific timestamp extraction (e.g., generic parser supports additional formats). That lets entries bypass --start-time/--end-time even though they parse successfully. Filter on parsed.timestamp after parser.parse(line) so filtering and reporting use the same parsed contract.
Proposed fix
- from log_analyzer_cli.utils import parse_timestamp
+ from log_analyzer_cli.utils import parse_timestamp
import re
compiled_pattern = re.compile(search_pattern) if search_pattern else None
for line in read_log_file(file_path):
@@
- timestamp = parse_timestamp(line)
- if start_time and timestamp and timestamp < start_time:
- continue
- if end_time and timestamp and timestamp > end_time:
- continue
-
parsed = parser.parse(line)
if not parsed:
continue
+
+ timestamp = parsed.timestamp
+ if start_time and timestamp and timestamp < start_time:
+ continue
+ if end_time and timestamp and timestamp > end_time:
+ continueAlso applies to: 217-231
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/log_analyzer_cli/cli.py` around lines 211 - 215, The timestamp filtering
at lines 211-215 uses parse_timestamp(line) directly for the start_time and
end_time checks, which can disagree with the parser-specific timestamp
extraction from parser.parse(line), allowing entries with supported timestamp
formats to bypass the time-window filters. After calling parser.parse(line) to
get the parsed object, use parsed.timestamp instead of parse_timestamp(line) for
the timestamp comparisons against start_time and end_time throughout the
filtering block (lines 211-215 and also lines 217-231). This ensures both the
filtering and subsequent reporting use the same timestamp extraction logic from
the parser.
Two changes that fix a class of false-positive level classification when a log line's hostname, process name, or message body contains words like 'error' or 'warn' that the heuristic matches with \b boundaries:
parsers/generic.py — detect_log_level now runs on the extracted message (text after the timestamp) instead of the full line. Mirrors the change that was already made to parsers/syslog.py.
cli.py — _parse_file no longer pre-scans the raw line with detect_log_level for the --levels filter; it parses first and filters on parsed.level. The parser and the filter now agree on every line, and we skip a redundant per-line scan that the parser was about to redo anyway.
Repro: a JSON line like {"level": "WARNING", "message": "error in the system"} with --levels=WARNING used to print 'No log entries found matching criteria'. It now keeps the line.
Test added: test_analyze_level_filter_uses_parsed_level_not_raw_line. It fails on the old code and passes with this change.
Summary by Sourcery
Align log level detection and filtering with the parsed message content to avoid misclassification based on raw line text.
Bug Fixes:
Tests:
Summary by CodeRabbit
Bug Fixes
Tests