filter --levels on parsed entry level, not raw line by HrachShah · Pull Request #13 · HrachShah/log-analyzer-cli

HrachShah · 2026-06-20T13:47:42Z

Two changes that fix a class of false-positive level classification when a log line's hostname, process name, or message body contains words like 'error' or 'warn' that the heuristic matches with \b boundaries:

parsers/generic.py — detect_log_level now runs on the extracted message (text after the timestamp) instead of the full line. Mirrors the change that was already made to parsers/syslog.py.
cli.py — _parse_file no longer pre-scans the raw line with detect_log_level for the --levels filter; it parses first and filters on parsed.level. The parser and the filter now agree on every line, and we skip a redundant per-line scan that the parser was about to redo anyway.

Repro: a JSON line like {"level": "WARNING", "message": "error in the system"} with --levels=WARNING used to print 'No log entries found matching criteria'. It now keeps the line.

Test added: test_analyze_level_filter_uses_parsed_level_not_raw_line. It fails on the old code and passes with this change.

Summary by Sourcery

Align log level detection and filtering with the parsed message content to avoid misclassification based on raw line text.

Bug Fixes:

Fix log level misclassification when words like 'error' appear in host/process/message text by detecting levels from the parsed message instead of the full line.
Ensure --levels filtering uses the parser’s computed level so entries with matching structured levels are not incorrectly filtered out.

Tests:

Add a CLI test verifying that --levels filtering respects the parsed level and does not treat message text mentions of other levels as actual log levels.

Summary by CodeRabbit

Bug Fixes

Fixed log-level filtering to correctly analyze structured log level information instead of scanning raw content, preventing false matches when level keywords appear in timestamps, hostnames, or other metadata.

Tests

Added test coverage for log-level filtering behavior with structured fields.

The CLI's _parse_file scanned the raw line with detect_log_level to decide whether to keep a row before calling parser.parse(). For generic-format logs that have no structured level field, the generic parser also calls detect_log_level on the full line, so the same false positives affected both the filter check and the output: the parser misclassified WARNING lines that mention 'error' in the hostname or message as ERROR. Worse, the cli filter ran first and rejected lines that the parser would later have accepted — the user got 'No log entries found matching criteria' even when the file contained real WARNING-level rows. Two changes: 1. generic.py: run detect_log_level on the message portion (the text after the timestamp) rather than the whole line. The previous syslog-parser commit fixed the same issue there. 2. cli.py: filter on parsed.level after parsing instead of running detect_log_level separately on the raw line. Now the parser and the filter agree on every line — the filter only sees levels the report would have printed. The pre-parse scan also did extra string work per line that the parser was about to redo. Add a regression test using a JSON line whose message contains 'error' but whose level field is WARNING; the old code rejected it under --levels=WARNING, the new code keeps it.

sourcery-ai · 2026-06-20T13:47:51Z

Reviewer's Guide

Adjusts log level detection and CLI filtering to operate on the parsed message/entry level instead of the raw line, fixing false-positive level matches and adding a regression test for the new behavior.

Sequence diagram for CLI level filtering using parsed entry level

sequenceDiagram
    participant CLI as _parse_file
    participant LogFile as read_log_file
    participant Parser as parser.parse
    participant UtilsTS as parse_timestamp
    participant UtilsLevel as detect_log_level

    loop for each line
        CLI->>LogFile: read_log_file(file_path)
        LogFile-->>CLI: line
        CLI->>UtilsTS: parse_timestamp(line)
        UtilsTS-->>CLI: timestamp
        CLI->>Parser: parse(line)
        Parser->>Parser: [extract timestamp match]
        Parser->>Parser: message = line[message_start:]
        Parser->>UtilsLevel: detect_log_level(message)
        UtilsLevel-->>Parser: level
        Parser-->>CLI: ParsedEntry(level, message)
        CLI->>CLI: [parsed exists]
        CLI->>CLI: [parsed.level in include_levels]
        CLI->>CLI: entries.append(parsed)
    end

File-Level Changes

Change	Details	Files
Filter log entries by the parser-resolved level field instead of rescanning the raw input line.	Removed pre-parse detect_log_level call in the CLI file parsing loop and associated import. Ensured each line is parsed first and skipped if parsing fails before any level-based filtering is applied. Applied level-filtering using parsed.level against include_levels, keeping only entries whose parsed level matches the requested levels.	`src/log_analyzer_cli/cli.py`
Detect log level from the extracted message portion in the generic parser to avoid misclassification from timestamp or metadata text.	Changed generic parser to compute message substring immediately after the timestamp match and strip it. Replaced level detection on the full raw line with detect_log_level(message) so only the message body influences level classification. Preserved the existing ParsedEntry structure while sourcing level from the message-only detection.	`src/log_analyzer_cli/parsers/generic.py`
Add a regression test ensuring --levels filtering uses the parsed level rather than raw-line heuristics.	Introduced a test that writes a JSON log line with level WARNING but message text containing the word 'error'. Invoked the CLI with --levels=WARNING -v and asserted that the WARNING entry is kept and no ERROR level row is reported in the output. Documented the scenario in comments to explain the previously incorrect behavior and the expected corrected behavior.	`tests/test_cli.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

coderabbitai · 2026-06-20T13:47:54Z

📝 Walkthrough

Walkthrough

GenericParser.parse() now calls detect_log_level on the extracted message substring instead of the full raw line. _parse_file in the CLI is reordered to parse each line first, skip unparsable lines, and then filter by parsed.level against include_levels. A regression test covering the fix is added.

Changes

Level Filtering Fix

Layer / File(s)	Summary
Parser and CLI level-filtering logic `src/log_analyzer_cli/parsers/generic.py`, `src/log_analyzer_cli/cli.py`	`GenericParser.parse()` passes the extracted message substring (not the full raw line) to `detect_log_level`, preventing misclassification from timestamp/hostname tokens. `_parse_file` is reworked to call `parser.parse` first, skip failures, then compare `parsed.level` against `include_levels`; imports updated accordingly.
Regression test `tests/test_cli.py`	Adds `test_analyze_level_filter_uses_parsed_level_not_raw_line`: a temp JSON log line has `"level": "WARNING"` but the message text contains "error"; the test runs `main analyze -l WARNING -v` and asserts no `ERROR` rows appear in the output.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 A bunny once parsed a suspicious line,
Where "error" hid in a message's shine.
But the level field said "WARNING" — quite clear!
So we read the real level and kept the right tier.
No false alarms now, just parsed truth so fine! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: shifting log level filtering from raw line inspection to parsed entry level, which directly addresses the core issue described in the PR objectives.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/cli-filter-on-parsed-level

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Hey - I've left some high level feedback:

In test_analyze_level_filter_uses_parsed_level_not_raw_line the inline comment starting with # The pre- is truncated; either complete or remove it to avoid confusing future readers.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `test_analyze_level_filter_uses_parsed_level_not_raw_line` the inline comment starting with `# The pre-` is truncated; either complete or remove it to avoid confusing future readers.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/log_analyzer_cli/cli.py`:
- Around line 211-215: The timestamp filtering at lines 211-215 uses
parse_timestamp(line) directly for the start_time and end_time checks, which can
disagree with the parser-specific timestamp extraction from parser.parse(line),
allowing entries with supported timestamp formats to bypass the time-window
filters. After calling parser.parse(line) to get the parsed object, use
parsed.timestamp instead of parse_timestamp(line) for the timestamp comparisons
against start_time and end_time throughout the filtering block (lines 211-215
and also lines 217-231). This ensures both the filtering and subsequent
reporting use the same timestamp extraction logic from the parser.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 36ac41f6-19be-45cd-8a58-9c9aa690389c

📥 Commits

Reviewing files that changed from the base of the PR and between e93757f and c79b31b.

📒 Files selected for processing (3)

src/log_analyzer_cli/cli.py
src/log_analyzer_cli/parsers/generic.py
tests/test_cli.py

coderabbitai · 2026-06-20T13:50:21Z

        timestamp = parse_timestamp(line)
        if start_time and timestamp and timestamp < start_time:
            continue
        if end_time and timestamp and timestamp > end_time:
            continue


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use parsed.timestamp for time-window filtering to avoid parser/filter drift.

At Line 211-Line 215, timestamp gating uses parse_timestamp(line), which can disagree with parser-specific timestamp extraction (e.g., generic parser supports additional formats). That lets entries bypass --start-time/--end-time even though they parse successfully. Filter on parsed.timestamp after parser.parse(line) so filtering and reporting use the same parsed contract.

Proposed fix

- from log_analyzer_cli.utils import parse_timestamp + from log_analyzer_cli.utils import parse_timestamp import re compiled_pattern = re.compile(search_pattern) if search_pattern else None for line in read_log_file(file_path): @@ - timestamp = parse_timestamp(line) - if start_time and timestamp and timestamp < start_time: - continue - if end_time and timestamp and timestamp > end_time: - continue - parsed = parser.parse(line) if not parsed: continue + + timestamp = parsed.timestamp + if start_time and timestamp and timestamp < start_time: + continue + if end_time and timestamp and timestamp > end_time: + continue

Also applies to: 217-231

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/log_analyzer_cli/cli.py` around lines 211 - 215, The timestamp filtering at lines 211-215 uses parse_timestamp(line) directly for the start_time and end_time checks, which can disagree with the parser-specific timestamp extraction from parser.parse(line), allowing entries with supported timestamp formats to bypass the time-window filters. After calling parser.parse(line) to get the parsed object, use parsed.timestamp instead of parse_timestamp(line) for the timestamp comparisons against start_time and end_time throughout the filtering block (lines 211-215 and also lines 217-231). This ensures both the filtering and subsequent reporting use the same timestamp extraction logic from the parser.

sourcery-ai Bot reviewed Jun 20, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

filter --levels on parsed entry level, not raw line#13

filter --levels on parsed entry level, not raw line#13
HrachShah wants to merge 1 commit into
mainfrom
fix/cli-filter-on-parsed-level

HrachShah commented Jun 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

sourcery-ai Bot commented Jun 20, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai Bot commented Jun 20, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HrachShah commented Jun 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Summary by CodeRabbit

Bug Fixes

Tests

Uh oh!

sourcery-ai Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for CLI level filtering using parsed entry level

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

HrachShah commented Jun 20, 2026 •

edited by coderabbitai Bot

Loading

sourcery-ai Bot commented Jun 20, 2026 •

edited

Loading

coderabbitai Bot commented Jun 20, 2026 •

edited

Loading