Conversation
The _parse_timestamp method only applied the current-year correction to the "%%b %%d %%H:%%M:%%S" (RFC 3164) format, but the same issue affects "%%Y-%%m-%%d %%H:%%M:%%S" which also omits the year. Now all formats that lack an explicit year default to the current year.
The COMBINED_PATTERN used (?P<user>\s+) which only matches whitespace, failing to capture the actual user value like '-' or 'frank'. Changed to (?P<user>\S+) to correctly capture the user identifier. Also removed unnecessary .*$ at the end of the combined pattern.
_parse_file used to return only the entries list, so the analyze command showed 'Total Lines: 3' for a 5-line log that mixed 3 valid JSON lines with 2 garbage lines, and it offered no way to tell that the level or pattern filter had silently dropped half the input. Returns a (entries, stats) tuple from _parse_file. stats carries total_lines (lines that survived the include-level/pattern/time filters), parse_errors (lines the parser could not turn into a ParsedEntry), and skipped_filtered (lines dropped by the filters). analyze() now writes total_lines, parse_errors, and a 'Skipped N line(s)' warning into the AnalysisResult so both the text and JSON formatters surface the data- quality issue that used to be invisible. Three new tests cover a mixed-quality file, a level-filtered file, and an empty file. All 50 tests pass.
Reviewer's GuideAdds per-line accounting for total, parse-error, and filtered log lines to the analyze command, wires those stats through to the analysis result and JSON output, adjusts timestamp handling and parsing in CLI and parsers, refines error-pattern normalization to better treat paths and ports, and updates documentation and tests accordingly. Sequence diagram for analyze command with parse stats propagationsequenceDiagram
actor User
participant CLI as cli_analyze
participant Parser as LogParser
participant Analyzer as analyze_log_entries
participant Formatter as format_output
User->>CLI: analyze(file, levels, pattern, start_time, end_time)
CLI->>CLI: _parse_file(parser, file, level_filter, pattern, start_dt, end_dt)
CLI->>Parser: parse(line) [inside _parse_file loop]
Parser-->>CLI: ParsedEntry or None
CLI-->>CLI: return entries, stats
CLI->>Analyzer: analyze_log_entries(entries, group_errors)
Analyzer-->>CLI: AnalysisResult result
CLI->>CLI: set result.total_lines = stats[total_lines]
CLI->>CLI: set result.parse_errors = stats[parse_errors]
CLI->>CLI: append warning to result.warnings when stats[skipped_filtered] > 0
CLI->>Formatter: format_output(result, output, verbose)
Formatter-->>CLI: output_str
CLI-->>User: print output_str
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Warning Review limit reached
More reviews will be available in 56 minutes and 51 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (7)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The change to make
start_dt/end_dttimezone-aware (timezone.utc) may conflict with whateverparse_timestampreturns (often naive datetimes), which can lead toTypeError: can't compare offset-naive and offset-aware datetimes; consider normalizing all parsed timestamps to a consistent timezone or keeping the CLI filters naive for now. - The new stats dictionary returned from
_parse_file(total_lines,parse_errors,skipped_filtered) is accessed via string keys throughout; introducing a small dataclass or TypedDict for these stats would make the interface clearer and reduce the chance of typos or key mismatches.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The change to make `start_dt`/`end_dt` timezone-aware (`timezone.utc`) may conflict with whatever `parse_timestamp` returns (often naive datetimes), which can lead to `TypeError: can't compare offset-naive and offset-aware datetimes`; consider normalizing all parsed timestamps to a consistent timezone or keeping the CLI filters naive for now.
- The new stats dictionary returned from `_parse_file` (`total_lines`, `parse_errors`, `skipped_filtered`) is accessed via string keys throughout; introducing a small dataclass or TypedDict for these stats would make the interface clearer and reduce the chance of typos or key mismatches.
## Individual Comments
### Comment 1
<location path="src/log_analyzer_cli/parsers/apache.py" line_range="118-119" />
<code_context>
try:
ts_str_naive = ts_str.split()[0]
- return datetime.strptime(ts_str_naive, "%d/%b/%Y:%H:%M:%S")
+ dt = datetime.strptime(ts_str_naive, "%d/%b/%Y:%H:%M:%S")
+ return dt.replace(tzinfo=timezone.utc)
except ValueError:
pass
</code_context>
<issue_to_address>
**issue (bug_risk):** Apache timestamps are now labeled as UTC but the original offset component is still discarded, which may misalign time-based filtering.
The code now parses only the `10/Oct/2000:13:55:36` part and then sets `tzinfo=timezone.utc`, effectively reinterpreting a local time as UTC and shifting it by the original offset. That can break UTC-based time-range filtering. Either parse the full `%d/%b/%Y:%H:%M:%S %z` format and preserve the real offset, or keep these as naive local timestamps and ensure any filters remain naive for this parser.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| dt = datetime.strptime(ts_str_naive, "%d/%b/%Y:%H:%M:%S") | ||
| return dt.replace(tzinfo=timezone.utc) |
There was a problem hiding this comment.
issue (bug_risk): Apache timestamps are now labeled as UTC but the original offset component is still discarded, which may misalign time-based filtering.
The code now parses only the 10/Oct/2000:13:55:36 part and then sets tzinfo=timezone.utc, effectively reinterpreting a local time as UTC and shifting it by the original offset. That can break UTC-based time-range filtering. Either parse the full %d/%b/%Y:%H:%M:%S %z format and preserve the real offset, or keep these as naive local timestamps and ensure any filters remain naive for this parser.
Reports parse_errors and skipped_filtered lines in the analyze command's output. _parse_file used to return only the entries list, so a 5-line JSON log with 3 valid lines and 2 garbage lines came back as 'Total Lines: 3, Parsed Entries: 3' — there was no way for a user to see that the parser was silently dropping 40% of their input, or that the level/pattern/time filters had quietly dropped lines before the parser ever saw them. Returns a (entries, stats) tuple. stats carries total_lines, parse_errors, and skipped_filtered. analyze() writes them into AnalysisResult so both the text and JSON formatters surface them, and adds a 'Skipped N line(s)' warning when filters drop input. 50/50 tests pass.
Summary by Sourcery
Track and surface parsing and filtering statistics in the analyze command while tightening log parsing and documentation.
New Features:
Bug Fixes:
Enhancements:
Tests: