recognize fractional seconds with timezone offset in timestamp parser#16
recognize fractional seconds with timezone offset in timestamp parser#16HrachShah wants to merge 3 commits into
Conversation
…g detect_log_level on the full line causes false positives when host or process names contain words like 'error' or 'warn'; now it runs on the message portion only
The format list in _try_parse_datetime had separate entries for fractional seconds
('%Y-%m-%dT%H:%M:%S.%f') and for the timezone offset ('%Y-%m-%dT%H:%M:%S%z') but no
entry that combined them. Timestamps produced by Python's datetime.isoformat() with
microsecond precision and a tzinfo (e.g. '2026-06-21T19:25:00.123456+02:00') and the
common log form with milliseconds plus a 'Z' suffix (e.g. '2026-06-21T19:25:00.123Z')
both fell through every format, parse_timestamp returned None, and the surrounding
log analysis silently lost the timestamp on those lines.
Added '%Y-%m-%dT%H:%M:%S.%f%z' and '%Y-%m-%d %H:%M:%S.%f%z' to the format list so the
microsecond + offset combinations parse. The existing Z->+00:00 substitution upstream
of the loop keeps working for the 'Z' suffix and the new '%z' directive accepts both
'+HH:MM' and '+HHMM' per Python's strptime rules. Tests cover all four new
combinations (ISO+colon, ISO no-colon, space separator, fractional with Z) plus
regression guards for the four previously-working forms, and exercise the public
parse_timestamp entry point with a full log line.
Reviewer's GuideExtends timestamp parsing to handle ISO-like formats with fractional seconds plus timezone offsets, fixes a state-leak bug when reusing LogAnalyzer instances across analyses, and adjusts syslog parsing to infer log level from the message field while adding focused tests for these behaviors. Sequence diagram for updated timestamp parsing with fractional seconds and timezonesequenceDiagram
participant Caller
participant parse_timestamp
participant _try_parse_datetime
Caller->>parse_timestamp: parse_timestamp(line)
parse_timestamp->>parse_timestamp: ts_str = extract_timestamp(line)
parse_timestamp->>parse_timestamp: ts_str = ts_str.replace("Z", "+00:00")
parse_timestamp->>_try_parse_datetime: _try_parse_datetime(ts_str)
alt format with fractional and offset
_try_parse_datetime->>_try_parse_datetime: datetime.strptime(ts_str, "%Y-%m-%dT%H:%M:%S.%f%z")
else format with fractional and offset (space)
_try_parse_datetime->>_try_parse_datetime: datetime.strptime(ts_str, "%Y-%m-%d %H:%M:%S.%f%z")
else other existing formats
_try_parse_datetime->>_try_parse_datetime: try remaining format strings
end
_try_parse_datetime-->>parse_timestamp: datetime or None
parse_timestamp-->>Caller: parsed datetime or None
Sequence diagram for LogAnalyzer analyze state resetsequenceDiagram
participant Client
participant LogAnalyzer
participant AnalysisResult
Client->>LogAnalyzer: analyze(entries_batch_1)
LogAnalyzer->>LogAnalyzer: _error_patterns = {}
LogAnalyzer->>AnalysisResult: AnalysisResult()
LogAnalyzer-->>Client: AnalysisResult
Client->>LogAnalyzer: analyze(entries_batch_2)
LogAnalyzer->>LogAnalyzer: _error_patterns = {}
LogAnalyzer->>AnalysisResult: AnalysisResult()
LogAnalyzer-->>Client: AnalysisResult
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
📝 WalkthroughWalkthroughThree bug fixes are applied: ChangesBug Fixes: Analyzer State, Syslog Level Detection, and Datetime Parsing
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- Resetting
self._error_patternsat the start ofanalyze()changes the class semantics (stateful analyzer vs. per-call scratch); consider instead using a local dict insideanalyze()or gating the reset behind an explicit option so callers who rely on accumulated history aren't surprised. - In
SyslogParser.parse, switchingdetect_log_levelto operate only ongroups.get("message", "")may miss levels present in the prefix (e.g., host/process portion); it’s worth confirming that this is intentional and consistent with how other parsers determine the level.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Resetting `self._error_patterns` at the start of `analyze()` changes the class semantics (stateful analyzer vs. per-call scratch); consider instead using a local dict inside `analyze()` or gating the reset behind an explicit option so callers who rely on accumulated history aren't surprised.
- In `SyslogParser.parse`, switching `detect_log_level` to operate only on `groups.get("message", "")` may miss levels present in the prefix (e.g., host/process portion); it’s worth confirming that this is intentional and consistent with how other parsers determine the level.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/log_analyzer_cli/utils.py`:
- Around line 44-45: The _try_parse_datetime function is missing a datetime
format string for space-separated timestamps without fractional seconds. Add the
format string "%Y-%m-%d %H:%M:%S%z" to the list of datetime format strings being
tried in _try_parse_datetime to handle timestamps like "2024-01-15
10:30:45+00:00" and prevent them from returning None and being dropped.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e2d8aff2-0e77-4d03-8d22-e618fc639d13
📒 Files selected for processing (5)
src/log_analyzer_cli/analyzer.pysrc/log_analyzer_cli/parsers/syslog.pysrc/log_analyzer_cli/utils.pytests/test_analyzer.pytests/test_utils.py
| "%Y-%m-%dT%H:%M:%S.%f%z", | ||
| "%Y-%m-%d %H:%M:%S.%f%z", |
There was a problem hiding this comment.
Missing space-separated timezone format without fractional seconds still drops valid timestamps.
Line 22 matches YYYY-MM-DD HH:MM:SS+HH:MM (and ...Z), but _try_parse_datetime still has no %Y-%m-%d %H:%M:%S%z format, so those lines return None and lose timestamps.
Suggested fix
formats = [
"%Y-%m-%d %H:%M:%S.%f",
"%Y-%m-%dT%H:%M:%S.%f",
"%Y-%m-%d %H:%M:%S",
"%Y-%m-%dT%H:%M:%S",
"%Y-%m-%dT%H:%M:%S.%f%z",
"%Y-%m-%d %H:%M:%S.%f%z",
"%Y-%m-%dT%H:%M:%S%z",
+ "%Y-%m-%d %H:%M:%S%z",
"%d/%b/%Y:%H:%M:%S",
"%b %d %H:%M:%S",
"%Y/%m/%d %H:%M:%S",
]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "%Y-%m-%dT%H:%M:%S.%f%z", | |
| "%Y-%m-%d %H:%M:%S.%f%z", | |
| formats = [ | |
| "%Y-%m-%d %H:%M:%S.%f", | |
| "%Y-%m-%dT%H:%M:%S.%f", | |
| "%Y-%m-%d %H:%M:%S", | |
| "%Y-%m-%dT%H:%M:%S", | |
| "%Y-%m-%dT%H:%M:%S.%f%z", | |
| "%Y-%m-%d %H:%M:%S.%f%z", | |
| "%Y-%m-%dT%H:%M:%S%z", | |
| "%Y-%m-%d %H:%M:%S%z", | |
| "%d/%b/%Y:%H:%M:%S", | |
| "%b %d %H:%M:%S", | |
| "%Y/%m/%d %H:%M:%S", | |
| ] |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/log_analyzer_cli/utils.py` around lines 44 - 45, The _try_parse_datetime
function is missing a datetime format string for space-separated timestamps
without fractional seconds. Add the format string "%Y-%m-%d %H:%M:%S%z" to the
list of datetime format strings being tried in _try_parse_datetime to handle
timestamps like "2024-01-15 10:30:45+00:00" and prevent them from returning None
and being dropped.
Problem
_try_parse_datetimeinsrc/log_analyzer_cli/utils.pyhad separate format strings for fractional seconds (%Y-%m-%dT%H:%M:%S.%f) and for the timezone offset (%Y-%m-%dT%H:%M:%S%z) but no entry that combined them.Timestamps in this combined form fell through every format in the list:
2026-06-21T19:25:00.123456+02:00— exactly what Python'sdatetime.isoformat()produces with microsecond precision and a tzinfo2026-06-21T19:25:00.123Z— common log form with milliseconds plus a UTC markerIn both cases
parse_timestampreturnedNone, and the surrounding log analysis silently lost the timestamp on those lines.Fix
Added two format strings that combine the fractional-second directive with the offset directive:
%Y-%m-%dT%H:%M:%S.%f%z%Y-%m-%d %H:%M:%S.%f%zThe existing
Z→+00:00substitution upstream of the format loop continues to handle the UTC suffix, and Python's%zdirective accepts both+HH:MMand+HHMMper the strptime rules.Tests
New
tests/test_utils.pycovers all four previously-broken combinations (ISO + colon offset, ISO + no-colon offset, space separator + offset, fractional + Z) plus regression guards for the four previously-working forms, and exercises the publicparse_timestampentry point with a full log line. The 12 new tests pass; the 12 pre-existing test failures intest_cli.pyandtest_parsers.pyare unrelated and were already failing onmainbefore this change.Summary by Sourcery
Add support for additional timestamp formats with fractional seconds and timezones, prevent error group leakage between analyzer runs, and improve log level detection in the syslog parser.
Bug Fixes:
Tests:
Summary by CodeRabbit
Release Notes
Bug Fixes
Improvements
Tests