Skip to content

recognize fractional seconds with timezone offset in timestamp parser#16

Open
HrachShah wants to merge 3 commits into
mainfrom
fix/utils-parse-combined-microsecond-timezone
Open

recognize fractional seconds with timezone offset in timestamp parser#16
HrachShah wants to merge 3 commits into
mainfrom
fix/utils-parse-combined-microsecond-timezone

Conversation

@HrachShah

@HrachShah HrachShah commented Jun 21, 2026

Copy link
Copy Markdown
Owner

Problem

_try_parse_datetime in src/log_analyzer_cli/utils.py had separate format strings for fractional seconds (%Y-%m-%dT%H:%M:%S.%f) and for the timezone offset (%Y-%m-%dT%H:%M:%S%z) but no entry that combined them.

Timestamps in this combined form fell through every format in the list:

  • 2026-06-21T19:25:00.123456+02:00 — exactly what Python's datetime.isoformat() produces with microsecond precision and a tzinfo
  • 2026-06-21T19:25:00.123Z — common log form with milliseconds plus a UTC marker

In both cases parse_timestamp returned None, and the surrounding log analysis silently lost the timestamp on those lines.

Fix

Added two format strings that combine the fractional-second directive with the offset directive:

  • %Y-%m-%dT%H:%M:%S.%f%z
  • %Y-%m-%d %H:%M:%S.%f%z

The existing Z+00:00 substitution upstream of the format loop continues to handle the UTC suffix, and Python's %z directive accepts both +HH:MM and +HHMM per the strptime rules.

Tests

New tests/test_utils.py covers all four previously-broken combinations (ISO + colon offset, ISO + no-colon offset, space separator + offset, fractional + Z) plus regression guards for the four previously-working forms, and exercises the public parse_timestamp entry point with a full log line. The 12 new tests pass; the 12 pre-existing test failures in test_cli.py and test_parsers.py are unrelated and were already failing on main before this change.

Summary by Sourcery

Add support for additional timestamp formats with fractional seconds and timezones, prevent error group leakage between analyzer runs, and improve log level detection in the syslog parser.

Bug Fixes:

  • Reset accumulated error patterns at the start of each analysis to avoid leaking error groups across multiple LogAnalyzer.analyze calls.
  • Detect log levels in the syslog parser from the message portion instead of the full line to improve level classification.
  • Parse timestamps that include both fractional seconds and timezone offsets, including ISO 8601 forms with Z suffixes.

Tests:

  • Add unit tests for _try_parse_datetime and parse_timestamp covering combined fractional-second and timezone-offset forms and regressions for existing formats.
  • Add tests ensuring repeated LogAnalyzer.analyze calls produce independent error groups without cross-batch leakage.
  • Add tests for detect_log_level to verify uppercasing of levels and UNKNOWN handling for non-log lines.

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Fixed error pattern retention when reusing the analyzer with multiple log batches.
  • Improvements

    • Enhanced timestamp parsing to support timezone-aware formats with fractional seconds (ISO 8601 variants).
  • Tests

    • Added comprehensive test coverage for timestamp parsing and analyzer isolation.

Zo Bot added 3 commits May 20, 2026 13:41
…g detect_log_level on the full line causes false positives when host or process names contain words like 'error' or 'warn'; now it runs on the message portion only
The format list in _try_parse_datetime had separate entries for fractional seconds
('%Y-%m-%dT%H:%M:%S.%f') and for the timezone offset ('%Y-%m-%dT%H:%M:%S%z') but no
entry that combined them. Timestamps produced by Python's datetime.isoformat() with
microsecond precision and a tzinfo (e.g. '2026-06-21T19:25:00.123456+02:00') and the
common log form with milliseconds plus a 'Z' suffix (e.g. '2026-06-21T19:25:00.123Z')
both fell through every format, parse_timestamp returned None, and the surrounding
log analysis silently lost the timestamp on those lines.

Added '%Y-%m-%dT%H:%M:%S.%f%z' and '%Y-%m-%d %H:%M:%S.%f%z' to the format list so the
microsecond + offset combinations parse. The existing Z->+00:00 substitution upstream
of the loop keeps working for the 'Z' suffix and the new '%z' directive accepts both
'+HH:MM' and '+HHMM' per Python's strptime rules. Tests cover all four new
combinations (ISO+colon, ISO no-colon, space separator, fractional with Z) plus
regression guards for the four previously-working forms, and exercise the public
parse_timestamp entry point with a full log line.
@sourcery-ai

sourcery-ai Bot commented Jun 21, 2026

Copy link
Copy Markdown

Reviewer's Guide

Extends timestamp parsing to handle ISO-like formats with fractional seconds plus timezone offsets, fixes a state-leak bug when reusing LogAnalyzer instances across analyses, and adjusts syslog parsing to infer log level from the message field while adding focused tests for these behaviors.

Sequence diagram for updated timestamp parsing with fractional seconds and timezone

sequenceDiagram
    participant Caller
    participant parse_timestamp
    participant _try_parse_datetime

    Caller->>parse_timestamp: parse_timestamp(line)
    parse_timestamp->>parse_timestamp: ts_str = extract_timestamp(line)
    parse_timestamp->>parse_timestamp: ts_str = ts_str.replace("Z", "+00:00")
    parse_timestamp->>_try_parse_datetime: _try_parse_datetime(ts_str)

    alt format with fractional and offset
        _try_parse_datetime->>_try_parse_datetime: datetime.strptime(ts_str, "%Y-%m-%dT%H:%M:%S.%f%z")
    else format with fractional and offset (space)
        _try_parse_datetime->>_try_parse_datetime: datetime.strptime(ts_str, "%Y-%m-%d %H:%M:%S.%f%z")
    else other existing formats
        _try_parse_datetime->>_try_parse_datetime: try remaining format strings
    end

    _try_parse_datetime-->>parse_timestamp: datetime or None
    parse_timestamp-->>Caller: parsed datetime or None
Loading

Sequence diagram for LogAnalyzer analyze state reset

sequenceDiagram
    participant Client
    participant LogAnalyzer
    participant AnalysisResult

    Client->>LogAnalyzer: analyze(entries_batch_1)
    LogAnalyzer->>LogAnalyzer: _error_patterns = {}
    LogAnalyzer->>AnalysisResult: AnalysisResult()
    LogAnalyzer-->>Client: AnalysisResult

    Client->>LogAnalyzer: analyze(entries_batch_2)
    LogAnalyzer->>LogAnalyzer: _error_patterns = {}
    LogAnalyzer->>AnalysisResult: AnalysisResult()
    LogAnalyzer-->>Client: AnalysisResult
Loading

File-Level Changes

Change Details Files
Ensure LogAnalyzer does not leak error groups across multiple analyze() calls.
  • Reset internal _error_patterns at the start of analyze() to clear any previously accumulated error groups.
  • Document in-code that callers wanting cross-batch aggregation should manage history themselves instead of relying on instance state.
src/log_analyzer_cli/analyzer.py
tests/test_analyzer.py
Improve syslog parsing to detect log level from the structured message segment instead of the entire line.
  • Move detect_log_level to operate on the captured message field rather than the full input line.
  • Simplify source selection by removing an outdated comment and keeping host-based fallback when process is missing.
src/log_analyzer_cli/parsers/syslog.py
Extend timestamp parsing to support fractional seconds combined with timezone offsets and add regression tests.
  • Add datetime.strptime formats that combine microseconds (%f) with timezone offsets (%z) for both 'T' and space-separated date-time forms.
  • Rely on existing Z-to-+00:00 normalization so fractional timestamps with 'Z' suffix parse as UTC.
  • Introduce unit tests for _try_parse_datetime covering microseconds+offset variants, Z-suffixed fractional forms, and regression guards for existing formats.
  • Add integration tests for parse_timestamp to verify it extracts full datetimes from realistic log lines.
  • Add tests for detect_log_level to ensure it returns normalized levels and UNKNOWN for non-log lines.
src/log_analyzer_cli/utils.py
tests/test_utils.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Three bug fixes are applied: LogAnalyzer.analyze now resets _error_patterns at invocation start to prevent state leakage across reuse; SyslogParser.parse scopes log-level detection to the parsed message group instead of the full raw line; and _try_parse_datetime gains two format strings for timezone-aware fractional-second timestamps. All fixes include new unit tests.

Changes

Bug Fixes: Analyzer State, Syslog Level Detection, and Datetime Parsing

Layer / File(s) Summary
Datetime timezone+microsecond format support
src/log_analyzer_cli/utils.py, tests/test_utils.py
_try_parse_datetime adds two %f%z format strings for T-separated and space-separated ISO 8601 forms. New TestTryParseDatetime, TestParseTimestampIntegration, and TestDetectLogLevel classes cover the expanded formats and detect_log_level behavior.
Syslog level detection scoped to message group
src/log_analyzer_cli/parsers/syslog.py
SyslogParser.parse replaces detect_log_level on the full raw line with a call on groups["message"], while retaining conditional pid extraction.
Analyzer error pattern state reset
src/log_analyzer_cli/analyzer.py, tests/test_analyzer.py
LogAnalyzer.analyze resets _error_patterns to {} at the start of each call. A new test runs two sequential analyze() calls and asserts that patterns and counts do not leak between runs.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 Hop, hop, the patterns clear!
No leaky state shall linger here.
The syslog message, rightly scoped,
And timestamps parsed with timezone hopes.
Three small fixes, neatly done —
The rabbit's buggy day is won! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.65% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding support for fractional seconds with timezone offset in timestamp parsing, which aligns with the primary objective described in the PR summary.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/utils-parse-combined-microsecond-timezone

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • Resetting self._error_patterns at the start of analyze() changes the class semantics (stateful analyzer vs. per-call scratch); consider instead using a local dict inside analyze() or gating the reset behind an explicit option so callers who rely on accumulated history aren't surprised.
  • In SyslogParser.parse, switching detect_log_level to operate only on groups.get("message", "") may miss levels present in the prefix (e.g., host/process portion); it’s worth confirming that this is intentional and consistent with how other parsers determine the level.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Resetting `self._error_patterns` at the start of `analyze()` changes the class semantics (stateful analyzer vs. per-call scratch); consider instead using a local dict inside `analyze()` or gating the reset behind an explicit option so callers who rely on accumulated history aren't surprised.
- In `SyslogParser.parse`, switching `detect_log_level` to operate only on `groups.get("message", "")` may miss levels present in the prefix (e.g., host/process portion); it’s worth confirming that this is intentional and consistent with how other parsers determine the level.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/log_analyzer_cli/utils.py`:
- Around line 44-45: The _try_parse_datetime function is missing a datetime
format string for space-separated timestamps without fractional seconds. Add the
format string "%Y-%m-%d %H:%M:%S%z" to the list of datetime format strings being
tried in _try_parse_datetime to handle timestamps like "2024-01-15
10:30:45+00:00" and prevent them from returning None and being dropped.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e2d8aff2-0e77-4d03-8d22-e618fc639d13

📥 Commits

Reviewing files that changed from the base of the PR and between e93757f and 7c54823.

📒 Files selected for processing (5)
  • src/log_analyzer_cli/analyzer.py
  • src/log_analyzer_cli/parsers/syslog.py
  • src/log_analyzer_cli/utils.py
  • tests/test_analyzer.py
  • tests/test_utils.py

Comment on lines +44 to +45
"%Y-%m-%dT%H:%M:%S.%f%z",
"%Y-%m-%d %H:%M:%S.%f%z",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing space-separated timezone format without fractional seconds still drops valid timestamps.

Line 22 matches YYYY-MM-DD HH:MM:SS+HH:MM (and ...Z), but _try_parse_datetime still has no %Y-%m-%d %H:%M:%S%z format, so those lines return None and lose timestamps.

Suggested fix
 formats = [
     "%Y-%m-%d %H:%M:%S.%f",
     "%Y-%m-%dT%H:%M:%S.%f",
     "%Y-%m-%d %H:%M:%S",
     "%Y-%m-%dT%H:%M:%S",
     "%Y-%m-%dT%H:%M:%S.%f%z",
     "%Y-%m-%d %H:%M:%S.%f%z",
     "%Y-%m-%dT%H:%M:%S%z",
+    "%Y-%m-%d %H:%M:%S%z",
     "%d/%b/%Y:%H:%M:%S",
     "%b %d %H:%M:%S",
     "%Y/%m/%d %H:%M:%S",
 ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"%Y-%m-%dT%H:%M:%S.%f%z",
"%Y-%m-%d %H:%M:%S.%f%z",
formats = [
"%Y-%m-%d %H:%M:%S.%f",
"%Y-%m-%dT%H:%M:%S.%f",
"%Y-%m-%d %H:%M:%S",
"%Y-%m-%dT%H:%M:%S",
"%Y-%m-%dT%H:%M:%S.%f%z",
"%Y-%m-%d %H:%M:%S.%f%z",
"%Y-%m-%dT%H:%M:%S%z",
"%Y-%m-%d %H:%M:%S%z",
"%d/%b/%Y:%H:%M:%S",
"%b %d %H:%M:%S",
"%Y/%m/%d %H:%M:%S",
]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/log_analyzer_cli/utils.py` around lines 44 - 45, The _try_parse_datetime
function is missing a datetime format string for space-separated timestamps
without fractional seconds. Add the format string "%Y-%m-%d %H:%M:%S%z" to the
list of datetime format strings being tried in _try_parse_datetime to handle
timestamps like "2024-01-15 10:30:45+00:00" and prevent them from returning None
and being dropped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant