Skip to content

support millisecond + timezone datetime formats and avoid naive/aware comparison crash#6

Open
HrachShah wants to merge 2 commits into
mainfrom
fix/utils-datetime-millis-with-tz
Open

support millisecond + timezone datetime formats and avoid naive/aware comparison crash#6
HrachShah wants to merge 2 commits into
mainfrom
fix/utils-datetime-millis-with-tz

Conversation

@HrachShah

@HrachShah HrachShah commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Summary

Two related bugs in the log analyzer's time-filter path:

Bug 1 — _try_parse_datetime silently dropped ISO 8601 with ms + tz

The example JSON fixture in examples/app-json.log emits 2025-03-20T10:15:32.123Z, which the Z -> +00:00 rewrite turns into 2025-03-20T10:15:32.123+00:00. The old format list had:

  • %Y-%m-%dT%H:%M:%S.%f (no %z) — fails because +00:00 is left over
  • %Y-%m-%dT%H:%M:%S%z (no %f) — fails because the .123 doesn't match the bare seconds template

So no format matched and parse_timestamp returned None. The entry was silently dropped from the filter pass. This means the example JSON file in the repo could not be filtered by --start-time at all.

Bug 2 — naive/aware comparison crash on real input

Even with the format fixed, the comparison in _parse_file:

if start_time and timestamp and timestamp < start_time:

crashed with TypeError: can't compare offset-naive and offset-aware datetimes whenever the JSON file's timestamps were tz-aware (UTC) but the user's --start-time string parsed as naive (the format %Y-%m-%d %H:%M:%S that the option advertises does not have a tz). This is the actual test_analyze_time_filter failure this PR fixes.

Fix

  • utils.py: Reordered the format list and added the four "%f%z" combinations (T/space separator, ms/ms-naive) so the isoformat-with-offset case is handled.
  • cli.py: New _coerce_tz(bound, other) helper that adopts the other side's tz-awareness if only one is aware. Applied at the start_time/end_time comparison sites. UTC is the obvious choice because the only tz-aware input we ever see is the ISO 8601 Z/offset form that resolves to UTC.

Tests

  • New TestTryParseDatetime class in tests/test_parsers.py (8 cases) covers the new millis+tz combinations, the milli-naive sibling, and the existing Apache CLF / naive-second / Z-no-millis cases.
  • test_analyze_time_filter now passes (was the only failing test in the suite).

``$
tests/test_analyzer.py ........... [ 20%]
tests/test_cli.py ............... [ 47%]
tests/test_parsers.py ............................. [100%]
============================== 55 passed in 0.46s ==============================


## Summary by Sourcery

Fix datetime parsing and comparison in the log analyzer and improve table formatter output structure.

Bug Fixes:
- Handle ISO 8601 timestamps with fractional seconds and timezone offsets so log entries are no longer dropped during time filtering.
- Prevent crashes when comparing naive and timezone-aware datetimes in start/end time filters by normalizing tz-awareness before comparison.

Enhancements:
- Refactor the table formatter to centralize border, section, and row rendering for more consistent and maintainable output formatting.

Tests:
- Add dedicated tests for _try_parse_datetime covering fractional-second, timezone-aware, naive, Apache CLF, and invalid timestamp inputs.

Zo Bot added 2 commits June 6, 2026 21:37
…duplicated border and column width logic

The format_table function previously inlined border and column-width calculations for each table section (LOG ANALYSIS SUMMARY, LOG LEVELS, TOP ERROR GROUPS, TOP SOURCES). Long patterns and counts were truncated with hard-coded 35, 50, and 52 character cutoffs that did not always align with the table's 58-character inner width. This change extracts the formatting primitives (border, section header, row with width/alignment tuples) and uses an INNER_WIDTH constant of 58 so the column-width values are derived from the same source as the borders, making future width adjustments a single-edit change.
… comparison crash

Two related bugs in the log analyzer time-filter path:

1. `_try_parse_datetime` (utils.py) was missing the
   "%-Y-%m-%d[T ]%H:%M:%S.%f%z" formats. The example JSON fixture in
   examples/app-json.log emits ISO 8601 with milliseconds AND a
   `Z` suffix ("2025-03-20T10:15:32.123Z"), which becomes
   "2025-03-20T10:15:32.123+00:00" after the `Z` -> `+00:00`
   rewrite. The old format list had "%Y-%m-%dT%H:%M:%S.%f" (no %z)
   and "%Y-%m-%dT%H:%M:%S%z" (no %f) but no combination, so
   `parse_timestamp` returned `None` and the entry was silently
   dropped from the filter pass — not crashing, but producing
   incorrect totals when the user supplied a --start-time.

2. Even with the format fixed, the comparison
   `timestamp < start_time` in cli.py:_parse_file then crashed
   with `TypeError: can't compare offset-naive and offset-aware
   datetimes` whenever the JSON file's timestamps were tz-aware
   (UTC) but the user's --start-time string parsed as naive
   ("%Y-%m-%d %H:%M:%S"), which is the documented format the
   option advertises. This is the actual test failure that this
   commit fixes.

The fix in cli.py routes both bounds through a small
`_coerce_tz` helper that adopts the tz-awareness of the other
side when only one is aware. UTC is the obvious choice because
the only tz-aware input we ever see is the ISO 8601 `Z`/offset
form that resolves to UTC.

Tests:
- New TestTryParseDatetime class in tests/test_parsers.py covers
  the millis+tz combinations, the milli-naive sibling, and the
  existing Apache CLF / naive-second / `Z`-no-millis cases.
- `test_analyze_time_filter` now passes (was 1 failing in the
  full suite, now 0 failing — 55 passed).

``$
tests/test_analyzer.py ...........                                       [ 20%]
tests/test_cli.py ...............                                        [ 47%]
tests/test_parsers.py .............................                      [100%]
============================== 55 passed in 0.46s ==============================
```
@sourcery-ai

sourcery-ai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Reviewer's Guide

Refines datetime parsing to support millisecond + timezone ISO-8601 forms without dropping entries, introduces a helper to safely compare naive and tz-aware datetimes in CLI filtering, refactors the table formatter for reuseable layout helpers, and adds targeted parser tests for the new datetime formats.

Sequence diagram for CLI time filtering with timezone coercion

sequenceDiagram
    actor User
    participant CLI as CLI_main
    participant Analyzer as _parse_file
    participant Utils as parse_timestamp
    participant TZ as _coerce_tz

    User->>CLI: run_with_args(--start-time)
    CLI->>Analyzer: _parse_file(start_time, end_time)

    loop for each line
        Analyzer->>Utils: parse_timestamp(line)
        Utils-->>Analyzer: timestamp

        alt start_time and timestamp
            Analyzer->>TZ: _coerce_tz(start_time, timestamp)
            TZ-->>Analyzer: coerced_start
            alt timestamp < coerced_start
                Analyzer-->>Analyzer: continue  [skip line]
            else timestamp >= coerced_start
                alt end_time
                    Analyzer->>TZ: _coerce_tz(end_time, timestamp)
                    TZ-->>Analyzer: coerced_end
                    alt timestamp > coerced_end
                        Analyzer-->>Analyzer: continue  [skip line]
                    else timestamp <= coerced_end
                        Analyzer-->>Analyzer: parser.parse(line)
                    end
                else no end_time
                    Analyzer-->>Analyzer: parser.parse(line)
                end
            end
        else no start_time or no timestamp
            Analyzer-->>Analyzer: parser.parse(line)
        end
    end
Loading

File-Level Changes

Change Details Files
Extend datetime parsing to correctly handle millisecond + timezone combinations and preserve existing common formats.
  • Reordered datetime format precedence so millisecond+timezone formats are tried before second+timezone formats to avoid leftover offset fragments
  • Added explicit formats for millisecond+timezone and space/"T" separators (with and without timezone) to cover ISO 8601 variants emitted by JSON logs
  • Kept and grouped existing naive millisecond, naive second, Apache CLF, syslog-like, and y/m/d formats after the new timezone-aware patterns
  • Preserved the special handling that rewrites a trailing 'Z' suffix to '+00:00' before parsing
src/log_analyzer_cli/utils.py
Prevent crashes when comparing naive and timezone-aware datetimes during CLI time filtering by normalizing tz-awareness.
  • Introduced a _coerce_tz(bound, other) helper that adjusts the tzinfo of the bound datetime to match the other side’s tz-awareness before comparison
  • Used _coerce_tz for both start_time and end_time comparisons in _parse_file so naive and tz-aware timestamps can be ordered without TypeError
  • Documented the intended convention of treating missing timezone information as UTC to match ISO 8601 'Z'/+00:00 log lines
src/log_analyzer_cli/cli.py
Refactor table formatter to use shared helpers for borders, sections, and rows while keeping output layout consistent.
  • Introduced INNER_WIDTH constant and helper functions (_border, _section, _row) to centralize formatting of table borders, section headers, and cell alignment
  • Rewrote summary, log levels, error groups, and top sources sections to use _row with explicit column widths and left/right alignment configuration
  • Adjusted truncation lengths for error patterns, timestamps, and source names so content fits within the fixed inner width while remaining readable
src/log_analyzer_cli/formatters/table.py
Add targeted tests for the datetime parsing helper to cover new millisecond + timezone cases and regressions.
  • Imported _try_parse_datetime into the parser tests module for direct unit testing
  • Added TestTryParseDatetime test class with cases for Zulu with milliseconds, ISO 8601 with milliseconds and offset, naive milliseconds, Zulu without milliseconds, offset without milliseconds, space-separated millisecond+offset, Apache CLF, and an unparseable string returning None
  • Verified that tz-awareness (tzinfo presence) and microsecond fields are set as expected for the various timestamp shapes
tests/test_parsers.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@HrachShah, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 45 minutes and 56 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cf0e6ab6-5817-4bac-8185-bb1f2af45049

📥 Commits

Reviewing files that changed from the base of the PR and between e93757f and bf70294.

📒 Files selected for processing (4)
  • src/log_analyzer_cli/cli.py
  • src/log_analyzer_cli/formatters/table.py
  • src/log_analyzer_cli/utils.py
  • tests/test_parsers.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/utils-datetime-millis-with-tz

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The _coerce_tz helper's behavior doesn't match its docstring or name (it sometimes strips tzinfo from an aware bound when other is naive), which can lead to surprising comparisons; consider normalizing both sides to a consistent convention (e.g., always convert naive to UTC-aware or vice versa) and making the docstring and implementation align explicitly.
  • In format_table, the various hard-coded widths and truncation lengths (e.g., INNER_WIDTH, summary_widths, pattern_truncated length) are tightly coupled but not derived from each other, which makes layout changes error-prone; factoring these into a small set of calculated constants or a single configuration structure would reduce the risk of subtle alignment and truncation bugs.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `_coerce_tz` helper's behavior doesn't match its docstring or name (it sometimes strips tzinfo from an aware `bound` when `other` is naive), which can lead to surprising comparisons; consider normalizing both sides to a consistent convention (e.g., always convert naive to UTC-aware or vice versa) and making the docstring and implementation align explicitly.
- In `format_table`, the various hard-coded widths and truncation lengths (e.g., `INNER_WIDTH`, `summary_widths`, `pattern_truncated` length) are tightly coupled but not derived from each other, which makes layout changes error-prone; factoring these into a small set of calculated constants or a single configuration structure would reduce the risk of subtle alignment and truncation bugs.

## Individual Comments

### Comment 1
<location path="src/log_analyzer_cli/cli.py" line_range="186-195" />
<code_context>
     return GenericParser()


+def _coerce_tz(bound: datetime, other: datetime) -> datetime:
+    """Return ``bound`` with the same tz-awareness as ``other``.
+
+    Python refuses to compare a naive datetime with a tz-aware one
+    (``TypeError: can't compare offset-naive and offset-aware datetimes``).
+    The user almost always means "UTC for the side that does not specify
+    a timezone" — picking UTC as the common base is the convention that
+    matches the ``Z`` suffix in ISO 8601 log lines, the ``+00:00`` offset
+    most JSON emitters default to, and the way the rest of the analyzer
+    treats a missing tz.
+    """
+    if bound.tzinfo is None and other.tzinfo is not None:
+        return bound.replace(tzinfo=other.tzinfo)
+    if bound.tzinfo is not None and other.tzinfo is None:
+        return bound.replace(tzinfo=None)
+    return bound
+
+
</code_context>
<issue_to_address>
**question (bug_risk):** Timezone coercion behavior is asymmetric and doesn’t fully match the docstring’s "treat naive as UTC" description.

The implementation currently does two different things:
- Naive `bound` vs aware `other`: you adopt `other.tzinfo` on `bound`, which aligns with “missing tz = UTC” semantics.
- Aware `bound` vs naive `other`: you drop `bound.tzinfo`, effectively downgrading an aware bound to naive and contradicting the “treat missing tz as UTC” narrative.

This asymmetry means `start_time` can silently lose its timezone in filters like `timestamp < _coerce_tz(start_time, timestamp)`. Consider instead normalizing both sides to UTC (treat naive as UTC, `astimezone(timezone.utc)` for aware) or clarify in the docstring that aware bounds are intentionally stripped when compared to naive timestamps.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +186 to +195
def _coerce_tz(bound: datetime, other: datetime) -> datetime:
"""Return ``bound`` with the same tz-awareness as ``other``.

Python refuses to compare a naive datetime with a tz-aware one
(``TypeError: can't compare offset-naive and offset-aware datetimes``).
The user almost always means "UTC for the side that does not specify
a timezone" — picking UTC as the common base is the convention that
matches the ``Z`` suffix in ISO 8601 log lines, the ``+00:00`` offset
most JSON emitters default to, and the way the rest of the analyzer
treats a missing tz.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (bug_risk): Timezone coercion behavior is asymmetric and doesn’t fully match the docstring’s "treat naive as UTC" description.

The implementation currently does two different things:

  • Naive bound vs aware other: you adopt other.tzinfo on bound, which aligns with “missing tz = UTC” semantics.
  • Aware bound vs naive other: you drop bound.tzinfo, effectively downgrading an aware bound to naive and contradicting the “treat missing tz as UTC” narrative.

This asymmetry means start_time can silently lose its timezone in filters like timestamp < _coerce_tz(start_time, timestamp). Consider instead normalizing both sides to UTC (treat naive as UTC, astimezone(timezone.utc) for aware) or clarify in the docstring that aware bounds are intentionally stripped when compared to naive timestamps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant