support millisecond + timezone datetime formats and avoid naive/aware comparison crash#6
support millisecond + timezone datetime formats and avoid naive/aware comparison crash#6HrachShah wants to merge 2 commits into
Conversation
…duplicated border and column width logic The format_table function previously inlined border and column-width calculations for each table section (LOG ANALYSIS SUMMARY, LOG LEVELS, TOP ERROR GROUPS, TOP SOURCES). Long patterns and counts were truncated with hard-coded 35, 50, and 52 character cutoffs that did not always align with the table's 58-character inner width. This change extracts the formatting primitives (border, section header, row with width/alignment tuples) and uses an INNER_WIDTH constant of 58 so the column-width values are derived from the same source as the borders, making future width adjustments a single-edit change.
… comparison crash
Two related bugs in the log analyzer time-filter path:
1. `_try_parse_datetime` (utils.py) was missing the
"%-Y-%m-%d[T ]%H:%M:%S.%f%z" formats. The example JSON fixture in
examples/app-json.log emits ISO 8601 with milliseconds AND a
`Z` suffix ("2025-03-20T10:15:32.123Z"), which becomes
"2025-03-20T10:15:32.123+00:00" after the `Z` -> `+00:00`
rewrite. The old format list had "%Y-%m-%dT%H:%M:%S.%f" (no %z)
and "%Y-%m-%dT%H:%M:%S%z" (no %f) but no combination, so
`parse_timestamp` returned `None` and the entry was silently
dropped from the filter pass — not crashing, but producing
incorrect totals when the user supplied a --start-time.
2. Even with the format fixed, the comparison
`timestamp < start_time` in cli.py:_parse_file then crashed
with `TypeError: can't compare offset-naive and offset-aware
datetimes` whenever the JSON file's timestamps were tz-aware
(UTC) but the user's --start-time string parsed as naive
("%Y-%m-%d %H:%M:%S"), which is the documented format the
option advertises. This is the actual test failure that this
commit fixes.
The fix in cli.py routes both bounds through a small
`_coerce_tz` helper that adopts the tz-awareness of the other
side when only one is aware. UTC is the obvious choice because
the only tz-aware input we ever see is the ISO 8601 `Z`/offset
form that resolves to UTC.
Tests:
- New TestTryParseDatetime class in tests/test_parsers.py covers
the millis+tz combinations, the milli-naive sibling, and the
existing Apache CLF / naive-second / `Z`-no-millis cases.
- `test_analyze_time_filter` now passes (was 1 failing in the
full suite, now 0 failing — 55 passed).
``$
tests/test_analyzer.py ........... [ 20%]
tests/test_cli.py ............... [ 47%]
tests/test_parsers.py ............................. [100%]
============================== 55 passed in 0.46s ==============================
```
Reviewer's GuideRefines datetime parsing to support millisecond + timezone ISO-8601 forms without dropping entries, introduces a helper to safely compare naive and tz-aware datetimes in CLI filtering, refactors the table formatter for reuseable layout helpers, and adds targeted parser tests for the new datetime formats. Sequence diagram for CLI time filtering with timezone coercionsequenceDiagram
actor User
participant CLI as CLI_main
participant Analyzer as _parse_file
participant Utils as parse_timestamp
participant TZ as _coerce_tz
User->>CLI: run_with_args(--start-time)
CLI->>Analyzer: _parse_file(start_time, end_time)
loop for each line
Analyzer->>Utils: parse_timestamp(line)
Utils-->>Analyzer: timestamp
alt start_time and timestamp
Analyzer->>TZ: _coerce_tz(start_time, timestamp)
TZ-->>Analyzer: coerced_start
alt timestamp < coerced_start
Analyzer-->>Analyzer: continue [skip line]
else timestamp >= coerced_start
alt end_time
Analyzer->>TZ: _coerce_tz(end_time, timestamp)
TZ-->>Analyzer: coerced_end
alt timestamp > coerced_end
Analyzer-->>Analyzer: continue [skip line]
else timestamp <= coerced_end
Analyzer-->>Analyzer: parser.parse(line)
end
else no end_time
Analyzer-->>Analyzer: parser.parse(line)
end
end
else no start_time or no timestamp
Analyzer-->>Analyzer: parser.parse(line)
end
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Warning Review limit reached
More reviews will be available in 45 minutes and 56 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The
_coerce_tzhelper's behavior doesn't match its docstring or name (it sometimes strips tzinfo from an awareboundwhenotheris naive), which can lead to surprising comparisons; consider normalizing both sides to a consistent convention (e.g., always convert naive to UTC-aware or vice versa) and making the docstring and implementation align explicitly. - In
format_table, the various hard-coded widths and truncation lengths (e.g.,INNER_WIDTH,summary_widths,pattern_truncatedlength) are tightly coupled but not derived from each other, which makes layout changes error-prone; factoring these into a small set of calculated constants or a single configuration structure would reduce the risk of subtle alignment and truncation bugs.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `_coerce_tz` helper's behavior doesn't match its docstring or name (it sometimes strips tzinfo from an aware `bound` when `other` is naive), which can lead to surprising comparisons; consider normalizing both sides to a consistent convention (e.g., always convert naive to UTC-aware or vice versa) and making the docstring and implementation align explicitly.
- In `format_table`, the various hard-coded widths and truncation lengths (e.g., `INNER_WIDTH`, `summary_widths`, `pattern_truncated` length) are tightly coupled but not derived from each other, which makes layout changes error-prone; factoring these into a small set of calculated constants or a single configuration structure would reduce the risk of subtle alignment and truncation bugs.
## Individual Comments
### Comment 1
<location path="src/log_analyzer_cli/cli.py" line_range="186-195" />
<code_context>
return GenericParser()
+def _coerce_tz(bound: datetime, other: datetime) -> datetime:
+ """Return ``bound`` with the same tz-awareness as ``other``.
+
+ Python refuses to compare a naive datetime with a tz-aware one
+ (``TypeError: can't compare offset-naive and offset-aware datetimes``).
+ The user almost always means "UTC for the side that does not specify
+ a timezone" — picking UTC as the common base is the convention that
+ matches the ``Z`` suffix in ISO 8601 log lines, the ``+00:00`` offset
+ most JSON emitters default to, and the way the rest of the analyzer
+ treats a missing tz.
+ """
+ if bound.tzinfo is None and other.tzinfo is not None:
+ return bound.replace(tzinfo=other.tzinfo)
+ if bound.tzinfo is not None and other.tzinfo is None:
+ return bound.replace(tzinfo=None)
+ return bound
+
+
</code_context>
<issue_to_address>
**question (bug_risk):** Timezone coercion behavior is asymmetric and doesn’t fully match the docstring’s "treat naive as UTC" description.
The implementation currently does two different things:
- Naive `bound` vs aware `other`: you adopt `other.tzinfo` on `bound`, which aligns with “missing tz = UTC” semantics.
- Aware `bound` vs naive `other`: you drop `bound.tzinfo`, effectively downgrading an aware bound to naive and contradicting the “treat missing tz as UTC” narrative.
This asymmetry means `start_time` can silently lose its timezone in filters like `timestamp < _coerce_tz(start_time, timestamp)`. Consider instead normalizing both sides to UTC (treat naive as UTC, `astimezone(timezone.utc)` for aware) or clarify in the docstring that aware bounds are intentionally stripped when compared to naive timestamps.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| def _coerce_tz(bound: datetime, other: datetime) -> datetime: | ||
| """Return ``bound`` with the same tz-awareness as ``other``. | ||
|
|
||
| Python refuses to compare a naive datetime with a tz-aware one | ||
| (``TypeError: can't compare offset-naive and offset-aware datetimes``). | ||
| The user almost always means "UTC for the side that does not specify | ||
| a timezone" — picking UTC as the common base is the convention that | ||
| matches the ``Z`` suffix in ISO 8601 log lines, the ``+00:00`` offset | ||
| most JSON emitters default to, and the way the rest of the analyzer | ||
| treats a missing tz. |
There was a problem hiding this comment.
question (bug_risk): Timezone coercion behavior is asymmetric and doesn’t fully match the docstring’s "treat naive as UTC" description.
The implementation currently does two different things:
- Naive
boundvs awareother: you adoptother.tzinfoonbound, which aligns with “missing tz = UTC” semantics. - Aware
boundvs naiveother: you dropbound.tzinfo, effectively downgrading an aware bound to naive and contradicting the “treat missing tz as UTC” narrative.
This asymmetry means start_time can silently lose its timezone in filters like timestamp < _coerce_tz(start_time, timestamp). Consider instead normalizing both sides to UTC (treat naive as UTC, astimezone(timezone.utc) for aware) or clarify in the docstring that aware bounds are intentionally stripped when compared to naive timestamps.
Summary
Two related bugs in the log analyzer's time-filter path:
Bug 1 —
_try_parse_datetimesilently dropped ISO 8601 with ms + tzThe example JSON fixture in
examples/app-json.logemits2025-03-20T10:15:32.123Z, which theZ->+00:00rewrite turns into2025-03-20T10:15:32.123+00:00. The old format list had:%Y-%m-%dT%H:%M:%S.%f(no %z) — fails because+00:00is left over%Y-%m-%dT%H:%M:%S%z(no %f) — fails because the.123doesn't match the bare seconds templateSo no format matched and
parse_timestampreturnedNone. The entry was silently dropped from the filter pass. This means the example JSON file in the repo could not be filtered by --start-time at all.Bug 2 — naive/aware comparison crash on real input
Even with the format fixed, the comparison in
_parse_file:crashed with
TypeError: can't compare offset-naive and offset-aware datetimeswhenever the JSON file's timestamps were tz-aware (UTC) but the user's--start-timestring parsed as naive (the format%Y-%m-%d %H:%M:%Sthat the option advertises does not have a tz). This is the actualtest_analyze_time_filterfailure this PR fixes.Fix
utils.py: Reordered the format list and added the four "%f%z" combinations (T/space separator, ms/ms-naive) so the isoformat-with-offset case is handled.cli.py: New_coerce_tz(bound, other)helper that adopts the other side's tz-awareness if only one is aware. Applied at the start_time/end_time comparison sites. UTC is the obvious choice because the only tz-aware input we ever see is the ISO 8601Z/offset form that resolves to UTC.Tests
TestTryParseDatetimeclass intests/test_parsers.py(8 cases) covers the new millis+tz combinations, the milli-naive sibling, and the existing Apache CLF / naive-second /Z-no-millis cases.test_analyze_time_filternow passes (was the only failing test in the suite).``$
tests/test_analyzer.py ........... [ 20%]
tests/test_cli.py ............... [ 47%]
tests/test_parsers.py ............................. [100%]
============================== 55 passed in 0.46s ==============================