capture real user from Apache combined log lines instead of matching whitespace#15
capture real user from Apache combined log lines instead of matching whitespace#15HrachShah wants to merge 1 commit into
Conversation
Reviewer's GuideFixes Apache combined log parsing to capture actual user identifiers instead of whitespace and adds tests to lock in behavior for authenticated and anonymous users. Sequence diagram for updated ApacheParser user capture from combined log linessequenceDiagram
actor Operator
participant ApacheParser
participant COMBINED_PATTERN
participant Entry
Operator->>ApacheParser: parse(log_line)
ApacheParser->>COMBINED_PATTERN: COMBINED_PATTERN.match(log_line)
alt [combined format matches]
COMBINED_PATTERN-->>ApacheParser: match(user=frank_or_dash)
ApacheParser->>Entry: metadata['user'] = match.group('user')
else [combined pattern does not match]
ApacheParser->>ApacheParser: COMMON_PATTERN.match(log_line)
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe ChangesApache Combined Log User Extraction Fix
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- Since COMBINED_PATTERN and COMMON_PATTERN now intentionally share most of their structure, consider factoring the shared pieces into a helper or base pattern to reduce the risk of them diverging subtly again in the future.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Since COMBINED_PATTERN and COMMON_PATTERN now intentionally share most of their structure, consider factoring the shared pieces into a helper or base pattern to reduce the risk of them diverging subtly again in the future.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
The COMBINED_PATTERN regex in src/log_analyzer_cli/parsers/apache.py used (?P\s+) for the user field — that captures whitespace, not a user identifier. Apache combined log format puts the user identifier (typically '-' for anonymous or a real login) in that slot. The pattern therefore never matched combined-format lines; the parse method only worked because it fell through to COMMON_PATTERN, which had the correct (?P\S+).
The user value then gets stored in entry.metadata['user']. With the old pattern that value was a single space (whatever whitespace the \s+ ate), which was useless for any downstream filter, grouping, or report that wanted to break down traffic by authenticated vs anonymous users.
Changed the COMBINED_PATTERN user group to \S+ and re-added the \s+ separator before the timestamp bracket, so the structure now mirrors COMMON_PATTERN. Added two tests that confirm a combined-format line with a real username (frank) captures it in metadata, and that the anonymous '-' placeholder is still captured correctly. The original test lines all used '-' for the user so the breakage was invisible in the test suite; the new tests use a real identifier to lock the behaviour in.
Summary by Sourcery
Fix Apache combined log parsing to correctly capture the user field and add coverage for authenticated and anonymous users.
Bug Fixes:
Tests:
Summary by CodeRabbit
Bug Fixes