Skip to content

capture real user from Apache combined log lines instead of matching whitespace#15

Open
HrachShah wants to merge 1 commit into
mainfrom
fix/apache-combined-pattern-user-field-real
Open

capture real user from Apache combined log lines instead of matching whitespace#15
HrachShah wants to merge 1 commit into
mainfrom
fix/apache-combined-pattern-user-field-real

Conversation

@HrachShah

@HrachShah HrachShah commented Jun 21, 2026

Copy link
Copy Markdown
Owner

The COMBINED_PATTERN regex in src/log_analyzer_cli/parsers/apache.py used (?P\s+) for the user field — that captures whitespace, not a user identifier. Apache combined log format puts the user identifier (typically '-' for anonymous or a real login) in that slot. The pattern therefore never matched combined-format lines; the parse method only worked because it fell through to COMMON_PATTERN, which had the correct (?P\S+).

The user value then gets stored in entry.metadata['user']. With the old pattern that value was a single space (whatever whitespace the \s+ ate), which was useless for any downstream filter, grouping, or report that wanted to break down traffic by authenticated vs anonymous users.

Changed the COMBINED_PATTERN user group to \S+ and re-added the \s+ separator before the timestamp bracket, so the structure now mirrors COMMON_PATTERN. Added two tests that confirm a combined-format line with a real username (frank) captures it in metadata, and that the anonymous '-' placeholder is still captured correctly. The original test lines all used '-' for the user so the breakage was invisible in the test suite; the new tests use a real identifier to lock the behaviour in.

Summary by Sourcery

Fix Apache combined log parsing to correctly capture the user field and add coverage for authenticated and anonymous users.

Bug Fixes:

  • Correct Apache combined log regex to capture non-whitespace user identifiers instead of consuming whitespace.

Tests:

  • Add tests verifying that Apache combined log lines capture authenticated usernames and anonymous '-' users in metadata.

Summary by CodeRabbit

Bug Fixes

  • Fixed Apache combined-log parser to correctly extract the user field, now properly handling both authenticated usernames and anonymous user entries.

@sourcery-ai

sourcery-ai Bot commented Jun 21, 2026

Copy link
Copy Markdown

Reviewer's Guide

Fixes Apache combined log parsing to capture actual user identifiers instead of whitespace and adds tests to lock in behavior for authenticated and anonymous users.

Sequence diagram for updated ApacheParser user capture from combined log lines

sequenceDiagram
    actor Operator
    participant ApacheParser
    participant COMBINED_PATTERN
    participant Entry

    Operator->>ApacheParser: parse(log_line)
    ApacheParser->>COMBINED_PATTERN: COMBINED_PATTERN.match(log_line)
    alt [combined format matches]
        COMBINED_PATTERN-->>ApacheParser: match(user=frank_or_dash)
        ApacheParser->>Entry: metadata['user'] = match.group('user')
    else [combined pattern does not match]
        ApacheParser->>ApacheParser: COMMON_PATTERN.match(log_line)
    end
Loading

File-Level Changes

Change Details Files
Correct Apache combined log regex to capture non-whitespace user identifiers and preserve field separators.
  • Replace the user capture group from whitespace (\s+) to non-whitespace (\S+) in the COMBINED_PATTERN regex.
  • Reintroduce an explicit whitespace separator after the user capture group to keep the pattern aligned with Apache's combined log format and COMMON_PATTERN.
src/log_analyzer_cli/parsers/apache.py
Add regression tests ensuring combined log lines capture authenticated and anonymous users correctly.
  • Add a test case that parses a combined-format line with a real username and asserts it is stored in metadata['user'] along with key request fields.
  • Add a test case that parses a combined-format line with the anonymous '-' placeholder and asserts it is captured correctly in metadata['user'].
tests/test_parsers.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 468b7d31-ff45-49a0-9ca7-fb81b84e8204

📥 Commits

Reviewing files that changed from the base of the PR and between e93757f and c153753.

📒 Files selected for processing (2)
  • src/log_analyzer_cli/parsers/apache.py
  • tests/test_parsers.py

📝 Walkthrough

Walkthrough

The user capture group in ApacheParser.COMBINED_PATTERN is corrected from \s+ (whitespace) to \S+ (non-whitespace) so it properly extracts the username token from combined Apache/Nginx log lines. Two new tests verify the extracted metadata["user"] for both authenticated ("frank") and anonymous ("-") log entries.

Changes

Apache Combined Log User Extraction Fix

Layer / File(s) Summary
COMBINED_PATTERN user group fix and extraction tests
src/log_analyzer_cli/parsers/apache.py, tests/test_parsers.py
The user regex group changes from \s+ to \S+, enabling correct username extraction. Two new test methods assert entry.metadata["user"] equals "frank" for an authenticated user and "-" for an anonymous user; the authenticated case also checks host, status, and request.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

🐇 A tiny regex, one letter astray,
\s grabbed whitespace, led parsers away.
Swap in \S, and non-space takes hold—
Now "frank" and his dash do as they're told.
The tests all pass, the warren is bright! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately captures the main change: fixing the regex pattern to capture actual usernames instead of whitespace in Apache combined logs.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/apache-combined-pattern-user-field-real

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • Since COMBINED_PATTERN and COMMON_PATTERN now intentionally share most of their structure, consider factoring the shared pieces into a helper or base pattern to reduce the risk of them diverging subtly again in the future.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Since COMBINED_PATTERN and COMMON_PATTERN now intentionally share most of their structure, consider factoring the shared pieces into a helper or base pattern to reduce the risk of them diverging subtly again in the future.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant