Skip to content

match Apache combined log user field as a non-empty token#18

Open
HrachShah wants to merge 1 commit into
mainfrom
fix/apache-combined-user-field-shape
Open

match Apache combined log user field as a non-empty token#18
HrachShah wants to merge 1 commit into
mainfrom
fix/apache-combined-user-field-shape

Conversation

@HrachShah

@HrachShah HrachShah commented Jun 22, 2026

Copy link
Copy Markdown
Owner

The COMBINED_PATTERN in ApacheParser used (?P<user>\s+), which only matches whitespace. Real Apache combined log lines have a non-empty user field: anonymous requests use a single '-' and authenticated requests use the username (frank, alice, etc.). With the previous pattern, every real combined log line failed can_parse() and was silently dropped; only the lines that happened to put multiple spaces in the user position ever matched. The user-agent and referer columns downstream of user are part of the same optional group, so even when a line did accidentally match, the captured user was empty and the optional trailing group never fired. Switched the user group to \S+ so anonymous and authenticated requests both parse, and the optional referer/user_agent group now actually matches. Added two test_parsers.py cases that pin the new behaviour: one for an authenticated request with non-empty user + referer + user_agent, and one for the anonymous '-' user.

Summary by Sourcery

Update Apache combined log parsing to correctly capture the user field and ensure downstream referer and user-agent metadata are populated.

Bug Fixes:

  • Fix Apache combined log regex so the user field is parsed as a non-empty token instead of whitespace, allowing both anonymous and authenticated requests to be handled.

Tests:

  • Add parser tests covering Apache combined logs with an authenticated user and with an anonymous '-' user, asserting user, referer, and user-agent metadata are captured.

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced Apache/Nginx combined log parser to accurately extract user field information from logs.
  • Tests

    • Added test coverage for Apache/Nginx user metadata extraction scenarios.

The COMBINED_PATTERN in ApacheParser used (?P<user>\s+), which only
matches whitespace. Real Apache combined log lines have a non-empty
user field: anonymous requests use a single '-' and authenticated
requests use the username (frank, alice, etc.). With \s+ as the
group, every real combined log line failed to match and the parser
silently fell through to the COMMON_PATTERN, which doesn't capture
referer or user_agent.

Switched to (?P<user>\S+), matching the shape used by COMMON_PATTERN
and by every reference implementation. Verified that:

  192.168.1.10 - - [ts] "GET / HTTP/1.0" 200 2326 "-" "Mozilla/5.0"
  192.168.1.10 - frank [ts] "GET /api HTTP/1.1" 200 512 "https://x/" "curl/8.0"

both now parse with user / referer / user_agent in metadata instead
of the anonymous-only fallback that was happening before.

Added two test_parsers.py cases that pin the new behaviour: one for
an authenticated request with non-empty user + referer + user_agent,
and one for the anonymous '-' user.
@sourcery-ai

sourcery-ai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Reviewer's Guide

Adjusts Apache combined log parsing so the user field is parsed as a non-empty token instead of whitespace, fixing parsing of real-world logs, and adds tests covering authenticated and anonymous users with referer and user-agent fields.

File-Level Changes

Change Details Files
Fix Apache combined log regex to correctly capture the user field and allow downstream referer/user_agent capture.
  • Replace the user capture group in the COMBINED_PATTERN from a whitespace-only matcher to a non-whitespace token matcher, keeping spacing semantics intact.
  • Ensure the optional referer and user_agent group can now match for real-world combined log lines that include a proper user token.
src/log_analyzer_cli/parsers/apache.py
Add tests that pin correct parsing of authenticated and anonymous Apache combined log entries, including referer and user_agent metadata.
  • Add test for an authenticated request with non-empty user, referer, and user_agent, asserting all are captured into metadata.
  • Add test for an anonymous '-' user request, asserting user and user_agent are captured and that combined format is used instead of falling back to common.
tests/test_parsers.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fe462010-7c74-491f-be68-d32d4e336da5

📥 Commits

Reviewing files that changed from the base of the PR and between e93757f and 330a6fd.

📒 Files selected for processing (2)
  • src/log_analyzer_cli/parsers/apache.py
  • tests/test_parsers.py

📝 Walkthrough

Walkthrough

The Apache parser's COMBINED_PATTERN regex is corrected so the user named capture group matches a non-whitespace token (\S+) instead of just whitespace. Two new test cases verify that both a non-empty user value and the conventional anonymous - value are correctly extracted along with referer and user_agent metadata.

Changes

Apache Combined Log User Capture Fix

Layer / File(s) Summary
COMBINED_PATTERN user regex fix and tests
src/log_analyzer_cli/parsers/apache.py, tests/test_parsers.py
The user named group changes from (?P<user>\s+) to (?P<user>\S+)\s+, capturing the actual token. Two new tests assert user, referer, and user_agent are parsed correctly for both authenticated and anonymous (-) combined-format lines.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

A regex once grabbed only space,
But the rabbit said, "That's not the place!"
🐇 Now \S+ grabs the name,
The user field is now on-game,
And tests confirm it's all in place! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: updating the Apache combined log parser to correctly match the user field as a non-empty token instead of whitespace.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/apache-combined-user-field-shape

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant