match Apache combined log user field as a non-empty token#18
Conversation
The COMBINED_PATTERN in ApacheParser used (?P<user>\s+), which only matches whitespace. Real Apache combined log lines have a non-empty user field: anonymous requests use a single '-' and authenticated requests use the username (frank, alice, etc.). With \s+ as the group, every real combined log line failed to match and the parser silently fell through to the COMMON_PATTERN, which doesn't capture referer or user_agent. Switched to (?P<user>\S+), matching the shape used by COMMON_PATTERN and by every reference implementation. Verified that: 192.168.1.10 - - [ts] "GET / HTTP/1.0" 200 2326 "-" "Mozilla/5.0" 192.168.1.10 - frank [ts] "GET /api HTTP/1.1" 200 512 "https://x/" "curl/8.0" both now parse with user / referer / user_agent in metadata instead of the anonymous-only fallback that was happening before. Added two test_parsers.py cases that pin the new behaviour: one for an authenticated request with non-empty user + referer + user_agent, and one for the anonymous '-' user.
Reviewer's GuideAdjusts Apache combined log parsing so the user field is parsed as a non-empty token instead of whitespace, fixing parsing of real-world logs, and adds tests covering authenticated and anonymous users with referer and user-agent fields. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe Apache parser's ChangesApache Combined Log User Capture Fix
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The COMBINED_PATTERN in ApacheParser used
(?P<user>\s+), which only matches whitespace. Real Apache combined log lines have a non-empty user field: anonymous requests use a single '-' and authenticated requests use the username (frank, alice, etc.). With the previous pattern, every real combined log line failed can_parse() and was silently dropped; only the lines that happened to put multiple spaces in the user position ever matched. The user-agent and referer columns downstream of user are part of the same optional group, so even when a line did accidentally match, the captured user was empty and the optional trailing group never fired. Switched the user group to\S+so anonymous and authenticated requests both parse, and the optional referer/user_agent group now actually matches. Added two test_parsers.py cases that pin the new behaviour: one for an authenticated request with non-empty user + referer + user_agent, and one for the anonymous '-' user.Summary by Sourcery
Update Apache combined log parsing to correctly capture the user field and ensure downstream referer and user-agent metadata are populated.
Bug Fixes:
Tests:
Summary by CodeRabbit
Bug Fixes
Tests