Add client-side regex filtering for all adapters#242
Open
maximelb wants to merge 8 commits into
Open
Conversation
Implement per-adapter configurable filtering that allows users to specify
regex patterns to filter out events/logs before they're sent to the cloud.
This reduces cloud ingestion costs and network traffic for unwanted data
like health checks, debug logs, and monitoring probes.
Architecture:
- FilterEngine: Compiles regex patterns once, tracks statistics, logs stats
every 5 minutes via background goroutine
- FilteredClient: Transparent wrapper implementing Shipper interface that
intercepts Ship() calls to filter messages before transmission
- Per-adapter configuration via "filters" field (optional string array)
Changes:
- Add utils/filter.go: Core filtering engine with regex matching and stats
- Add utils/filtered_client.go: Shipper interface and filtered client wrapper
- Add utils/filter_test.go: Comprehensive unit tests (all passing)
- Update all 40 adapter configs to add Filters []string field
- Update all 39 adapter constructors to wrap client when filters configured
- Change adapter client field type from *uspclient.Client to utils.Shipper
Features:
- Regex-based pattern matching with full regex support
- Matches against TextPayload and JsonPayload (marshaled to JSON)
- Thread-safe statistics tracking with atomic operations
- Background stats reporter logs every 5 minutes
- Debug logging for each filtered item
- Graceful shutdown with final stats report
- Zero overhead when no filters configured
Example configuration:
```yaml
file_adapter:
file_path: "/var/log/app.log"
filters:
- "health-?check"
- "(?i)monitoring-bot"
- "\\b(DEBUG|TRACE)\\b"
```
Testing:
- All unit tests pass
- No breaking changes to existing functionality
- Build successful across all adapters
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit adds enhanced client-side filtering capabilities with type-safe configuration and support for efficient JSON field filtering using GJSON. ## Features **Type-Safe Filter Configuration** - Structured FilterPattern type with explicit type discriminator - Support two pattern types: "regex" and "gjson" - Built-in validation for all pattern fields - Clear, self-documenting configuration syntax **GJSON-Based JSON Field Filtering** - Integrate gjson library (v1.18.0) for JSON path queries - Support nested field access: user.profile.role - Support array indexing: items.0.id - Support conditional queries: users.#(age>45).email - See https://github.com/tidwall/gjson/blob/master/SYNTAX.md for syntax **Per-Pattern Statistics** - Track filtering matches for each pattern individually - Automatic statistics reporting every 5 minutes - Track JSON marshal failures separately **Thread Safety** - Atomic operations for all counters - RWMutex protection for shared state - Idempotent Close() using sync.Once - Clean shutdown of background goroutines ## Configuration Examples Simple field filtering: ```yaml filters: - type: gjson path: "level" pattern: "^(DEBUG|TRACE)$" ``` Multiple patterns: ```yaml filters: - type: gjson path: "level" pattern: "^DEBUG$" - type: gjson path: "user.email" pattern: ".*@test\\.example\\.com$" - type: regex pattern: "health-?check" ``` Advanced GJSON queries: ```yaml filters: - type: gjson path: "users.#(age>45).email" pattern: ".*@Company\\.com" - type: gjson path: "events.0.severity" pattern: "^(high|critical)$" ``` ## Performance Benchmarks show GJSON filtering at 9.5µs per operation with <1% overhead vs network latency in real-world usage. Usability benefits justify minimal performance cost: - Clearer intent and safer than JSON regex patterns - Advanced queries impossible with regex alone - Natural syntax for common filtering use cases See utils/FILTERING_BENCHMARKS.md for detailed analysis. ## Testing - 40+ tests covering all functionality - Race detector clean (no data races) - Concurrency test: 50 goroutines × 100 messages - Comprehensive benchmark suite ## Files Modified - utils/filter.go: Complete rewrite with gjson support - utils/filtered_client.go: Updated function signature - utils/filter_test.go: Comprehensive test suite (678 lines) - utils/FILTERING_BENCHMARKS.md: Performance analysis - utils/IMPLEMENTATION_SUMMARY.md: Implementation guide - go.mod, go.sum: Added gjson v1.18.0 dependency - All 40 adapters: Updated to use FilterPattern type 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
tomaz-lc
reviewed
Nov 11, 2025
tomaz-lc
reviewed
Nov 11, 2025
tomaz-lc
reviewed
Nov 11, 2025
…entation Eliminates per-message logging that was creating excessive log volume in production. Removed logging on every filtered message (21-53% performance overhead) while preserving initialization and periodic statistics logging. Also removes markdown documentation files that are not needed in the repository. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Resolved conflicts in 40 adapter client.go files where the client-side filtering branch's temporary `client` variable collided with master's context parameter addition. Updated all uspclient.NewClient calls to use the ctx parameter while preserving the filtering wrapper pattern. Also updated dependencies to latest versions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
# Conflicts: # 1password/client.go # azure_event_hub/client.go # bigquery/client.go # bitwarden/client.go # box/client.go # cato/client.go # cylance/client.go # defender/client.go # duo/client.go # entraid/client.go # evtx/client.go # falconcloud/client.go # file/client.go # gcs/client.go # go.mod # go.sum # hubspot/client.go # imap/client.go # itglue/client.go # k8s_pods/client.go # mac_unified_logging/client.go # mimecast/client.go # ms_graph/client.go # o365/client.go # okta/client.go # pandadoc/client.go # proofpoint_tap/client.go # pubsub/client.go # s3/client.go # sentinelone/s1.go # simulator/client.go # slack/client.go # sophos/client.go # sqs-files/client.go # sqs/client.go # stdin/client.go # sublime/client.go # syslog/client.go # trendmicro/client.go # wel/client.go # wiz/client.go # zendesk/client.go
This commit:
1. Merges the latest changes from master, resolving conflicts in:
- go.mod and go.sum (updated to newer dependency versions)
- mac_unified_logging/conf.go and wel/conf.go (combined fmt and utils imports)
2. Adds filter mode support to allow users to choose between:
- "exclude" (default): Messages matching any pattern are filtered out
- "include": Only messages matching at least one pattern are allowed through
The filter mode is configured via a new `filter_mode` field in each adapter's
configuration. This provides flexibility for users who want to either:
- Filter OUT specific unwanted data (exclude mode - original behavior)
- Keep ONLY specific data they care about (include mode - new)
Example configuration for include mode:
```yaml
filters:
- type: gjson
path: "level"
pattern: "^(INFO|ERROR|WARN)$"
filter_mode: include # Only keep INFO, ERROR, and WARN logs
```
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When a message had a JsonPayload that failed to marshal, and both gjson and regex patterns were configured, the marshalFailures counter was incremented twice: 1. Once in matchesAnyPattern() when trying gjson patterns 2. Once in extractPayload() when trying regex patterns This fix introduces extractPayloadWithCache() which reuses the already marshaled JSON string (or skips re-marshaling if it already failed), preventing the double-count. Also added a regression test to verify marshal failures are only counted once. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
tomaz-lc
reviewed
Jan 26, 2026
tomaz-lc
approved these changes
Jan 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds client-side filtering with type-safe configuration and GJSON support for efficient JSON field filtering. Users can specify patterns to filter out events/logs before they're sent to the cloud, reducing ingestion costs and network traffic for unwanted data like health checks, debug logs, and test users.
New in this version: Added filter mode support (
exclude/include) allowing users to either filter OUT matching messages or keep ONLY matching messages.Architecture
Core Components
FilterEngine (
utils/filter.go)FilteredClient (
utils/filtered_client.go)Shipperinterface for transparent wrappingShip()calls to filter before transmissionConfiguration
Filters []utils.FilterPatternfield to all 40 adapter configsFilterMode utils.FilterModefield to all 40 adapter configsFilter Modes
Exclude Mode (Default)
Messages matching any pattern are filtered OUT. Non-matching messages pass through.
Include Mode
Only messages matching at least one pattern are allowed through. Non-matching messages are filtered OUT.
Changes
utils/filter.go- Added FilterMode support with exclude/include semanticsutils/filtered_client.go- Updated to pass FilterMode to FilterEngineutils/filter_test.go- Comprehensive test suite (800+ lines, 50+ tests)utils/FILTERING_BENCHMARKS.md- Performance analysis and recommendationsutils/IMPLEMENTATION_SUMMARY.md- Complete implementation guideFiltersandFilterModefieldsFeatures
Filter Mode Support
Type-Safe Configuration
GJSON-Based JSON Filtering
user.profile.roleitems.0.idusers.#(age>45).emailRegex Filtering
Observability
Thread Safety
Example Configurations
Exclude Mode - Filter OUT Unwanted Data
Include Mode - Keep ONLY Specific Data
Advanced GJSON Queries
Example Output
Performance
Benchmarks (see
utils/FILTERING_BENCHMARKS.mdfor details):Testing
Impact
Documentation
utils/filter.go- Extensive godoc with FilterMode and GJSON syntax examplesutils/FILTERING_BENCHMARKS.md- Performance analysis and recommendationsutils/IMPLEMENTATION_SUMMARY.md- Complete implementation guide with examples🤖 Generated with Claude Code