Cloudflare logpush pipeline improvements internal#4
Open
brijesh-elastic wants to merge 17 commits into
Open
Conversation
Update format_version to 3.3.2 and ECS dependency to git@v9.3.0 in manifest.yml and build.yml. Update ecs.version to 9.3.0 in all 21 data stream ingest pipelines.
field descriptions Update ECS field definitions by replacing agent.yml with beats.yml and modernizing base-fields.yml across all 21 data streams. Sort all fields.yml entries alphabetically for better maintainability. Fix swapped field descriptions for firewall_event (origin.ray.id/origin.response.status) and http_request (cache.status/cache.response.status). Add support for new fields across 9 data streams with their corresponding ingest pipeline processors: device_posture (RegistrationID), firewall_event (FraudUserID), gateway_dns (12 fields including InternalDNS*, QueryApplication*, RequestContext*), gateway_http (AppControlInfo, ApplicationStatuses, RedirectTargetURI, RegistrationID), gateway_network (RegistrationID), http_request (11 fields including Fraud*, WebAssets*, WorkerScriptName), network_analytics (DNSQueryName, DNSQueryType, PFPCustomTag), network_session (InitialOriginIP, RegistrationID, ResolvedFQDN, SNI), workers_trace (CPUTimeMs, WallTimeMs).
Correct the Painless script to reference ctx.json.Timestamp (PascalCase) instead of ctx.json.timestamp, matching the actual field name from the Cloudflare API and the guard condition.
Fix the grok guard condition that used an incorrect path (ctx.json?.cloudflare_logpush) instead of (ctx.cloudflare_logpush) and a tautological || operator instead of &&. Also correct the remove processor to reference action instead of event_action. Update test data to use a valid disconnect timestamp.
Correct the split processor condition to reference ctx.json.TCPSackBlocks consistently instead of mixing TCPSACKBlocks and TCPSackBlocks casing.
Correct the rename condition to use ctx.json?.Interface (PascalCase) matching the actual Cloudflare API field name instead of lowercase.
Replace rename processors with convert processors (type: string) for fields documented as integers or arrays of integers but mapped as keyword type in fields.yml. Affected fields: gateway_dns (CNAMECategoryIDs, EDEErrors, InitialCategoryIDs, MatchedIndicatorFeedIDs, ResolvedIPCategoryIDs), gateway_http (ApplicationIDs), gateway_network (ApplicationIDs, CategoryIDs).
Change singular header to plural headers for RequestHeaders and ResponseHeaders target fields to match the fields.yml definitions (request.headers and response.headers).
Add IANA keyword representation scripts for dns.response_code and dns.question.type in both dns and dns_firewall data streams. Numeric DNS response codes are now mapped to human-readable names (e.g., 0 -> NoError, 3 -> NXDomain) and query types are mapped to their IANA names (e.g., 1 -> A, 28 -> AAAA, 15 -> MX).
Remove ignore_failure: true from the first JSON processor in all data stream pipelines that had it. Parsing failures should surface as errors rather than silently producing partial documents.
Replace grok processors with dissect for simple delimiter-based pattern matching in firewall_event, http_request, and spectrum_event data streams. Dissect is more performant than grok for fixed patterns like protocol/version splitting.
Replace multiple timestamp normalization scripts (which handled both String and Number types with try/catch) with a single, efficient script that only handles Number type. The new script directly converts timestamps to Unix milliseconds by dividing nanosecond values or multiplying second values. Update test input log files to use numeric timestamps to match the simplified script expectations.
Replace rename processors with typed convert processors for fields declared as ip, long, boolean, or double in fields.yml to ensure correct type casting. Add in-place convert processors for timestamp fields to handle string-to-number conversion before the normalization script. Affected data streams: access_request, device_posture, dns, dns_firewall, gateway_dns, gateway_http, gateway_network, http_request, magic_ids, network_session, sinkhole_http, spectrum_event. Also adds timestamp converts for all 20 data streams with numeric timestamp handling.
Add a null/empty field removal script at the end of all 21 data stream
pipelines. The script recursively removes fields with null values, empty
strings, empty maps, and empty lists.
Standardize all on_failure error.message values to use the full format:
"Processor {type} with tag {tag} in pipeline {pipeline} failed with
message: {message}" for consistent debugging output.
Add a unique tag key to every processor in all 21 ingest pipelines for
easier debugging and tracing of pipeline failures. Tags follow the
pattern: {processor_type}_{field_description}_{hash}.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposed commit message
Checklist
changelog.ymlfile.Author's Checklist
How to test this PR locally
Related issues
Screenshots