Skip to content

Cloudflare logpush pipeline improvements internal#4

Open
brijesh-elastic wants to merge 17 commits into
mainfrom
cloudflare_logpush-pipeline-improvements-internal
Open

Cloudflare logpush pipeline improvements internal#4
brijesh-elastic wants to merge 17 commits into
mainfrom
cloudflare_logpush-pipeline-improvements-internal

Conversation

@brijesh-elastic

Copy link
Copy Markdown
Owner

Proposed commit message

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Screenshots

Update format_version to 3.3.2 and ECS dependency to git@v9.3.0 in
manifest.yml and build.yml. Update ecs.version to 9.3.0 in all 21
data stream ingest pipelines.
field descriptions

Update ECS field definitions by replacing agent.yml with beats.yml and
modernizing base-fields.yml across all 21 data streams.

Sort all fields.yml entries alphabetically for better maintainability.
Fix swapped field descriptions for firewall_event
(origin.ray.id/origin.response.status) and http_request
(cache.status/cache.response.status).

Add support for new fields across 9 data streams with their
corresponding ingest pipeline processors: device_posture
(RegistrationID), firewall_event (FraudUserID), gateway_dns (12 fields
including InternalDNS*, QueryApplication*, RequestContext*),
gateway_http (AppControlInfo, ApplicationStatuses, RedirectTargetURI,
RegistrationID), gateway_network (RegistrationID), http_request (11
fields including Fraud*, WebAssets*, WorkerScriptName), network_analytics
(DNSQueryName, DNSQueryType, PFPCustomTag), network_session
(InitialOriginIP, RegistrationID, ResolvedFQDN, SNI), workers_trace
(CPUTimeMs, WallTimeMs).
Correct the Painless script to reference ctx.json.Timestamp (PascalCase)
instead of ctx.json.timestamp, matching the actual field name from the
Cloudflare API and the guard condition.
Fix the grok guard condition that used an incorrect path
(ctx.json?.cloudflare_logpush) instead of (ctx.cloudflare_logpush) and
a tautological || operator instead of &&. Also correct the remove
processor to reference action instead of event_action. Update test
data to use a valid disconnect timestamp.
Correct the split processor condition to reference ctx.json.TCPSackBlocks
consistently instead of mixing TCPSACKBlocks and TCPSackBlocks casing.
Correct the rename condition to use ctx.json?.Interface (PascalCase)
matching the actual Cloudflare API field name instead of lowercase.
Replace rename processors with convert processors (type: string) for
fields documented as integers or arrays of integers but mapped as keyword
type in fields.yml. Affected fields: gateway_dns (CNAMECategoryIDs,
EDEErrors, InitialCategoryIDs, MatchedIndicatorFeedIDs,
ResolvedIPCategoryIDs), gateway_http (ApplicationIDs), gateway_network
(ApplicationIDs, CategoryIDs).
Change singular header to plural headers for RequestHeaders and
ResponseHeaders target fields to match the fields.yml definitions
(request.headers and response.headers).
Add IANA keyword representation scripts for dns.response_code and
dns.question.type in both dns and dns_firewall data streams. Numeric
DNS response codes are now mapped to human-readable names (e.g., 0 ->
NoError, 3 -> NXDomain) and query types are mapped to their IANA
names (e.g., 1 -> A, 28 -> AAAA, 15 -> MX).
Remove ignore_failure: true from the first JSON processor in all data
stream pipelines that had it. Parsing failures should surface as errors
rather than silently producing partial documents.
Replace grok processors with dissect for simple delimiter-based pattern
matching in firewall_event, http_request, and spectrum_event data
streams. Dissect is more performant than grok for fixed patterns like
protocol/version splitting.
Replace multiple timestamp normalization scripts (which handled both
String and Number types with try/catch) with a single, efficient script
that only handles Number type. The new script directly converts
timestamps to Unix milliseconds by dividing nanosecond values or
multiplying second values.

Update test input log files to use numeric timestamps to match the
simplified script expectations.
Replace rename processors with typed convert processors for fields
declared as ip, long, boolean, or double in fields.yml to ensure
correct type casting. Add in-place convert processors for timestamp
fields to handle string-to-number conversion before the normalization
script.

Affected data streams: access_request, device_posture, dns, dns_firewall,
gateway_dns, gateway_http, gateway_network, http_request, magic_ids,
network_session, sinkhole_http, spectrum_event. Also adds timestamp
converts for all 20 data streams with numeric timestamp handling.
Add a null/empty field removal script at the end of all 21 data stream
pipelines. The script recursively removes fields with null values, empty
strings, empty maps, and empty lists.

Standardize all on_failure error.message values to use the full format:
"Processor {type} with tag {tag} in pipeline {pipeline} failed with
message: {message}" for consistent debugging output.
Add a unique tag key to every processor in all 21 ingest pipelines for
easier debugging and tracing of pipeline failures. Tags follow the
pattern: {processor_type}_{field_description}_{hash}.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant