Skip to content

[cloudflare_logpush] ingest pipeline improments#3

Open
brijesh-elastic wants to merge 14 commits into
mainfrom
cloudflare_logpush-pipeline-internal
Open

[cloudflare_logpush] ingest pipeline improments#3
brijesh-elastic wants to merge 14 commits into
mainfrom
cloudflare_logpush-pipeline-internal

Conversation

@brijesh-elastic

Copy link
Copy Markdown
Owner

Proposed commit message

cloudflare_logpush: ingest pipeline improvements

Enhancements:
- Updated the integration to use ECS version 9.3.0 and format_version 3.3.2.
- Added missing convert processors across all data streams. This ensures fields declared as ip, long,
boolean, or double in fields.yml are correctly typed.
- Added support for new fields across 9 data streams (device_posture, gateway_dns, gateway_http,
gateway_network, http_request, network_analytics, network_session, workers_trace, firewall_event).
- Sorted all fields.yml entries alphabetically for better maintainability.
- Converted grok processors to dissect for improved performance.
- Consolidated multiple timestamp-to-Unix-millis script processors into a single, efficient script.
- Implemented the latest null removal script across all data streams.
- Updated error.message values to use the full, standardized format.
- Added a tag key to every processor for easier debugging.
- Removed ignore_failure: true from the initial JSON processor.
- Implemented dynamic mapping for dns.response_code and dns.question.type to follow IANA keyword
representations in dns and dns_firewall data streams.

Bugfixes:
- email_security_alerts - Fixed timestamp normalization by correcting the Painless script to reference
ctx.json.Timestamp (PascalCase) instead of ctx.json.timestamp.
- firewall_event - Resolved swapped field descriptions for origin.ray.id and origin.response.status
- http_request - Resolved swapped field descriptions for cache.status and cache.response.status
- http_request - Aligned header naming between the pipeline and fields.yml (changed singular header
to plural headers) (for RequestHeaders/ResponseHeaders).
- spectrum_event - Fixed a broken grok guard condition that used an incorrect path and a
tautological || operator. Also corrected the remove processor to reference action instead of event_action.
- network_analytics - Fixed case-sensitivity in the split processor condition (TCPSackBlocks).
- audit - Corrected the rename condition to use ctx.json.Interface (PascalCase).
- Fixed instances where fields documented as integers or arrays of integers were incorrectly
mapped. Replaced simple rename processors with convert processors to ensure these
fields are correctly cast to string to match the `type: keyword` definition in fields.yml.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

How to test this PR locally

  • Clone integrations repo.
  • Install elastic package locally.
  • Start elastic stack using elastic-package.
  • Move to integrations/packages/cloudflare_logpush directory.
  • Run the following command to run tests.

elastic-package test -v

…tions

Sort all fields.yml entries alphabetically across all data streams for better
maintainability. Add support for new fields across device_posture, gateway_dns,
gateway_http, gateway_network, http_request, network_analytics, network_session,
workers_trace, firewall_event, and email_security_alerts data streams. Fix
swapped field descriptions for firewall_event (origin.ray.id/origin.response.status)
and http_request (cache.status/cache.response.status). Align header naming
between pipeline and fields.yml (singular header to plural headers for
RequestHeaders/ResponseHeaders in http_request).
Correct the Painless script to reference ctx.json.Timestamp (PascalCase)
instead of ctx.json.timestamp, matching the actual field name from the
Cloudflare API and the guard condition.
Fix the grok guard condition that used an incorrect path
(ctx.json?.cloudflare_logpush) instead of (ctx.cloudflare_logpush) and
a tautological || operator instead of &&. Also correct the remove
processor to reference action instead of event_action. Update test
data to use a valid disconnect timestamp.
Correct the split processor condition to reference ctx.json.TCPSackBlocks
consistently instead of mixing TCPSACKBlocks and TCPSackBlocks casing.
Correct the rename condition to use ctx.json?.Interface (PascalCase)
matching the actual Cloudflare API field name instead of lowercase.
Replace rename processors with convert processors (type: string) for
fields documented as integers or arrays of integers but mapped as keyword
type in fields.yml. Affected fields: gateway_dns (CNAMECategoryIDs,
EDEErrors, InitialCategoryIDs, MatchedIndicatorFeedIDs,
ResolvedIPCategoryIDs), gateway_http (ApplicationIDs), gateway_network
(ApplicationIDs, CategoryIDs).
Remove ignore_failure: true from the first JSON processor in all data
stream pipelines that had it. Parsing failures should surface as errors
rather than silently producing partial documents.
Replace grok processors with equivalent dissect processors in
firewall_event (protocol parsing), http_request (protocol and TLS
parsing), and spectrum_event (TLS parsing). Dissect is faster than
grok for simple delimiter-based patterns.
removal script, standardize error.message format, and add processor tags

Consolidate multiple timestamp-to-Unix-millis script processors into a
single, efficient script across all data streams. Update test input log
files to include varied timestamp format test cases.

Add missing convert processors to ensure fields declared as ip, long,
boolean, or double in fields.yml are correctly typed at ingest time.

Implement the latest null removal script (handleMap/handleList) across
all data streams for consistent cleanup of null and empty values.

Update error.message values in on_failure blocks to use the full
standardized format including processor type, tag, pipeline, and message.

Add a tag key to every processor across all data streams for easier
debugging and error tracing.

Align http_request header naming between pipeline and fields.yml
(singular header to plural headers for RequestHeaders/ResponseHeaders).
dns_firewall data streams

Add dynamic mapping for dns.response_code and dns.question.type to
follow IANA keyword representations in dns and dns_firewall data streams.

Also applies the same pipeline improvements as the previous commit
(timestamp consolidation, convert processors, null removal script,
standardized error.message, and processor tags) to these two data streams.
improvements for network_analytics

Add ignore_failure: true to the community_id processor in the
network_analytics pipeline to resolve build failures.

Also applies the same pipeline improvements (timestamp consolidation,
convert processors, null removal script, standardized error.message,
and processor tags) to this data stream.
@brijesh-elastic brijesh-elastic self-assigned this Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant