Skip to content

[ML] Truncate oversized field values to prevent autodetect process crash#2929

Open
valeriy42 wants to merge 11 commits intoelastic:mainfrom
valeriy42:fix/is-2796
Open

[ML] Truncate oversized field values to prevent autodetect process crash#2929
valeriy42 wants to merge 11 commits intoelastic:mainfrom
valeriy42:fix/is-2796

Conversation

@valeriy42
Copy link
Contributor

@valeriy42 valeriy42 commented Feb 26, 2026

The autodetect process could crash due to excessive memory usage when processing very large field values (by, over, partition, influencer fields). Issue #2796 reported crashes with influencer strings exceeding 77K characters. These large strings get copied into multiple data structures during processing and state persistence, leading to memory amplification that can trigger the OOM killer.

This PR implements comprehensive field value truncation at 256 characters to prevent memory exhaustion. The fix includes:

  1. a new CFieldValueTruncator domain class that centralizes truncation logic,
  2. input-path truncation in CAnomalyJob::addRecord() with warning logs,
  3. state restore-path truncation in four critical locations to handle existing persisted large values, and
  4. debug log truncation to prevent oversized log output.

The 256-character limit aligns with Elasticsearch's ignore_above default for keyword fields.

Fixes #2796

Depends on elastic/kibana#258542

@valeriy42 valeriy42 marked this pull request as draft February 26, 2026 18:57
@prodsecmachine
Copy link

prodsecmachine commented Feb 26, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@valeriy42 valeriy42 added ci:run-qa-tests Run a subset of the QA tests auto-backport Automatically merge backport PRs when CI passes v9.3.3 v9.2.8 v8.19.14 and removed v9.2.7 v9.3.2 v8.19.13 labels Mar 19, 2026
@valeriy42 valeriy42 marked this pull request as ready for review March 19, 2026 09:30

std::string escapedFieldName = fieldName;
core::CStringUtils::escape('\\', "\n\r\t", escapedFieldName);
LOG_WARN(<< "Field '" << escapedFieldName
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using LOG_WARNING here will automatically throttle log messages; LOG_INFO would not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think warning logs are actionable: they should trigger investigation.

This, however, is just expected behavior, so I don't think logging a warning is appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I ended up creating a new LOG_INFO_THROTTLED macro so this is an INFO message and doesn't flood the log file.

@valeriy42 valeriy42 requested a review from jan-elastic March 19, 2026 11:19
@valeriy42 valeriy42 changed the title [ML] Fix autodetect process crash from oversized field values by truncating at 256 characters [ML] Truncate oversized field values to prevent autodetect process crash Mar 19, 2026
Copy link
Contributor

@jan-elastic jan-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a dozen small comments.

//! \return Copy with length constraint enforced
static std::string truncated(const std::string& value) {
if (needsTruncation(value) == false) {
return value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copying the string value, right? (because the return type is string)
Is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It creates a copy because truncation is sometimes used at places where the caller does not own the input. Hence, having only in-place truncation is insufficient.


std::string escapedFieldName = fieldName;
core::CStringUtils::escape('\\', "\n\r\t", escapedFieldName);
LOG_WARN(<< "Field '" << escapedFieldName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think warning logs are actionable: they should trigger investigation.

This, however, is just expected behavior, so I don't think logging a warning is appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically merge backport PRs when CI passes >bug ci:run-qa-tests Run a subset of the QA tests :ml v8.19.14 v9.2.8 v9.3.3 v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ML] Influencer string with very large size may cause the autodetect process to crash

3 participants