[ML] Truncate oversized field values to prevent autodetect process crash#2929
[ML] Truncate oversized field values to prevent autodetect process crash#2929valeriy42 wants to merge 11 commits intoelastic:mainfrom
Conversation
…g at 256 characters
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
lib/api/CAnomalyJob.cc
Outdated
|
|
||
| std::string escapedFieldName = fieldName; | ||
| core::CStringUtils::escape('\\', "\n\r\t", escapedFieldName); | ||
| LOG_WARN(<< "Field '" << escapedFieldName |
There was a problem hiding this comment.
Using LOG_WARNING here will automatically throttle log messages; LOG_INFO would not.
There was a problem hiding this comment.
I think warning logs are actionable: they should trigger investigation.
This, however, is just expected behavior, so I don't think logging a warning is appropriate.
There was a problem hiding this comment.
Yeah, I ended up creating a new LOG_INFO_THROTTLED macro so this is an INFO message and doesn't flood the log file.
jan-elastic
left a comment
There was a problem hiding this comment.
LGTM. Just a dozen small comments.
| //! \return Copy with length constraint enforced | ||
| static std::string truncated(const std::string& value) { | ||
| if (needsTruncation(value) == false) { | ||
| return value; |
There was a problem hiding this comment.
This is copying the string value, right? (because the return type is string)
Is that intentional?
There was a problem hiding this comment.
Yes. It creates a copy because truncation is sometimes used at places where the caller does not own the input. Hence, having only in-place truncation is insufficient.
lib/api/CAnomalyJob.cc
Outdated
|
|
||
| std::string escapedFieldName = fieldName; | ||
| core::CStringUtils::escape('\\', "\n\r\t", escapedFieldName); | ||
| LOG_WARN(<< "Field '" << escapedFieldName |
There was a problem hiding this comment.
I think warning logs are actionable: they should trigger investigation.
This, however, is just expected behavior, so I don't think logging a warning is appropriate.
The autodetect process could crash due to excessive memory usage when processing very large field values (by, over, partition, influencer fields). Issue #2796 reported crashes with influencer strings exceeding 77K characters. These large strings get copied into multiple data structures during processing and state persistence, leading to memory amplification that can trigger the OOM killer.
This PR implements comprehensive field value truncation at 256 characters to prevent memory exhaustion. The fix includes:
CFieldValueTruncatordomain class that centralizes truncation logic,CAnomalyJob::addRecord()with warning logs,The 256-character limit aligns with Elasticsearch's
ignore_abovedefault for keyword fields.Fixes #2796
Depends on elastic/kibana#258542