From 0400ffc5ffd12373116420af6416849fe9e04227 Mon Sep 17 00:00:00 2001 From: Chaosheng Wang <163980877+adminwcs@users.noreply.github.com> Date: Fri, 22 May 2026 10:12:29 +0800 Subject: [PATCH 1/3] Create Multi_line_Log_Merging.md --- .../solutions/ait/Multi_line_Log_Merging.md | 465 ++++++++++++++++++ 1 file changed, 465 insertions(+) create mode 100644 docs/en/solutions/ait/Multi_line_Log_Merging.md diff --git a/docs/en/solutions/ait/Multi_line_Log_Merging.md b/docs/en/solutions/ait/Multi_line_Log_Merging.md new file mode 100644 index 000000000..23a06370c --- /dev/null +++ b/docs/en/solutions/ait/Multi_line_Log_Merging.md @@ -0,0 +1,465 @@ +--- +products: + - Alauda Container Platform +kind: + - Solution +--- + +# nevermore Multi-line Log Merging Production Change Plan + +## 1. Background + +After the `nevermore` log collection component is deployed, it uses the `nevermore-config` ConfigMap in the `cpaas-system` namespace as the Filebeat configuration source by default. + +This ConfigMap contains multiple types of log collection configurations, for example: + +```yaml +filebeat-log-containers.yml +filebeat-log-file.yml +filebeat-log.yml +filebeat-audit.yml +filebeat-event.yml +filebeat-log-system.yml +filebeat-log-systemd.yml +``` + +Business container standard output logs are mainly controlled by: + +```yaml +filebeat-log-containers.yml +``` + +If business logs are collected from mounted files, also pay attention to: + +```yaml +filebeat-log-file.yml +``` + +This plan is based on modifying the existing `nevermore-config`. It does not require creating a new ConfigMap or changing the DaemonSet volume mount configuration. + +--- + +## 2. Environment Information + +Applicable Versions: 4.3.x + +--- + +## 3. Scope + +This plan applies to Kubernetes clusters where the `nevermore` log collection component has already been deployed. It is used to handle multi-line log merging for container logs or file-based logs. + +Typical applicable scenarios include: + +1. Application exception stacks from Java, Go, Python, and similar languages are split into multiple log entries. +2. A single business log contains multiple lines, such as SQL, JSON, XML, or detailed error information. +3. Multiple lines belonging to the same exception or request are displayed as separate records in the log platform. +4. Subsequent log lines with specific characteristics need to be merged into the previous main log line. + +Scenarios where this plan is not applicable or should be used with caution: + +1. Logs do not have stable first-line or continuation-line characteristics, making it difficult to distinguish them accurately with regular expressions. +2. The log volume is very large and the multi-line content is long. The impact on collection latency, memory usage, and single-log size must be evaluated. +3. Log formats vary significantly across business applications. Do not use an overly broad regular expression to cover all business logs. + +--- + +## 4. Expected Result + +By adding `multiline.*` configuration to the corresponding Filebeat input, continuation lines that match the regular expression can be merged into the previous main log line. This prevents the same exception, request, or business event from being split into multiple records in the log platform. + +The following example uses a Java exception stack. + +Before configuration, the Java exception stack may be collected as multiple independent log records and displayed as multiple logs in the log platform: + +```text +Log record 1: +[2026-05-21 10:00:00] ERROR request failed + +Log record 2: +java.lang.RuntimeException: test error + +Log record 3: + at com.example.DemoService.test(DemoService.java:12) + +Log record 4: + at com.example.DemoController.test(DemoController.java:25) + +Log record 5: +Caused by: java.lang.IllegalArgumentException: invalid argument + +Log record 6: + at com.example.Validator.check(Validator.java:8) +``` + +After configuration, continuation lines that match `multiline.pattern` are merged into the previous main log line and displayed as one complete log record: + +```text +Log record 1: +[2026-05-21 10:00:00] ERROR request failed +java.lang.RuntimeException: test error + at com.example.DemoService.test(DemoService.java:12) + at com.example.DemoController.test(DemoController.java:25) +Caused by: java.lang.IllegalArgumentException: invalid argument + at com.example.Validator.check(Validator.java:8) +``` + +> The above is only an example of the display effect. Which lines are actually merged depends on whether the configured `multiline.pattern` accurately matches the continuation-line characteristics of the logs. + +--- + +## 5. Multi-line Merging Rule Description + +The core of multi-line merging is `multiline.pattern`. This field is a regular expression used to determine which log lines should be merged into the previous log line. + +> Note: The `multiline.pattern` in this document is only an example for Java exception stack scenarios. It is not a universal fixed configuration. In a production environment, customers must write the regular expression based on their own log format, exception format, and log line characteristics, and verify it in a test environment before applying it to production. + +Example configuration: + +```yaml +multiline.type: pattern +multiline.pattern: '' +multiline.negate: false +multiline.match: after +multiline.timeout: 3s +multiline.max_lines: 500 +``` + +Example regular expression for Java exception stacks: + +```yaml +multiline.pattern: '^[[:space:]]+(at|\.{3})[[:space:]]+\b|^Caused by:|^java\.' +``` + +Field descriptions: + +| Configuration Item | Description | +|---|---| +| `multiline.type: pattern` | Uses a regular expression pattern for multi-line matching. | +| `multiline.pattern` | Matches log lines that need to be merged. This must be written based on the customer's actual log format. | +| `multiline.negate: false` | Lines matching the regular expression are the lines to be merged. | +| `multiline.match: after` | Appends matching lines to the previous unmatched log line. | +| `multiline.timeout: 3s` | Outputs the current merged result after waiting up to 3 seconds. | +| `multiline.max_lines: 500` | Merges up to 500 lines into a single multi-line log. Excess lines are discarded. | + +The Java exception example regular expression matches: + +```regex +^[[:space:]]+(at|\.{3})[[:space:]]+\b|^Caused by:|^java\. +``` + +It can match lines such as: + +```text + at com.example.Service.method(Service.java:10) + ... 20 more +Caused by: java.lang.RuntimeException +java.lang.NullPointerException +``` + +> Compared with the original `^java.` pattern, `^java\.` is recommended in production to avoid incorrectly matching non-Java exception lines such as `javaX` or `javascript`. + +### 5.1 Regular Expression Writing Recommendations + +When writing `multiline.pattern`, first identify the characteristics of the “main log line” and the “continuation lines”: + +1. Main log lines usually contain fixed timestamps, log levels, request IDs, or similar fields. +2. Continuation lines usually do not contain a complete timestamp and may start with spaces, tabs, `at`, `Caused by`, `...`, and similar content. +3. It is recommended to match the continuation-line characteristics first, and then use `multiline.negate: false` with `multiline.match: after` to append continuation lines to the previous main log line. +4. Avoid overly broad regular expressions, otherwise unrelated log lines may be incorrectly merged into one record. +5. Before making changes, use real business log samples to validate the regular expression and confirm that it does not cause incorrect merges or missed merges. + +--- + +## 6. Pre-change Preparation + +### 6.1 Check the nevermore Running Status + +```bash +kubectl get ds -n cpaas-system nevermore +kubectl get pods -n cpaas-system | grep -i nevermore +``` + +### 6.2 Back Up the Current ConfigMap + +Before making changes in production, back up `nevermore-config`: + +```bash +kubectl get cm -n cpaas-system nevermore-config -o yaml > nevermore-config-backup-$(date +%Y%m%d%H%M%S).yaml +``` + +It is also recommended to record the current DaemonSet and Pod status: + +```bash +kubectl get ds -n cpaas-system nevermore -o wide +kubectl get pods -n cpaas-system | grep -i nevermore +``` + +--- + +## 7. Configuration Modification Locations + +### 7.1 Container Standard Output Logs + +If multi-line merging is required for container standard output logs, modify: + +```yaml +filebeat-log-containers.yml: | +``` + +In this configuration block, find: + +```yaml +paths: + - /var/log/containers/*.log +processors: +``` + +Add the multi-line merging configuration between `paths` and `processors`. The `multiline.pattern` must be written based on the customer's actual log format. The following example is for Java exception stacks: + +```yaml +multiline.type: pattern +multiline.pattern: '^[[:space:]]+(at|\.{3})[[:space:]]+\b|^Caused by:|^java\.' +multiline.negate: false +multiline.match: after +multiline.timeout: 3s +multiline.max_lines: 500 +``` + +Example after modification: + +```yaml +filebeat-log-containers.yml: | + - type: container + id: containers + {{if .FirstRun}} + # for first run, the tail_files should be true. + tail_files: true + {{end}} + symlinks: true + ignore_older: 30m + close_inactive: 15m + scan_frequency: 30s + paths: + - /var/log/containers/*.log + multiline.type: pattern + multiline.pattern: '^[[:space:]]+(at|\.{3})[[:space:]]+\b|^Caused by:|^java\.' + multiline.negate: false + multiline.match: after + multiline.timeout: 3s + multiline.max_lines: 500 + processors: + - add_size: ~ + - add_fields: + target: "" + fields: + source: container +``` + +### 7.2 Mounted File Logs, Optional + +If business logs are collected from mounted files, also modify: + +```yaml +filebeat-log-file.yml +``` + +Find: + +```yaml +scan_frequency: 30s +processors: +``` + +Add the multi-line merging configuration between them. The `multiline.pattern` must be written based on the customer's actual log format. The following example is for Java exception stacks: + +```yaml +multiline.type: pattern +multiline.pattern: '^[[:space:]]+(at|\.{3})[[:space:]]+\b|^Caused by:|^java\.' +multiline.negate: false +multiline.match: after +multiline.timeout: 3s +multiline.max_lines: 500 +``` + +Example after modification: + +```yaml +filebeat-log-file.yml: | + {{range $cid, $fileConfigs := .Files}} + {{range $fileConfigs}} + - type: log + paths: + - {{.Path}} + {{if .ExcludePaths }} + exclude_files: + {{range .ExcludePaths }} + - {{.}} + {{end}} + {{end}} + ignore_older: 30m + close_inactive: 15m + scan_frequency: 30s + multiline.type: pattern + multiline.pattern: '^[[:space:]]+(at|\.{3})[[:space:]]+\b|^Caused by:|^java\.' + multiline.negate: false + multiline.match: after + multiline.timeout: 3s + multiline.max_lines: 500 + processors: + - add_size: ~ + - add_fields: + target: "" + fields: + source: container + container_id: {{.ContainerID}} +``` + +--- + +## 8. Recommended Production Change Procedure + +### Step 1: Back Up the Current Configuration + +```bash +kubectl get cm -n cpaas-system nevermore-config -o yaml > nevermore-config-backup-$(date +%Y%m%d%H%M%S).yaml +``` + +### Step 2: Export the Configuration to Be Modified + +Compared with directly using `kubectl edit`, exporting the configuration first is recommended in production because it is easier to review, compare, and roll back. + +```bash +kubectl get cm -n cpaas-system nevermore-config -o yaml > nevermore-config-edit.yaml +``` + +### Step 3: Modify the Configuration File + +```bash +vi nevermore-config-edit.yaml +``` + +Add the multi-line merging rules to the corresponding configuration block based on the log source. + +Container standard output logs: + +```yaml +data: + filebeat-log-containers.yml: | +``` + +Mounted file logs: + +```yaml +data: + filebeat-log-file.yml: | +``` + +If both types of logs require multi-line merging, modify both configuration blocks. + +### Step 4: Apply the Modification + +```bash +kubectl apply -f nevermore-config-edit.yaml +``` + +### Step 5: Wait for the nevermore Pod to Update Automatically + +After `nevermore-config` is updated, the `nevermore` Pod automatically restarts and loads the new configuration. Manual Pod deletion is not required. + +### Step 6: Confirm That the DaemonSet Has Recovered + +```bash +kubectl rollout status ds/nevermore -n cpaas-system +``` + +Check Pods: + +```bash +kubectl get pods -n cpaas-system | grep -i nevermore +``` + +--- + +## 9. Verification Method + +### 9.1 Prepare Test Logs + +Output Java exception logs from a test business container, for example: + +```text +[2026-05-21 10:00:00] ERROR test exception +java.lang.RuntimeException: test error + at com.example.DemoService.test(DemoService.java:12) + at com.example.DemoController.test(DemoController.java:25) +Caused by: java.lang.IllegalArgumentException: invalid argument + at com.example.Validator.check(Validator.java:8) +``` + +### 9.2 Verify the Collection Result + +Query the corresponding Pod logs in the log platform and confirm whether the exception stack is merged into one log record. + +Key checks: + +1. Whether `java.lang.RuntimeException` is merged with the previous ERROR log line. +2. Whether multiple `at ...` lines are no longer split into independent log records. +3. Whether `Caused by:` is merged into the same log record. +4. Whether normal logs are still collected correctly. +5. Whether log time parsing remains normal. + +--- + +## 10. Rollback Plan + +If log collection issues, log display issues, or nevermore startup failures occur after the change, roll back using the backup file. + +### 10.1 Roll Back the ConfigMap + +```bash +kubectl apply -f nevermore-config-backup-YYYYMMDDHHMMSS.yaml +``` + +Replace the file name with the actual backup file name. + +### 10.2 Wait for the nevermore Pod to Update Automatically + +After `nevermore-config` is rolled back, the `nevermore` Pod automatically restarts and reloads the rolled-back configuration. Manual Pod deletion is not required. + +### 10.3 Confirm Recovery + +```bash +kubectl rollout status ds/nevermore -n cpaas-system +kubectl get pods -n cpaas-system | grep -i nevermore +``` + +--- + +## 11. Production Notes + +1. **Modify the existing `nevermore-config`** + This plan directly updates the Filebeat input configuration in `nevermore-config`. It does not create a new ConfigMap or change the DaemonSet volume mount configuration. + +2. **Choose the modification location based on the log source** + For container standard output logs, modify `filebeat-log-containers.yml`. For mounted file logs, modify `filebeat-log-file.yml`. If both types of logs need multi-line merging, modify both configuration blocks. + +3. **Validate in a test environment first** + Confirm that log merging, log platform display, alert rules, and searchable fields are all normal before applying the change in production. + +4. **Customize `multiline.pattern` based on the log format** + The Java exception regular expression in this document is only an example. Customers should write and validate the regular expression based on actual log content, first-line characteristics, and exception stack formats to avoid incorrect merges or missed merges. + +5. **Avoid overly broad regular expressions** + If using the Java exception example, `^java\.` is recommended instead of `^java.` to reduce incorrect matches. + +6. **Pay attention to log latency** + `multiline.timeout: 3s` may introduce up to approximately 3 seconds of waiting time for multi-line logs. + +7. **Pay attention to very long stack traces** + `multiline.max_lines: 500` means that up to 500 lines are merged into one log record. Excess lines are discarded. + +8. **Observe the collection pipeline after the change** + After `nevermore-config` is updated, the Pod automatically restarts and loads the new configuration. After the change, observe the nevermore Pod status, log ingestion volume, and business log integrity. + +9. **The configuration only affects newly collected logs** + After the multi-line merging configuration takes effect, it usually only affects newly collected logs. Logs that have already been collected and stored will not be automatically re-merged. From d6dbde33e2d603ac5e062b7559bb68b51ff19d84 Mon Sep 17 00:00:00 2001 From: Chaosheng Wang <163980877+adminwcs@users.noreply.github.com> Date: Fri, 22 May 2026 10:15:18 +0800 Subject: [PATCH 2/3] Delete docs/en/solutions/ait/Multi_line_Log_Merging.md --- .../solutions/ait/Multi_line_Log_Merging.md | 465 ------------------ 1 file changed, 465 deletions(-) delete mode 100644 docs/en/solutions/ait/Multi_line_Log_Merging.md diff --git a/docs/en/solutions/ait/Multi_line_Log_Merging.md b/docs/en/solutions/ait/Multi_line_Log_Merging.md deleted file mode 100644 index 23a06370c..000000000 --- a/docs/en/solutions/ait/Multi_line_Log_Merging.md +++ /dev/null @@ -1,465 +0,0 @@ ---- -products: - - Alauda Container Platform -kind: - - Solution ---- - -# nevermore Multi-line Log Merging Production Change Plan - -## 1. Background - -After the `nevermore` log collection component is deployed, it uses the `nevermore-config` ConfigMap in the `cpaas-system` namespace as the Filebeat configuration source by default. - -This ConfigMap contains multiple types of log collection configurations, for example: - -```yaml -filebeat-log-containers.yml -filebeat-log-file.yml -filebeat-log.yml -filebeat-audit.yml -filebeat-event.yml -filebeat-log-system.yml -filebeat-log-systemd.yml -``` - -Business container standard output logs are mainly controlled by: - -```yaml -filebeat-log-containers.yml -``` - -If business logs are collected from mounted files, also pay attention to: - -```yaml -filebeat-log-file.yml -``` - -This plan is based on modifying the existing `nevermore-config`. It does not require creating a new ConfigMap or changing the DaemonSet volume mount configuration. - ---- - -## 2. Environment Information - -Applicable Versions: 4.3.x - ---- - -## 3. Scope - -This plan applies to Kubernetes clusters where the `nevermore` log collection component has already been deployed. It is used to handle multi-line log merging for container logs or file-based logs. - -Typical applicable scenarios include: - -1. Application exception stacks from Java, Go, Python, and similar languages are split into multiple log entries. -2. A single business log contains multiple lines, such as SQL, JSON, XML, or detailed error information. -3. Multiple lines belonging to the same exception or request are displayed as separate records in the log platform. -4. Subsequent log lines with specific characteristics need to be merged into the previous main log line. - -Scenarios where this plan is not applicable or should be used with caution: - -1. Logs do not have stable first-line or continuation-line characteristics, making it difficult to distinguish them accurately with regular expressions. -2. The log volume is very large and the multi-line content is long. The impact on collection latency, memory usage, and single-log size must be evaluated. -3. Log formats vary significantly across business applications. Do not use an overly broad regular expression to cover all business logs. - ---- - -## 4. Expected Result - -By adding `multiline.*` configuration to the corresponding Filebeat input, continuation lines that match the regular expression can be merged into the previous main log line. This prevents the same exception, request, or business event from being split into multiple records in the log platform. - -The following example uses a Java exception stack. - -Before configuration, the Java exception stack may be collected as multiple independent log records and displayed as multiple logs in the log platform: - -```text -Log record 1: -[2026-05-21 10:00:00] ERROR request failed - -Log record 2: -java.lang.RuntimeException: test error - -Log record 3: - at com.example.DemoService.test(DemoService.java:12) - -Log record 4: - at com.example.DemoController.test(DemoController.java:25) - -Log record 5: -Caused by: java.lang.IllegalArgumentException: invalid argument - -Log record 6: - at com.example.Validator.check(Validator.java:8) -``` - -After configuration, continuation lines that match `multiline.pattern` are merged into the previous main log line and displayed as one complete log record: - -```text -Log record 1: -[2026-05-21 10:00:00] ERROR request failed -java.lang.RuntimeException: test error - at com.example.DemoService.test(DemoService.java:12) - at com.example.DemoController.test(DemoController.java:25) -Caused by: java.lang.IllegalArgumentException: invalid argument - at com.example.Validator.check(Validator.java:8) -``` - -> The above is only an example of the display effect. Which lines are actually merged depends on whether the configured `multiline.pattern` accurately matches the continuation-line characteristics of the logs. - ---- - -## 5. Multi-line Merging Rule Description - -The core of multi-line merging is `multiline.pattern`. This field is a regular expression used to determine which log lines should be merged into the previous log line. - -> Note: The `multiline.pattern` in this document is only an example for Java exception stack scenarios. It is not a universal fixed configuration. In a production environment, customers must write the regular expression based on their own log format, exception format, and log line characteristics, and verify it in a test environment before applying it to production. - -Example configuration: - -```yaml -multiline.type: pattern -multiline.pattern: '' -multiline.negate: false -multiline.match: after -multiline.timeout: 3s -multiline.max_lines: 500 -``` - -Example regular expression for Java exception stacks: - -```yaml -multiline.pattern: '^[[:space:]]+(at|\.{3})[[:space:]]+\b|^Caused by:|^java\.' -``` - -Field descriptions: - -| Configuration Item | Description | -|---|---| -| `multiline.type: pattern` | Uses a regular expression pattern for multi-line matching. | -| `multiline.pattern` | Matches log lines that need to be merged. This must be written based on the customer's actual log format. | -| `multiline.negate: false` | Lines matching the regular expression are the lines to be merged. | -| `multiline.match: after` | Appends matching lines to the previous unmatched log line. | -| `multiline.timeout: 3s` | Outputs the current merged result after waiting up to 3 seconds. | -| `multiline.max_lines: 500` | Merges up to 500 lines into a single multi-line log. Excess lines are discarded. | - -The Java exception example regular expression matches: - -```regex -^[[:space:]]+(at|\.{3})[[:space:]]+\b|^Caused by:|^java\. -``` - -It can match lines such as: - -```text - at com.example.Service.method(Service.java:10) - ... 20 more -Caused by: java.lang.RuntimeException -java.lang.NullPointerException -``` - -> Compared with the original `^java.` pattern, `^java\.` is recommended in production to avoid incorrectly matching non-Java exception lines such as `javaX` or `javascript`. - -### 5.1 Regular Expression Writing Recommendations - -When writing `multiline.pattern`, first identify the characteristics of the “main log line” and the “continuation lines”: - -1. Main log lines usually contain fixed timestamps, log levels, request IDs, or similar fields. -2. Continuation lines usually do not contain a complete timestamp and may start with spaces, tabs, `at`, `Caused by`, `...`, and similar content. -3. It is recommended to match the continuation-line characteristics first, and then use `multiline.negate: false` with `multiline.match: after` to append continuation lines to the previous main log line. -4. Avoid overly broad regular expressions, otherwise unrelated log lines may be incorrectly merged into one record. -5. Before making changes, use real business log samples to validate the regular expression and confirm that it does not cause incorrect merges or missed merges. - ---- - -## 6. Pre-change Preparation - -### 6.1 Check the nevermore Running Status - -```bash -kubectl get ds -n cpaas-system nevermore -kubectl get pods -n cpaas-system | grep -i nevermore -``` - -### 6.2 Back Up the Current ConfigMap - -Before making changes in production, back up `nevermore-config`: - -```bash -kubectl get cm -n cpaas-system nevermore-config -o yaml > nevermore-config-backup-$(date +%Y%m%d%H%M%S).yaml -``` - -It is also recommended to record the current DaemonSet and Pod status: - -```bash -kubectl get ds -n cpaas-system nevermore -o wide -kubectl get pods -n cpaas-system | grep -i nevermore -``` - ---- - -## 7. Configuration Modification Locations - -### 7.1 Container Standard Output Logs - -If multi-line merging is required for container standard output logs, modify: - -```yaml -filebeat-log-containers.yml: | -``` - -In this configuration block, find: - -```yaml -paths: - - /var/log/containers/*.log -processors: -``` - -Add the multi-line merging configuration between `paths` and `processors`. The `multiline.pattern` must be written based on the customer's actual log format. The following example is for Java exception stacks: - -```yaml -multiline.type: pattern -multiline.pattern: '^[[:space:]]+(at|\.{3})[[:space:]]+\b|^Caused by:|^java\.' -multiline.negate: false -multiline.match: after -multiline.timeout: 3s -multiline.max_lines: 500 -``` - -Example after modification: - -```yaml -filebeat-log-containers.yml: | - - type: container - id: containers - {{if .FirstRun}} - # for first run, the tail_files should be true. - tail_files: true - {{end}} - symlinks: true - ignore_older: 30m - close_inactive: 15m - scan_frequency: 30s - paths: - - /var/log/containers/*.log - multiline.type: pattern - multiline.pattern: '^[[:space:]]+(at|\.{3})[[:space:]]+\b|^Caused by:|^java\.' - multiline.negate: false - multiline.match: after - multiline.timeout: 3s - multiline.max_lines: 500 - processors: - - add_size: ~ - - add_fields: - target: "" - fields: - source: container -``` - -### 7.2 Mounted File Logs, Optional - -If business logs are collected from mounted files, also modify: - -```yaml -filebeat-log-file.yml -``` - -Find: - -```yaml -scan_frequency: 30s -processors: -``` - -Add the multi-line merging configuration between them. The `multiline.pattern` must be written based on the customer's actual log format. The following example is for Java exception stacks: - -```yaml -multiline.type: pattern -multiline.pattern: '^[[:space:]]+(at|\.{3})[[:space:]]+\b|^Caused by:|^java\.' -multiline.negate: false -multiline.match: after -multiline.timeout: 3s -multiline.max_lines: 500 -``` - -Example after modification: - -```yaml -filebeat-log-file.yml: | - {{range $cid, $fileConfigs := .Files}} - {{range $fileConfigs}} - - type: log - paths: - - {{.Path}} - {{if .ExcludePaths }} - exclude_files: - {{range .ExcludePaths }} - - {{.}} - {{end}} - {{end}} - ignore_older: 30m - close_inactive: 15m - scan_frequency: 30s - multiline.type: pattern - multiline.pattern: '^[[:space:]]+(at|\.{3})[[:space:]]+\b|^Caused by:|^java\.' - multiline.negate: false - multiline.match: after - multiline.timeout: 3s - multiline.max_lines: 500 - processors: - - add_size: ~ - - add_fields: - target: "" - fields: - source: container - container_id: {{.ContainerID}} -``` - ---- - -## 8. Recommended Production Change Procedure - -### Step 1: Back Up the Current Configuration - -```bash -kubectl get cm -n cpaas-system nevermore-config -o yaml > nevermore-config-backup-$(date +%Y%m%d%H%M%S).yaml -``` - -### Step 2: Export the Configuration to Be Modified - -Compared with directly using `kubectl edit`, exporting the configuration first is recommended in production because it is easier to review, compare, and roll back. - -```bash -kubectl get cm -n cpaas-system nevermore-config -o yaml > nevermore-config-edit.yaml -``` - -### Step 3: Modify the Configuration File - -```bash -vi nevermore-config-edit.yaml -``` - -Add the multi-line merging rules to the corresponding configuration block based on the log source. - -Container standard output logs: - -```yaml -data: - filebeat-log-containers.yml: | -``` - -Mounted file logs: - -```yaml -data: - filebeat-log-file.yml: | -``` - -If both types of logs require multi-line merging, modify both configuration blocks. - -### Step 4: Apply the Modification - -```bash -kubectl apply -f nevermore-config-edit.yaml -``` - -### Step 5: Wait for the nevermore Pod to Update Automatically - -After `nevermore-config` is updated, the `nevermore` Pod automatically restarts and loads the new configuration. Manual Pod deletion is not required. - -### Step 6: Confirm That the DaemonSet Has Recovered - -```bash -kubectl rollout status ds/nevermore -n cpaas-system -``` - -Check Pods: - -```bash -kubectl get pods -n cpaas-system | grep -i nevermore -``` - ---- - -## 9. Verification Method - -### 9.1 Prepare Test Logs - -Output Java exception logs from a test business container, for example: - -```text -[2026-05-21 10:00:00] ERROR test exception -java.lang.RuntimeException: test error - at com.example.DemoService.test(DemoService.java:12) - at com.example.DemoController.test(DemoController.java:25) -Caused by: java.lang.IllegalArgumentException: invalid argument - at com.example.Validator.check(Validator.java:8) -``` - -### 9.2 Verify the Collection Result - -Query the corresponding Pod logs in the log platform and confirm whether the exception stack is merged into one log record. - -Key checks: - -1. Whether `java.lang.RuntimeException` is merged with the previous ERROR log line. -2. Whether multiple `at ...` lines are no longer split into independent log records. -3. Whether `Caused by:` is merged into the same log record. -4. Whether normal logs are still collected correctly. -5. Whether log time parsing remains normal. - ---- - -## 10. Rollback Plan - -If log collection issues, log display issues, or nevermore startup failures occur after the change, roll back using the backup file. - -### 10.1 Roll Back the ConfigMap - -```bash -kubectl apply -f nevermore-config-backup-YYYYMMDDHHMMSS.yaml -``` - -Replace the file name with the actual backup file name. - -### 10.2 Wait for the nevermore Pod to Update Automatically - -After `nevermore-config` is rolled back, the `nevermore` Pod automatically restarts and reloads the rolled-back configuration. Manual Pod deletion is not required. - -### 10.3 Confirm Recovery - -```bash -kubectl rollout status ds/nevermore -n cpaas-system -kubectl get pods -n cpaas-system | grep -i nevermore -``` - ---- - -## 11. Production Notes - -1. **Modify the existing `nevermore-config`** - This plan directly updates the Filebeat input configuration in `nevermore-config`. It does not create a new ConfigMap or change the DaemonSet volume mount configuration. - -2. **Choose the modification location based on the log source** - For container standard output logs, modify `filebeat-log-containers.yml`. For mounted file logs, modify `filebeat-log-file.yml`. If both types of logs need multi-line merging, modify both configuration blocks. - -3. **Validate in a test environment first** - Confirm that log merging, log platform display, alert rules, and searchable fields are all normal before applying the change in production. - -4. **Customize `multiline.pattern` based on the log format** - The Java exception regular expression in this document is only an example. Customers should write and validate the regular expression based on actual log content, first-line characteristics, and exception stack formats to avoid incorrect merges or missed merges. - -5. **Avoid overly broad regular expressions** - If using the Java exception example, `^java\.` is recommended instead of `^java.` to reduce incorrect matches. - -6. **Pay attention to log latency** - `multiline.timeout: 3s` may introduce up to approximately 3 seconds of waiting time for multi-line logs. - -7. **Pay attention to very long stack traces** - `multiline.max_lines: 500` means that up to 500 lines are merged into one log record. Excess lines are discarded. - -8. **Observe the collection pipeline after the change** - After `nevermore-config` is updated, the Pod automatically restarts and loads the new configuration. After the change, observe the nevermore Pod status, log ingestion volume, and business log integrity. - -9. **The configuration only affects newly collected logs** - After the multi-line merging configuration takes effect, it usually only affects newly collected logs. Logs that have already been collected and stored will not be automatically re-merged. From 0386eb75f6bbd7ed477906a184771f10384d60ba Mon Sep 17 00:00:00 2001 From: Chaosheng Wang <163980877+adminwcs@users.noreply.github.com> Date: Fri, 22 May 2026 10:36:02 +0800 Subject: [PATCH 3/3] Create Prometheus_Metrics_Discovery_via_Pod_Annotations_and_PodMonitor.md --- ...very_via_Pod_Annotations_and_PodMonitor.md | 547 ++++++++++++++++++ 1 file changed, 547 insertions(+) create mode 100644 docs/en/solutions/ait/Prometheus_Metrics_Discovery_via_Pod_Annotations_and_PodMonitor.md diff --git a/docs/en/solutions/ait/Prometheus_Metrics_Discovery_via_Pod_Annotations_and_PodMonitor.md b/docs/en/solutions/ait/Prometheus_Metrics_Discovery_via_Pod_Annotations_and_PodMonitor.md new file mode 100644 index 000000000..a57fe5ac3 --- /dev/null +++ b/docs/en/solutions/ait/Prometheus_Metrics_Discovery_via_Pod_Annotations_and_PodMonitor.md @@ -0,0 +1,547 @@ +--- +products: + - Alauda Container Platform +kind: + - Solution +id: KB260500006 +--- + +# Prometheus PodMonitor Metrics Discovery by Pod Annotations + +## Overview + +In Kubernetes environments, many applications and middleware components expose Prometheus scrape metadata through Pod annotations, for example: + +```yaml +annotations: + prometheus.io/scrape: "true" + prometheus.io/path: "/metrics" + prometheus.io/port: "8080" +``` + +With native Prometheus `kubernetes_sd_configs` and `relabel_configs`, these annotations can be used directly to discover and scrape metrics targets. + +In a Prometheus Operator based deployment, Prometheus does not automatically inherit the native annotation-based scrape behavior. Instead, scrape targets are managed through CRDs such as `ServiceMonitor` and `PodMonitor`. + +If you want to continue using Pod annotations to declare metrics endpoints while managing scraping through `PodMonitor`, you must explicitly handle these annotations in `PodMonitor.spec.podMetricsEndpoints[].relabelings`. + +This guide provides a production-oriented pattern for using `PodMonitor` to: + +- Select candidate Pods by Pod labels. +- Decide whether to scrape a Pod by Pod annotations. +- Dynamically set the metrics path from Pod annotations. +- Dynamically set the target port from Pod annotations. +- Optionally configure Basic Authentication. +- Align annotation-based scrape metadata with the Prometheus Operator model. + +## Environment Information + +Applicable Versions: 4.3.x + +## Prerequisites + +Before you create the `PodMonitor`, make sure the following requirements are met. + +### Prometheus Can Select the PodMonitor + +The Prometheus custom resource must be configured to select this `PodMonitor`. + +Prometheus Operator uses the following fields from the Prometheus custom resource: + +- `spec.podMonitorSelector` +- `spec.podMonitorNamespaceSelector` + +The `PodMonitor` must match both selectors. Otherwise, Prometheus Operator will not generate scrape configuration for it. + +Example: + +```yaml +metadata: + labels: + prometheus: kube-prometheus +``` + +The label above must match the `podMonitorSelector` configured in the target Prometheus custom resource. + +### Pods Have Matchable Labels + +`PodMonitor.spec.selector` selects Pods by labels. Each target Pod must have labels that match the selector. + +Example target Pod labels: + +```yaml +labels: + service_name: elasticsearch +``` + +### Pods Have Prometheus Annotations + +Each target Pod should include Prometheus scrape annotations. + +Example: + +```yaml +annotations: + prometheus.io/scrape: "true" + prometheus.io/path: "/metrics" + prometheus.io/port: "8080" +``` + +Recommendations: + +- Always use string values for annotations. +- Use `prometheus.io/scrape: "true"` only for Pods that should be scraped. +- Use an explicit metrics path, even if the application uses `/metrics`. +- Keep the annotated port consistent with the actual metrics listener port. + +### Pods Declare the Metrics Container Port + +The target Pod should declare the metrics port in the container spec. + +Example: + +```yaml +ports: + - name: metrics + containerPort: 8080 + protocol: TCP +``` + +Notes: + +- `PodMonitor.spec.podMetricsEndpoints[].port` should use the port name defined in the Pod spec. +- Kubernetes port names must be 15 characters or less. +- Even if the application listens on a port and can be accessed by `curl`, Prometheus Operator target generation is more reliable when the container port is explicitly declared in the Pod spec. + +### Basic Auth Secret Is Available, If Required + +If the metrics endpoint requires Basic Authentication, create a Secret that can be referenced by the `PodMonitor`. + +Example: + +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: metrics-basic-auth + namespace: cpaas-system +type: Opaque +stringData: + username: admin + password: 123456 +``` + +Notes: + +- The Secret referenced by a `PodMonitor` is normally expected to be in the same namespace as the `PodMonitor`. +- Do not assume that a `PodMonitor` can directly reference a Secret from another namespace. +- In production, do not store plaintext credentials in Git. Use a secure secret management process. + +## Production PodMonitor Example + +The following example selects Pods that have the `service_name` label and scrapes only Pods with `prometheus.io/scrape: "true"`. + +It also supports overriding the metrics path and port through Pod annotations. + +```yaml +apiVersion: monitoring.coreos.com/v1 +kind: PodMonitor +metadata: + name: cpaas-elasticsearch-podmonitor + namespace: cpaas-system + labels: + prometheus: kube-prometheus +spec: + namespaceSelector: + any: true + selector: + matchExpressions: + - key: service_name + operator: Exists + podMetricsEndpoints: + - port: es-http + path: /_prometheus/metrics + interval: 30s + basicAuth: + username: + name: acp-config-secret + key: ES_USERNAME + password: + name: acp-config-secret + key: ES_PASSWORD + relabelings: + - action: keep + sourceLabels: + - __meta_kubernetes_pod_annotation_prometheus_io_scrape + regex: "true" + + - action: replace + sourceLabels: + - __meta_kubernetes_pod_annotation_prometheus_io_path + targetLabel: __metrics_path__ + regex: (.+) + + - action: replace + sourceLabels: + - __address__ + - __meta_kubernetes_pod_annotation_prometheus_io_port + regex: ([^:]+)(?::\d+)?;(\d+) + replacement: $1:$2 + targetLabel: __address__ + + - action: replace + sourceLabels: + - __meta_kubernetes_namespace + targetLabel: kubernetes_namespace + + - action: replace + sourceLabels: + - __meta_kubernetes_pod_name + targetLabel: kubernetes_pod_name +``` + +## Field Explanation + +### PodMonitor Metadata + +```yaml +metadata: + name: cpaas-elasticsearch-podmonitor + namespace: cpaas-system + labels: + prometheus: kube-prometheus +``` + +- `metadata.namespace` is the namespace where the `PodMonitor` object is created. +- `metadata.labels` must match the Prometheus CR `spec.podMonitorSelector`. +- If the label does not match, Prometheus Operator will ignore this `PodMonitor`. + +### Namespace Selector + +```yaml +namespaceSelector: + any: true +``` + +This allows the `PodMonitor` to select Pods from any namespace. + +Production recommendation: + +- Use `any: true` only when cross-namespace scraping is required. +- For stricter isolation, prefer `matchNames`. + +Example: + +```yaml +namespaceSelector: + matchNames: + - cpaas-system +``` + +### Pod Selector + +```yaml +selector: + matchExpressions: + - key: service_name + operator: Exists +``` + +This selects candidate Pods by label. + +Important: + +- This only selects candidate Pods. +- The final decision to scrape is made by the relabeling rule that checks `prometheus.io/scrape`. + +### Pod Metrics Endpoint + +```yaml +podMetricsEndpoints: + - port: es-http + path: /_prometheus/metrics + interval: 30s +``` + +- `port` should match a named container port in the target Pod. +- `path` is the default scrape path. +- `interval` defines the scrape interval. + +If the Pod has `prometheus.io/path`, the relabeling rule overrides the default path by setting `__metrics_path__`. + +If the Pod has `prometheus.io/port`, the relabeling rule overrides the port part of `__address__`. + +## Relabeling Rules + +### Keep Only Pods with `prometheus.io/scrape=true` + +```yaml +- action: keep + sourceLabels: + - __meta_kubernetes_pod_annotation_prometheus_io_scrape + regex: "true" +``` + +Only Pods with the following annotation are kept: + +```yaml +prometheus.io/scrape: "true" +``` + +All other Pods are dropped. + +### Override Metrics Path from Annotation + +```yaml +- action: replace + sourceLabels: + - __meta_kubernetes_pod_annotation_prometheus_io_path + targetLabel: __metrics_path__ + regex: (.+) +``` + +If the Pod has this annotation: + +```yaml +prometheus.io/path: "/metrics" +``` + +Prometheus uses the annotation value as the scrape path. + +If the annotation is missing or empty, the default `path` configured in `podMetricsEndpoints` is used. + +### Override Target Port from Annotation + +```yaml +- action: replace + sourceLabels: + - __address__ + - __meta_kubernetes_pod_annotation_prometheus_io_port + regex: ([^:]+)(?::\d+)?;(\d+) + replacement: $1:$2 + targetLabel: __address__ +``` + +If the Pod has this annotation: + +```yaml +prometheus.io/port: "8080" +``` + +Prometheus replaces the port part of `__address__` with `8080`. + +Important: + +- The annotation port must be numeric. +- The Pod should still declare a named container port so that the `PodMonitor` can generate a target reliably. +- `podMetricsEndpoints[].port` is still required by the Prometheus Operator schema and should reference a valid named container port. + +### Add Kubernetes Namespace Label + +```yaml +- action: replace + sourceLabels: + - __meta_kubernetes_namespace + targetLabel: kubernetes_namespace +``` + +This adds the Pod namespace as a metric label. + +### Add Kubernetes Pod Name Label + +```yaml +- action: replace + sourceLabels: + - __meta_kubernetes_pod_name + targetLabel: kubernetes_pod_name +``` + +This adds the Pod name as a metric label. + +## Example Target Pod + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: example-app + namespace: cpaas-system + labels: + service_name: example-app + annotations: + prometheus.io/scrape: "true" + prometheus.io/path: "/metrics" + prometheus.io/port: "8080" +spec: + containers: + - name: app + image: example/app:latest + ports: + - name: metrics + containerPort: 8080 + protocol: TCP +``` + +If the `PodMonitor` uses: + +```yaml +podMetricsEndpoints: + - port: metrics +``` + +Prometheus Operator can generate a target for the named port, and the relabeling rules can then apply the annotation-based scrape behavior. + +## Basic Auth Configuration + +If the metrics endpoint requires Basic Authentication, configure `basicAuth` in the endpoint. + +```yaml +basicAuth: + username: + name: metrics-basic-auth + key: username + password: + name: metrics-basic-auth + key: password +``` + +Notes: + +- The Secret must be accessible to the Prometheus Operator and Prometheus configuration generation process. +- The Secret is typically created in the same namespace as the `PodMonitor`. +- Use Kubernetes RBAC and secret management policies to protect credentials. + +If Basic Authentication is not required, remove the `basicAuth` section. + +## Validation + +### Check That the PodMonitor Is Selected by Prometheus + +Check the Prometheus custom resource and confirm that: + +- `spec.podMonitorSelector` matches the labels of the `PodMonitor`. +- `spec.podMonitorNamespaceSelector` includes the namespace of the `PodMonitor`. + +Example commands: + +```bash +kubectl get prometheus -A +kubectl get podmonitor -n cpaas-system cpaas-elasticsearch-podmonitor --show-labels +``` + +### Check That Target Pods Match the PodMonitor Selector + +```bash +kubectl get pod -A -l service_name --show-labels +``` + +Confirm that the target Pods have the expected labels. + +### Check Pod Annotations + +```bash +kubectl get pod -n -o yaml +``` + +Confirm that the Pod has: + +```yaml +prometheus.io/scrape: "true" +prometheus.io/path: "/metrics" +prometheus.io/port: "8080" +``` + +### Check Container Port Declaration + +Confirm that the Pod declares a named container port that matches `podMetricsEndpoints[].port`. + +Example: + +```yaml +ports: + - name: metrics + containerPort: 8080 + protocol: TCP +``` + +### Check Prometheus Targets + +Open the Prometheus UI and check: + +```text +Status -> Targets +``` + +Expected result: + +- The target appears under the generated PodMonitor scrape job. +- The target URL uses the expected path. +- The target address uses the expected port. +- The target state is `UP`. + +## Troubleshooting + +### PodMonitor Is Not Effective + +Possible causes: + +- The `PodMonitor` namespace is not selected by `spec.podMonitorNamespaceSelector`. +- The `PodMonitor` labels do not match `spec.podMonitorSelector`. +- The Prometheus Operator is not watching the namespace. + +### Pod Is Not Discovered + +Possible causes: + +- The Pod labels do not match `PodMonitor.spec.selector`. +- The Pod is in a namespace not selected by `PodMonitor.spec.namespaceSelector`. +- The Pod does not declare the named container port referenced by `podMetricsEndpoints[].port`. + +### Pod Is Discovered but Dropped + +Possible cause: + +- The Pod does not have `prometheus.io/scrape: "true"`. + +The `keep` relabeling rule drops all Pods that do not match this annotation. + +### Target Uses the Wrong Path + +Possible causes: + +- The Pod does not have `prometheus.io/path`. +- The annotation value is empty. +- The application exposes metrics on a different path. + +If the annotation is missing, Prometheus uses the endpoint default `path`. + +### Target Uses the Wrong Port + +Possible causes: + +- The Pod does not have `prometheus.io/port`. +- The annotation value is not numeric. +- The container does not declare the named port used by `podMetricsEndpoints[].port`. +- The application listens on a different port than the annotation value. + +### Target Is Down with Authentication Error + +Possible causes: + +- The `basicAuth` Secret does not exist. +- The Secret is in the wrong namespace. +- The Secret keys are incorrect. +- The username or password is invalid. + +## Production Recommendations + +- Prefer `namespaceSelector.matchNames` over `namespaceSelector.any: true` unless cross-namespace discovery is required. +- Use strict Pod label selectors to avoid selecting too many candidate Pods. +- Require `prometheus.io/scrape: "true"` to explicitly opt in to scraping. +- Keep annotation values as strings. +- Declare named container ports in the Pod spec. +- Keep port names no longer than 15 characters. +- Store Basic Auth credentials in Kubernetes Secrets and manage them securely. +- Avoid committing plaintext credentials to Git. +- Validate targets in the Prometheus UI after applying the `PodMonitor`. +- Keep the default endpoint `path` and annotation `prometheus.io/path` consistent where possible.