From c5c5a22e22e3a6703c86b051b34615d773d5830e Mon Sep 17 00:00:00 2001 From: Stef Nestor <26751266+stefnestor@users.noreply.github.com> Date: Thu, 19 Feb 2026 07:38:03 -0700 Subject: [PATCH 1/4] bootlooping doc applies to ECE MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 👋 mini update that the bootlooping nodes troubleshooting doc applies to ECE not just ECH --- troubleshoot/monitoring/node-bootlooping.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/troubleshoot/monitoring/node-bootlooping.md b/troubleshoot/monitoring/node-bootlooping.md index bb2dbf947f..f44f4bcf48 100644 --- a/troubleshoot/monitoring/node-bootlooping.md +++ b/troubleshoot/monitoring/node-bootlooping.md @@ -6,6 +6,7 @@ mapped_pages: applies_to: deployment: ess: all + ece: all products: - id: cloud-hosted --- @@ -163,4 +164,4 @@ To resolve this: ## Insufficient Storage [ec-config-change-errors-insufficient-storage] -Configuration change errors can occur when there is insufficient disk space for a data tier. To resolve this, you need to increase the size of that tier to ensure it provides enough storage to accommodate the data in your cluster tier considering the [high watermark](elasticsearch://reference/elasticsearch/configuration-reference/cluster-level-shard-allocation-routing-settings.md#disk-based-shard-allocation). For troubleshooting walkthrough, see [Fix watermark errors](/troubleshoot/elasticsearch/fix-watermark-errors.md). \ No newline at end of file +Configuration change errors can occur when there is insufficient disk space for a data tier. To resolve this, you need to increase the size of that tier to ensure it provides enough storage to accommodate the data in your cluster tier considering the [high watermark](elasticsearch://reference/elasticsearch/configuration-reference/cluster-level-shard-allocation-routing-settings.md#disk-based-shard-allocation). For troubleshooting walkthrough, see [Fix watermark errors](/troubleshoot/elasticsearch/fix-watermark-errors.md). From 025931cf42a560365471a1f1f9f3d05409979dc6 Mon Sep 17 00:00:00 2001 From: Stef Nestor <26751266+stefnestor@users.noreply.github.com> Date: Thu, 26 Feb 2026 11:17:38 -0700 Subject: [PATCH 2/4] feedback about ECE-vs-ECH, added ECK --- .../cloud-enterprise/node-bootlooping.md | 130 ------------------ troubleshoot/monitoring/node-bootlooping.md | 90 ++++++++---- troubleshoot/toc.yml | 5 +- 3 files changed, 64 insertions(+), 161 deletions(-) delete mode 100644 troubleshoot/deployments/cloud-enterprise/node-bootlooping.md diff --git a/troubleshoot/deployments/cloud-enterprise/node-bootlooping.md b/troubleshoot/deployments/cloud-enterprise/node-bootlooping.md deleted file mode 100644 index 60be7276d5..0000000000 --- a/troubleshoot/deployments/cloud-enterprise/node-bootlooping.md +++ /dev/null @@ -1,130 +0,0 @@ ---- -navigation_title: Node bootlooping -mapped_pages: - - https://www.elastic.co/guide/en/cloud-enterprise/current/ece-config-change-errors.html -applies_to: - deployment: - ece: all -products: - - id: cloud-enterprise ---- - -# Troubleshoot node bootlooping in {{ece}} [ece-config-change-errors] - -When you attempt to apply a configuration change to a deployment, the attempt may fail with an error indicating that the change could not be applied, and deployment resources may be unable to restart. In some cases, bootlooping may result, where the deployment resources cycle through a continual reboot process. - -:::{image} /troubleshoot/images/cloud-ec-ce-configuration-change-failure.png -:alt: A screen capture of the deployment page showing an error: Latest change to {{es}} configuration failed. -::: - -To confirm if your Elasticsearch cluster is bootlooping, you can check the most recent plan under your [Deployment Activity page](/deploy-manage/deploy/elastic-cloud/keep-track-of-deployment-activity.md) for the error: - -```sh -Plan change failed: Some instances were unable to start properly. -``` - -Here are some frequent causes of a failed configuration change: - -* [Secure settings](#ece-config-change-errors-secure-settings) -* [Expired custom plugins or bundles](#ece-config-change-errors-expired-bundle-extension) -* [OOM errors](#ece-config-change-errors-oom-errors) -* [Existing index](#ece-config-change-errors-existing-index) -* [Insufficient storage](#ece-config-change-errors-insufficient-storage) - -If you’re unable to remediate the failing plan’s root cause, you can attempt to reset the deployment to the latest successful {{es}} configuration by performing a [no-op plan](/troubleshoot/monitoring/deployment-health-warnings.md). - -:::{include} /deploy-manage/_snippets/autoops-callout-with-ech.md -::: - -## Secure settings [ece-config-change-errors-secure-settings] - -The most frequent cause of a failed deployment configuration change is due to invalid or mislocated [secure settings](/deploy-manage/security/secure-settings.md). -The keystore allows you to safely store sensitive settings, such as passwords, as a key/value pair. You can then access a secret value from a settings file by referencing its key. Importantly, not all settings can be stored in the keystore, and the keystore does not validate the settings that you add. Adding unsupported settings can cause {{es}} or other components to fail to restart. To check whether a setting is supported in the keystore, look for a "Secure" qualifier in the [lists of reloadable settings](/deploy-manage/security/secure-settings.md). - -The following sections detail some secure settings problems that can result in a configuration change error that can prevent a deployment from restarting. You might diagnose these plan failures via the logs or via their [related exit codes](/deploy-manage/maintenance/start-stop-services/start-stop-elasticsearch.md#fatal-errors) `1`, `3`, and `78`. - - -### Invalid or outdated values [ece-config-change-errors-old-values] - -The keystore does not validate any settings that you add, so invalid or outdated values are a common source of errors when you apply a configuration change to a deployment. - -To check the current set of stored settings: - -1. Open the deployment **Security** page. -2. In the **{{es}} keystore** section, check the **Security keys** list. The list is shown only if you currently have settings configured in the keystore. - -One frequent cause of errors is when settings in the keystore are no longer valid, such as when SAML settings are added for a test environment, but the settings are either not carried over or no longer valid in a production environment. - - -### Snapshot repositories [ece-config-change-errors-snapshot-repos] - -Sometimes, settings added to the keystore to connect to a snapshot repository may not be valid. When this happens, you may get an error such as `SettingsException[Neither a secret key nor a shared access token was set.]` - -For example, when adding an [Azure repository storage setting](/deploy-manage/tools/snapshot-and-restore/azure-repository.md#repository-azure-usage) such as `azure.client.default.account` to the keystore, the associated setting `azure.client.default.key` must also be added for the configuration to be valid. - - -### Third-party authentication [ece-config-change-errors-third-party-auth] - -When you configure third-party authentication, it’s important that all required configuration elements that are stored in the keystore are included in the {{es}} user settings file. For example, when you [create a SAML realm](/deploy-manage/users-roles/cluster-or-deployment-auth/saml.md#saml-create-realm), omitting a field such as `idp.entity_id` when that setting is present in the keystore results in a failed configuration change. - - -### Wrong location [ece-config-change-errors-wrong-location] - -In some cases, settings may accidentally be added to the keystore that should have been added to the [{{es}} user settings file](/deploy-manage/deploy/elastic-cloud/edit-stack-settings.md). It’s always a good idea to check the [lists of reloadable settings](/deploy-manage/security/secure-settings.md) to determine if a setting can be stored in the keystore. Settings that can safely be added to the keystore are flagged as `Secure`. - - -## Expired custom plugins or bundles [ece-config-change-errors-expired-bundle-extension] - -During the process of applying a configuration change, {{ecloud}} checks to determine if any [uploaded custom plugins or bundles](/deploy-manage/deploy/elastic-cloud/upload-custom-plugins-bundles.md) are expired. - -Problematic plugins produce oscillating {{es}} start-up logs like the following: - -```sh -Booting at Sun Sep 4 03:06:43 UTC 2022 -Installing user plugins. -Installing elasticsearch-analysis-izumo-master-7.10.2-20210618-28f8a97... -/app/elasticsearch.sh: line 169: [: too many arguments -Booting at Sun Sep 4 03:06:58 UTC 2022 -Installing user plugins. -Installing elasticsearch-analysis-izumo-master-7.10.2-20210618-28f8a97... -/app/elasticsearch.sh: line 169: [: too many arguments -``` - -Problematic bundles produce similar oscillations but their install log would appear like - -```sh -2024-11-17 15:18:02 https://found-user-plugins.s3.amazonaws.com/XXXXX/XXXXX.zip?response-content-disposition=attachment%3Bfilename%XXXXX%2F4007535947.zip&x-elastic-extension-version=1574194077471&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20241016T133214Z&X-Amz-SignedHeaders=host&X-Amz-Expires=86400&XAmz-Credential=XXXXX%2F20201016%2Fus-east-1%2Fs3%2Faws4_request&X-AmzSignature=XXXXX -``` - -Noting in example that the bundle’s expiration `X-Amz-Date=20241016T133214Z` is before than the log timestamp `2024-11-17 15:18:02` so this bundle is considered expired. - -To view any added plugins or bundles: - -1. Go to the **Features** page and open the **Extensions** tab. -2. Select any extension and then choose **Update extension** to renew it. No other changes are needed, and any associated configuration change failures should now be able to succeed. - - -## OOM errors [ece-config-change-errors-oom-errors] - -Configuration change errors can occur when there is insufficient RAM configured for a data tier. In this case, the cluster typically also shows OOM (out of memory) errors. To resolve these, you need to increase the amount of heap memory. For instances up to 64 GB of RAM, heap memory is half of the total memory allocated. For instances larger than 64 GB, the heap size is capped at 32 GB. You might also detect OOM in plan changes via their [related exit codes](/deploy-manage/maintenance/start-stop-services/start-stop-elasticsearch.md#fatal-errors) `127`, `137`, and `158`. - -You can also read our detailed blog [Managing and troubleshooting {{es}} memory](https://www.elastic.co/blog/managing-and-troubleshooting-elasticsearch-memory). - - -## Existing index [ece-config-change-errors-existing-index] - -In rare cases, when you attempt to upgrade the version of a deployment and the upgrade fails on the first attempt, subsequent attempts to upgrade may fail due to already existing resources. The problem may be due to the system preventing itself from overwriting existing indices, resulting in an error such as this: `Another Kibana instance appears to be migrating the index. Waiting for that migration to complete. If no other Kibana instance is attempting migrations, you can get past this message by deleting index .kibana_2 and restarting Kibana`. - -To resolve this: - -1. Check that you don’t need the content. -2. Run an {{es}} [Delete index request](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-delete) to remove the existing index. - - In this example, the `.kibana_2` index is the rollover of saved objects (such as Kibana visualizations or dashboards) from the original `.kibana_1` index. Since `.kibana_2` was created as part of the failed upgrade process, this index does not yet contain any pertinent data and it can safely be deleted. - -3. Retry the deployment configuration change. - - -## Insufficient storage [ece-config-change-errors-insufficient-storage] - -Configuration change errors can occur when there is insufficient disk space for a data tier. To resolve this, you need to increase the size of that tier to ensure it provides enough storage to accommodate the data in your cluster tier considering the [high watermark](elasticsearch://reference/elasticsearch/configuration-reference/cluster-level-shard-allocation-routing-settings.md#disk-based-shard-allocation). For troubleshooting walkthrough, see [Fix watermark errors](/troubleshoot/elasticsearch/fix-watermark-errors.md). \ No newline at end of file diff --git a/troubleshoot/monitoring/node-bootlooping.md b/troubleshoot/monitoring/node-bootlooping.md index f44f4bcf48..05c866852f 100644 --- a/troubleshoot/monitoring/node-bootlooping.md +++ b/troubleshoot/monitoring/node-bootlooping.md @@ -1,38 +1,51 @@ --- navigation_title: Node bootlooping mapped_pages: - - https://www.elastic.co/guide/en/cloud/current/ec-config-change-errors.html + - https://www.elastic.co/guide/en/cloud-enterprise/current/ece-config-change-errors.html - https://www.elastic.co/guide/en/cloud-heroku/current/ech-config-change-errors.html + - https://www.elastic.co/guide/en/cloud/current/ec-config-change-errors.html applies_to: deployment: - ess: all ece: all + ess: all + eck: all products: + - id: cloud-enterprise - id: cloud-hosted + - id: cloud-kubernetes --- -# Troubleshoot node bootlooping in {{ech}} [ec-config-change-errors] +# Troubleshoot node bootlooping [ec-config-change-errors] -When you attempt to apply a configuration change to a deployment, the attempt may fail with an error indicating that the change could not be applied, and deployment resources may be unable to restart. In some cases, bootlooping may result, where the deployment resources cycle through a continual reboot process. +When you attempt to apply a configuration change to a deployment, the attempt may fail with an error indicating that the change could not be applied, and deployment resources may be unable to restart. For {{ecloud}} platforms, bootlooping may result, where the deployment resources cycle through a continual reboot process. -:::{image} /troubleshoot/images/cloud-ec-ce-configuration-change-failure.png -:alt: A screen capture of the deployment page showing an error: Latest change to {{es}} configuration failed. -::: +* In {{ech}} and {{ece}}, this will induce a deployment warning banner like: -To help diagnose these and any other types of issues in your deployments, we recommend [setting up monitoring](/deploy-manage/monitor/stack-monitoring/ece-ech-stack-monitoring.md). Then, you can easily view your deployment health and access log files to troubleshoot this configuration failure. + :::{image} /troubleshoot/images/cloud-ec-ce-configuration-change-failure.png + :alt: A screen capture of the deployment page showing an error: Latest change to {{es}} configuration failed. + ::: -To confirm if your Elasticsearch cluster is bootlooping, you can check the most recent plan under your [Deployment Activity page](/deploy-manage/deploy/elastic-cloud/keep-track-of-deployment-activity.md) for the error: + To confirm if your {{es}} cluster is bootlooping, you can check the most recent plan under your [Deployment Activity page](/deploy-manage/deploy/elastic-cloud/keep-track-of-deployment-activity.md) for the error: -```sh -Plan change failed: Some instances were unable to start properly. -``` + ```sh + Plan change failed: Some instances were unable to start properly. + ``` + +* In {{eck}}, this will induce a `CrashLoopBackOff` pod state -If this occurs, correlating {{es}} logs should report: +To help diagnose these and any other types of issues in your deployments, we recommend [setting up monitoring](/deploy-manage/monitor.md). Then, you can easily view your deployment health and access log files to troubleshoot this configuration failure. + +If this occurs, correlating product logs should report `fatal exception while booting`. For example, {{es}} will report: ```sh fatal exception while booting Elasticsearch ``` +If you’re unable to remediate the failing plan’s root cause, you can attempt to reset the deployment to the latest successful configuration by + +* For {{ech}} and {{ece}}, [navigating Deployment > Edit > and selecting **Save**](/troubleshoot/monitoring/deployment-health-warnings.md). +* For {{eck}}, kubenetes [`apply`](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_apply/) the previously working configuration. + Following are some frequent causes of a failed configuration change: 1. [Secure settings](/troubleshoot/monitoring/node-bootlooping.md#ec-config-change-errors-secure-settings) @@ -41,7 +54,7 @@ Following are some frequent causes of a failed configuration change: 4. [Existing index](/troubleshoot/monitoring/node-bootlooping.md#ec-config-change-errors-existing-index) 5. [Insufficient Storage](/troubleshoot/monitoring/node-bootlooping.md#ec-config-change-errors-insufficient-storage) -If you’re unable to remediate the failing plan’s root cause, you can attempt to reset the deployment to the latest successful {{es}} configuration by performing a [no-op plan](/troubleshoot/monitoring/deployment-health-warnings.md). For an example, watch this [video walkthrough](https://www.youtube.com/watch?v=8MnXZ9egBbQ). +For an example, watch this [video walkthrough](https://www.youtube.com/watch?v=8MnXZ9egBbQ). :::{include} /deploy-manage/_snippets/autoops-callout-with-ech.md ::: @@ -68,25 +81,41 @@ These are settings typically added to the keystore for the purpose of: 1. Setting up third-party authentication, for example [SAML](/deploy-manage/users-roles/cluster-or-deployment-auth/saml.md), [OpenID Connect](/deploy-manage/users-roles/cluster-or-deployment-auth/openid-connect.md), or [Kerberos](/deploy-manage/users-roles/cluster-or-deployment-auth/kerberos.md). 2. Setting up a [custom repository](/deploy-manage/tools/snapshot-and-restore/elastic-cloud-hosted.md). -The keystore allows you to safely store sensitive settings, such as passwords, as a key/value pair. You can then access a secret value from a settings file by referencing its key. Importantly, not all settings can be stored in the keystore, and the keystore does not validate the settings that you add. Adding unsupported settings can cause {{es}} or other components to fail to restart. To check whether a setting is supported in the keystore, look for a "Secure" qualifier in the [lists of reloadable settings](/deploy-manage/security/secure-settings.md). +The keystore allows you to safely store sensitive settings, such as passwords, as a key/value pair. You can then access a secret value from a settings file by referencing its key. Importantly, not all settings can be stored in the keystore, and the keystore does not validate the settings that you add. Adding unsupported settings can cause {{es}} or other components to fail to restart. To check whether a setting is supported in the keystore, look for a "Secure" qualifier in the [lists of reloadable settings](/deploy-manage/security/secure-settings.md). Additionally, some settings require their correlating settings to also be configured at the same time to take effect and the missing setting will induce fatal errors like: + +```sh +The configuration setting [...] is required +``` The following sections detail some secure settings problems that can result in a configuration change error that can prevent a deployment from restarting. You might diagnose these plan failures via the logs or via their [related exit codes](/deploy-manage/maintenance/start-stop-services/start-stop-elasticsearch.md#fatal-errors) `1`, `3`, and `78`. +:::{tip} +If you configure these settings via a client tool, such as the [Terraform Provider for Elastic Cloud](https://github.com/elastic/terraform-provider-ec), or through an API and encounter the error, try configuring the settings directly in the Cloud UI to isolate the cause. If configuring in the Cloud UI does not result in the same error, it suggests that the keystore setting is valid, and the method of configuration should be examined. Conversely, if the same error is reported, it suggests that the keystore setting may be invalid and should be reviewed. +::: + ### Invalid or outdated values [ec-config-change-errors-old-values] The keystore does not validate any settings that you add, so invalid or outdated values are a common source of errors when you apply a configuration change to a deployment. To check the current set of stored settings: -1. Open the deployment **Security** page. -2. In the **{{es}} keystore** section, check the **Security keys** list. The list is shown only if you currently have settings configured in the keystore. +* For {{ech}} or {{ece}}: + + 1. Open the deployment **Security** page. + 2. In the **{{es}} keystore** section, check the **Security keys** list. The list is shown only if you currently have settings configured in the keystore. + +* For {{eck}}, check your [secure settings](/deploy-manage/security/k8s-secure-settings.md). One frequent cause of errors is when settings in the keystore are no longer valid, such as when SAML settings are added for a test environment, but the settings are either not carried over or no longer valid in a production environment. ### Snapshot repositories [ec-config-change-errors-snapshot-repos] -Sometimes, settings added to the keystore to connect to a snapshot repository may not be valid. When this happens, you may get an error such as `SettingsException[Neither a secret key nor a shared access token was set.]` +Sometimes, settings added to the keystore to connect to a snapshot repository may not be valid. When this happens, you may get an error such as + +```sh +SettingsException[Neither a secret key nor a shared access token was set.] +``` For example, when adding an [Azure repository storage setting](/deploy-manage/tools/snapshot-and-restore/azure-repository.md#repository-azure-usage) such as `azure.client.default.account` to the keystore, the associated setting `azure.client.default.key` must also be added for the configuration to be valid. @@ -100,16 +129,15 @@ When you configure third-party authentication, it’s important that all require In some cases, settings may accidentally be added to the keystore that should have been added to the [{{es}} user settings file](/deploy-manage/deploy/elastic-cloud/edit-stack-settings.md). It’s always a good idea to check the [lists of reloadable settings](/deploy-manage/security/secure-settings.md) to determine if a setting can be stored in the keystore. Settings that can safely be added to the keystore are flagged as `Secure`. -### Missing or improperly configured - -The error message `The configuration setting [...] is required` indicates that the corresponding setting is configured and present in the Elasticsearch instance via [Elasticsearch user settings](/deploy-manage/deploy/elastic-cloud/edit-stack-settings.md#ec-add-user-settings), but is either missing or improperly configured in [secure settings](/deploy-manage/security/secure-settings.md). Please review your [secure settings](/deploy-manage/security/secure-settings.md) to ensure they are configured correctly. - -Additionally, if you configure these settings via a client tool, such as the [Terraform Provider for Elastic Cloud](https://github.com/elastic/terraform-provider-ec), or through an API and encounter the error, try configuring the settings directly in the Cloud UI to isolate the cause. If configuring in the Cloud UI does not result in the same error, it suggests that the keystore setting is valid, and the method of configuration should be examined. Conversely, if the same error is reported, it suggests that the keystore setting may be invalid and should be reviewed. - - ## Expired custom plugins or bundles [ec-config-change-errors-expired-bundle-extension] +```{applies_to} +deployment: + ess: ga + ece: ga +``` + During the process of applying a configuration change, {{ecloud}} checks to determine if any [uploaded custom plugins or bundles](/deploy-manage/deploy/elastic-cloud/upload-custom-plugins-bundles.md) are expired. Problematic plugins produce oscillating {{es}} start-up logs like the following: @@ -135,7 +163,7 @@ Noting in example that the bundle’s expiration `X-Amz-Date=20241016T133214Z` i To view any added plugins or bundles: -1. From your deployment's lower navigation menu, select **Extensions**. +1. For {{ech}}, from your deployment's lower navigation menu, select **Extensions**. For {{ece}}, go to the **Features** page and open the **Extensions** tab. 2. Select any extension and then choose **Update extension** to renew it. No other changes are needed, and any associated configuration change failures should now be able to succeed. @@ -143,14 +171,18 @@ To view any added plugins or bundles: Configuration change errors can occur when there is insufficient RAM configured for a data tier. In this case, the cluster typically also shows OOM (out of memory) errors. To resolve these, you need to increase the amount of heap memory. For instances up to 64 GB of RAM, heap memory is half of the total memory allocated. For instances larger than 64 GB, the heap size is capped at 32 GB. You might also detect OOM in plan changes via their [related exit codes](/deploy-manage/maintenance/start-stop-services/start-stop-elasticsearch.md#fatal-errors) `127`, `137`, and `158`. -Check the [{{es}} cluster size](/deploy-manage/deploy/elastic-cloud/ec-customize-deployment-components.md#ec-cluster-size) and the [JVM memory pressure indicator](/deploy-manage/monitor/ec-memory-pressure.md) documentation to learn more. +Refer to the [High JVM memory pressure](/troubleshoot/elasticsearch/high-jvm-memory-pressure.md) documentation for more troubleshooting guidance. You can also read our detailed blog [Managing and troubleshooting {{es}} memory](https://www.elastic.co/blog/managing-and-troubleshooting-elasticsearch-memory). ## Existing index [ec-config-change-errors-existing-index] -In rare cases, when you attempt to upgrade the version of a deployment and the upgrade fails on the first attempt, subsequent attempts to upgrade may fail due to already existing resources. The problem may be due to the system preventing itself from overwriting existing indices, resulting in an error such as this: `Another Kibana instance appears to be migrating the index. Waiting for that migration to complete. If no other Kibana instance is attempting migrations, you can get past this message by deleting index .kibana_2 and restarting Kibana`. +In rare cases, when you attempt to upgrade the version of a deployment and the upgrade fails on the first attempt, subsequent attempts to upgrade may fail due to already existing resources. The problem may be due to the system preventing itself from overwriting existing indices, resulting in an error such as this: + +```sh +Another Kibana instance appears to be migrating the index. Waiting for that migration to complete. If no other Kibana instance is attempting migrations, you can get past this message by deleting index .kibana_2 and restarting Kibana +``` To resolve this: @@ -162,6 +194,6 @@ To resolve this: 3. Retry the deployment configuration change. -## Insufficient Storage [ec-config-change-errors-insufficient-storage] +## Insufficient storage [ec-config-change-errors-insufficient-storage] Configuration change errors can occur when there is insufficient disk space for a data tier. To resolve this, you need to increase the size of that tier to ensure it provides enough storage to accommodate the data in your cluster tier considering the [high watermark](elasticsearch://reference/elasticsearch/configuration-reference/cluster-level-shard-allocation-routing-settings.md#disk-based-shard-allocation). For troubleshooting walkthrough, see [Fix watermark errors](/troubleshoot/elasticsearch/fix-watermark-errors.md). diff --git a/troubleshoot/toc.yml b/troubleshoot/toc.yml index 0634da8e23..df8c8078ba 100644 --- a/troubleshoot/toc.yml +++ b/troubleshoot/toc.yml @@ -213,7 +213,7 @@ toc: - file: deployments/cloud-enterprise/verify-zookeeper-sync-status.md - file: deployments/cloud-enterprise/rebuilding-broken-zookeeper-quorum.md - file: deployments/cloud-enterprise/deployment-health-warnings.md - - file: deployments/cloud-enterprise/node-bootlooping.md + - file: monitoring/node-bootlooping.md - file: deployments/cloud-enterprise/run-ece-diagnostics-tool.md - file: deployments/cloud-enterprise/heap-dumps.md - file: deployments/cloud-enterprise/thread-dumps.md @@ -222,4 +222,5 @@ toc: - file: deployments/cloud-on-k8s/common-problems.md - file: deployments/cloud-on-k8s/troubleshooting-methods.md - file: deployments/cloud-on-k8s/run-eck-diagnostics.md - - file: deployments/cloud-on-k8s/jvm-heap-dumps.md \ No newline at end of file + - file: deployments/cloud-on-k8s/jvm-heap-dumps.md + - file: monitoring/node-bootlooping.md \ No newline at end of file From 79903aec0c76201e11f8edabf7e270aee78b1fa2 Mon Sep 17 00:00:00 2001 From: Marci W <333176+marciw@users.noreply.github.com> Date: Fri, 27 Feb 2026 19:02:58 -0500 Subject: [PATCH 3/4] Add redirect for deleted page --- redirects.yml | 3 +++ 1 file changed, 3 insertions(+) diff --git a/redirects.yml b/redirects.yml index 5c4add9fc1..fb55eabeb9 100644 --- a/redirects.yml +++ b/redirects.yml @@ -701,3 +701,6 @@ redirects: # Related to https://github.com/elastic/docs-content/pull/5033 'solutions/observability/observability-ai-assistant.md': 'solutions/observability/ai/observability-ai-assistant.md' 'solutions/observability/llm-performance-matrix.md': 'solutions/observability/ai/llm-performance-matrix.md' + +# Related to https://github.com/elastic/docs-content/pull/5222 + 'troubleshoot/deployments/cloud-enterprise/node-bootlooping.md': 'troubleshoot/monitoring/node-bootlooping.md' \ No newline at end of file From a6b31f1732d7c9898149c1751089807378ba7e29 Mon Sep 17 00:00:00 2001 From: Stef Nestor <26751266+stefnestor@users.noreply.github.com> Date: Wed, 4 Mar 2026 12:12:42 -0700 Subject: [PATCH 4/4] =?UTF-8?q?feedback=20=F0=9F=99=8F?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Marci W <333176+marciw@users.noreply.github.com> --- troubleshoot/monitoring/node-bootlooping.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/troubleshoot/monitoring/node-bootlooping.md b/troubleshoot/monitoring/node-bootlooping.md index 05c866852f..916dceb5bf 100644 --- a/troubleshoot/monitoring/node-bootlooping.md +++ b/troubleshoot/monitoring/node-bootlooping.md @@ -17,9 +17,9 @@ products: # Troubleshoot node bootlooping [ec-config-change-errors] -When you attempt to apply a configuration change to a deployment, the attempt may fail with an error indicating that the change could not be applied, and deployment resources may be unable to restart. For {{ecloud}} platforms, bootlooping may result, where the deployment resources cycle through a continual reboot process. +When you try to apply a configuration change to a deployment, an error might appear and resources might not restart. This can result in _bootlooping_, where the deployment resources cycle through a continual reboot process. -* In {{ech}} and {{ece}}, this will induce a deployment warning banner like: +* In {{ech}} and {{ece}}, a deployment health warning appears: :::{image} /troubleshoot/images/cloud-ec-ce-configuration-change-failure.png :alt: A screen capture of the deployment page showing an error: Latest change to {{es}} configuration failed. @@ -31,7 +31,7 @@ When you attempt to apply a configuration change to a deployment, the attempt ma Plan change failed: Some instances were unable to start properly. ``` -* In {{eck}}, this will induce a `CrashLoopBackOff` pod state +* In {{eck}}, a `CrashLoopBackOff` pod state occurs. To help diagnose these and any other types of issues in your deployments, we recommend [setting up monitoring](/deploy-manage/monitor.md). Then, you can easily view your deployment health and access log files to troubleshoot this configuration failure. @@ -41,10 +41,10 @@ If this occurs, correlating product logs should report `fatal exception while bo fatal exception while booting Elasticsearch ``` -If you’re unable to remediate the failing plan’s root cause, you can attempt to reset the deployment to the latest successful configuration by +If you can't determine the root cause, you can try to reset the deployment to the latest successful configuration: -* For {{ech}} and {{ece}}, [navigating Deployment > Edit > and selecting **Save**](/troubleshoot/monitoring/deployment-health-warnings.md). -* For {{eck}}, kubenetes [`apply`](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_apply/) the previously working configuration. +* For {{ech}} and {{ece}}, select **Edit** on the deployment page, then **Save** without making any changes. +* For {{eck}}, use [`kubectl apply`](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_apply/) to reset. Following are some frequent causes of a failed configuration change: