fix(container): update rook-ceph group ( v1.19.5 → v1.19.6 )#1346
Open
renovate[bot] wants to merge 1 commit into
Open
fix(container): update rook-ceph group ( v1.19.5 → v1.19.6 )#1346renovate[bot] wants to merge 1 commit into
renovate[bot] wants to merge 1 commit into
Conversation
--- kubernetes/apps/rook-ceph/rook-ceph/app Kustomization: rook-ceph/rook-ceph OCIRepository: rook-ceph/rook-ceph
+++ kubernetes/apps/rook-ceph/rook-ceph/app Kustomization: rook-ceph/rook-ceph OCIRepository: rook-ceph/rook-ceph
@@ -10,9 +10,9 @@
spec:
interval: 5m
layerSelector:
mediaType: application/vnd.cncf.helm.chart.content.v1.tar+gzip
operation: copy
ref:
- tag: v1.19.5
+ tag: v1.19.6
url: oci://ghcr.io/rook/rook-ceph
--- kubernetes/apps/rook-ceph/rook-ceph/cluster Kustomization: rook-ceph/rook-ceph-cluster OCIRepository: rook-ceph/rook-ceph-cluster
+++ kubernetes/apps/rook-ceph/rook-ceph/cluster Kustomization: rook-ceph/rook-ceph-cluster OCIRepository: rook-ceph/rook-ceph-cluster
@@ -10,9 +10,9 @@
spec:
interval: 5m
layerSelector:
mediaType: application/vnd.cncf.helm.chart.content.v1.tar+gzip
operation: copy
ref:
- tag: v1.19.5
+ tag: v1.19.6
url: oci://ghcr.io/rook/rook-ceph-cluster
|
--- HelmRelease: rook-ceph/rook-ceph-cluster CephCluster: rook-ceph/rook-ceph
+++ HelmRelease: rook-ceph/rook-ceph-cluster CephCluster: rook-ceph/rook-ceph
@@ -89,12 +89,18 @@
cleanup:
limits:
memory: 1Gi
requests:
cpu: 500m
memory: 100Mi
+ cmd-reporter:
+ limits:
+ memory: 1Gi
+ requests:
+ cpu: 500m
+ memory: 100Mi
crashcollector:
limits:
memory: 60Mi
requests:
cpu: 100m
memory: 60Mi
--- HelmRelease: rook-ceph/rook-ceph-cluster PrometheusRule: rook-ceph/prometheus-ceph-rules
+++ HelmRelease: rook-ceph/rook-ceph-cluster PrometheusRule: rook-ceph/prometheus-ceph-rules
@@ -10,63 +10,72 @@
spec:
groups:
- name: cluster health
rules:
- alert: CephHealthError
annotations:
- description: The cluster state has been HEALTH_ERROR for more than 5 minutes.
- Please check 'ceph health detail' for more information.
- summary: Ceph is in the ERROR state
+ description: The cluster state has been HEALTH_ERROR for more than 5 minutes
+ on cluster {{ $labels.cluster }}. Please check 'ceph health detail' for
+ more information.
+ summary: Ceph is in the ERROR state on cluster {{ $labels.cluster }}
expr: ceph_health_status == 2
for: 5m
labels:
oid: 1.3.6.1.4.1.50495.1.2.1.2.1
severity: critical
type: ceph_default
- alert: CephHealthWarning
annotations:
- description: The cluster state has been HEALTH_WARN for more than 15 minutes.
- Please check 'ceph health detail' for more information.
- summary: Ceph is in the WARNING state
+ description: The cluster state has been HEALTH_WARN for more than 15 minutes
+ on cluster {{ $labels.cluster }}. Please check 'ceph health detail' for
+ more information.
+ summary: Ceph is in the WARNING state on cluster {{ $labels.cluster }}
expr: ceph_health_status == 1
for: 15m
labels:
severity: warning
type: ceph_default
- name: mon
rules:
- alert: CephMonDownQuorumAtRisk
annotations:
- description: '{{ $min := query "floor(count(ceph_mon_metadata) / 2) + 1" |
- first | value }}Quorum requires a majority of monitors (x {{ $min }}) to
- be active. Without quorum the cluster will become inoperable, affecting
- all services and connected clients. The following monitors are down: {{-
- range query "(ceph_mon_quorum_status == 0) + on(ceph_daemon) group_left(hostname)
- (ceph_mon_metadata * 0)" }} - {{ .Labels.ceph_daemon }} on {{ .Labels.hostname
+ description: '{{ $min := printf "floor(count(ceph_mon_metadata{cluster=''%s''})
+ / 2) + 1" .Labels.cluster | query | first | value }}Quorum requires a majority
+ of monitors (x {{ $min }}) to be active. Without quorum the cluster will
+ become inoperable, affecting all services and connected clients. The following
+ monitors are down: {{- range printf "(ceph_mon_quorum_status{cluster=''%s''}
+ == 0) + on(cluster,ceph_daemon) group_left(hostname) (ceph_mon_metadata
+ * 0)" .Labels.cluster | query }} - {{ .Labels.ceph_daemon }} on {{ .Labels.hostname
}} {{- end }}'
documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#mon-down
- summary: Monitor quorum is at risk
+ summary: Monitor quorum is at risk on cluster {{ $labels.cluster }}
expr: |
(
- (ceph_health_detail{name="MON_DOWN"} == 1) * on() (
- count(ceph_mon_quorum_status == 1) == bool (floor(count(ceph_mon_metadata) / 2) + 1)
+ (ceph_health_detail{name="MON_DOWN"} == 1) * on() group_right(cluster) (
+ count(ceph_mon_quorum_status == 1) by(cluster)== bool (floor(count(ceph_mon_metadata) by(cluster) / 2) + 1)
)
) == 1
for: 30s
labels:
oid: 1.3.6.1.4.1.50495.1.2.1.3.1
severity: critical
type: ceph_default
- alert: CephMonDown
annotations:
- description: |
- {{ $down := query "count(ceph_mon_quorum_status == 0)" | first | value }}{{ $s := "" }}{{ if gt $down 1.0 }}{{ $s = "s" }}{{ end }}You have {{ $down }} monitor{{ $s }} down. Quorum is still intact, but the loss of an additional monitor will make your cluster inoperable. The following monitors are down: {{- range query "(ceph_mon_quorum_status == 0) + on(ceph_daemon) group_left(hostname) (ceph_mon_metadata * 0)" }} - {{ .Labels.ceph_daemon }} on {{ .Labels.hostname }} {{- end }}
+ description: '{{ $down := printf "count(ceph_mon_quorum_status{cluster=''%s''}
+ == 0)" .Labels.cluster | query | first | value }}{{ $s := "" }}{{ if gt
+ $down 1.0 }}{{ $s = "s" }}{{ end }}You have {{ $down }} monitor{{ $s }}
+ down. Quorum is still intact, but the loss of an additional monitor will
+ make your cluster inoperable. The following monitors are down: {{- range
+ printf "(ceph_mon_quorum_status{cluster=''%s''} == 0) + on(cluster,ceph_daemon)
+ group_left(hostname) (ceph_mon_metadata * 0)" .Labels.cluster | query }}
+ - {{ .Labels.ceph_daemon }} on {{ .Labels.hostname }} {{- end }}'
documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#mon-down
- summary: One or more monitors down
+ summary: One or more monitors down on cluster {{ $labels.cluster }}
expr: |
- count(ceph_mon_quorum_status == 0) <= (count(ceph_mon_metadata) - floor(count(ceph_mon_metadata) / 2) + 1)
+ (count by (cluster) (ceph_mon_quorum_status == 0)) <= (count by (cluster) (ceph_mon_metadata) - floor((count by (cluster) (ceph_mon_metadata) / 2 + 1)))
for: 30s
labels:
severity: warning
type: ceph_default
- alert: CephMonDiskspaceCritical
annotations:
@@ -76,13 +85,14 @@
on the mon pod's worker node for Rook. Look for old, rotated versions of
*.log and MANIFEST*. Do NOT touch any *.sst files. Also check any other
directories under /var/lib/rook and other directories on the same filesystem,
often /var/log and /var/tmp are culprits. Your monitor hosts are; {{- range
query "ceph_mon_metadata"}} - {{ .Labels.hostname }} {{- end }}
documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#mon-disk-crit
- summary: Filesystem space on at least one monitor is critically low
+ summary: Filesystem space on at least one monitor is critically low on cluster
+ {{ $labels.cluster }}
expr: ceph_health_detail{name="MON_DISK_CRIT"} == 1
for: 1m
labels:
oid: 1.3.6.1.4.1.50495.1.2.1.3.2
severity: critical
type: ceph_default
@@ -95,13 +105,14 @@
node for Rook. Look for old, rotated versions of *.log and MANIFEST*. Do
NOT touch any *.sst files. Also check any other directories under /var/lib/rook
and other directories on the same filesystem, often /var/log and /var/tmp
are culprits. Your monitor hosts are; {{- range query "ceph_mon_metadata"}}
- {{ .Labels.hostname }} {{- end }}
documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#mon-disk-low
- summary: Drive space on at least one monitor is approaching full
+ summary: Drive space on at least one monitor is approaching full on cluster
+ {{ $labels.cluster }}
expr: ceph_health_detail{name="MON_DISK_LOW"} == 1
for: 5m
labels:
severity: warning
type: ceph_default
- alert: CephMonClockSkew
@@ -110,53 +121,62 @@
and cluster consistency. This event indicates that the time on at least
one mon has drifted too far from the lead mon. Review cluster status with
ceph -s. This will show which monitors are affected. Check the time sync
status on each monitor host with 'ceph time-sync-status' and the state and
peers of your ntpd or chrony daemon.
documentation: https://docs.ceph.com/en/latest/rados/operations/health-checks#mon-clock-skew
- summary: Clock skew detected among monitors
+ summary: Clock skew detected among monitors on cluster {{ $labels.cluster
+ }}
expr: ceph_health_detail{name="MON_CLOCK_SKEW"} == 1
for: 1m
labels:
severity: warning
type: ceph_default
- name: osd
rules:
- alert: CephOSDDownHigh
annotations:
- description: '{{ $value | humanize }}% or {{ with query "count(ceph_osd_up
- == 0)" }}{{ . | first | value }}{{ end }} of {{ with query "count(ceph_osd_up)"
+ description: '{{ $value | humanize }}% or {{ with printf "count (ceph_osd_up{cluster=''%s''}
+ == 0)" .Labels.cluster | query }}{{ . | first | value }}{{ end }} of {{
+ with printf "count (ceph_osd_up{cluster=''%s''})" .Labels.cluster | query
}}{{ . | first | value }}{{ end }} OSDs are down (>= 10%). The following
- OSDs are down: {{- range query "(ceph_osd_up * on(ceph_daemon) group_left(hostname)
- ceph_osd_metadata) == 0" }} - {{ .Labels.ceph_daemon }} on {{ .Labels.hostname
+ OSDs are down: {{- range printf "(ceph_osd_up{cluster=''%s''} * on(cluster,
+ ceph_daemon) group_left(hostname) ceph_osd_metadata) == 0" .Labels.cluster
+ | query }} - {{ .Labels.ceph_daemon }} on {{ .Labels.hostname }} {{- end
+ }}'
+ summary: More than 10% of OSDs are down on cluster {{ $labels.cluster }}
+ expr: count by (cluster) (ceph_osd_up == 0) / count by (cluster) (ceph_osd_up)
+ * 100 >= 10
+ labels:
+ oid: 1.3.6.1.4.1.50495.1.2.1.4.1
+ severity: critical
+ type: ceph_default
+ - alert: CephOSDHostDown
+ annotations:
+ description: 'The following OSDs are down: {{- range printf "(ceph_osd_up{cluster=''%s''}
+ * on(cluster,ceph_daemon) group_left(hostname) ceph_osd_metadata) == 0"
+ .Labels.cluster | query }} - {{ .Labels.hostname }} : {{ .Labels.ceph_daemon
}} {{- end }}'
- summary: More than 10% of OSDs are down
- expr: count(ceph_osd_up == 0) / count(ceph_osd_up) * 100 >= 10
- for: 5m
- labels:
- oid: 1.3.6.1.4.1.50495.1.2.1.4.1
- severity: critical
- type: ceph_default
- - alert: CephOSDHostDown
- annotations:
[Diff truncated by flux-local]
--- HelmRelease: rook-ceph/rook-ceph-operator Deployment: rook-ceph/rook-ceph-operator
+++ HelmRelease: rook-ceph/rook-ceph-operator Deployment: rook-ceph/rook-ceph-operator
@@ -28,13 +28,13 @@
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 5
containers:
- name: rook-ceph-operator
- image: ghcr.io/rook/ceph:v1.19.5
+ image: ghcr.io/rook/ceph:v1.19.6
imagePullPolicy: IfNotPresent
args:
- ceph
- operator
securityContext:
capabilities: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
v1.19.5→v1.19.6v1.19.5→v1.19.6Release Notes
rook/rook (ghcr.io/rook/rook-ceph)
v1.19.6Compare Source
Improvements
Rook v1.19.6 is a patch release limited in scope and focusing on feature additions and bug fixes to the Ceph operator.
clusterlabel to all scraped metrics (#17544, @jhoblitt)Configuration
📅 Schedule: (in timezone America/Los_Angeles)
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about these updates again.
This PR was generated by Mend Renovate. View the repository job log.