Skip to content

Adding runbook for 'CephXattrSetLatency' alert#364

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:masterfrom
aruniiird:add-cephXattrSetLatency-runbook-file
Feb 10, 2026
Merged

Adding runbook for 'CephXattrSetLatency' alert#364
openshift-merge-bot[bot] merged 1 commit intoopenshift:masterfrom
aruniiird:add-cephXattrSetLatency-runbook-file

Conversation

@aruniiird
Copy link
Copy Markdown
Contributor

No description provided.

@aruniiird
Copy link
Copy Markdown
Contributor Author

/retest-required

@aruniiird aruniiird force-pushed the add-cephXattrSetLatency-runbook-file branch from 768548e to 24d67a2 Compare January 19, 2026 09:31
@agarwal-mudit
Copy link
Copy Markdown

@aruniiird

  1. Who should review this PR, pls reach out.
  2. You need to fix the lint error

@aruniiird aruniiird force-pushed the add-cephXattrSetLatency-runbook-file branch from 24d67a2 to ca3a6d4 Compare February 2, 2026 13:27
@aruniiird
Copy link
Copy Markdown
Contributor Author

@weirdwiz , can you please take a look?

@aruniiird
Copy link
Copy Markdown
Contributor Author

@Rakshith-R , can you please take a look at this runbook doc?


## Additional Resources

- [Ceph MDS Troubleshooting](https://docs.ceph.com/en/latest/cephfs/troubleshooting/)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not use upstream links in our downstream product runbooks ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Member

@BlaineEXE BlaineEXE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good, consise summary of triage steps for customers. Customers with high-quality admins should be able to self-support with this doc, and customers still having difficulties will have good additional info to bring to support requests. No concerns adding this.

@aruniiird aruniiird force-pushed the add-cephXattrSetLatency-runbook-file branch from ca3a6d4 to 859c13f Compare February 6, 2026 05:11
@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 6, 2026
@aruniiird
Copy link
Copy Markdown
Contributor Author

@agarwal-mudit , the PR is good to go now. Please check.
Thanks

Copy link
Copy Markdown
Member

@Rakshith-R Rakshith-R left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be using odf cli where possible

@subham.kumar.rai ^

I don't think users can understand anything with outputs of step 2 and step 6.

@aruniiird you have used links to other runbooks here, users will be directed to github repo instead of viewing the runbook on the console ?

@aruniiird aruniiird force-pushed the add-cephXattrSetLatency-runbook-file branch from 859c13f to e6b8141 Compare February 6, 2026 12:59
@aruniiird
Copy link
Copy Markdown
Contributor Author

aruniiird commented Feb 10, 2026

@Rakshith-R

We should be using odf cli where possible

@subham.kumar.rai ^

This should be done in all the other (applicable) runbooks as well. I will take this as a separate issue.
PS: also we have to see which odf versions the odf cli was introduced and add notes accordingly.

I don't think users can understand anything with outputs of step 2 and step 6.

Thanks for the feedback on the audience level.
If you are pointing to these two steps, Step 2: Check MDS Performance Metrics and Step 6: Check for Lock Contention, I included those steps primarily for admins who might be familiar with memory/process dump commands, and as a way to gather diagnostic data for our support team.

Rather than removing them entirely, perhaps we could reframe them as optional (or even as an Advanced) diagnostic steps? This keeps the guide useful for experts without confusing beginners. However, if you believe these specific metrics won't lead to a resolution, let me know and I'll gladly take them out.

@aruniiird you have used links to other runbooks here, users will be directed to github repo instead of viewing the runbook on the console ?

All our alerts has a runbook_url session, which points to this github repo.
Example: https://github.com/red-hat-storage/ocs-operator/blob/main/metrics/deploy/prometheus-ocs-rules.yaml
It is standard practice to link to an existing runbook if the mitigation steps are already documented elsewhere.

Signed-off-by: Arun Kumar Mohan <amohan@redhat.com>
@aruniiird aruniiird force-pushed the add-cephXattrSetLatency-runbook-file branch from e6b8141 to 2c8a393 Compare February 10, 2026 05:26
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Feb 10, 2026

@aruniiird: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Copy Markdown

@agarwal-mudit agarwal-mudit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Feb 10, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Feb 10, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: agarwal-mudit, aruniiird, BlaineEXE

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot Bot merged commit 27b7839 into openshift:master Feb 10, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants