Conversation
Signed-off-by: Annaraya Narasagond <annaraya.narasagond@ibm.com>
Signed-off-by: Annaraya Narasagond <annaraya.narasagond@ibm.com>
Signed-off-by: Annaraya Narasagond <annaraya.narasagond@ibm.com>
Signed-off-by: Annaraya Narasagond <annaraya.narasagond@ibm.com>
Cleanup protectedPVCs that are stale
Signed-off-by: Annaraya Narasagond <annarayanarasagond@gmail.com> Signed-off-by: Annaraya Narasagond <annaraya.narasagond@ibm.com>
Signed-off-by: Annaraya Narasagond <annaraya.narasagond@ibm.com>
Signed-off-by: Annaraya Narasagond <annaraya.narasagond@ibm.com>
Signed-off-by: Annaraya Narasagond <annaraya.narasagond@ibm.com>
Fixes: RamenDR#1200 Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
Also bump github.com/golang/protobuf to 1.5.4 To address related security vulnerability, see: - https://www.cve.org/CVERecord?id=CVE-2024-24786 - https://groups.google.com/g/golang-announce/c/ArQ6CDgtEjY/m/oLMrdq_GBQAJ Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>
By default, clusteradm installs the latest release. Extract a BUNDLE_VERSION constant to allow specifying specific ocm version. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
To test ocm changes, we need to build and push the images to a private image repository. When deploying we can use the new IMAGE_REGISTRY= constant to specify the image registry. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
This version[1] pulls ocm 0.13.1[2], fixing auto approval failures after joining the hub. [1] https://github.com/open-cluster-management-io/clusteradm/releases/tag/v0.8.1 [2] https://github.com/open-cluster-management-io/ocm/releases/tag/v0.13.1 Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Ignore some flake8 rules conflicting with black code style so we can use automatic formatting. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Use the same configuration used in the full environment. This should make testing the minimal environment closer to the full one. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Rook quick start guide recommends to check `ceph status` using the toolbox after deploying the cluster[1]. Move the rook ceph toolbox before the rook pool, so we can validate the cluster status. [1] https://rook.io/docs/rook/latest/Getting-Started/quickstart/#create-a-ceph-cluster Signed-off-by: Nir Soffer <nsoffer@redhat.com>
To make sure that we wait correctly for the cluster. On the next failure the cluster status will be logged. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
This may help debug issues with ceph, and also validates that the toolbox works. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Hopefully this will make it easier to debug random failures in rbd-mirror. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
We add cephrbdmirror resource, but we don't wait until it is reconciled and become ready. Wait and log the resource status to make debugging easier on the next random timeout. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Hopefully this will help to debug issue when we have the next random timeout in rbd-mirror test. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
We have random failure timing out waiting for rbd-mirror. One possible reason may be bad ceph blocklist blocking rbd-daemon. Log the ceph osd blocklist before we wait for rbd daemon. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Previously we tried to wait for all deployment when starting a running minikube profile. This works most of the time, but fail if a deployment is in failed state (Progressing=False). Fix by restarting all the failed deployments. We don't wait until they are rolled out again, since the addons already wait for the deployments. I could reproduce the issue once with rook-ceph-operator, and restarting the deployment fixed it. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Signed-off-by: Abhijeet Shakya <abhijeetshakya21@gmail.com>
Previously limited to 1 worker per cluster due to various issues. Since the issues are fixed now, we can remove this limit. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Create the test structure upfront instead of building it in write_output. This will make it easier to add more info. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
When comparing runs we want to make sure we tested the same code.
Normally you don't update the code during a run, so we can get the git
commit and branch once at the start.
Example:
$ head -5 out/test.json
{
"git": {
"commit": "9bb63eb6a7e0dfec1bc20144f81f84f4ed1540fb",
"branch": "stress-git-info"
},
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
kubectl.label() has confusing arguments names. The first argument is the resource, and the second is the label (key=value or key-). Rename to make this more clear. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
The test helper "worked" since after starting the test clsuter the default context is updated by minikube. But if you start another environment or change the default context manually, the test would fail trying the access the wrong cluster, or worse, succeed silently while modifying the wrong cluster. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
When gathering logs from pods, we read large amount of data from kubectl, and write it to a file. Decoding every line on read and encoding on write is wasteful. Add keepends= and decode= arguments to commands.watch() so it can be used for gathering logs. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Sometimes we want to to stop watching a command early and terminate the
command, ignoring the exist code.
An example is watching a resource status:
$ kubectl get foo/bar -o jsonpath='{.status}{"\n"} --watch
{"phase": "Create"}
{"phase": "Create"}
{"phase": "Ready"}
We want to stop watching when "phase" is "Ready". This change adds this
capability by handling the GeneratorExit exception raised inside
commands.watch() when you close the return value. When closed, we kill
the watched process and return, ignoring the exit code.
Example usage:
# Keep the generator object.
watcher = commands.watch("kubectl", "get", "foo/bar", "-o", "jsonpath={.status}{"\\n"}', "--watch")
# Iterate over it...
for line in watcher:
status = json.loads(line)
if status["phase"] == "Ready":
# We are done!
watcher.close()
With this we can watch resources efficiently without polling. We could
do this with kubectl.wait(), but now we can detect a timeout, and we can
implement complex waiting logic not possible using jsonpath.
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
When running a command that does not support timeout, we can implement the timeout on our side. This change adds a timeout argument to commands.watch(). If the watched command does not terminate within the specified timeout, we kill it and raise commands.Timeout exception. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
This is a higher level helper for using `kubectl get --watch`. We
support only jsonpath output since we must have one line per event.
Because kubectl returns raw value for leaf nodes ({.status.phase} ->
Ready) instead of a json value ({.status.phase} -> "Ready"), we
cannot parse the json value in the helper.
Example usage - watching status changes:
for line in kubectl.watch(
"deploy/example",
jsonpath="{.status}",
context=context,
):
status = json.loads(line)
print(status)
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Based on ocs-operator commit[1], recommended by Ilya Dryomov[2]. We use static configuration for simplicity, using rook-ceph-override configmap[3]. We don't enable logging to file so we can get the logs via kubectl and not use minikube specific code. [1] red-hat-storage/ocs-operator@e39bb41 [2] https://tracker.ceph.com/issues/65487#note-4 [3] https://rook.io/docs/rook/latest-release/Storage-Configuration/Advanced/ceph-configuration/#example Signed-off-by: Nir Soffer <nsoffer@redhat.com>
With this configuration the rbd-mirror daemon logs also to
/data/rook/rook-ceph/log/ceph-client.rbd-mirror.a.log
This log file is pretty big, growing to 90 MiB in 12 hours on an idle
system, so I hope we can revert this change soon. Keeping this as
separate commit to make it easy to revert.
To copy the entire logs you need to use minikube specific code:
minikube cp -p dr1 dr1:/data/rook/rook-ceph/log/ceph-client.rbd-mirror.a.log $PWD
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
We have a random issue when rbd-mirror cannot connect to the remote
peer, and we time out waiting for daemon health after 600 seconds.
When this happens, we see ERROR status in rbd mirror pool status:
$ kubectl rook-ceph --context dr2 rbd mirror pool status -p replicapool --verbose
health: ERROR
daemon health: ERROR
image health: OK
images: 0 total
DAEMONS
service 4361:
instance_id: 4408
client_id: a
hostname: dr2
version: 18.2.2
leader: true
health: ERROR
callouts: unable to connect to remote cluster
In rbd-mirror log we can see:
8287-356f-4f81-87dc-51bb05942553.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin
debug 2024-04-07T05:18:11.585+0000 7fc86d4808c0 0 rbd::mirror::PoolReplayer: 0x5589c90dc000
init_rados: reverting global config option override: mon_host:
[v2:192.168.122.98:3300,v1:192.168.122.98:6789] ->
unable to get monitor info from DNS SRV with service name: ceph-mon
debug 2024-04-07T05:18:11.602+0000 7fc86d4808c0 -1 failed for service _ceph-mon._tcp
debug 2024-04-07T05:18:11.602+0000 7fc86d4808c0 -1 monclient: get_monmap_and_config cannot
identify monitors to contact
After restarting the daemon it works normally.
Add a workaround restarting the rbd-mirror daemon if mirroring health is
not OK after 180 seconds. We try this 3 times, and fail if mirroring
health is still not OK after the last attempt.
Example log showing the workaround in action:
1. Attempt 1 times out
2024-04-09 15:31:37,070 DEBUG [rdr/0] Waiting for mirroring health in cluster 'dr1' (1/3)
2024-04-09 15:31:37,259 DEBUG [rdr/0] Cluster 'dr1' mirroring status': {'daemon_health': 'UNKNOWN', 'health': 'UNKNOWN', 'image_health': 'OK', 'states': {}}
2024-04-09 15:31:40,845 DEBUG [rdr/0] Cluster 'dr1' mirroring status': {'daemon_health': 'UNKNOWN', 'health': 'UNKNOWN', 'image_health': 'OK', 'states': {}}
2024-04-09 15:32:18,270 DEBUG [rdr/0] Cluster 'dr1' mirroring status': {'daemon_health': 'ERROR', 'health': 'ERROR', 'image_health': 'OK', 'states': {}}
2024-04-09 15:32:37,404 DEBUG [rdr/0] Cluster 'dr1' mirroring status': {'daemon_health': 'ERROR', 'health': 'ERROR', 'image_health': 'OK', 'states': {}}
2024-04-09 15:33:18,557 DEBUG [rdr/0] Cluster 'dr1' mirroring status': {'daemon_health': 'ERROR', 'health': 'ERROR', 'image_health': 'OK', 'states': {}}
2024-04-09 15:33:37,561 DEBUG [rdr/0] Cluster 'dr1' mirroring status': {'daemon_health': 'ERROR', 'health': 'ERROR', 'image_health': 'OK', 'states': {}}
2024-04-09 15:34:19,089 DEBUG [rdr/0] Cluster 'dr1' mirroring status': {'daemon_health': 'ERROR', 'health': 'ERROR', 'image_health': 'OK', 'states': {}}
2024-04-09 15:34:37,226 DEBUG [rdr/0] Timeout waiting for mirroring health in cluster 'dr1'
2. Restarting the rbd-mirror daemon
2024-04-09 15:34:37,226 DEBUG [rdr/0] Restarting deploy/rook-ceph-rbd-mirror-a in cluster 'dr1'
2024-04-09 15:34:37,391 DEBUG [rdr/0] deployment.apps/rook-ceph-rbd-mirror-a restarted
2024-04-09 15:34:37,395 DEBUG [rdr/0] Waiting until deploy/rook-ceph-rbd-mirror-a is rolled out in cluster 'dr1'
2024-04-09 15:34:37,597 DEBUG [rdr/0] Waiting for deployment "rook-ceph-rbd-mirror-a" rollout to finish: 0 out of 1 new replicas have been updated...
2024-04-09 15:34:37,622 DEBUG [rdr/0] Waiting for deployment "rook-ceph-rbd-mirror-a" rollout to finish: 1 old replicas are pending termination...
2024-04-09 15:34:41,475 DEBUG [rdr/0] Waiting for deployment "rook-ceph-rbd-mirror-a" rollout to finish: 1 old replicas are pending termination...
2024-04-09 15:34:41,562 DEBUG [rdr/0] deployment "rook-ceph-rbd-mirror-a" successfully rolled out
2. Attempt 2 succeeds
2024-04-09 15:34:41,568 DEBUG [rdr/0] Waiting for mirroring health in cluster 'dr1' (2/3)
2024-04-09 15:34:41,742 DEBUG [rdr/0] Cluster 'dr1' mirroring status': {'daemon_health': 'ERROR', 'health': 'ERROR', 'image_health': 'OK', 'states': {}}
2024-04-09 15:35:19,509 DEBUG [rdr/0] Cluster 'dr1' mirroring status': {'daemon_health': 'OK', 'health': 'OK', 'image_health': 'OK', 'states': {}}
2024-04-09 15:35:19,510 DEBUG [rdr/0] Cluster 'dr1' mirroring healthy in 37.94 seconds
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>
It is a list/array of elements and as per Google style guide it must be plural. Refer to https://google.github.io/styleguide/jsoncstyleguide.xml?showone=Singular_vs_Plural_Property_Names#Singular_vs_Plural_Property_Names Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>
Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>
Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>
Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>
Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>
Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>
Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>
Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>
Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
Signed-off-by: youhangwang <youhangwang@foxmail.com>
Signed-off-by: youhangwang <youhangwang@foxmail.com>
Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>
Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>
Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>
Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>
* add e2e framework Signed-off-by: jacklu <jilu@redhat.com>
Signed-off-by: Annaraya Narasagond <annaraya.narasagond@ibm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.