Ut by asn1809 · Pull Request #3 · asn1809/ramen

asn1809 · 2024-05-02T14:53:07Z

No description provided.

Signed-off-by: Annaraya Narasagond <annaraya.narasagond@ibm.com>

Cleanup protectedPVCs that are stale

Signed-off-by: Annaraya Narasagond <annarayanarasagond@gmail.com> Signed-off-by: Annaraya Narasagond <annaraya.narasagond@ibm.com>

Signed-off-by: Annaraya Narasagond <annaraya.narasagond@ibm.com>

Fixes: RamenDR#1200 Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>

Also bump github.com/golang/protobuf to 1.5.4 To address related security vulnerability, see: - https://www.cve.org/CVERecord?id=CVE-2024-24786 - https://groups.google.com/g/golang-announce/c/ArQ6CDgtEjY/m/oLMrdq_GBQAJ Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>

By default, clusteradm installs the latest release. Extract a BUNDLE_VERSION constant to allow specifying specific ocm version. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

To test ocm changes, we need to build and push the images to a private image repository. When deploying we can use the new IMAGE_REGISTRY= constant to specify the image registry. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

This version[1] pulls ocm 0.13.1[2], fixing auto approval failures after joining the hub. [1] https://github.com/open-cluster-management-io/clusteradm/releases/tag/v0.8.1 [2] https://github.com/open-cluster-management-io/ocm/releases/tag/v0.13.1 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Ignore some flake8 rules conflicting with black code style so we can use automatic formatting. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Use the same configuration used in the full environment. This should make testing the minimal environment closer to the full one. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Rook quick start guide recommends to check `ceph status` using the toolbox after deploying the cluster[1]. Move the rook ceph toolbox before the rook pool, so we can validate the cluster status. [1] https://rook.io/docs/rook/latest/Getting-Started/quickstart/#create-a-ceph-cluster Signed-off-by: Nir Soffer <nsoffer@redhat.com>

To make sure that we wait correctly for the cluster. On the next failure the cluster status will be logged. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

This may help debug issues with ceph, and also validates that the toolbox works. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Hopefully this will make it easier to debug random failures in rbd-mirror. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

We add cephrbdmirror resource, but we don't wait until it is reconciled and become ready. Wait and log the resource status to make debugging easier on the next random timeout. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Hopefully this will help to debug issue when we have the next random timeout in rbd-mirror test. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

We have random failure timing out waiting for rbd-mirror. One possible reason may be bad ceph blocklist blocking rbd-daemon. Log the ceph osd blocklist before we wait for rbd daemon. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Previously we tried to wait for all deployment when starting a running minikube profile. This works most of the time, but fail if a deployment is in failed state (Progressing=False). Fix by restarting all the failed deployments. We don't wait until they are rolled out again, since the addons already wait for the deployments. I could reproduce the issue once with rook-ceph-operator, and restarting the deployment fixed it. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Signed-off-by: Abhijeet Shakya <abhijeetshakya21@gmail.com>

Previously limited to 1 worker per cluster due to various issues. Since the issues are fixed now, we can remove this limit. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Create the test structure upfront instead of building it in write_output. This will make it easier to add more info. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

When comparing runs we want to make sure we tested the same code. Normally you don't update the code during a run, so we can get the git commit and branch once at the start. Example: $ head -5 out/test.json { "git": { "commit": "9bb63eb6a7e0dfec1bc20144f81f84f4ed1540fb", "branch": "stress-git-info" }, Signed-off-by: Nir Soffer <nsoffer@redhat.com>

kubectl.label() has confusing arguments names. The first argument is the resource, and the second is the label (key=value or key-). Rename to make this more clear. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

The test helper "worked" since after starting the test clsuter the default context is updated by minikube. But if you start another environment or change the default context manually, the test would fail trying the access the wrong cluster, or worse, succeed silently while modifying the wrong cluster. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

When gathering logs from pods, we read large amount of data from kubectl, and write it to a file. Decoding every line on read and encoding on write is wasteful. Add keepends= and decode= arguments to commands.watch() so it can be used for gathering logs. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Sometimes we want to to stop watching a command early and terminate the command, ignoring the exist code. An example is watching a resource status: $ kubectl get foo/bar -o jsonpath='{.status}{"\n"} --watch {"phase": "Create"} {"phase": "Create"} {"phase": "Ready"} We want to stop watching when "phase" is "Ready". This change adds this capability by handling the GeneratorExit exception raised inside commands.watch() when you close the return value. When closed, we kill the watched process and return, ignoring the exit code. Example usage: # Keep the generator object. watcher = commands.watch("kubectl", "get", "foo/bar", "-o", "jsonpath={.status}{"\\n"}', "--watch") # Iterate over it... for line in watcher: status = json.loads(line) if status["phase"] == "Ready": # We are done! watcher.close() With this we can watch resources efficiently without polling. We could do this with kubectl.wait(), but now we can detect a timeout, and we can implement complex waiting logic not possible using jsonpath. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

When running a command that does not support timeout, we can implement the timeout on our side. This change adds a timeout argument to commands.watch(). If the watched command does not terminate within the specified timeout, we kill it and raise commands.Timeout exception. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

This is a higher level helper for using `kubectl get --watch`. We support only jsonpath output since we must have one line per event. Because kubectl returns raw value for leaf nodes ({.status.phase} -> Ready) instead of a json value ({.status.phase} -> "Ready"), we cannot parse the json value in the helper. Example usage - watching status changes: for line in kubectl.watch( "deploy/example", jsonpath="{.status}", context=context, ): status = json.loads(line) print(status) Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Based on ocs-operator commit[1], recommended by Ilya Dryomov[2]. We use static configuration for simplicity, using rook-ceph-override configmap[3]. We don't enable logging to file so we can get the logs via kubectl and not use minikube specific code. [1] red-hat-storage/ocs-operator@e39bb41 [2] https://tracker.ceph.com/issues/65487#note-4 [3] https://rook.io/docs/rook/latest-release/Storage-Configuration/Advanced/ceph-configuration/#example Signed-off-by: Nir Soffer <nsoffer@redhat.com>

With this configuration the rbd-mirror daemon logs also to /data/rook/rook-ceph/log/ceph-client.rbd-mirror.a.log This log file is pretty big, growing to 90 MiB in 12 hours on an idle system, so I hope we can revert this change soon. Keeping this as separate commit to make it easy to revert. To copy the entire logs you need to use minikube specific code: minikube cp -p dr1 dr1:/data/rook/rook-ceph/log/ceph-client.rbd-mirror.a.log $PWD Signed-off-by: Nir Soffer <nsoffer@redhat.com>

We have a random issue when rbd-mirror cannot connect to the remote peer, and we time out waiting for daemon health after 600 seconds. When this happens, we see ERROR status in rbd mirror pool status: $ kubectl rook-ceph --context dr2 rbd mirror pool status -p replicapool --verbose health: ERROR daemon health: ERROR image health: OK images: 0 total DAEMONS service 4361: instance_id: 4408 client_id: a hostname: dr2 version: 18.2.2 leader: true health: ERROR callouts: unable to connect to remote cluster In rbd-mirror log we can see: 8287-356f-4f81-87dc-51bb05942553.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin debug 2024-04-07T05:18:11.585+0000 7fc86d4808c0 0 rbd::mirror::PoolReplayer: 0x5589c90dc000 init_rados: reverting global config option override: mon_host: [v2:192.168.122.98:3300,v1:192.168.122.98:6789] -> unable to get monitor info from DNS SRV with service name: ceph-mon debug 2024-04-07T05:18:11.602+0000 7fc86d4808c0 -1 failed for service _ceph-mon._tcp debug 2024-04-07T05:18:11.602+0000 7fc86d4808c0 -1 monclient: get_monmap_and_config cannot identify monitors to contact After restarting the daemon it works normally. Add a workaround restarting the rbd-mirror daemon if mirroring health is not OK after 180 seconds. We try this 3 times, and fail if mirroring health is still not OK after the last attempt. Example log showing the workaround in action: 1. Attempt 1 times out 2024-04-09 15:31:37,070 DEBUG [rdr/0] Waiting for mirroring health in cluster 'dr1' (1/3) 2024-04-09 15:31:37,259 DEBUG [rdr/0] Cluster 'dr1' mirroring status': {'daemon_health': 'UNKNOWN', 'health': 'UNKNOWN', 'image_health': 'OK', 'states': {}} 2024-04-09 15:31:40,845 DEBUG [rdr/0] Cluster 'dr1' mirroring status': {'daemon_health': 'UNKNOWN', 'health': 'UNKNOWN', 'image_health': 'OK', 'states': {}} 2024-04-09 15:32:18,270 DEBUG [rdr/0] Cluster 'dr1' mirroring status': {'daemon_health': 'ERROR', 'health': 'ERROR', 'image_health': 'OK', 'states': {}} 2024-04-09 15:32:37,404 DEBUG [rdr/0] Cluster 'dr1' mirroring status': {'daemon_health': 'ERROR', 'health': 'ERROR', 'image_health': 'OK', 'states': {}} 2024-04-09 15:33:18,557 DEBUG [rdr/0] Cluster 'dr1' mirroring status': {'daemon_health': 'ERROR', 'health': 'ERROR', 'image_health': 'OK', 'states': {}} 2024-04-09 15:33:37,561 DEBUG [rdr/0] Cluster 'dr1' mirroring status': {'daemon_health': 'ERROR', 'health': 'ERROR', 'image_health': 'OK', 'states': {}} 2024-04-09 15:34:19,089 DEBUG [rdr/0] Cluster 'dr1' mirroring status': {'daemon_health': 'ERROR', 'health': 'ERROR', 'image_health': 'OK', 'states': {}} 2024-04-09 15:34:37,226 DEBUG [rdr/0] Timeout waiting for mirroring health in cluster 'dr1' 2. Restarting the rbd-mirror daemon 2024-04-09 15:34:37,226 DEBUG [rdr/0] Restarting deploy/rook-ceph-rbd-mirror-a in cluster 'dr1' 2024-04-09 15:34:37,391 DEBUG [rdr/0] deployment.apps/rook-ceph-rbd-mirror-a restarted 2024-04-09 15:34:37,395 DEBUG [rdr/0] Waiting until deploy/rook-ceph-rbd-mirror-a is rolled out in cluster 'dr1' 2024-04-09 15:34:37,597 DEBUG [rdr/0] Waiting for deployment "rook-ceph-rbd-mirror-a" rollout to finish: 0 out of 1 new replicas have been updated... 2024-04-09 15:34:37,622 DEBUG [rdr/0] Waiting for deployment "rook-ceph-rbd-mirror-a" rollout to finish: 1 old replicas are pending termination... 2024-04-09 15:34:41,475 DEBUG [rdr/0] Waiting for deployment "rook-ceph-rbd-mirror-a" rollout to finish: 1 old replicas are pending termination... 2024-04-09 15:34:41,562 DEBUG [rdr/0] deployment "rook-ceph-rbd-mirror-a" successfully rolled out 2. Attempt 2 succeeds 2024-04-09 15:34:41,568 DEBUG [rdr/0] Waiting for mirroring health in cluster 'dr1' (2/3) 2024-04-09 15:34:41,742 DEBUG [rdr/0] Cluster 'dr1' mirroring status': {'daemon_health': 'ERROR', 'health': 'ERROR', 'image_health': 'OK', 'states': {}} 2024-04-09 15:35:19,509 DEBUG [rdr/0] Cluster 'dr1' mirroring status': {'daemon_health': 'OK', 'health': 'OK', 'image_health': 'OK', 'states': {}} 2024-04-09 15:35:19,510 DEBUG [rdr/0] Cluster 'dr1' mirroring healthy in 37.94 seconds Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>

It is a list/array of elements and as per Google style guide it must be plural. Refer to https://google.github.io/styleguide/jsoncstyleguide.xml?showone=Singular_vs_Plural_Property_Names#Singular_vs_Plural_Property_Names Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>

Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

Signed-off-by: youhangwang <youhangwang@foxmail.com>

Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>

* add e2e framework Signed-off-by: jacklu <jilu@redhat.com>

Signed-off-by: Annaraya Narasagond <annaraya.narasagond@ibm.com>

Annaraya-Narasagond and others added 30 commits April 10, 2024 12:28

Cleanup protectedPVCs that are stale

1d640f5

Signed-off-by: Annaraya Narasagond <annaraya.narasagond@ibm.com>

UT fix

0cfc89b

Signed-off-by: Annaraya Narasagond <annaraya.narasagond@ibm.com>

Reducing complexity of the function

1a26b06

Signed-off-by: Annaraya Narasagond <annaraya.narasagond@ibm.com>

solving lint issues

85f6cff

Signed-off-by: Annaraya Narasagond <annaraya.narasagond@ibm.com>

Lint corrections

0ff685c

Signed-off-by: Annaraya Narasagond <annaraya.narasagond@ibm.com>

Merge pull request #1 from asn1809/chnages

4a8bc84

Cleanup protectedPVCs that are stale

Incorporating review comments

9ca5e1b

Signed-off-by: Annaraya Narasagond <annarayanarasagond@gmail.com> Signed-off-by: Annaraya Narasagond <annaraya.narasagond@ibm.com>

Correcting lint errors

9c8c517

Signed-off-by: Annaraya Narasagond <annaraya.narasagond@ibm.com>

Adding testcase for unprotecting PVC not bound errors

21f7b6c

Signed-off-by: Annaraya Narasagond <annaraya.narasagond@ibm.com>

Minor changes

447f248

Signed-off-by: Annaraya Narasagond <annaraya.narasagond@ibm.com>

drenv: Add option flag to pass flags in test/scritps/drenv-selftest

f759b9d

Fixes: RamenDR#1200 Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>

Support specific ocm version

dffa74f

By default, clusteradm installs the latest release. Extract a BUNDLE_VERSION constant to allow specifying specific ocm version. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Support private image registry

9323f8c

To test ocm changes, we need to build and push the images to a private image repository. When deploying we can use the new IMAGE_REGISTRY= constant to specify the image registry. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Adjust flake8 rules to accept black code style

9d31943

Ignore some flake8 rules conflicting with black code style so we can use automatic formatting. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Update rook env cpus

576a603

Use the same configuration used in the full environment. This should make testing the minimal environment closer to the full one. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Log ceph cluster status when ready

a4e7400

To make sure that we wait correctly for the cluster. On the next failure the cluster status will be logged. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Log output of ceph status when the toolbox is ready

c4215bc

This may help debug issues with ceph, and also validates that the toolbox works. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Log ceph block pool status

783917b

Hopefully this will make it easier to debug random failures in rbd-mirror. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Wait until rbd mirror is ready

9d99a54

We add cephrbdmirror resource, but we don't wait until it is reconciled and become ready. Wait and log the resource status to make debugging easier on the next random timeout. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Log ceph block pool status when mirroring is healthy

7c285d3

Hopefully this will help to debug issue when we have the next random timeout in rbd-mirror test. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Log ceph blocklist

e16638b

We have random failure timing out waiting for rbd-mirror. One possible reason may be bad ceph blocklist blocking rbd-daemon. Log the ceph osd blocklist before we wait for rbd daemon. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Introducing cache for kubevirt and cdi addons

76ba2e7

Signed-off-by: Abhijeet Shakya <abhijeetshakya21@gmail.com>

Increase the number of drenv workers

a79b582

Previously limited to 1 worker per cluster due to various issues. Since the issues are fixed now, we can remove this limit. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Clean up test construction

d8f2edd

Create the test structure upfront instead of building it in write_output. This will make it easier to add more info. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Improve arguments names

e7500e2

kubectl.label() has confusing arguments names. The first argument is the resource, and the second is the label (key=value or key-). Rename to make this more clear. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

nirs and others added 27 commits May 2, 2024 07:45

controllers: rename variable

416e8fe

Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>

config: add generated files

eb4ccac

Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>

Makefile: remove KUBE_OBJECT_PROTECTION_DISABLED configuration

ec67b14

Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>

config: set default value for ramenOpsNamespace

7231e36

Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>

Makefile: provide override for ramenOpsNamespace

a069009

Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>

config: enable multinamespace feature by default

c1fc9c6

Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>

config: set velero namespace by default

c8e18c3

Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>

Makefile: provide override for veleroNamespace

b8a6c6c

Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>

ramenctl: update configmap to match default configmap

68f1b6b

Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>

Add kustomize to required tools

fb88676

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

add diagrams about cephfs RDR

84dad82

Signed-off-by: youhangwang <youhangwang@foxmail.com>

add svg diagrams about cephfs RDR

38778d8

Signed-off-by: youhangwang <youhangwang@foxmail.com>

Add metric to report protected condition status

d7bf7e8

Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>

Consume workload protection metric while reconciling DRPC

ec75f4f

Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>

Add alert for workloads not protected beyond 10m

79663df

Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>

Add promlinter to set of CI linters

e033a6c

Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>

add e2e framework (RamenDR#1352)

4ef0abc

* add e2e framework Signed-off-by: jacklu <jilu@redhat.com>

Additional changes

de59faf

Signed-off-by: Annaraya Narasagond <annaraya.narasagond@ibm.com>

asn1809 force-pushed the main branch from 96cce87 to d3b1b8c Compare June 13, 2024 05:32

asn1809 force-pushed the main branch from 91f60d7 to eff856f Compare August 22, 2024 07:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ut#3

Ut#3
asn1809 wants to merge 57 commits into
mainfrom
ut

asn1809 commented May 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Conversation

asn1809 commented May 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants