Add creation of OVN DB backups by otherwiseguy · Pull Request #557 · openstack-k8s-operators/ovn-operator

otherwiseguy · 2026-04-17T01:00:55Z

In case rollback is required, generate backup files for the OVN DB files.

In case rollback is required, generate backup files for the OVN DB files. Resolves: OSPRH-28871 Signed-off-by: Terry Wilson <twilson@redhat.com>

openshift-ci · 2026-04-17T01:01:00Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: otherwiseguy
Once this PR has been reviewed and has the lgtm label, please assign slawqo for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2026-04-17T01:01:06Z

Hi @otherwiseguy. Thanks for your PR.

I'm waiting for a openstack-k8s-operators member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

otherwiseguy · 2026-04-17T16:46:03Z

    done
 }

-function wait_for_ovsdb_client {


This is removed because ovsdb-client is not run in the background, there is nothing to wait on.

Actually this was not added for any background task but for the cases when pods are terminated in middle while ovsdb-client was running, it's similar to what we do for ovsdb-tool handling where empty db caused the issues in past while pods were terminated.
Or you observed some issues with these checks or just observed as cleanup task?

Reading the code, the only reason I could see for the ovsdb-tool wait was that setup.sh ran ovn-ctl with '&' to background it, and ovsdb-tool/ovsdb_tool() is called in ovs-lib in things like create_cluster, upgrade_cluster, etc.

ovsdb-client, from what I can tell, is only called via ovn-ctl through ovs-lib's upgrade_cluster(), which we would never call with this patch applied because we pass --no-db-cluster-schema-upgrade to ovn-ctl. So since the only place it will ever be called is ultimately from setup.sh, in the foreground, there would never be any reason to wait for the execution of ovsdb-client.

Let me know if I'm missing something.

since osvdb-client is being called at

ovn-operator/templates/ovndbcluster/bin/functions

Lines 52 to 69 in 55b3554

if test X"`ovsdb-client needs-conversion unix:$db_socket "$db_schema"`" = Xyes; then

echo "Database conversion needed for ${DB_TYPE} database, performing conversion..."

# Set trap to wait for ovsdb-client to finish conversion

trap wait_for_ovsdb_client EXIT

if ovsdb-client convert unix:$db_socket "$db_schema"; then

echo "Database conversion completed successfully for ${DB_TYPE} database"

else

echo "Error: Database conversion failed for ${DB_TYPE} database"

trap - EXIT

return 1

fi

# Wait for conversion to complete and remove trap

wait_for_ovsdb_client

trap - EXIT

fi

The wait were added for foreground tasks itself to avoid the cases when pods are terminated/recreated/deleted for any reason with operator or human involvement. It may not have similar side effect as ovsdb-tool(in that case it was resulted into empty db) but still doesn't looks harmful, or you seeing some issue with keeping it?

otherwiseguy · 2026-04-17T16:47:17Z

 # call to ovn-ctl directly instead of start-${DB_TYPE}-db-server to pass
 # extra_args after --
 set /usr/share/ovn/scripts/ovn-ctl --no-monitor
+set "$@" --no-db-cluster-schema-upgrade


This will prevent ovn-ctl from upgrading the schema so that only we are upgrading the schema during the check_and_convert_db step.

otherwiseguy · 2026-04-17T23:59:50Z

Tested manually by running RHOSO 18.0.1 ovndb containers and then switching to current ci-framework installed image (nb schema 7.3.0 -> 7.12.0) and backup file was created.

averdagu · 2026-04-18T06:55:12Z

+        db_version=$(ovsdb-client get-schema-version "unix:$db_socket" "$db_name")
+        local backup_file="${DB_FILE%.db}-backup-v${db_version}.db"
+        echo "Backing up ${db_name} v${db_version} to ${backup_file}..."
+        ovsdb-client backup "unix:$db_socket" "$db_name" > "$backup_file"


The ovndbcluster[-nb/-sb] pod has this startup_probe:
startupProbe := &corev1.Probe{ // TODO might need tuning TimeoutSeconds: 5, PeriodSeconds: 3, FailureThreshold: 20, InitialDelaySeconds: 3, }
It means that it has 63 seconds (FailureThreshold * PeriodSeconds + InitialDelay) to be ready. On a large deployments can this operation take more than 1 minute?

@averdagu It's conceivable it could be a problem. With a test SBDB that I just loaded with 2M FDB entries, around 250MB, it took around 35 seconds on my laptop to do the backup. I've seen 500MB DBs occasionally. it'd be pretty unlikely to be a problem with NBDB, though I've seen some bigger NBDBs with tons of ACLs, etc.

@averdagu is it ok to bump the timeout? Or should we just insist that backup is a manual process that is done before upgrade?

Atleast it would not be a problem with startup and liveness probe as those checking for ovsdb-server process, the conversion we running post starting that. Only thing that should be confirmed if readiness probes are impacted due to this where we check cluster membership

karelyatin · 2026-04-21T05:26:13Z

Also observed backup/restore activity #559 , are those linked? or that was unknown and just parallel effort, just trying to get some context on all these parallel works and see if we can refine those together.

otherwiseguy · 2026-04-23T16:56:08Z

Also observed backup/restore activity #559 , are those linked? or that was unknown and just parallel effort, just trying to get some context on all these parallel works and see if we can refine those together.

Someone asked me in a jira issue if I thought that we needed something like that and I told them that I didn't think there would be any benefit to doing it. I wasn't aware someone had gone ahead and implemented it. Given the neutron db is the source of truth and outside of some initial static configuration that the operator would already handle, I believe we aim to make all of the OVSDB DBs to be able to be generated from the contents of the neutron DB. Add to that that the point of an active-active cluster is to have live backups as well and that any manually taken backup would have to be synced to the neutron db the same as an empty db would, and I don't really see the benefit of doing #559. Maybe someone else can come up with a reason I can't think of.

stuggi · 2026-04-24T11:38:53Z

Also observed backup/restore activity #559 , are those linked? or that was unknown and just parallel effort, just trying to get some context on all these parallel works and see if we can refine those together.

Someone asked me in a jira issue if I thought that we needed something like that and I told them that I didn't think there would be any benefit to doing it. I wasn't aware someone had gone ahead and implemented it. Given the neutron db is the source of truth and outside of some initial static configuration that the operator would already handle, I believe we aim to make all of the OVSDB DBs to be able to be generated from the contents of the neutron DB. Add to that that the point of an active-active cluster is to have live backups as well and that any manually taken backup would have to be synced to the neutron db the same as an empty db would, and I don't really see the benefit of doing #559. Maybe someone else can come up with a reason I can't think of.

That was my. FYIi, for now I have submitted a PR to our upstream full backup doc to add optionally add OVN backups openstack-k8s-operators/dev-docs#172 . Let me know if you think it should be the default and we can change it to be it.

The reason why @lmiccini started #559 is to look into an interface for ovn, which is like the one there is for galera. It was done/started after you shared the correct steps

karelyatin · 2026-04-30T13:32:03Z

+        db_version=$(ovsdb-client get-schema-version "unix:$db_socket" "$db_name")
+        local backup_file="${DB_FILE%.db}-backup-v${db_version}.db"
+        echo "Backing up ${db_name} v${db_version} to ${backup_file}..."
+        ovsdb-client backup "unix:$db_socket" "$db_name" > "$backup_file"


Atleast it would not be a problem with startup and liveness probe as those checking for ovsdb-server process, the conversion we running post starting that. Only thing that should be confirmed if readiness probes are impacted due to this where we check cluster membership

karelyatin · 2026-04-30T13:35:39Z

+            db_name="OVN_Southbound"
+        fi
+        local db_version
+        db_version=$(ovsdb-client get-schema-version "unix:$db_socket" "$db_name")


the only concern here is this can pile up per db upgrade so we should be checking how many of these we should persist.

I think I'm just going to abandon the automatic backup idea. They can document making manual backups, and honestly I think the idea of backups is ultimately kind of silly when no matter what you have to do an ovn-db-sync-util run in any case, and it can create from nothing (bugs in that util are the only real reason for the backup, and we should fix those, but a manual "just in case" process seems sufficient).

Add creation of OVN DB backups

b7c541e

In case rollback is required, generate backup files for the OVN DB files. Resolves: OSPRH-28871 Signed-off-by: Terry Wilson <twilson@redhat.com>

openshift-ci Bot requested review from slawqo and stuggi April 17, 2026 01:00

openshift-ci Bot added the needs-ok-to-test label Apr 17, 2026

otherwiseguy commented Apr 17, 2026

View reviewed changes

averdagu reviewed Apr 18, 2026

View reviewed changes

karelyatin mentioned this pull request Apr 21, 2026

Add OVNDBBackup and OVNDBRestore CRDs for managed backup/restore #559

Open

karelyatin requested changes Apr 30, 2026

View reviewed changes

openshift-ci Bot assigned karelyatin Apr 30, 2026

	if test X"`ovsdb-client needs-conversion unix:$db_socket "$db_schema"`" = Xyes; then
	echo "Database conversion needed for ${DB_TYPE} database, performing conversion..."

	# Set trap to wait for ovsdb-client to finish conversion
	trap wait_for_ovsdb_client EXIT

	if ovsdb-client convert unix:$db_socket "$db_schema"; then
	echo "Database conversion completed successfully for ${DB_TYPE} database"
	else
	echo "Error: Database conversion failed for ${DB_TYPE} database"
	trap - EXIT
	return 1
	fi

	# Wait for conversion to complete and remove trap
	wait_for_ovsdb_client
	trap - EXIT
	fi

Conversation

otherwiseguy commented Apr 17, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci Bot commented Apr 17, 2026

Uh oh!

openshift-ci Bot commented Apr 17, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

otherwiseguy commented Apr 17, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karelyatin commented Apr 21, 2026

Uh oh!

otherwiseguy commented Apr 23, 2026

Uh oh!

stuggi commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

otherwiseguy commented Apr 17, 2026 •

edited by openshift-ci Bot

Loading

stuggi commented Apr 24, 2026 •

edited

Loading