-
Notifications
You must be signed in to change notification settings - Fork 39
Add creation of OVN DB backups #557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -26,13 +26,6 @@ function wait_for_ovsdb_tool { | |
| done | ||
| } | ||
|
|
||
| function wait_for_ovsdb_client { | ||
| while pgrep -a ovsdb-client; do | ||
| echo "ovsdb-client still working. Waiting..." | ||
| sleep 1 | ||
| done | ||
| } | ||
|
|
||
| function check_and_convert_db { | ||
| local db_socket="${OVN_RUNDIR}/ovn${DB_TYPE}_db.sock" | ||
| local db_schema="/usr/share/ovn/ovn-${DB_TYPE}.ovsschema" | ||
|
|
@@ -52,8 +45,15 @@ function check_and_convert_db { | |
| if test X"`ovsdb-client needs-conversion unix:$db_socket "$db_schema"`" = Xyes; then | ||
| echo "Database conversion needed for ${DB_TYPE} database, performing conversion..." | ||
|
|
||
| # Set trap to wait for ovsdb-client to finish conversion | ||
| trap wait_for_ovsdb_client EXIT | ||
| local db_name="OVN_Northbound" | ||
| if [[ "${DB_TYPE}" == "sb" ]]; then | ||
| db_name="OVN_Southbound" | ||
| fi | ||
| local db_version | ||
| db_version=$(ovsdb-client get-schema-version "unix:$db_socket" "$db_name") | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the only concern here is this can pile up per db upgrade so we should be checking how many of these we should persist.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I'm just going to abandon the automatic backup idea. They can document making manual backups, and honestly I think the idea of backups is ultimately kind of silly when no matter what you have to do an ovn-db-sync-util run in any case, and it can create from nothing (bugs in that util are the only real reason for the backup, and we should fix those, but a manual "just in case" process seems sufficient). |
||
| local backup_file="${DB_FILE%.db}-backup-v${db_version}.db" | ||
| echo "Backing up ${db_name} v${db_version} to ${backup_file}..." | ||
| ovsdb-client backup "unix:$db_socket" "$db_name" > "$backup_file" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The ovndbcluster[-nb/-sb] pod has this startup_probe:
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @averdagu It's conceivable it could be a problem. With a test SBDB that I just loaded with 2M FDB entries, around 250MB, it took around 35 seconds on my laptop to do the backup. I've seen 500MB DBs occasionally. it'd be pretty unlikely to be a problem with NBDB, though I've seen some bigger NBDBs with tons of ACLs, etc.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @averdagu is it ok to bump the timeout? Or should we just insist that backup is a manual process that is done before upgrade?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Atleast it would not be a problem with startup and liveness probe as those checking for ovsdb-server process, the conversion we running post starting that. Only thing that should be confirmed if readiness probes are impacted due to this where we check cluster membership |
||
|
|
||
| if ovsdb-client convert unix:$db_socket "$db_schema"; then | ||
| echo "Database conversion completed successfully for ${DB_TYPE} database" | ||
|
|
@@ -62,10 +62,6 @@ function check_and_convert_db { | |
| trap - EXIT | ||
| return 1 | ||
| fi | ||
|
|
||
| # Wait for conversion to complete and remove trap | ||
| wait_for_ovsdb_client | ||
| trap - EXIT | ||
| fi | ||
| } | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -46,6 +46,7 @@ fi | |
| # call to ovn-ctl directly instead of start-${DB_TYPE}-db-server to pass | ||
| # extra_args after -- | ||
| set /usr/share/ovn/scripts/ovn-ctl --no-monitor | ||
| set "$@" --no-db-cluster-schema-upgrade | ||
|
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will prevent ovn-ctl from upgrading the schema so that only we are upgrading the schema during the check_and_convert_db step. |
||
|
|
||
| # Election timer (hardcoded initial value; runtime changes handled by operator) | ||
| set "$@" --db-${DB_TYPE}-election-timer=10000 | ||
|
|
@@ -155,7 +156,10 @@ if [[ "$(hostname)" == "{{ .SERVICE_NAME }}-0" ]]; then | |
| fi | ||
|
|
||
| # Check and perform database conversion if needed | ||
| # This can be cleaned up once https://redhat.atlassian.net/browse/FDP-3108 is fixed | ||
| # "Online" db conversion will happen when ovn-ctl starts a clustered DB unless | ||
| # --no-db-cluster-schema-upgrade is passed. Online conversion does no backup, | ||
| # making rollback impossible if something goes wrong during an upgrade and it | ||
| # needs to be reverted | ||
| check_and_convert_db | ||
|
|
||
| wait_for_ovsdb_tool | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is removed because ovsdb-client is not run in the background, there is nothing to wait on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this was not added for any background task but for the cases when pods are terminated in middle while ovsdb-client was running, it's similar to what we do for ovsdb-tool handling where empty db caused the issues in past while pods were terminated.
Or you observed some issues with these checks or just observed as cleanup task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the code, the only reason I could see for the ovsdb-tool wait was that setup.sh ran ovn-ctl with '&' to background it, and ovsdb-tool/ovsdb_tool() is called in ovs-lib in things like create_cluster, upgrade_cluster, etc.
ovsdb-client, from what I can tell, is only called via ovn-ctl through ovs-lib's upgrade_cluster(), which we would never call with this patch applied because we pass --no-db-cluster-schema-upgrade to ovn-ctl. So since the only place it will ever be called is ultimately from setup.sh, in the foreground, there would never be any reason to wait for the execution of ovsdb-client.
Let me know if I'm missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since osvdb-client is being called at
ovn-operator/templates/ovndbcluster/bin/functions
Lines 52 to 69 in 55b3554
The wait were added for foreground tasks itself to avoid the cases when pods are terminated/recreated/deleted for any reason with operator or human involvement. It may not have similar side effect as ovsdb-tool(in that case it was resulted into empty db) but still doesn't looks harmful, or you seeing some issue with keeping it?