Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 9 additions & 13 deletions templates/ovndbcluster/bin/functions
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,6 @@ function wait_for_ovsdb_tool {
done
}

function wait_for_ovsdb_client {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is removed because ovsdb-client is not run in the background, there is nothing to wait on.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this was not added for any background task but for the cases when pods are terminated in middle while ovsdb-client was running, it's similar to what we do for ovsdb-tool handling where empty db caused the issues in past while pods were terminated.
Or you observed some issues with these checks or just observed as cleanup task?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the code, the only reason I could see for the ovsdb-tool wait was that setup.sh ran ovn-ctl with '&' to background it, and ovsdb-tool/ovsdb_tool() is called in ovs-lib in things like create_cluster, upgrade_cluster, etc.

ovsdb-client, from what I can tell, is only called via ovn-ctl through ovs-lib's upgrade_cluster(), which we would never call with this patch applied because we pass --no-db-cluster-schema-upgrade to ovn-ctl. So since the only place it will ever be called is ultimately from setup.sh, in the foreground, there would never be any reason to wait for the execution of ovsdb-client.

Let me know if I'm missing something.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since osvdb-client is being called at

if test X"`ovsdb-client needs-conversion unix:$db_socket "$db_schema"`" = Xyes; then
echo "Database conversion needed for ${DB_TYPE} database, performing conversion..."
# Set trap to wait for ovsdb-client to finish conversion
trap wait_for_ovsdb_client EXIT
if ovsdb-client convert unix:$db_socket "$db_schema"; then
echo "Database conversion completed successfully for ${DB_TYPE} database"
else
echo "Error: Database conversion failed for ${DB_TYPE} database"
trap - EXIT
return 1
fi
# Wait for conversion to complete and remove trap
wait_for_ovsdb_client
trap - EXIT
fi

The wait were added for foreground tasks itself to avoid the cases when pods are terminated/recreated/deleted for any reason with operator or human involvement. It may not have similar side effect as ovsdb-tool(in that case it was resulted into empty db) but still doesn't looks harmful, or you seeing some issue with keeping it?

while pgrep -a ovsdb-client; do
echo "ovsdb-client still working. Waiting..."
sleep 1
done
}

function check_and_convert_db {
local db_socket="${OVN_RUNDIR}/ovn${DB_TYPE}_db.sock"
local db_schema="/usr/share/ovn/ovn-${DB_TYPE}.ovsschema"
Expand All @@ -52,8 +45,15 @@ function check_and_convert_db {
if test X"`ovsdb-client needs-conversion unix:$db_socket "$db_schema"`" = Xyes; then
echo "Database conversion needed for ${DB_TYPE} database, performing conversion..."

# Set trap to wait for ovsdb-client to finish conversion
trap wait_for_ovsdb_client EXIT
local db_name="OVN_Northbound"
if [[ "${DB_TYPE}" == "sb" ]]; then
db_name="OVN_Southbound"
fi
local db_version
db_version=$(ovsdb-client get-schema-version "unix:$db_socket" "$db_name")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only concern here is this can pile up per db upgrade so we should be checking how many of these we should persist.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm just going to abandon the automatic backup idea. They can document making manual backups, and honestly I think the idea of backups is ultimately kind of silly when no matter what you have to do an ovn-db-sync-util run in any case, and it can create from nothing (bugs in that util are the only real reason for the backup, and we should fix those, but a manual "just in case" process seems sufficient).

local backup_file="${DB_FILE%.db}-backup-v${db_version}.db"
echo "Backing up ${db_name} v${db_version} to ${backup_file}..."
ovsdb-client backup "unix:$db_socket" "$db_name" > "$backup_file"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ovndbcluster[-nb/-sb] pod has this startup_probe:
startupProbe := &corev1.Probe{ // TODO might need tuning TimeoutSeconds: 5, PeriodSeconds: 3, FailureThreshold: 20, InitialDelaySeconds: 3, }
It means that it has 63 seconds (FailureThreshold * PeriodSeconds + InitialDelay) to be ready. On a large deployments can this operation take more than 1 minute?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@averdagu It's conceivable it could be a problem. With a test SBDB that I just loaded with 2M FDB entries, around 250MB, it took around 35 seconds on my laptop to do the backup. I've seen 500MB DBs occasionally. it'd be pretty unlikely to be a problem with NBDB, though I've seen some bigger NBDBs with tons of ACLs, etc.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@averdagu is it ok to bump the timeout? Or should we just insist that backup is a manual process that is done before upgrade?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atleast it would not be a problem with startup and liveness probe as those checking for ovsdb-server process, the conversion we running post starting that. Only thing that should be confirmed if readiness probes are impacted due to this where we check cluster membership


if ovsdb-client convert unix:$db_socket "$db_schema"; then
echo "Database conversion completed successfully for ${DB_TYPE} database"
Expand All @@ -62,10 +62,6 @@ function check_and_convert_db {
trap - EXIT
return 1
fi

# Wait for conversion to complete and remove trap
wait_for_ovsdb_client
trap - EXIT
fi
}

Expand Down
6 changes: 5 additions & 1 deletion templates/ovndbcluster/bin/setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ fi
# call to ovn-ctl directly instead of start-${DB_TYPE}-db-server to pass
# extra_args after --
set /usr/share/ovn/scripts/ovn-ctl --no-monitor
set "$@" --no-db-cluster-schema-upgrade
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will prevent ovn-ctl from upgrading the schema so that only we are upgrading the schema during the check_and_convert_db step.


# Election timer (hardcoded initial value; runtime changes handled by operator)
set "$@" --db-${DB_TYPE}-election-timer=10000
Expand Down Expand Up @@ -155,7 +156,10 @@ if [[ "$(hostname)" == "{{ .SERVICE_NAME }}-0" ]]; then
fi

# Check and perform database conversion if needed
# This can be cleaned up once https://redhat.atlassian.net/browse/FDP-3108 is fixed
# "Online" db conversion will happen when ovn-ctl starts a clustered DB unless
# --no-db-cluster-schema-upgrade is passed. Online conversion does no backup,
# making rollback impossible if something goes wrong during an upgrade and it
# needs to be reverted
check_and_convert_db

wait_for_ovsdb_tool
Expand Down