Implement cluster shrink (2nd phase)#2247
Implement cluster shrink (2nd phase)#2247whitehawk wants to merge 15 commits intofeature/ADBDEV-6608from
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
After f2f discussion, we need to evaluate CTAS approach for mat views, as current approach can have potential issues with race condition, if one mat view depends on another mat view. Created GG-225. |
Why? If the database is still available to users during shrink, what will be the state of their temporary tables after that? |
Current approach won't be cut off for now? |
| @when('the user waits till {process_name} prints "{log_msg}" in the logs') | ||
| @then('the user waits till {process_name} prints "{log_msg}" in the logs') | ||
| def impl(context, process_name, log_msg): | ||
| command = "while sleep 0.1; " \ |
There was a problem hiding this comment.
Is any timeout needed in case of log_msg absence ? Or it's only fair for run_async_command?
| if self.rel_kind == 'm' and table_exists: | ||
| self.shrink.rebalance_schema.setStatusForTableToRebalance(self.db_name, self.schema_name, self.rel_name, self.STATUS_MAT_VIEW_REFRESH_REQUIRED) |
There was a problem hiding this comment.
Why don't we need to do analyze for mat view? mat views have their own statistics in pg_statistics.
| @given('a long-run session starts') | ||
| @when('a long-run session starts') | ||
| @then('a long-run session starts') | ||
| def impl(context): | ||
| dbname = 'gptest' | ||
| context.long_run_conn = dbconn.connect(dbconn.DbURL(dbname=dbname), unsetSearchPath=False) | ||
|
|
||
| @given('a long-run session ends') | ||
| @when('a long-run session ends') | ||
| @then('a long-run session ends') | ||
| def impl(context): | ||
| if context.long_run_conn != None: | ||
| context.long_run_conn.close() | ||
| context.long_run_conn = None | ||
|
|
||
| @given('sql "{sql}" is executed in a long-run session') | ||
| @when('sql "{sql}" is executed in a long-run session') | ||
| @then('sql "{sql}" is executed in a long-run session') | ||
| def impl(context, sql): | ||
| dbconn.execSQL(context.long_run_conn, sql) | ||
|
|
There was a problem hiding this comment.
Can we use:
the user connects to "{dbname}" with named connection "{cname}"
the user drops the named connection "{cname}"
and
the user executes "{sql}" with named connection "{cname}"
Implement cluster shrink (2nd phase)
List of changes:
stopped strictly after primaries in order to avoid hanging replication
processes.
segments. Now we only emit a warning. It is done to comply with the
requirements.
we will not stop in case of an exception inside the 'SegmentStopAfterShrink'
worker. So now, when a fault is injected, send SIGINT to the ggrebalance
process to halt its work.
tables, partitioned tables, unlogged tables. Skip processing of temp tables.
It is done to comply with the requirements.
2-phase handling: 1st to perform 'ALTER TABLE ... REBALANCE' and 2nd to
perform 'REFRESH MATERIALIZED VIEW'.
type of workers 'MatViewRefreshTask' (derived from 'TableRebalanceTask') to work
during the 2nd phase. 2nd phase doesn't intersect with the 1st phase, because
REFRESH may not properly update the materialized view if the table it depends
on is currently being rebalanced. For ex.:
In this case the materialized view will contain values only from 1 segment. It
is a consequence of how Postgres/Greengage work with the snapshots of the system
catalog and the table's data. When the REFRESH command starts, it works with the
snapshot of table data (from 't'), where the data is still distributed across
several segments (as the transaction of session №2 is not yet committed). Then
REFRESH command suspends its execution as it depends on 't' locked by the other
transaction. Once the second transaction commits, REFRESH is unlocked, and it
immediately sees the changes of system catalog (it is 'work as designed' in
Postgres/Greengage), so it sees that the 't' distribution is the same as 'mv_t',
and creates a plan to update data in 'mv_t' without motions. And, as result, we
have data in the materialized view only from one segment.
To resolve this situation, we separate the stages of tables rebalance and
materialized views refreshing in time.
to rebalance the table. It is needed as one could drop it in parallel after we
have created the rebalance table list.
other session opens a transaction after we have created the rebalance table
list, drops the table before we started to rebalance it, and commits the
transaction when we started to rebalance the table (and are hanging on the
table's locks).
but didn't use it. Instead, they tried to use the connection from the context,
which was not properly configured.
materialized views and unlogged tables.
crashing it.