Skip to content

Implement cluster shrink (2nd phase)#2247

Open
whitehawk wants to merge 15 commits intofeature/ADBDEV-6608from
GG-110
Open

Implement cluster shrink (2nd phase)#2247
whitehawk wants to merge 15 commits intofeature/ADBDEV-6608from
GG-110

Conversation

@whitehawk
Copy link

@whitehawk whitehawk commented Feb 16, 2026

Implement cluster shrink (2nd phase)

List of changes:

  • Change the order of shrunk segment processes stopping. Now mirrors are
    stopped strictly after primaries in order to avoid hanging replication
    processes.
  • Do not stop the tool execution in case we couldn't stop some of the shrinked
    segments. Now we only emit a warning. It is done to comply with the
    requirements.
  • Rework fault injection when stopping a segment due to the item above, as now
    we will not stop in case of an exception inside the 'SegmentStopAfterShrink'
    worker. So now, when a fault is injected, send SIGINT to the ggrebalance
    process to halt its work.
  • Improve logging inside 'SegmentStopAfterShrink'.
  • Add support for redistribution of materialized views, external writable
    tables, partitioned tables, unlogged tables. Skip processing of temp tables.
    It is done to comply with the requirements.
  • Add 'relkind' info into the rebalance schema, as materialized views require
    2-phase handling: 1st to perform 'ALTER TABLE ... REBALANCE' and 2nd to
    perform 'REFRESH MATERIALIZED VIEW'.
  • Implement the mentioned 2-phase handling for materialized views. Add separate
    type of workers 'MatViewRefreshTask' (derived from 'TableRebalanceTask') to work
    during the 2nd phase. 2nd phase doesn't intersect with the 1st phase, because
    REFRESH may not properly update the materialized view if the table it depends
    on is currently being rebalanced. For ex.:
1: create table t(a int) distributed by (a);
1: create materialized view mv_t as select a from t distributed by (a);
1: alter materialized view mv_t rebalance 1;

2: begin;
2: alter table t rebalance 1;

1: refresh materialized view mv_t;
2: commit;

In this case the materialized view will contain values only from 1 segment. It
is a consequence of how Postgres/Greengage work with the snapshots of the system
catalog and the table's data. When the REFRESH command starts, it works with the
snapshot of table data (from 't'), where the data is still distributed across
several segments (as the transaction of session №2 is not yet committed). Then
REFRESH command suspends its execution as it depends on 't' locked by the other
transaction. Once the second transaction commits, REFRESH is unlocked, and it
immediately sees the changes of system catalog (it is 'work as designed' in
Postgres/Greengage), so it sees that the 't' distribution is the same as 'mv_t',
and creates a plan to update data in 'mv_t' without motions. And, as result, we
have data in the materialized view only from one segment.
To resolve this situation, we separate the stages of tables rebalance and
materialized views refreshing in time.

  • Add checks that the database and the table exists before we actually start
    to rebalance the table. It is needed as one could drop it in parallel after we
    have created the rebalance table list.
  • Add retry logic into table rebalance worker. It is needed, when for ex.,
    other session opens a transaction after we have created the rebalance table
    list, drops the table before we started to rebalance it, and commits the
    transaction when we started to rebalance the table (and are hanging on the
    table's locks).
  • Remove not used flag 'needs_repopulate'.
  • Add new behave test cases and update old ones to cover the new functionality.
  • Add new behave step definitions to support the updates in the tests.
  • Fix behave test steps for view/matview creation - they opened a connection,
    but didn't use it. Instead, they tried to use the connection from the context,
    which was not properly configured.
  • Update code in the behave utils to support new test step definitions for
    materialized views and unlogged tables.
  • Add into the fault injector the ability to suspend execution instead of
    crashing it.

@whitehawk whitehawk changed the title Gg 110 Implement cluster shrink (2nd phase) Feb 18, 2026
@whitehawk whitehawk marked this pull request as ready for review February 18, 2026 05:48
@KnightMurloc

This comment was marked as resolved.

@whitehawk
Copy link
Author

2nd to perform 'REFRESH MATERIALIZED VIEW'.

Why is just rebalancing not enough? does gpexpand refresh mat view after expanding?

After f2f discussion, we need to evaluate CTAS approach for mat views, as current approach can have potential issues with race condition, if one mat view depends on another mat view. Created GG-225.

@KnightMurloc
Copy link

Skip processing of temp tables

Why? If the database is still available to users during shrink, what will be the state of their temporary tables after that?

@bimboterminator1
Copy link
Member

Created GG-225.

Current approach won't be cut off for now?

@when('the user waits till {process_name} prints "{log_msg}" in the logs')
@then('the user waits till {process_name} prints "{log_msg}" in the logs')
def impl(context, process_name, log_msg):
command = "while sleep 0.1; " \
Copy link
Member

@bimboterminator1 bimboterminator1 Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is any timeout needed in case of log_msg absence ? Or it's only fair for run_async_command?

Comment on lines +649 to +650
if self.rel_kind == 'm' and table_exists:
self.shrink.rebalance_schema.setStatusForTableToRebalance(self.db_name, self.schema_name, self.rel_name, self.STATUS_MAT_VIEW_REFRESH_REQUIRED)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we need to do analyze for mat view? mat views have their own statistics in pg_statistics.

Comment on lines +1504 to +1524
@given('a long-run session starts')
@when('a long-run session starts')
@then('a long-run session starts')
def impl(context):
dbname = 'gptest'
context.long_run_conn = dbconn.connect(dbconn.DbURL(dbname=dbname), unsetSearchPath=False)

@given('a long-run session ends')
@when('a long-run session ends')
@then('a long-run session ends')
def impl(context):
if context.long_run_conn != None:
context.long_run_conn.close()
context.long_run_conn = None

@given('sql "{sql}" is executed in a long-run session')
@when('sql "{sql}" is executed in a long-run session')
@then('sql "{sql}" is executed in a long-run session')
def impl(context, sql):
dbconn.execSQL(context.long_run_conn, sql)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use:
the user connects to "{dbname}" with named connection "{cname}"
the user drops the named connection "{cname}"
and
the user executes "{sql}" with named connection "{cname}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments