Skip to content

ADBDEV-8057: Fix pg_rewind leaving replica on old timeline#1970

Merged
silent-observer merged 3 commits intoadb-7.2.0from
ADBDEV-8057
Sep 19, 2025
Merged

ADBDEV-8057: Fix pg_rewind leaving replica on old timeline#1970
silent-observer merged 3 commits intoadb-7.2.0from
ADBDEV-8057

Conversation

@silent-observer
Copy link

@silent-observer silent-observer commented Sep 2, 2025

Remove bogus assertion in pg_rewind

It's entirely possible that the calculated "point of divergence" is
before the end of WAL on the target, because when we calculate the
point of divergence, we assume that the target's WAL extends to
infinity on its current timeline. It might be more clear to calculate
the end of WAL first, and if the target is a direct ancestor of the
source, consider the end of target WAL as the point of
divergence. However, it doesn't change the result: the target is a
direct ancestor of the source, and doesn't need to be rewound.

Add tests to cover the "no rewind required" cases, both when the
target's end of WAL is exactly at the point of divergence, which is
what the removed assertion assumed (case B in the test), and the case
where the assertion failed (case A).

Changes from original patch: test was made more general to prepare for more
detailed tests in the next commits. In particular, we pg_rewind from a
running source node, not from a stopped one, and the source node is passed as
an argument. This doesn't affect the test validity for this patch.

Discussion: https://www.postgresql.org/message-id/18575-1b9b3b60e2a480de@postgresql.org


Alternatively, perhaps just copying the history file would be enough?

Changes from original patch:

  • Move sourceHistfile initialization to before rewind_parseTimeLineHistory,
    since rewind_parseTimeLineHistory actually modifies it.
  • Don't copy timeline history file when dry_run is specified. Dry run
    shouldn't make changes that can potentially break the target server when
    applied incorrectly.
  • Fix 107_empty_conf.pl test to preserve primary_conninfo setting.

Fix pg_rewind edge case when checkpoint is at switchpoint

In one specific edge case, when the source server makes a checkpoint exactly
after switching to a new timeline, switchpoint LSN exactly matches latest
checkpoint LSN. This can happen, for example, if the source has just performed
archive recovery. Then setting minRecoveryPoint to the checkpoint LSN in
pg_rewind is illegal and causes an error at startup, since minRecoveryPoint
actually means the minimum allowed end of WAL, and we want that to be after
the switch occurs, not at the end of the previous timeline.

As a fix, add +1 to the checkpoint LSN when setting minRecoveryPoint, which
now correctly handles this case. Also add a test scenario for the source node
that has performed recovery.


Note: Do not squash to preserve authorship

@silent-observer silent-observer force-pushed the ADBDEV-8057 branch 2 times, most recently from cc0a1c8 to df9dd60 Compare September 3, 2025 14:00
@silent-observer silent-observer marked this pull request as ready for review September 4, 2025 06:13
@RekGRpth

This comment was marked as resolved.

@RekGRpth

This comment was marked as resolved.

@KnightMurloc
Copy link

Use checkpoint+1 to account for the edge case when the checkpoint exactly
matches switchpoint LSN.

Is this not the same case as described in the first commit?

Add the appropriate description in the second commit.

@KnightMurloc
Copy link

If switches to the first commit, the tests pass with unchanged to pg_rewind.

silent-observer pushed a commit that referenced this pull request Sep 15, 2025
It's entirely possible that the calculated "point of divergence" is
before the end of WAL on the target, because when we calculate the
point of divergence, we assume that the target's WAL extends to
infinity on its current timeline. It might be more clear to calculate
the end of WAL first, and if the target is a direct ancestor of the
source, consider the end of target WAL as the point of
divergence. However, it doesn't change the result: the target is a
direct ancestor of the source, and doesn't need to be rewound.

Add tests to cover the "no rewind required" cases, both when the
target's end of WAL is exactly at the point of divergence, which is
what the removed assertion assumed (case B in the test), and the case
where the assertion failed (case A).

Changes from original patch: test was made more general to prepare for more
detailed tests in the next commits. In particular, we pg_rewind from a
running source node, not from a stopped one, and the source node is passed as
an argument. This doesn't affect the test validity for this patch.

Discussion: https://www.postgresql.org/message-id/18575-1b9b3b60e2a480de@postgresql.org

Ticket: ADBDEV-8057
silent-observer pushed a commit that referenced this pull request Sep 15, 2025
Alternatively, perhaps just copying the history file would be enough?

Changes from original patch:
- Move sourceHistfile initialization to before rewind_parseTimeLineHistory,
  since rewind_parseTimeLineHistory actually modifies it.
- Don't copy timeline history file when dry_run is specified. Dry run
  shouldn't make changes that can potentially break the target server when
  applied incorrectly.
- Fix 107_empty_conf.pl test to preserve primary_conninfo setting.

Ticket: ADBDEV-8057
silent-observer added a commit that referenced this pull request Sep 15, 2025
In one specific edge case, when the source server makes a checkpoint exactly
after switching to a new timeline, switchpoint LSN exactly matches latest
checkpoint LSN. This can happen, for example, if the source has just performed
archive recovery. Then setting minRecoveryPoint to the checkpoint LSN in
pg_rewind is illegal and causes an error at startup, since minRecoveryPoint
actually means the minimum allowed end of WAL, and we want that to be after
the switch occurs, not at the end of the previous timeline.

As a fix, add +1 to the checkpoint LSN when setting minRecoveryPoint, which
now correctly handles this case. Also add a test scenario for the source node
that has performed recovery.

Ticket: ADBDEV-8057
@silent-observer
Copy link
Author

I split the patch into three commits now.

  • The first contains the assert removal and an updated test that can be adapted to second and third commits with minimal changes.
  • The second contains the bulk of the patch by Heikki, with minor fixes, and updates to the test that only pass after the second commit.
  • The third commit contains a fix for minRecoveryPoint edge case, and extends the test more to cover it.
    Now the test changes are fully sequential: the tests pass on each of the three commits, but only with corresponding changes to pg_rewind present.

silent-observer pushed a commit that referenced this pull request Sep 16, 2025
Alternatively, perhaps just copying the history file would be enough?

Changes from original patch:
- Refactor sourceHistfile to be an out parameter instead of global state.
- Move sourceHistfile initialization to before rewind_parseTimeLineHistory,
  since rewind_parseTimeLineHistory actually modifies it.
- Don't copy timeline history file when dry_run is specified. Dry run
  shouldn't make changes that can potentially break the target server when
  applied incorrectly.
- Fix 107_empty_conf.pl test to preserve primary_conninfo setting.

Ticket: ADBDEV-8057
silent-observer added a commit that referenced this pull request Sep 16, 2025
In one specific edge case, when the source server makes a checkpoint exactly
after switching to a new timeline, switchpoint LSN exactly matches latest
checkpoint LSN. This can happen, for example, if the source has just performed
archive recovery. Then setting minRecoveryPoint to the checkpoint LSN in
pg_rewind is illegal and causes an error at startup, since minRecoveryPoint
actually means the minimum allowed end of WAL, and we want that to be after
the switch occurs, not at the end of the previous timeline.

As a fix, add +1 to the checkpoint LSN when setting minRecoveryPoint, which
now correctly handles this case. Also add a test scenario for the source node
that has performed recovery.

Ticket: ADBDEV-8057
silent-observer pushed a commit that referenced this pull request Sep 16, 2025
Alternatively, perhaps just copying the history file would be enough?

Changes from original patch:
- Refactor sourceHistfile to be an out parameter instead of global state.
- Move sourceHistfile initialization to before rewind_parseTimeLineHistory,
  since rewind_parseTimeLineHistory actually modifies it.
- Don't copy timeline history file when dry_run is specified. Dry run
  shouldn't make changes that can potentially break the target server when
  applied incorrectly.
- Fix 107_empty_conf.pl test to preserve primary_conninfo setting.

Ticket: ADBDEV-8057
silent-observer added a commit that referenced this pull request Sep 16, 2025
In one specific edge case, when the source server makes a checkpoint exactly
after switching to a new timeline, switchpoint LSN exactly matches latest
checkpoint LSN. This can happen, for example, if the source has just performed
archive recovery. Then setting minRecoveryPoint to the checkpoint LSN in
pg_rewind is illegal and causes an error at startup, since minRecoveryPoint
actually means the minimum allowed end of WAL, and we want that to be after
the switch occurs, not at the end of the previous timeline.

As a fix, add +1 to the checkpoint LSN when setting minRecoveryPoint, which
now correctly handles this case. Also add a test scenario for the source node
that has performed recovery.

Ticket: ADBDEV-8057
silent-observer added a commit that referenced this pull request Sep 18, 2025
In one specific edge case, when the source server makes a checkpoint exactly
after switching to a new timeline, switchpoint LSN exactly matches latest
checkpoint LSN. This can happen, for example, if the source has just performed
archive recovery. Then setting minRecoveryPoint to the checkpoint LSN in
pg_rewind is illegal and causes an error at startup, since minRecoveryPoint
actually means the minimum allowed end of WAL, and we want that to be after
the switch occurs, not at the end of the previous timeline.

As a fix, add +1 to the checkpoint LSN when setting minRecoveryPoint, which
now correctly handles this case. Also add a test scenario for the source node
that has performed recovery.

Ticket: ADBDEV-8057
KnightMurloc
KnightMurloc previously approved these changes Sep 18, 2025
@RekGRpth

This comment was marked as resolved.

@RekGRpth
Copy link
Member

Refactor sourceHistfile to be an out parameter instead of global state.

pgindent these your changes

@silent-observer
Copy link
Author

Is it expected that changes in the src/bin/pg_rewind/t/107_empty_conf.pl test work without a patch?

Yes, changes to 107_empty_conf.pl only fix config line being accidentally deleted, they don't change the nature of the test. 010_target_is_ancestor.pl is the main test for this patch.

It's entirely possible that the calculated "point of divergence" is
before the end of WAL on the target, because when we calculate the
point of divergence, we assume that the target's WAL extends to
infinity on its current timeline. It might be more clear to calculate
the end of WAL first, and if the target is a direct ancestor of the
source, consider the end of target WAL as the point of
divergence. However, it doesn't change the result: the target is a
direct ancestor of the source, and doesn't need to be rewound.

Add tests to cover the "no rewind required" cases, both when the
target's end of WAL is exactly at the point of divergence, which is
what the removed assertion assumed (case B in the test), and the case
where the assertion failed (case A).

Changes from original patch: test was made more general to prepare for more
detailed tests in the next commits. In particular, we pg_rewind from a
running source node, not from a stopped one, and the source node is passed as
an argument. This doesn't affect the test validity for this patch.

Discussion: https://www.postgresql.org/message-id/18575-1b9b3b60e2a480de@postgresql.org

Ticket: ADBDEV-8057
silent-observer pushed a commit that referenced this pull request Sep 19, 2025
Alternatively, perhaps just copying the history file would be enough?

Changes from original patch:
- Refactor sourceHistfile to be an out parameter instead of global state.
- Move sourceHistfile initialization to before rewind_parseTimeLineHistory,
  since rewind_parseTimeLineHistory actually modifies it.
- Don't copy timeline history file when dry_run is specified. Dry run
  shouldn't make changes that can potentially break the target server when
  applied incorrectly.
- Fix 107_empty_conf.pl test to preserve primary_conninfo setting.

Ticket: ADBDEV-8057
silent-observer added a commit that referenced this pull request Sep 19, 2025
In one specific edge case, when the source server makes a checkpoint exactly
after switching to a new timeline, switchpoint LSN exactly matches latest
checkpoint LSN. This can happen, for example, if the source has just performed
archive recovery. Then setting minRecoveryPoint to the checkpoint LSN in
pg_rewind is illegal and causes an error at startup, since minRecoveryPoint
actually means the minimum allowed end of WAL, and we want that to be after
the switch occurs, not at the end of the previous timeline.

As a fix, add +1 to the checkpoint LSN when setting minRecoveryPoint, which
now correctly handles this case. Also add a test scenario for the source node
that has performed recovery.

Ticket: ADBDEV-8057
hlinnaka and others added 2 commits September 19, 2025 07:41
Alternatively, perhaps just copying the history file would be enough?

Changes from original patch:
- Refactor sourceHistfile to be an out parameter instead of global state.
- Move sourceHistfile initialization to before rewind_parseTimeLineHistory,
  since rewind_parseTimeLineHistory actually modifies it.
- Don't copy timeline history file when dry_run is specified. Dry run
  shouldn't make changes that can potentially break the target server when
  applied incorrectly.
- Fix 107_empty_conf.pl test to preserve primary_conninfo setting.

Ticket: ADBDEV-8057
In one specific edge case, when the source server makes a checkpoint exactly
after switching to a new timeline, switchpoint LSN exactly matches latest
checkpoint LSN. This can happen, for example, if the source has just performed
archive recovery. Then setting minRecoveryPoint to the checkpoint LSN in
pg_rewind is illegal and causes an error at startup, since minRecoveryPoint
actually means the minimum allowed end of WAL, and we want that to be after
the switch occurs, not at the end of the previous timeline.

As a fix, add +1 to the checkpoint LSN when setting minRecoveryPoint, which
now correctly handles this case. Also add a test scenario for the source node
that has performed recovery.

Ticket: ADBDEV-8057
@silent-observer silent-observer merged commit 742e818 into adb-7.2.0 Sep 19, 2025
5 checks passed
silent-observer pushed a commit that referenced this pull request Sep 19, 2025
It's entirely possible that the calculated "point of divergence" is
before the end of WAL on the target, because when we calculate the
point of divergence, we assume that the target's WAL extends to
infinity on its current timeline. It might be more clear to calculate
the end of WAL first, and if the target is a direct ancestor of the
source, consider the end of target WAL as the point of
divergence. However, it doesn't change the result: the target is a
direct ancestor of the source, and doesn't need to be rewound.

Add tests to cover the "no rewind required" cases, both when the
target's end of WAL is exactly at the point of divergence, which is
what the removed assertion assumed (case B in the test), and the case
where the assertion failed (case A).

Changes from original patch: test was made more general to prepare for more
detailed tests in the next commits. In particular, we pg_rewind from a
running source node, not from a stopped one, and the source node is passed as
an argument. This doesn't affect the test validity for this patch.

Discussion: https://www.postgresql.org/message-id/18575-1b9b3b60e2a480de@postgresql.org

Ticket: ADBDEV-8057
silent-observer pushed a commit that referenced this pull request Sep 19, 2025
Alternatively, perhaps just copying the history file would be enough?

Changes from original patch:
- Refactor sourceHistfile to be an out parameter instead of global state.
- Move sourceHistfile initialization to before rewind_parseTimeLineHistory,
  since rewind_parseTimeLineHistory actually modifies it.
- Don't copy timeline history file when dry_run is specified. Dry run
  shouldn't make changes that can potentially break the target server when
  applied incorrectly.
- Fix 107_empty_conf.pl test to preserve primary_conninfo setting.

Ticket: ADBDEV-8057
@silent-observer silent-observer deleted the ADBDEV-8057 branch September 19, 2025 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments