Skip to content

The charm allows a 2-node cluster but it's not functional after a failover #570

@nobuto-m

Description

@nobuto-m

Steps to reproduce

  1. Prepare a MAAS provider
  2. deploy the charm with 2 units by following https://charmhub.io/postgresql/docs/h-scale
    juju deploy postgresql --base ubuntu@22.04 --channel 14/stable -n 2
  3. take down the primary unit

Expected behavior

It's either:

  • keep functional after taking down one of the two units
  • or prevent a two-node cluster from being deployed by making juju status blocked by suggesting 3 units instead

Actual behavior

Similar topic with #566.

Juju status looks okay at a glance. However, the living unit doesn't say which unit is the primary at the moment.

$ juju status
Model     Controller            Cloud/Region       Version  SLA          Timestamp
postgres  mysunbeam-controller  mysunbeam/default  3.5.3    unsupported  12:17:40Z

App         Version  Status  Scale  Charm       Channel    Rev  Exposed  Message
postgresql  14.11    active    1/2  postgresql  14/stable  429  no       

Unit           Workload  Agent  Machine  Public address   Ports     Message
postgresql/0   unknown   lost   0        192.168.151.115  5432/tcp  agent lost, see 'juju show-status-log postgresql/0'
postgresql/1*  active    idle   1        192.168.151.116  5432/tcp  

Machine  State    Address          Inst id    Base          AZ       Message
0        down     192.168.151.115  machine-7  ubuntu@22.04  default  Deployed
1        started  192.168.151.116  machine-8  ubuntu@22.04  default  Deployed

Also, the action states the dead unit is the primary, which shouldn't be.

$ juju run postgresql/leader get-primary
Running operation 3 with 1 task
  - task 4 on unit-postgresql-1

Waiting for task 4...
primary: postgresql/0

The patroni's member list cannot be fetched since the quorum of the raft was lost.

$ juju ssh postgresql/1 -- sudo -u snap_daemon env PATRONI_LOG_LEVEL=DEBUG patronictl -c /var/snap/charmed-postgresql/current/etc/patroni/patroni.yaml list
2024-08-05 12:20:16,176 - DEBUG - Loading configuration from file /var/snap/charmed-postgresql/current/etc/patroni/patroni.yaml
2024-08-05 12:20:21,243 - INFO - waiting on raft
2024-08-05 12:20:26,243 - INFO - waiting on raft
2024-08-05 12:20:31,244 - INFO - waiting on raft
2024-08-05 12:20:36,244 - INFO - waiting on raft
2024-08-05 12:20:41,245 - INFO - waiting on raft
2024-08-05 12:20:46,245 - INFO - waiting on raft
2024-08-05 12:20:51,246 - INFO - waiting on raft
2024-08-05 12:20:56,247 - INFO - waiting on raft
^C
Aborted!
Connection to 192.168.151.116 closed.

On a side note, the raft support is deprecated in patroni upstream as of 3.0.0.
https://patroni.readthedocs.io/en/latest/releases.html#version-3-0-0

Versions

Operating system: jammy

Juju CLI: 3.5.3

Juju agent: 3.5.3

Charm revision: 14/stable 429

LXD: N/A

Log output

Juju debug log:
model_debug.log

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working as expected

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions