Ran upgrade script from 1.11.6 cluster to 1.12.7, masters failed due to temporary api server inavailability, ansible aborted. kubectl get nodes showed that masters were successfully upgraded, tried to re-run script to make sure all plays were performed, script now fails on package install
Detect no change necessary on masters for stages that were successful, only apply needed changes
$ ansible-playbook -i clus.yaml wardroom/swizzle/upgrade.yml
...
TASK [kubernetes-master : add all of the kubernetes add-ons] ********************************************************************************
fatal: [arcadeqa-clus104-master1-c82e66.vm.qis.site.gs.com]: FAILED! => {"changed": true, "cmd": ["kubeadm", "alpha", "phase", "addon", "all", "--config", "/etc/kubernetes/kubeadm.conf"], "delta": "0:00:00.039686", "end": "2019-08-16 12:10:17.833855", "msg": "non-zero return code", "rc": 1, "start": "2019-08-16 12:10:17.794169", "stderr": "Get https://api.c104.qis.site.gs.com:6443/api/v1/namespaces/kube-system/configmaps/kube-dns: dial tcp: lookup api.c104.qis.site.gs.com on 127.0.0.53:53: no such host", "stderr_lines": ["Get https://api.c104.qis.site.gs.com:6443/api/v1/namespaces/kube-system/configmaps/kube-dns: dial tcp: lookup api.c104.qis.site.gs.com on 127.0.0.53:53: no such host"], "stdout": "", "stdout_lines": []}
fatal: [arcadeqa-clus104-master2-f69131.vm.qis.site.gs.com]: FAILED! => {"changed": true, "cmd": ["kubeadm", "alpha", "phase", "addon", "all"
, "--config", "/etc/kubernetes/kubeadm.conf"], "delta": "0:00:00.066816", "end": "2019-08-16 12:10:17.972071", "msg": "non-zero return code", "rc": 1, "start": "2019-08-16 12:10:17.905255", "stderr": "Get https://api.c104.qis.site.gs.com:6443/api/v1/namespaces/kube-system/configmaps/kube-dns: dial tcp: lookup api.c104.qis.site.gs.com on 127.0.0.53:53: no such host", "stderr_lines": ["Get https://api.c104.qis.site.gs.com:6443/api/v1/namespaces/kube-system/configmaps/kube-dns: dial tcp: lookup api.c104.qis.site.gs.com on 127.0.0.53:53: no such host"], "stdout": "", "stdout_lines": []}
fatal: [arcadeqa-clus104-master3-765238.vm.qis.site.gs.com]: FAILED! => {"changed": true, "cmd": ["kubeadm", "alpha", "phase", "addon", "all", "--config", "/etc/kubernetes/kubeadm.conf"], "delta": "0:00:00.123016", "end": "2019-08-16 12:10:18.019232", "msg": "non-zero return code", "rc": 1, "start": "2019-08-16 12:10:17.896216", "stderr": "Get https://api.c104.qis.site.gs.com:6443/api/v1/namespaces/kube-system/configmaps/kube-dns: dial tcp: lookup api.c104.qis.site.gs.com on 127.0.0.53:53: no such host", "stderr_lines": ["Get https://api.c104.qis.site.gs.com:6443/api/v1/namespaces/kube-system/configmaps/kube-dns: dial tcp: lookup api.c104.qis.site.gs.com on 127.0.0.53:53: no such host"], "stdout": "", "stdout_lines": []}
$ ansible-playbook -i clus.yaml wardroom/swizzle/upgrade.yml
...
TASK [kubernetes : install kubernetes packages] *********************************************************************************************
FAILED - RETRYING: install kubernetes packages (5 retries left).
FAILED - RETRYING: install kubernetes packages (5 retries left).
FAILED - RETRYING: install kubernetes packages (5 retries left).
FAILED - RETRYING: install kubernetes packages (4 retries left).
FAILED - RETRYING: install kubernetes packages (4 retries left).
FAILED - RETRYING: install kubernetes packages (4 retries left).
FAILED - RETRYING: install kubernetes packages (3 retries left).
FAILED - RETRYING: install kubernetes packages (3 retries left).
FAILED - RETRYING: install kubernetes packages (3 retries left).
FAILED - RETRYING: install kubernetes packages (2 retries left).
FAILED - RETRYING: install kubernetes packages (2 retries left).
FAILED - RETRYING: install kubernetes packages (2 retries left).
FAILED - RETRYING: install kubernetes packages (1 retries left).
FAILED - RETRYING: install kubernetes packages (1 retries left).
FAILED - RETRYING: install kubernetes packages (1 retries left).
[WARNING]: Could not find aptitude. Using apt-get instead
fatal: [arcadeqa-clus104-master2-f69131.vm.qis.site.gs.com]: FAILED! => {"attempts": 5, "cache_update_time": 1565971943, "cache_updated": false, "changed": false, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\" install 'kubernetes-cni=0.6.0-00'' failed: E: Packages were downgraded and -y was used without --allow-downgrades.\n", "rc": 100, "stderr": "E: Packages were downgraded and -y was used without --allow-downgrades.\n", "stderr_lines": ["E: Packages were downgraded and -y was used without --allow-downgrades."], "stdout": "Reading package lists...\nBuilding dependency tree...\nReading state information...\nThe following packages were automatically installed and are no longer required:\n conntrack cri-tools grub-pc-bin linux-headers-4.15.0-20\n linux-headers-4.15.0-20-generic linux-image-4.15.0-20-generic\n linux-modules-4.15.0-20-generic\nUse 'sudo apt autoremove' to remove them.\nThe following packages will be REMOVED:\n kubeadm kubelet\nThe following packages will be DOWNGRADED:\n kubernetes-cni\n0 upgraded, 0 newly installed, 1 downgraded, 2 to remove and 171 not upgraded.\n", "stdout_lines": ["Reading package lists...", "Building dependency tree...", "Reading state information...", "The following packages were automatically installed and are no longer required:", " conntrack cri-tools grub-pc-bin linux-headers-4.15.0-20", " linux-headers-4.15.0-20-generic linux-image-4.15.0-20-generic", " linux-modules-4.15.0-20-generic", "Use 'sudo apt autoremove' to remove them.", "The following packages will be REMOVED:", " kubeadm kubelet", "The following packages will be DOWNGRADED:", " kubernetes-cni", "0 upgraded, 0 newly installed, 1 downgraded, 2 to remove and 171 not upgraded."]}
/kind bug
What steps did you take and what happened:
Ran upgrade script from 1.11.6 cluster to 1.12.7, masters failed due to temporary api server inavailability, ansible aborted. kubectl get nodes showed that masters were successfully upgraded, tried to re-run script to make sure all plays were performed, script now fails on package install
What did you expect to happen:
Detect no change necessary on masters for stages that were successful, only apply needed changes
Anything else you would like to add:
Environment:
branch1.12/etc/os-release): ubuntu 18.04@craigtracey