Skip to content

k8s/pvc存储卷动态扩容 #42

@kaybinwong

Description

@kaybinwong

发现grafana读取Prometheus的数据超时了,排查后发现Prometheus磁盘空间不足,导致pod crash后一直在重启。

[root@a-docker-cluster01 ~]# kubectl -n monitoring get po -o wide
NAME                                                         READY     STATUS             RESTARTS   AGE       IP              NODE
......
prometheus-wrinkled-gibbon-prometheus-prometheus-0           2/3       CrashLoopBackOff   108        132d       10.233.71.69    a-docker-cluster02
[root@a-docker-cluster01 ~]# kubectl -n monitoring logs prometheus-wrinkled-gibbon-prometheus-prometheus-0 prometheus
......
level=info ts=2018-12-23T13:28:30.866076641Z caller=main.go:608 msg="Notifier manager stopped"
level=error ts=2018-12-23T13:28:30.866158699Z caller=main.go:617 err="opening storage failed: zero-pad torn page: write /prometheus/wal/00000138: no space left on device"

看到这里只申请5G的存储空间,前面运行一直没问题,但是这周加了好几个监控指标,大量的抓取数据导致爆盘了。
只能怪当初没事先做好容量规划,欠的债始终是要还的。我们来看看能否动态扩容呢?我们来看看之前ceph rbd的定义。

[root@a-docker-cluster01 ~]# kubectl get storageclass rbd -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rbd
parameters:
  adminId: admin
  adminSecretName: ceph-admin-secret
  adminSecretNamespace: kube-system
  imageFeatures: layering
  imageFormat: "2"
  monitors: 170.0.0.1:6789
  pool: kube
  userId: admin
  userSecretName: ceph-admin-secret
provisioner: ceph.com/rbd
reclaimPolicy: Delete

遗憾的是,当初出于某些原因没有配置动态扩容。那么要怎么处理呢?其实办法还是有的,手动来扩容rbd块,步骤如下:

1、获取rbd镜像
2、扩容rbd镜像
3、更新pvc定义

首先,我们来获取rbd镜像。

  1. 获取pvc列表
[root@a-docker-cluster01 ~]# kubectl -n monitoring get pvc
NAME                                                                                                             STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
alertmanager-wrinkled-gibbon-prometheus-alertmanager-db-alertmanager-wrinkled-gibbon-prometheus-alertmanager-0   Bound     pvc-18aaa76f-04f3-11e9-8438-00163e086b4d   5Gi        RWO            rbd            132d
prometheus-wrinkled-gibbon-prometheus-prometheus-db-prometheus-wrinkled-gibbon-prometheus-prometheus-0           Bound     pvc-1cabd371-04f3-11e9-8438-00163e086b4d   5Gi        RWO            rbd            132d
  1. 获取pvc详细信息
[root@a-docker-cluster01 ~]# kubectl -n monitoring get pvc prometheus-wrinkled-gibbon-prometheus-prometheus-db-prometheus-wrinkled-gibbon-prometheus-prometheus-0 -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    volume.beta.kubernetes.io/storage-provisioner: ceph.com/rbd
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app: prometheus
    prometheus: wrinkled-gibbon-prometheus-prometheus
  name: prometheus-wrinkled-gibbon-prometheus-prometheus-db-prometheus-wrinkled-gibbon-prometheus-prometheus-0
  namespace: monitoring
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: rbd
  volumeName: pvc-1cabd371-04f3-11e9-8438-00163e086b4d
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 5Gi
  phase: Bound
  1. 获取卷信息
[root@a-docker-cluster01 ~]# kubectl get pv pvc-1cabd371-04f3-11e9-8438-00163e086b4d -o yaml
apiVersion: v1
kind: PersistentVolume
.....
    image: kubernetes-dynamic-pvc-59e33f2a-04f3-11e9-8122-86bd83a719ac

image: kubernetes-dynamic-pvc-xxxx就是rbd镜像的名字

其次,扩容rbd镜像

  1. 直接在ansible控制机执行远程shell,查询pod所在主机的设备挂载信息
[root@a-docker-cluster01 ~]# ssh a-docker-cluster02 'df -h | grep 59e33f2a-04f3-11e9-8122-86bd83a719ac'
/dev/rbd0       4.8G  4.6G     0 100% /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/kube-image-kubernetes-dynamic-pvc-59e33f2a-04f3-11e9-8122-86bd83a719ac

发现,的确也是存储空间满了。

  1. 扩容rbd镜像, 也就是扩容pvc
[root@a-docker-cluster01 ~]# rbd resize --size 20480 kube/kubernetes-dynamic-pvc-59e33f2a-04f3-11e9-8122-86bd83a719ac
Resizing image: 100% complete...done.
  1. 扩容挂载分区容量
[root@a-docker-cluster01 ~]# ssh a-docker-cluster02 'resize2fs /dev/rbd0'
resize2fs 1.42.9 (28-Dec-2013)
Filesystem at /dev/rbd0 is mounted on /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/kube-image-kubernetes-dynamic-pvc-59e33f2a-04f3-11e9-8122-86bd83a719ac; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 2
The filesystem on /dev/rbd0 is now 5242880 blocks long.
  1. 再次验证rbd容量
[root@a-docker-cluster01 ~]# ssh a-docker-cluster02 'df -h | grep 59e33f2a-04f3-11e9-8122-86bd83a719ac'
/dev/rbd0       20G  5.3G  14.7G  27% /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/kube-image-kubernetes-dynamic-pvc-59e33f2a-04f3-11e9-8122-86bd83a719ac

我们发现,扩容成功,容量变成了20G。

最后,更新定义

因为 kubernetes 并不会感知到 rbd 的变化,因此需要手动修正 rbd 大小的显示信息,将pv的容量设置成20G。

[root@a-docker-cluster01 prometheus]# kubectl get pv -n monitoring 
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                                                                                                                       STORAGECLASS   REASON    AGE
pvc-18aaa76f-04f3-11e9-8438-00163e086b4d   5Gi        RWO            Delete           Bound     monitoring/alertmanager-wrinkled-gibbon-prometheus-alertmanager-db-alertmanager-wrinkled-gibbon-prometheus-alertmanager-0   rbd                      132d
pvc-1cabd371-04f3-11e9-8438-00163e086b4d   5Gi        RWO            Delete           Bound     monitoring/prometheus-wrinkled-gibbon-prometheus-prometheus-db-prometheus-wrinkled-gibbon-prometheus-prometheus-0           rbd                      132d
[root@a-docker-cluster01 prometheus]# kubectl -n monitoring edit pv pvc-1cabd371-04f3-11e9-8438-00163e086b4d 
persistentvolume "pvc-1cabd371-04f3-11e9-8438-00163e086b4d" edited
[root@a-docker-cluster01 prometheus]# kubectl get pv -n monitoring 
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                                                                                                                       STORAGECLASS   REASON    AGE
pvc-18aaa76f-04f3-11e9-8438-00163e086b4d   5Gi        RWO            Delete           Bound     monitoring/alertmanager-wrinkled-gibbon-prometheus-alertmanager-db-alertmanager-wrinkled-gibbon-prometheus-alertmanager-0   rbd                      132d
pvc-1cabd371-04f3-11e9-8438-00163e086b4d   20Gi       RWO            Delete           Bound     monitoring/prometheus-wrinkled-gibbon-prometheus-prometheus-db-prometheus-wrinkled-gibbon-prometheus-prometheus-0           rbd                      132d

另外再将Prometheus默认存储时间从15d降低至3d,并升级helm。

image

注:
1、v1.9.2版本无法动态更改pvc声明,只能更改PV定义。
2、实际上k8s在v1.11以上版本中,已经实现pvc自动扩容,并且默认激活。

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions