PostgreSQL: Kubernetes利用Patroni部署高可用PostgreSQL
1. 准备
部署高可用的PostgreSQL一直都是一个比较麻烦的事情,因为官方或者社区并没有一个official的解决方案。由于项目需要,所以花了一些时间去调研并且实践了一些关于PostgreSQL的主流HA方案,包括:Crunchy, Stolon 和 Patroni, 还有就是pgpool repmgr这种方案。看到网上关于Stolon的blog有一篇,但是那篇其实是翻译人家老外的,结果被到处转发(说明我们还是懒喜欢别人汉化的)。在国内外暂时还没找到Patroni在Kubernetes上部署的Blog。所以这里就简单分享一下,以便大家以后有遇到问题可以相互帮助。
方案选型
几种方案简单比较一下:
- 首先repmgr这种方案的算法有明显缺陷,非主流分布式算法,直接pass;
- Stolon和Patroni相对于Crunchy更加Cloud Native, 后者是基于pgPool实现。
- Crunchy和Patroni相对于Stolon有更多的使用者,并且提供了Operator对于以后的管理和扩容
根据上面简单的比较,最终我选择了Patroni,其实我内心上是更想使用Stolon的,因为它是golang写的,可以顺便学习一下人家的代码。但是其实学golang看kubernetes的源码非常足够了,另外Patroni还可以用operator部署,后期扩展更方便。
架构规划
总体框架是如上图所示,是比较简单的,利用haproxy来做服务发现和代理。PostgreSQL节点之间通过Streaming Sync进行同步。Patroni作为daemon进程用来管理PostgreSQL集群并且定期更新TTL至Kubernetes(也可以使用ETCD作为DCS)。
在我本机上我规划一个Kubernetes集群(1 master和1 minion), 有时候实在是卡,所以就把所有的资源尽量减少了。由于在家没连公司VPN,受限于大天朝墙的影响,直接找了一篇K8S国内的部署方法。具体流程网上都很多教程了,就不介绍了,有问题可以交流。
2. 部署Patroni
1.Dockerfile
我在Dockerfile的路径下创建了三个文件:
Dockerfile callback.sh entrypoint.sh
Dockerfile是直接从官方例子copy下来的,其实就是把callback.sh
和entrypoint.sh
编到image里面。内容如下:
FROM postgres:11
MAINTAINER Alexander Kukushkin <[email protected]>
RUN export DEBIAN_FRONTEND=noninteractive
&& echo 'APT::Install-Recommends "0";nAPT::Install-Suggests "0";' > /etc/apt/apt.conf.d/01norecommend
&& apt-get update -y
&& apt-get upgrade -y
&& apt-cache depends patroni | sed -n -e 's/.* Depends: (python3-. )$/1/p'
| grep -Ev '^python3-(sphinx|etcd|consul|kazoo|kubernetes)'
| xargs apt-get install -y vim-tiny curl jq locales git python3-pip python3-wheel
## Make sure we have a en_US.UTF-8 locale available
&& localedef -i en_US -c -f UTF-8 -A /usr/share/locale/locale.alias en_US.UTF-8
&& pip3 install setuptools
&& pip3 install 'git https://github.com/zalando/patroni.git#egg=patroni[kubernetes]'
&& PGHOME=/home/postgres
&& mkdir -p $PGHOME
&& chown postgres $PGHOME
&& sed -i "s|/var/lib/postgresql.*|$PGHOME:/bin/bash|" /etc/passwd
# Set permissions for OpenShift
&& chmod 775 $PGHOME
&& chmod 664 /etc/passwd
# Clean up
&& apt-get remove -y git python3-pip python3-wheel
&& apt-get autoremove -y
&& apt-get clean -y
&& rm -rf /var/lib/apt/lists/* /root/.cache
ADD entrypoint.sh /
ADD callback.sh /
RUN chmod x /callback.sh
RUN apt-get update -y && apt-get install iputils-ping -y
EXPOSE 5432 8008
ENV LC_ALL=en_US.UTF-8 LANG=en_US.UTF-8 EDITOR=/usr/bin/editor
USER postgres
WORKDIR /home/postgres
CMD ["/bin/bash", "/entrypoint.sh"]
callback.sh
是Patroni在发生Start/Stop/Reload等操作时,会调用的脚本,如果不想用的话可以在entrypoint.sh
里面把它删掉。这里我们在Start和Switchover的时候将新的master信息存到了current_master
这个table里面:
#!/bin/bash
export PGPASSWORD=postgres
readonly cb_name=$1
readonly role=$2
readonly scope=$3
# sleep 3 seconds in case that postgres is not ready
sleep 3
function usage() {
echo "Usage: $0 <on_start|on_role_change> ";
exit 1;
}
echo "this is patroni callback $cb_name $role $scope"
create_table="
CREATE TABLE IF NOT EXISTS current_master(
id serial PRIMARY KEY,
hostname text
);
"
insert_record="
INSERT INTO current_master (hostname) VALUES ('$HOSTNAME');
"
case $cb_name in
on_start)
if [[ $role == 'master' ]]; then
psql -h localhost -U postgres -d postgres -c "${create_table}";
psql -h localhost -U postgres -d postgres -c "${insert_record}";
fi
;;
on_role_change)
if [[ $role == 'master' ]]; then
psql -h localhost -U postgres -d postgres -c "${insert_record}";
fi
;;
*)
usage
;;
esac
entrypoint.sh
主要是patroni的配置信息,可以参考官方手册修改:
#!/bin/bash
if [[ $UID -ge 10000 ]]; then
GID=$(id -g)
sed -e "s/^postgres:x:[^:]*:[^:]*:/postgres:x:$UID:$GID:/" /etc/passwd > /tmp/passwd
cat /tmp/passwd > /etc/passwd
rm /tmp/passwd
fi
cat > /home/postgres/patroni.yml <<__EOF__
bootstrap:
dcs:
postgresql:
use_pg_rewind: true
initdb:
- auth-host: md5
- auth-local: trust
- encoding: UTF8
- locale: en_US.UTF-8
- data-checksums
pg_hba:
- host all all 0.0.0.0/0 md5
- host replication ${PATRONI_REPLICATION_USERNAME} ${PATRONI_KUBERNETES_POD_IP}/16 md5
restapi:
connect_address: '${PATRONI_KUBERNETES_POD_IP}:8008'
postgresql:
connect_address: '${PATRONI_KUBERNETES_POD_IP}:5432'
authentication:
superuser:
password: '${PATRONI_SUPERUSER_PASSWORD}'
replication:
password: '${PATRONI_REPLICATION_PASSWORD}'
callbacks:
on_start: /callback.sh
on_role_change: /callback.sh
__EOF__
unset PATRONI_SUPERUSER_PASSWORD PATRONI_REPLICATION_PASSWORD
export KUBERNETES_NAMESPACE=$PATRONI_KUBERNETES_NAMESPACE
export POD_NAME=$PATRONI_NAME
exec /usr/bin/python3 /usr/local/bin/patroni /home/postgres/patroni.yml
2.编译部署
有了上面的三个文件,就可以先编译出Docker Image:
docker build -t patroni .
然后,就是准备好Deploy的yaml文件patroni_k8s.yaml
, 这里是从官网下载下来的example并且进行修改:
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: &cluster_name patronidemo
labels:
application: patroni
cluster-name: *cluster_name
spec:
replicas: 2
serviceName: *cluster_name
selector:
matchLabels:
application: patroni
cluster-name: *cluster_name
template:
metadata:
namespace: default
labels:
application: patroni
cluster-name: *cluster_name
spec:
serviceAccountName: patronidemo
initContainers:
- name: init-permission
image: busybox
command:
- chown
- "-R"
- "999:999"
- "/home/postgres"
imagePullPolicy: IfNotPresent
volumeMounts:
- name: pgdata
mountPath: "/home/postgres"
containers:
- name: *cluster_name
image: patroni # docker build -t patroni .
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8008
protocol: TCP
- containerPort: 5432
protocol: TCP
volumeMounts:
- mountPath: /home/postgres/pgdata
name: pgdata
env:
- name: PATRONI_KUBERNETES_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: PATRONI_KUBERNETES_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
#- name: PATRONI_KUBERNETES_USE_ENDPOINTS
# value: 'true'
- name: PATRONI_KUBERNETES_LABELS
value: '{application: patroni, cluster-name: patronidemo}'
- name: PATRONI_SUPERUSER_USERNAME
value: postgres
- name: PATRONI_SUPERUSER_PASSWORD
valueFrom:
secretKeyRef:
name: *cluster_name
key: superuser-password
- name: PATRONI_REPLICATION_USERNAME
value: standby
- name: PATRONI_REPLICATION_PASSWORD
valueFrom:
secretKeyRef:
name: *cluster_name
key: replication-password
- name: PATRONI_SCOPE
value: *cluster_name
- name: PATRONI_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: PATRONI_POSTGRESQL_DATA_DIR
value: /home/postgres/pgdata/pgroot/data
- name: PATRONI_POSTGRESQL_PGPASS
value: /tmp/pgpass
- name: PATRONI_POSTGRESQL_LISTEN
value: '0.0.0.0:5432'
- name: PATRONI_RESTAPI_LISTEN
value: '0.0.0.0:8008'
terminationGracePeriodSeconds: 0
volumeClaimTemplates:
- metadata:
labels:
application: patroni
cluster-name: *cluster_name
name: pgdata
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
selector:
matchLabels:
app: patroni
resources:
requests:
storage: 1Gi
---
kind: PersistentVolume
apiVersion: v1
metadata:
name: patroni-pv-1
labels:
type: local
app: patroni
spec:
storageClassName: manual
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/patroni-1"
---
kind: PersistentVolume
apiVersion: v1
metadata:
name: patroni-pv-2
labels:
type: local
app: patroni
spec:
storageClassName: manual
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/patroni-2"
#---
#apiVersion: v1
#kind: Endpoints
#metadata:
# name: &cluster_name patronidemo
# labels:
# application: patroni
# cluster-name: *cluster_name
#subsets: []
---
apiVersion: v1
kind: Service
metadata:
name: &cluster_name patronidemo
labels:
application: patroni
cluster-name: *cluster_name
spec:
selector:
application: patroni
cluster-name: *cluster_name
type: ClusterIP
ports:
- port: 5432
targetPort: 5432
clusterIP: None
---
apiVersion: v1
kind: Secret
metadata:
name: &cluster_name patronidemo
labels:
application: patroni
cluster-name: *cluster_name
type: Opaque
data:
superuser-password: cG9zdGdyZXM=
replication-password: cG9zdGdyZXM=
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: patronidemo
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: patronidemo
rules:
- apiGroups:
- ""
resources:
- configmaps
verbs:
- create
- get
- list
- patch
- update
- watch
# delete is required only for 'patronictl remove'
- delete
- apiGroups:
- ""
resources:
- endpoints
verbs:
- get
- patch
- update
# the following three privileges are necessary only when using endpoints
- create
- list
- watch
# delete is required only for for 'patronictl remove'
- delete
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- list
- patch
- update
- watch
# The following privilege is only necessary for creation of headless service
# for patronidemo-config endpoint, in order to prevent cleaning it up by the
# k8s master. You can avoid giving this privilege by explicitly creating the
# service like it is done in this manifest (lines 2..10)
- apiGroups:
- ""
resources:
- services
verbs:
- create
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: patronidemo
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: patronidemo
subjects:
- kind: ServiceAccount
name: patronidemo
直接利用这个yaml文件就可以部署Patroni了:
kubectl create -f patroni_k8s.yaml
这里我只部署了两个replica,你可以根据需求多部署几个:
➜ ha-pg git:(master) ✗ kubectl get pod |grep patroni
patronidemo-0 1/1 Running 1 3h6m
patronidemo-1 1/1 Running 1 3h6m
再看一下log,可以发现这两个节点已经实现主从了:
➜ ha-pg git:(master) ✗ kubectl log patronidemo-0
2019-03-11 13:27:42,705 INFO: Lock owner: patronidemo-0; I am patronidemo-0
2019-03-11 13:27:42,712 INFO: no action. i am the leader with the lock
...
2019-03-11 13:28:02,705 INFO: Lock owner: patronidemo-0; I am patronidemo-0
2019-03-11 13:28:02,712 INFO: no action. i am the leader with the lock
尝试一把switchover,让master重启,可以看到slave切换成master了, 并且刚才加入的callback.sh
也成功执行了:
➜ ha-pg git:(master) ✗ kubectl delete pod patronidemo-0
➜ ha-pg git:(master) ✗ kubectl log patronidemo-1
2019-03-11 13:30:58,576 INFO: Lock owner: patronidemo-1; I am patronidemo-1
2019-03-11 13:30:58,608 INFO: no action. i am the leader with the lock
this is patroni callback on_role_change master patronidemo
INSERT 0 1
这里的yaml看起来比较复杂,解释起来比较花时间(懒)。但是大部分人是想直接copy文件拿来用,所以这里我只给出来了一个可行的yaml文件,并没有多花时间解释,因为大家可能对于Kubernetes比我熟悉(我只是搞开发的...555)。官方没帮我们解决PV的问题,我通过一个initContainer来解决。对于细节有疑问的我们可以交流。
小结
这一节,我们编译了Dockerfile生成了部署Patroni的image,并且修改官方的yaml文件成功部署起来了一个高可用的PostgreSQL集群。但是这里我们还没有解决服务发现和负载均衡的问题,下一节就是简单介绍如何解决服务发现。
3. 部署Haproxy
- Haproxy可以提供更多负载均衡方法
- Haproxy有良好的监控界面和metrics
- Service方法在处理Kubernetes的node掉电时响应速度非常慢,默认是40s。
我之前也实现了一版Service的方法,其实很简单,熟悉Kubernetes的结合官方文档,很快就可以配置出来。所以这里就只放出Haproxy的方法,你还可以使用Pgbouncer和Pgpool这些L7的代理。
Dockerfile
同样需要Dockerfile和haproxy.cfg
,当然相对于patroni来说很简单,而且体积还小。Dockerfile:
FROM haproxy:1.9-alpine
COPY haproxy.cfg /usr/local/etc/haproxy/haproxy.cfg
haproxy.cfg
里面主要是开启了一个监控端口1000,write端口5432和read端口5433:
global
maxconn 100
defaults
log global
mode tcp
retries 2
timeout client 30m
timeout connect 4s
timeout server 30m
timeout check 5s
listen stats
mode http
bind *:1000
stats enable
stats uri /
stats realm Haproxy Statistics
stats auth admin:admin123
stats admin if TRUE
listen write
bind *:5432
option httpchk GET /master
http-check expect status 200
default-server inter 3s fall 2 rise 2 on-marked-down shutdown-sessions
server write_server patronidemo-0.patronidemo:5432 maxconn 100 check port 8008
server write_server patronidemo-1.patronidemo:5432 maxconn 100 check port 8008
listen read
bind *:5433
option httpchk GET /replica
http-check expect status 200
default-server inter 3s fall 2 rise 2 on-marked-down shutdown-sessions
server read_server patronidemo-0.patronidemo:5432 maxconn 100 check port 8008
server read_server patronidemo-1.patronidemo:5432 maxconn 100 check port 8008
build docker image:
docker build -t haproxy .
Deploy Haproxy
嗯,这里我只Deploy一个pod,毕竟本机资源有限啊,另外因为haproxy是部署成无状态的pod,所以一个pod,挂掉重启就行了, deploy.yaml
:
apiVersion: apps/v1
kind: Deployment
metadata:
name: haproxy
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: haproxy
template:
metadata:
labels:
app: haproxy
spec:
containers:
- name: container-haproxy
image: haproxy
ports:
- name: write
containerPort: 5432
- name: read
containerPort: 5433
- name: admin
containerPort: 1000
---
apiVersion: v1
kind: Service
metadata:
name: haproxy-svc
labels:
app: haproxy
spec:
ports:
- port: 5432
targetPort: 5432
name: write-port
- port: 5433
targetPort: 5433
name: read-port
clusterIP: None
selector:
app: haproxy
---
apiVersion: v1
kind: Service
metadata:
name: haproxy-admin-svc
labels:
app: haproxy
spec:
type: NodePort
ports:
- port: 1000
targetPort: 1000
nodePort: 31000
name: admin-port
selector:
app: haproxy
这里我们通过5432,5433分别暴露的写和读的端口,因为Patroni提供了restfulAPI用来分别检测master和replica,所以直接拿来做心跳检测即可。31000暴露了管理的端口,所以我们直接访问host:31000就可以看到haproxy的管理界面:
这里可以看到haproxy提供了比较全面的端口信息啦,用于监控也是挺好的。
总结
通过这三篇blog我们就可以在Kubernetes上搭建一个高可用PostgreSQL集群。当然有一些坑和限制,我就不列出来了,大家亲自踩踩比较好。另外网上也有helm直接部署的,只是里面屏蔽了蛮多细节,如果只是使用也可以直接用helm。
总的来说没有什么代码,都是一些配置信息。看上去都是网上抄抄改改,说实话也挺花时间的,尤其是最开始对所用的工具和软件都不了解,毕竟自己也不是DevOps,而且还要跟各种global team的人review,还得写英文文档和走流程。
但是在各种debug的过程中对于K8S也越来越了解,所以实践出真知。并且其实写文档和走流程过程中也是对自己的行为进行约束也可以理清思路。
写这几篇blog的目的,一方面是为了记录自己的经验,以后可能还会再碰到;另一方面就是如果有相似需求的人看到这几篇blog,遇到问题以后好交流。所有的code都放在github了。