搭建Kubernetes集群

尝试用虚拟机手工搭建k8s集群,本文是搭建流程简单记录,使用了三台CentOS虚拟机节点作为物理Node,由于众所周知的原因,公网引发问题不计其数,所以需要一些额外的垃圾步骤,并且过程中遇到了很多莫名巧妙的问题,多次重新安装了虚拟机及中间的一些Trouble Shooting,为了确保流程简洁此处暂时省略少部分细节。

网络拓扑

使用三台虚拟机运行Linux,一台作为Master Node使用,两台作为Slave Node使用,网络配置如下,其中Bridge网络作为OAM网络及极端情况下挂VPN下载镜像及组件等(整个过程实际没有使用VPN),Internal网络作为Node之间的联络网络,Pod网络是Calico自建网络。

配置虚拟机系统及软件安装

创建1台虚拟机作为Mater节点,安装系统,CPU至少使用2核,单核会导致k8s报错,内存我设置为2GB,硬盘给了10GB,这些硬件配置安装好系统后也可以更改。常规安装CentOS,安装完成后更改主机名为,

1
hostnamectl --static set-hostname master

更新Docker源

1
2
cd /etc/yum.repos.d/
wget -c https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

创建Kubernetes源,kubernetes.repo,文件内容为,

1
2
3
4
5
6
7
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg

开始安装并启动相应服务,

1
2
3
4
yum install docker-ce
yum install kubeadm kubelet kubectl
systemctl enable docker && systemctl start docker
systemctl enable kubelet && systemctl start kubelet

修改一些系统参数,尝试了不修改这些参数,在部署过程中会报错,导致部署失败,查看相关原因都跟这几个参数有关。

1
2
3
4
5
6
7
8
9
10
#SELINUX
在文件中/etc/selinux/config 修改 SELINUX=permissive
#Firewall
systemctl disable firewalld && systemctl stop firewalld
#网络参数设置
sysctl -w net.bridge.bridge-nf-call-iptables=1
在文件中/etc/sysctl.d/k8s.conf 添加一行 net.bridge.bridge-nf-call-iptables=1
#关闭SWAP
swapoff -a
在文件中/etc/fstab中将swap所在行注释掉

做好以上配置后,再克隆2台虚拟机作为NodeA和NodeB工作节点。

配置k8s Master节点

获取所需容器镜像

由于不能直连其中涉及到的域名,所以只能额外增加了这步操作,查看所需的镜像。也可以通过配置Proxy或者开VPN的方式,这样不需要提前下载镜像。

1
2
3
4
5
6
7
8
[root@master ~]# kubeadm config images list
k8s.gcr.io/kube-apiserver:v1.14.2
k8s.gcr.io/kube-controller-manager:v1.14.2
k8s.gcr.io/kube-scheduler:v1.14.2
k8s.gcr.io/kube-proxy:v1.14.2
k8s.gcr.io/pause:3.1
k8s.gcr.io/etcd:3.3.10
k8s.gcr.io/coredns:1.3.1

准备如下shell脚本文件并运行,之后使用docker image ls可以看到下载好的镜像。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#! /bin/bash
images=(
kube-apiserver:v1.14.2
kube-controller-manager:v1.14.2
kube-scheduler:v1.14.2
kube-proxy:v1.14.2
pause:3.1
etcd:3.3.10
coredns:1.3.1
)

for imageName in ${images[@]} ; do
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/$imageName
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/$imageName k8s.gcr.io/$imageName
done

确认下载image,

1
[root@k8s-master ~]# docker image ls

初始化Master节点(使用kubeadm reset可以回滚)

1
kubeadm init --apiserver-advertise-address 192.168.56.117 --pod-network-cidr=10.244.0.0/16
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[root@master ~]# kubeadm init --apiserver-advertise-address 192.168.56.117 --pod-network-cidr=10.244.0.0/16
...
Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.56.117:6443 --token qivprv.la764w1vr8onk3ik \
--discovery-token-ca-cert-hash sha256:23df71bdfacb9de49311a2ba16fc7a3efb3d707dd8ece2502237955e5ea299e8
1
2
3
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

配置k8s Slave节点

修改两台worker节点主机名

1
2
3
4
#nodea
hostnamectl --static set-hostname nodea
#nodeb
hostnamectl --static set-hostname nodeb

在master,nodea,nodeb上修改/etc/hosts文件,

1
2
3
192.168.56.117 master
192.168.56.118 nodea
192.168.56.119 nodeb

在nodea和nodeb上执行,(使用kubeadm reset可以回滚)

1
2
kubeadm join 192.168.56.117:6443 --token cn6unj.8xj0rfk7w9a49x5y \
--discovery-token-ca-cert-hash sha256:0489ee84cbbdd26f759fddac685da43cb5d327fe1b8a5f5f30d90de8d5cd233c

在master上执行,

1
2
3
4
5
[root@master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master NotReady master 12m v1.14.2
nodea NotReady <none> 3m v1.14.2
nodeb NotReady <none> 6s v1.14.2

查看NotReady的原因,

1
2
3
4
[root@k8s-master ~]# kubectl describe node k8s-nodea
...
KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
...

安装网络插件

此处使用calico插件,

1
kubectl apply -f https://docs.projectcalico.org/v3.7/manifests/calico.yaml

查看Node状态,

1
2
3
4
5
[root@master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready master 10h v1.14.2
nodea Ready <none> 10h v1.14.2
nodeb Ready <none> 10h v1.14.2

查看Pod状态,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@master ~]# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-8646dd497f-n8wl6 1/1 Running 0
kube-system calico-node-2lcjn 1/1 Running 0
kube-system calico-node-kvmz7 1/1 Running 0
kube-system calico-node-p2nft 1/1 Running 0
kube-system coredns-fb8b8dccf-qzr5m 1/1 Running 0
kube-system coredns-fb8b8dccf-txgzh 1/1 Running 0
kube-system etcd-master 1/1 Running 0
kube-system kube-apiserver-master 1/1 Running 0
kube-system kube-controller-manager-master 1/1 Running 0
kube-system kube-proxy-49t6x 1/1 Running 0
kube-system kube-proxy-86j9l 1/1 Running 0
kube-system kube-proxy-kttxv 1/1 Running 0
kube-system kube-scheduler-master 1/1 Running 0

以上就完成了k8s集群的配置工作。

运行一个应用

创建一个myweb.yaml文件,

1
2
3
4
5
6
7
8
9
10
11
12
13
apiVersion: v1
kind: Pod
metadata:
name: mywebv1
spec:
restartPolicy: OnFailure
containers:
- name: myweb
image: mywebv1
imagePullPolicy: Never
ports:
- containerPort: 80
protocol: TCP

在集群上运行镜像,

1
2
3
4
[root@master ~]# kubectl create -f myweb.yml 
[root@master ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
mywebv1 1/1 Running 0 10s

过程中遇到的一个小问题

安装网络插件后,查看Node状态和Pod状态,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready master 9h v1.14.2
nodea NotReady <none> 9h v1.14.2
nodeb NotReady <none> 9h v1.14.2

[root@master ~]# kubectl get pods --all-namespaces
...
kube-system calico-node-2lcjn 0/1 Init:0/2 0 9h
kube-system calico-node-kvmz7 1/1 Running 0 9h
kube-system calico-node-p2nft 0/1 Init:0/2 0 9h
...
kube-system kube-proxy-86j9l 0/1 ContainerCreating 0 9h
kube-system kube-proxy-kttxv 0/1 ContainerCreating 0 9h
kube-system kube-scheduler-master 1/1 Running 0 9h

查看原因,

1
2
3
[root@master ~]# kubectl describe pod kube-proxy-86j9l --namespace=kube-system
...
Warning FailedCreatePodSandBox 3m6s (x1211 over 9h) kubelet, nodeb Failed create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.1": Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

拉缺失镜像

1
2
[root@nodeb ~]# docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1
[root@nodeb ~]# docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1 k8s.gcr.io/pause:3.1

再次查看失败原因,

1
2
3
4
[root@master ~]#  kubectl describe pod calico-node-p2nft --namespace=kube-system
...
Warning Failed 49s (x3 over 11m) kubelet, nodea Error: ErrImagePull
Warning Failed 49s (x2 over 9m40s) kubelet, nodea Failed to pull image "calico/cni:v3.7.2": rpc error: code = Unknown desc = context canceled

拉缺失镜像,换了几个国内源终于找到了一个。

1
2
[root@nodea ~]# docker pull quay-mirror.qiniu.com/calico/cni:v3.7.2
[root@nodeb ~]# docker pull quay-mirror.qiniu.com/calico/cni:v3.7.2