Fork me on GitHub
Fork me on GitHub

kubeadm安装kubernetes 1.14.0

之前采用二进制方式部署过Kubernetes,操作较为繁琐,此次采用kubeadm安装Kubernetes,Kubernetes相关组件介绍详见前面的文章。

环境说明

服务器环境

主机名 操作系统版本 IP地址 角色 安装软件
spark32 CentOS 7.0 172.16.206.32 Kubernetes Master、Harbor docker-ce 18.09.4、kubelet v1.14.0、kubeadm v1.14.0、etcd 3.3.10、kube-apiserver v1.14.0、kube-scheduler v1.14.0、kube-controller-manager v1.14.0、kube-proxy、flannel v0.11.0、kubectl v1.14.0、pause 3.1
spark17 CentOS 7.0 172.16.206.17 Kubernetes Node docker-ce 18.09.4、kubelet v1.14.0、kubeadm v1.14.0、kube-proxy v1.14.0、flannel v0.11.0、pause 3.1
ubuntu31 Ubuntu 16.04 172.16.206.31 Kubernetes Node docker-ce 18.09.4、kubelet v1.14.0、kubeadm v1.14.0、kube-proxy v1.14.0、flannel v0.11.0、pause 3.1

【注意】:如果操作系统选择的是CentOS的,建议操作系统选择CentOS 7.3+的,从7.3+开始,默认安装的xfs文件系统的 ftype=1,这样docker可以使用官方推荐的存储驱动:overlay2
如果当前系统小于7.3,并且ftype=0,有几个方法:

【示例】:如何查看xfs文件系统的ftype的值

安装准备

1.关闭iptables和firewalld
每个节点都需要做

1
2
systemctl stop firewalld.service
systemctl disable firewalld.service

2.集群主机时间同步
采用NTP(Network Time Protocol)方式来实现, 选择一台机器, 作为集群的时间同步服务器, 然后分别配置服务端和集群其他机器。
参见之前的博客文档安装部署Apache Hadoop (完全分布式模式并且实现NameNode HA和ResourceManager HA)/)

3.禁用SELINUX
每个节点都需要做

1
2
3
setenforce 0
vi /etc/selinux/config
SELINUX=disabled

4.配置/etc/hosts
每个节点都需要配置

1
2
3
4
# vim /etc/hosts
172.16.206.32 spark32
172.16.206.17 spark17
172.16.206.31 ubuntu31

5.docker会生成大量的ip规则,有可能对iptables内部的nfcall,需要打开内生的桥接功能
每个节点都需要配置

1
2
3
4
[root@spark32 ~]# echo "net.bridge.bridge-nf-call-iptables = 1" >> /etc/sysctl.conf
[root@spark32 ~]# echo "net.bridge.bridge-nf-call-ip6tables = 1" >> /etc/sysctl.conf
[root@spark32 ~]# echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf
[root@spark32 ~]# sysctl -p /etc/sysctl.conf

安装配置master节点

安装docker-ce、kubelet、kubeadm、kubectl

我们使用阿里云仓库。点击访问阿里云仓库,可以找到docker-ce和kubernetes,在右侧有“帮助”,里面会有说明。

1
[root@spark32 ~]# yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo


1
2
3
4
5
6
7
8
9
[root@spark32 ~]# cat <<EOF > /etc/yum.repos.d/kubernetes.repo
> [kubernetes]
> name=Kubernetes
> baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
> enabled=1
> gpgcheck=1
> repo_gpgcheck=1
> gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
> EOF

1
[root@spark32 ~]# yum install docker-ce kubelet kubeadm kubectl -y

你也可以自己安装时指定固定的版本安装,我这里默认使用最新的稳定版本1.14.0。以kubeadm为例,查看仓库里有哪些版本可以安装:

1
[root@spark32 ~]# yum list kubeadm.x86_64 --showduplicates | sort -r

配置启动docker

接下来需要安装apiserver、controller manager、scheduler、kube-proxy。kubeadm安装Kubernetes集群时,这些组件是跑在静态Pods中的,官网Static Pod文档
docker是需要去docker仓库里下载所依赖的每一个镜像文件,这些镜像文件放在google的gcr仓库里,可能下载不了。我这里临时改下docker的配置文件,临时添加个代理,等集群安装完去掉代理。

1
2
3
4
5
6
[root@spark32 ~]# vim /usr/lib/systemd/system/docker.service
Environment="HTTPS_PROXY=http://www.ik8s.io:10080"
Environment="NO_PROXY=127.0.0.0/8,172.16.0.0/16"
[root@spark32 ~]# systemctl daemon-reload
[root@spark32 ~]# systemctl start docker
[root@spark32 ~]# docker info


如上图所示,有个提示,Storage Driver: overlay,以后会被移除。我这里改为overlay2。具体关于存储方面的可仔细阅读docker官方文档相应章节

1
2
3
4
5
6
7
8
[root@spark32 ~]# systemctl stop docker
[root@spark32 ~]# cp -au /var/lib/docker /var/lib/docker.bk
[root@spark32 ~]# vim /etc/docker/daemon.json
{
"storage-driver": "overlay2"
}
[root@spark32 ~]# systemctl start docker
[root@spark32 ~]# systemctl enable docker.service

【注意】:如果你自己有代理,即使你在服务器上配置了访问所有地址都走代理,在使用命令:docker pull k8s.gcr.io/kube-apiserver:v1.14.0 就可以拉取镜像了,一定要在 /usr/lib/systemd/system/docker.service 里配置代理地址,比如:Environment=”HTTPS_PROXY=http://127.0.0.1:8118
Environment=”HTTP_PROXY=http://127.0.0.1:8118

配置kubelet

1
2
3
4
5
6
7
[root@spark32 ~]# rpm -ql kubelet
/etc/kubernetes/manifests
/etc/sysconfig/kubelet
/usr/bin/kubelet
/usr/lib/systemd/system/kubelet.service
[root@spark32 ~]# cat /etc/sysconfig/kubelet
KUBELET_EXTRA_ARGS=

早期的k8s要求,每一个节点都不能打开swap设备。如果开了,不让你启动。但是我们可以人为的去忽略它,定义的方式就是上面这个参数。

1
2
[root@spark32 ~]# vim /etc/sysconfig/kubelet
KUBELET_EXTRA_ARGS="--fail-swap-on=false"

如果不配置这个,在下面使用kubeadm init初始化master时会报如下错误:

1
2
[ERROR Swap]: running with swap on is not supported. Please disable swap
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`

此时不能启动kubelet,它的配置文件还没有生成,在初始化master节点的过程中会生成kubelet的配置文件,并且启动kubelet。先设置开机自启动:

1
[root@spark32 ~]# systemctl enable kubelet.service

初始化master节点

使用kubeadm安装master节点的各组件。
查看初始化参数:

1
[root@spark32 ~]# kubeadm init --help



在初始化前,我们可以打印出kubeadm默认的配置:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
[root@spark32 ~]# kubeadm config print init-defaults
apiVersion: kubeadm.k8s.io/v1beta1
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 1.2.3.4
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
name: spark32
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta1
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: ""
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.14.0
networking:
dnsDomain: cluster.local
podSubnet: ""
serviceSubnet: 10.96.0.0/12
scheduler: {}

下面开始初始化:

1
2
3
4
5
6
7
8
9
10
11
12
[root@spark32 ~]# kubeadm init --kubernetes-version=v1.14.0 --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12 --ignore-preflight-errors=Swap
[init] Using Kubernetes version: v1.14.0
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[WARNING Swap]: running with swap on is not supported. Please disable swap
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-apiserver:v1.14.0: output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
, error: exit status 1
...

问题:初始化时发现网上找的代理已经失效了,拉取不了gcr仓库的镜像。
先查看kubeadm初始化master需要哪些镜像:

1
2
3
4
5
6
7
8
9
10
[root@spark32 ~]# kubeadm config images list
I0408 15:43:21.372628 4131 version.go:96] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get https://dl.k8s.io/release/stable-1.txt: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I0408 15:43:21.372730 4131 version.go:97] falling back to the local client version: v1.14.0
k8s.gcr.io/kube-apiserver:v1.14.0
k8s.gcr.io/kube-controller-manager:v1.14.0
k8s.gcr.io/kube-scheduler:v1.14.0
k8s.gcr.io/kube-proxy:v1.14.0
k8s.gcr.io/pause:3.1
k8s.gcr.io/etcd:3.3.10
k8s.gcr.io/coredns:1.3.1

去掉docker配置文件中代理配置:

1
2
3
4
5
[root@spark32 ~]# vim /usr/lib/systemd/system/docker.service
# Environment="HTTPS_PROXY=http://www.ik8s.io:10080"
# Environment="NO_PROXY=127.0.0.0/8,172.16.0.0/16"
[root@spark32 ~]# systemctl daemon-reload
[root@spark32 ~]# systemctl restart docker.service

配置阿里云加速:参照前面博客CentOS安装Docker CE
从阿里云registry拉取镜像

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@spark32 ~]# docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.14.0
[root@spark32 ~]# docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.14.0
[root@spark32 ~]# docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.14.0
[root@spark32 ~]# docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.14.0
[root@spark32 ~]# docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1
[root@spark32 ~]# docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.3.10
[root@spark32 ~]# docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:1.3.1
[root@spark32 ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy v1.14.0 5cd54e388aba 13 days ago 82.1MB
registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager v1.14.0 b95b1efa0436 13 days ago 158MB
registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler v1.14.0 00638a24688b 13 days ago 81.6MB
registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver v1.14.0 ecf910f40d6e 13 days ago 210MB
registry.cn-hangzhou.aliyuncs.com/google_containers/coredns 1.3.1 eb516548c180 2 months ago 40.3MB
registry.cn-hangzhou.aliyuncs.com/google_containers/etcd 3.3.10 2c4adeb21b4f 4 months ago 258MB
registry.cn-hangzhou.aliyuncs.com/google_containers/pause 3.1 da86e6ba6ca1 15 months ago 742kB

因为kubeadm安装的docker镜像默认是k8s.gcr.io网站的,所以需要改一下标签:

1
2
3
4
5
6
7
[root@spark32 ~]# docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.14.0 k8s.gcr.io/kube-apiserver:v1.14.0
[root@spark32 ~]# docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.14.0 k8s.gcr.io/kube-scheduler:v1.14.0
[root@spark32 ~]# docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.14.0 k8s.gcr.io/kube-proxy:v1.14.0
[root@spark32 ~]# docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1 k8s.gcr.io/pause:3.1
[root@spark32 ~]# docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1 k8s.gcr.io/pause:3.1
[root@spark32 ~]# docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.3.10 k8s.gcr.io/etcd:3.3.10
[root@spark32 ~]# docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:1.3.1 k8s.gcr.io/coredns:1.3.1

下面再次初始化:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
[root@spark32 ~]# kubeadm init --kubernetes-version=v1.14.0 --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12 --ignore-preflight-errors=Swap
[init] Using Kubernetes version: v1.14.0
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[WARNING Swap]: running with swap on is not supported. Please disable swap
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [spark32 localhost] and IPs [172.16.206.32 127.0.0.1 ::1]
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [spark32 localhost] and IPs [172.16.206.32 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [spark32 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.16.206.32]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 25.563846 seconds
[upload-config] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.14" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --experimental-upload-certs
[mark-control-plane] Marking the node spark32 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node spark32 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: q8vnmo.3eav1kq9c3zp2xgq
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 172.16.206.32:6443 --token q8vnmo.3eav1kq9c3zp2xgq \
--discovery-token-ca-cert-hash sha256:7fb950b750bc546d9343040b646bae7d8749a63a33124f51c1adb2b4c19aa235


【说明】:

  • dns附件在k8s上已经进化到第三版了。第一版叫skydns,后来被kubedns所取代,而在1.11版开始,才正式被CoreDNS所取代。CoreDNS支持前面两版不支持的高级功能,比如像很多资源的动态配置等。
  • kube-proxy是作为附件运行的,也是托管在k8s之上。来帮忙负责生成service资源的相关iptables或ipvs规则。从1.11版本开始,默认开始使用的是ipvs。如果当前系统支不支持,默认安装上不支持的时候自动降级为iptables。1.10版及之前都是iptables,那时候使用ipvs还不成熟。
  • 这个token是个认证令牌,其实是个域共享密钥,意思是当前这个集群不是谁都能加入进来的,你要想加入进来得拿着这个令牌。这个token是动态生成的,复制保存下来,免得以后找不着了。注意这个令牌有效期是24小时,过了24小时在拿这个令牌来加入集群就会报认证错误,具体看下面的章节-安装配置Ubuntu 16.04 node节点
  • –discovery-token-ca-cert-hash:做发现时TLS BootStrap那个相关证书的或者私钥文件的hash码,hash码不对是不会让加入集群的,复制保存下来。
1
2
3
[root@spark32 ~]# mkdir -p $HOME/.kube
You have mail in /var/spool/mail/root
[root@spark32 ~]# cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

【说明】:admin.conf里面是kubeadm帮我们生成的一个被kubectl可以拿来作为配置文件,指定连接至k8s的apiserver,并完成认证的配置文件。这里面包含了认证信息,认证证书信息。
kubeadm init workflow:

1
https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/#init-workflow

查看kubelet状态,此时已经启动起来了。

1
[root@spark32 ~]# systemctl status kubelet

查看集群组件状态:kubectl get cs/componentstatus

1
2
3
4
5
6
7
8
9
10
[root@spark32 ~]# kubectl get cs
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health":"true"}
[root@spark32 ~]# kubectl get componentstatus
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health":"true"}

这里没有显示apiserver的状态,如果apiserver不健康,我们是得不到这些组件健康状态信息的。

查看集群节点信息:

1
2
3
[root@spark32 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
spark32 NotReady master 29m v1.14.0

此时只有主节点一个节点,而且是未就绪状态,因为此时还缺少一个网络附件,此时Pod之间无法通信。

查看集群Pods状态信息(默认查的是default空间的pods,系统级的pods都是在kube-system空间):

1
2
3
4
5
6
7
8
9
[root@spark32 ~]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-8zzkb 0/1 Pending 0 43m
coredns-fb8b8dccf-t69hm 0/1 Pending 0 43m
etcd-spark32 1/1 Running 1 43m
kube-apiserver-spark32 1/1 Running 1 43m
kube-controller-manager-spark32 1/1 Running 1 43m
kube-proxy-9clf7 1/1 Running 0 43m
kube-scheduler-spark32 1/1 Running 1 43m

coredns处于pending状态,未决定的,行将发生的状态,是因为此时网络插件还没有安装。此时/etc/cni/目录也不存在。

【注意】:原来kubeadm 1.14.0版本初始化master节点时,已经支持从其他仓库下载需要的镜像,选项为:–image-repository string
这就意味着我们不需要像上面那样从阿里云registry下载镜像,在打tag了。

部署网络附件flannel

官网:https://github.com/coreos/flannel

1
2
3
4
5
6
7
8
9
10
11
[root@spark32 ~]# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
podsecuritypolicy.extensions/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.extensions/kube-flannel-ds-amd64 created
daemonset.extensions/kube-flannel-ds-arm64 created
daemonset.extensions/kube-flannel-ds-arm created
daemonset.extensions/kube-flannel-ds-ppc64le created
daemonset.extensions/kube-flannel-ds-s390x created

看到这些并不是执行好了,此时正在拉取flannel镜像。下载很慢,耐心等待。可以使用docker images查看是否下载完成。。。
可以使用docker images查看当前服务器上下载的镜像,如果一直没有下来,直接运行命令拉取:

1
[root@spark32 ~]# docker pull quay.io/coreos/flannel:v0.11.0-amd64

拉取下来后,过一会儿,对应的flannel的pod就会启动起来了。

查看集群Pods状态:

1
2
3
4
5
6
7
8
9
10
[root@spark32 ~]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-8zzkb 1/1 Running 0 79m
coredns-fb8b8dccf-t69hm 1/1 Running 0 79m
etcd-spark32 1/1 Running 1 78m
kube-apiserver-spark32 1/1 Running 1 78m
kube-controller-manager-spark32 1/1 Running 1 78m
kube-flannel-ds-amd64-2hkjh 1/1 Running 0 27m
kube-proxy-9clf7 1/1 Running 0 79m
kube-scheduler-spark32 1/1 Running 1 78m

此时再次查看master节点状态,已经是就绪状态了:

1
2
3
[root@spark32 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
spark32 Ready master 86m v1.14.0

安装配置CentOS 7 node节点

我这里以spark17节点为例:

安装docker-ce、kubelet、kubeadm

在master节点上远程拷贝 docker-ce.repo、kubernetes.repo 到node3节点。

1
2
[root@spark32 ~]# scp -p /etc/yum.repos.d/docker-ce.repo root@spark17:/etc/yum.repos.d/
[root@spark32 ~]# scp -p /etc/yum.repos.d/kubernetes.repo root@spark17:/etc/yum.repos.d/

安装docker-ce、kubelet、kubeadm

1
[root@node3 ~]# yum install docker-ce kubelet kubeadm -y

启动docker:

1
[root@spark17 ~]# systemctl start docker

配置启动docker

修改docker的Storage Driver:

1
[root@spark32 ~]# scp -p /etc/docker/daemon.json root@node3:/etc/docker/

重启docker并设置开机自启动:

1
2
[root@spark17 ~]# systemctl restart docker
[root@spark17 ~]# systemctl enable docker

配置kubelet

1
2
3
[root@spark17 ~]# vim /etc/sysconfig/kubelet
KUBELET_EXTRA_ARGS="--fail-swap-on=false"
[root@spark17 ~]# systemctl enable kubelet

加入集群

我这里由于上面kubeadm init的时候没有使用 –image-repository string 指定从其他registry下载镜像,所以我先下载下来需要的镜像,然后再加入到集群中。

1
2
3
4
5
[root@spark17 ~]# docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.14.0
[root@spark17 ~]# docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.14.0 k8s.gcr.io/kube-proxy:v1.14.0
[root@spark17 ~]# docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1
[root@spark17 ~]# docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1 k8s.gcr.io/pause:3.1
[root@spark17 ~]# docker pull quay.io/coreos/flannel:v0.11.0-amd64

或者可以在master节点上,把镜像docker save保存为tar,然后到node节点上docker load为镜像。以flannel为例:

1
2
[root@spark31 ~]# docker save -o flannel.tar quay.io/coreos/flannel:v0.11.0-amd64
[root@spark17 ~]# docker load -i flannel.tar

加入集群:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
kubeadm join 172.16.206.32:6443 --token q8vnmo.3eav1kq9c3zp2xgq --discovery-token-ca-cert-hash sha256:7fb950b750bc546d9343040b646bae7d8749a63a33124f51c1adb2b4c19aa235 --ignore-preflight-errors=Swap
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[WARNING Swap]: running with swap on is not supported. Please disable swap
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.14" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

要等待几分钟,node节点会下载kube-proxy和flannel,启动起来,才算真正完成。可以使用 kubectl get nodes 查看节点状态。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@spark32 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
spark17 Ready <none> 14h v1.14.0
spark32 Ready master 23h v1.14.0
[root@spark32 ~]# kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-fb8b8dccf-8zzkb 1/1 Running 0 23h 10.244.0.2 spark32 <none> <none>
coredns-fb8b8dccf-t69hm 1/1 Running 0 23h 10.244.0.3 spark32 <none> <none>
etcd-spark32 1/1 Running 1 23h 172.16.206.32 spark32 <none> <none>
kube-apiserver-spark32 1/1 Running 1 23h 172.16.206.32 spark32 <none> <none>
kube-controller-manager-spark32 1/1 Running 1 23h 172.16.206.32 spark32 <none> <none>
kube-flannel-ds-amd64-2hkjh 1/1 Running 0 22h 172.16.206.32 spark32 <none> <none>
kube-flannel-ds-amd64-f5fhc 1/1 Running 0 14h 172.16.206.17 spark17 <none> <none>
kube-proxy-9clf7 1/1 Running 0 23h 172.16.206.32 spark32 <none> <none>
kube-proxy-p5nrg 1/1 Running 0 14h 172.16.206.17 spark17 <none> <none>
kube-scheduler-spark32 1/1 Running 1 23h 172.16.206.32 spark32 <none> <none>

安装配置Ubuntu 16.04 node节点

集群中有一台服务器的操作系统是Ubuntu 16.04,作为Node节点。
关闭防火墙:

1
2
3
sudo systemctl status ufw.service
sudo systemctl stop ufw.service
sudo systemctl disable ufw.service

查看apparmor(相当于CentOS中的SELinux)状态:

1
2
3
4
sudo systemctl status apparmor.service
● apparmor.service
Loaded: not-found (Reason: No such file or directory)
Active: inactive (dead)

检查nf-call:

1
2
3
4
$ cat /proc/sys/net/bridge/bridge-nf-call-iptables
1
$ cat /proc/sys/net/bridge/bridge-nf-call-ip6tables
1

安装docker-ce、kubelet、kubeadm

我们使用阿里云仓库。点击访问阿里云仓库,可以找到docker-ce和kubernetes,在右侧有“帮助”,里面会有说明。

安装docker

https://yq.aliyun.com/articles/110806
step 1: 安装必要的一些系统工具

1
2
sudo apt-get update
sudo apt-get -y install apt-transport-https ca-certificates curl software-properties-common

step 2: 安装GPG证书

1
curl -fsSL http://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add -

Step 3: 写入软件源信息

1
sudo add-apt-repository "deb [arch=amd64] http://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"

Step 4: 更新并安装 Docker-CE

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
sudo apt-get -y update
sudo apt-get -y install docker-ce
sudo docker info
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 4
Server Version: 18.09.4
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
...
WARNING: No swap limit support

有个WARNING,根据错误提示,只是cgroups中的swap account没有开启。这个功能应该是用在 docker run -m=1524288 -it ubuntu /bin/bash 类似的命令,用来限制一个docker容器的内存使用上限,所以这里只是WARNING,不影响使用。
解决:
编辑/etc/default/grub,找到 GRUB_CMDLINE_LINUX=””,在双引号里面输入cgroup_enable=memory swapaccount=1

1
2
3
sudo vim /etc/default/grub
sudo update-grub
sudo reboot

安装kubelet、kubeadm

1
2
sudo apt-get update
sudo apt-get install -y apt-transport-https

下面这条命令需要切换到root用户运行,普通用户sudo执行时一直提示要求使用root用户执行:

1
2
3
4
5
6
jkzhao@ubuntu31:~$ su - root
Password:
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF

下面在切回到普通用户执行:

1
2
sudo apt-get update
sudo apt-get install kubelet=1.14.0-00 kubeadm=1.14.0-00 kubectl=1.14.0-00 #这里安装时我发现默认最新的是1.14.1版本,所以我安装的时候指定版本为1.14.0

配置kubelet

Kubernetes 1.8开始要求关闭系统的Swap,如果不关闭,默认配置下kubelet将无法启动。可以通过kubelet的启动参数–fail-swap-on=false更改这个限制。配置忽略swap:

1
2
$ sudo vim /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
Environment="KUBELET_EXTRA_ARGS=--fail-swap-on=false"

加入集群

把需要的3个镜像下载下来,分别是kube-proxy、pause、flannel。

1
2
3
4
5
$ sudo docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.14.0
$ sudo docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.14.0 k8s.gcr.io/kube-proxy:v1.14.0
$ sudo docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1
$ sudo docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1 k8s.gcr.io/pause:3.1
$ sudo docker pull quay.io/coreos/flannel:v0.11.0-amd64

加入集群:
切换到root用户,执行如下命令:

1
2
3
4
5
6
7
# kubeadm join 172.16.206.32:6443 --token q8vnmo.3eav1kq9c3zp2xgq --discovery-token-ca-cert-hash sha256:7fb950b750bc546d9343040b646bae7d8749a63a33124f51c1adb2b4c19aa235 --ignore-preflight-errors=Swap
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[WARNING Swap]: running with swap on is not supported. Please disable swap
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Unauthorized

但是发现报错了:error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Unauthorized
解决:https://github.com/kubernetes/kubeadm/issues/1310
TTL for the token should be 24h.我这个是后面几天加的新节点。
回到master节点,查看集群token:

1
[root@spark32 ~]# kubeadm token list


生成一个新的token:

1
2
[root@spark32 ~]# kubeadm token create
85e3nn.5copkyq2j5leqrpb


再次回到node节点执行加入集群命令:

1
# kubeadm join 172.16.206.32:6443 --token 85e3nn.5copkyq2j5leqrpb --discovery-token-ca-cert-hash sha256:7fb950b750bc546d9343040b646bae7d8749a63a33124f51c1adb2b4c19aa235 --ignore-preflight-errors=Swap

在Ubuntu上使用kubectl,

1
$ mkdir -p .kube

把master节点上的 $HOME/.kube/config/admin.conf 拷贝到 这台Ubuntu机器的 $HOME/.kube/,并将admin.conf重命名为config

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
jkzhao@ubuntu31:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
spark17 Ready <none> 21h v1.14.0
spark32 Ready master 30h v1.14.0
ubuntu31 Ready <none> 13m v1.14.0
jkzhao@ubuntu31:~$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-8zzkb 1/1 Running 0 30h
coredns-fb8b8dccf-t69hm 1/1 Running 0 30h
etcd-spark32 1/1 Running 1 30h
kube-apiserver-spark32 1/1 Running 1 30h
kube-controller-manager-spark32 1/1 Running 1 30h
kube-flannel-ds-amd64-2hkjh 1/1 Running 0 29h
kube-flannel-ds-amd64-f5fhc 1/1 Running 0 21h
kube-flannel-ds-amd64-j596v 1/1 Running 0 13m
kube-proxy-9clf7 1/1 Running 0 30h
kube-proxy-p5nrg 1/1 Running 0 21h
kube-proxy-vbcs4 1/1 Running 0 13m
kube-scheduler-spark32 1/1 Running 1 30h

如何从集群中移除Node

如果需要从集群中移除spark17这个Node执行下面的命令:
在master节点上执行:

1
2
kubectl drain spark17 --delete-local-data --force --ignore-daemonsets
kubectl delete node spark17

在spark17节点上执行:

1
2
3
4
5
6
kubeadm reset
ifconfig cni0 down
ip link delete cni0
ifconfig flannel.1 down
ip link delete flannel.1
rm -rf /var/lib/cni/

在ubuntu31节点上执行:

1
kubectl delete node spark17

重置集群

在master节点上执行如下命令:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
[root@spark32 ~]# kubeadm reset --ignore-preflight-errors=Swap
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
[reset] Removing info for node "spark32" from the ConfigMap "kubeadm-config" in the "kube-system" Namespace
W0412 13:10:50.222211 9235 reset.go:158] [reset] failed to remove etcd member: error syncing endpoints with etc: etcdclient: no available endpoints
.Please manually remove this etcd member using etcdctl
[reset] Stopping the kubelet service
[reset] unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually.
For example:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.
[root@spark32 ~]# iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
[root@spark32 ~]# rm -rf /var/lib/kubelet/
[root@spark32 ~]# rm -rf /var/lib/cni/
[root@spark32 ~]# rm -rf /var/lib/etcd/
[root@spark32 ~]# rm -rf /etc/kubernetes/
[root@spark32 ~]# rm -rf .kube/config
[root@spark32 ~]# ifconfig cni0 down
[root@spark32 ~]# ip link delete cni0
[root@spark32 ~]# ifconfig flannel.1 down
[root@spark32 ~]# ip link delete flannel.1


登录另外两台机器:
CentOS:

1
2
3
4
5
6
7
8
9
[root@spark17 ~]# kubeadm reset --ignore-preflight-errors=Swap
[root@spark17 ~]# iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
[root@spark17 ~]# rm -rf /var/lib/kubelet/
[root@spark17 ~]# rm -rf /var/lib/cni/
[root@spark17 ~]# rm -rf /var/lib/etcd/
[root@spark17 ~]# rm -rf /etc/kubernetes/
[root@spark17 ~]# ifconfig flannel.1 down
[root@spark17 ~]# ip link delete flannel.1

Ubuntu:
切换到root用户:

1
2
3
4
5
6
7
8
9
root@ubuntu31:~# kubeadm reset --ignore-preflight-errors=Swap
root@ubuntu31:~# iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
root@ubuntu31:~# rm -rf /var/lib/kubelet/
root@ubuntu31:~# rm -rf /var/lib/cni/
root@ubuntu31:~# rm -rf /var/lib/etcd/
root@ubuntu31:~# rm -rf /etc/kubernetes/
root@ubuntu31:~# ifconfig flannel.1 down
root@ubuntu31:~# ip link delete flannel.1

kubelet报错

查看kubelet日志,发现kubelet一直在报错:

1
2
[root@spark17 ~]# journalctl -xeu kubelet
qos_container_manager_linux.go:139] [ContainerManager] Failed to reserve QoS requests: failed to set supported cgroup subsystems for cgroup [kubepods burburstable]: Failed to find subsystem mount for required subsystem: pids

1
2
3
[root@spark17 ~]# tail -f /var/log/messages
Apr 10 16:48:48 spark17 kubelet: E0410 16:48:48.871706 11041 qos_container_manager_linux.go:329] [ContainerManager]: Failed to update QoS cgroup configuration
Apr 10 16:48:48 spark17 kubelet: W0410 16:48:48.871726 11041 qos_container_manager_linux.go:139] [ContainerManager] Failed to reserve QoS requests: failed to set supported cgroup subsystems for cgroup [kubepods burstable]: Failed to find subsystem mount for required subsystem: pids
1
2
3
4
5
6
[root@spark17 ~]# kubectl describe node spark17
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedNodeAllocatableEnforcement 5m33s (x1741 over 31h) kubelet, spark32 Failed to update Node Allocatable Limits ["kubepods"]: failed to set supported cgroup subsystems for cgroup [kubepods]: Failed to find subsystem mount for required subsystem: pids

查看当前系统支持哪些subsystem:

1
2
3
4
5
6
7
8
9
10
11
12
[root@spark17 ~]# cat /proc/cgroups
#subsys_name hierarchy num_cgroups enabled
cpuset 5 11 1
cpu 4 104 1
cpuacct 4 104 1
memory 6 104 1
devices 3 104 1
freezer 2 11 1
net_cls 8 11 1
blkio 9 104 1
perf_event 10 11 1
hugetlb 7 11 1

发现并不支持pids。

查看当前系统内核:

1
2
3
4
5
6
7
8
9
10
[root@spark17 ~]# uname -r
3.10.0-327.el7.x86_64
[root@spark17 ~]# grep CGROUP_ /boot/config-3.10.0-327.el7.x86_64
# CONFIG_CGROUP_DEBUG is not set
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_HUGETLB=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_SCHED=y

升级内核,经过测试,在CentOS 7.4默认的内核为3.10.0-514.26.2.el7.x86_64,默认是开启了pids subsystem,我这边使用yum升级内核为 3.10.0-957.el7.x86_64
查看仓库里有哪些版本的内核,并安装:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@spark17 ~]# yum list kernel.x86_64 --showduplicates | sort -r
* updates: ap.stykers.moe
Loading mirror speeds from cached hostfile
Loaded plugins: fastestmirror, langpacks
kernel.x86_64 3.10.0-957.el7 base
kernel.x86_64 3.10.0-957.5.1.el7 updates
kernel.x86_64 3.10.0-957.1.3.el7 updates
kernel.x86_64 3.10.0-957.10.1.el7 updates
kernel.x86_64 3.10.0-327.el7 @anaconda
Installed Packages
* extras: mirrors.huaweicloud.com
* epel: mirrors.aliyun.com
* elrepo: mirrors.tuna.tsinghua.edu.cn
* base: ap.stykers.moe
Available Packages
[root@spark17 ~]# yum install kernel-3.10.0-957.el7.x86_64 -y

由于CentOS 7使用grub2作为引导程序 ,所以和CentOS 6有所不同,并不是修改/etc/grub.conf来修改启动项,需要如下操作:

1
2
3
4
5
6
7
8
9
10
[root@spark17 ~]# cat /boot/grub2/grub.cfg |grep menuentry ##查看有哪些内核选项
if [ x"${feature_menuentry_id}" = xy ]; then
menuentry_id_option="--id"
menuentry_id_option=""
export menuentry_id_option
menuentry 'CentOS Linux (3.10.0-957.el7.x86_64) 7 (Core)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-327.el7.x86_64-advanced-7787952f-c2d4-4216-ae09-5188e7fd88b8' {
menuentry 'CentOS Linux (3.10.0-327.el7.x86_64) 7 (Core)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-327.el7.x86_64-advanced-7787952f-c2d4-4216-ae09-5188e7fd88b8' {
menuentry 'CentOS Linux (0-rescue-d918a8d2df0e481a820b4e5554fed3b5) 7 (Core)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-0-rescue-d918a8d2df0e481a820b4e5554fed3b5-advanced-7787952f-c2d4-4216-ae09-5188e7fd88b8' {
[root@spark17 ~]# grub2-editenv list #查看默认启动内核
[root@spark17 ~]# shutdown -r now

重启后,查看内核版本,并查看是否支持pids subsystem。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@spark17 ~]# uname -r
3.10.0-957.el7.x86_64
[root@spark17 ~]# cat /proc/cgroups
#subsys_name hierarchy num_cgroups enabled
cpuset 5 11 1
cpu 4 110 1
cpuacct 4 110 1
memory 3 110 1
devices 6 110 1
freezer 7 11 1
net_cls 2 11 1
blkio 10 110 1
perf_event 8 11 1
hugetlb 9 11 1
pids 11 110 1
net_prio 2 11 1

此时已经支持了pids subsystem,查看kubelet已经不报错了

apiserver疯狂刷日志:OpenAPI AggregationController: Processing item k8s_internal_local_delegation_chain_0000000001

1
2
3
4
# kubectl logs -f kube-apiserver-spark32 -n kube-system
I0411 12:01:07.724117 1 controller.go:102] OpenAPI AggregationController: Processing item k8s_internal_local_delegation_chain_0000000001
I0411 12:01:07.724291 1 controller.go:102] OpenAPI AggregationController: Processing item k8s_internal_local_delegation_chain_0000000002
...

疯狂的刷这两行日志。。。
网上查了在1.14.0版本上确实有人遇到这样的情况,见kubernetes issue:

1
https://github.com/kubernetes/kubernetes/issues/75777

https://github.com/kubernetes/kubernetes/pull/75781,说会在1.14.1版本里修复。去查了下1.14.1的CHANGELOG,上面写了已经修复了这个bug:

但如今装的是1.14.0版本,只能尝试着降低apiserver这个组件的日志级别。基本上每个 kubernetes 组件都会有个通用的参数 –v;这个参数用于控制 kubernetes 各个组件的日志级别。官方关于apiserver命令行文档:https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/,里面可以看到有个关于日志级别的选项:

1
2
-v, --v Level
number for the log level verbosity

关于Kubernetes组件输出日志级别说明:

1
https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md

kubeadm init在初始化master节点的时候生成了apiserver组件的manifests,在 /etc/kubernetes/manifests/ 目录下。修改文件kube-apiserver.yaml:

1
[root@spark32 manifests]# vim kube-apiserver.yaml


然后重启kubelet,在重启之前我们先查看当前集群的pod,注意apiserver这个pod的运行时间,等会重启kubelet之后,在运行查询pod的命令,会发现apiserver的运行时间改了,其实是kubelet检测到了kube-apiserver的manifest文件改变了,于是重新生成了pod。

1
[root@spark32 manifests]# systemctl restart kubelet


再去查看apiserver pod的日志,就发现没那么多输出了。
【说明】:
1.静态pod(DaemonSet)在特定的节点上直接通过 kubelet 守护进程进行管理,API 服务无法管理。它没有跟任何的副本控制器进行关联,kubelet 守护进程对它进行监控,如果崩溃了,kubelet 守护进程会重启它。Kubelet 通过 Kubernetes API 服务为每个静态 pod 创建 镜像 pod,这些镜像 pod 对于 API 服务是可见的,但是不受它控制。
2.kubeadm init workflow里面有一句:

1
Static Pod manifests are written to /etc/kubernetes/manifests; the kubelet watches this directory for Pods to create on startup.