Top Banner
How to build scalable, reliable and stable Kubernetes cluster atop OpenStack Bo Wang [email protected] HouMing Wang [email protected]
18

How to build scalable, reliable and stable Kubernetes ... · –kube-reserved [1] OSsystemdaemons etcd,flanneld,apiserver configure--system-reserved [1] evictionthresholds kubelet

Jun 26, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: How to build scalable, reliable and stable Kubernetes ... · –kube-reserved [1] OSsystemdaemons etcd,flanneld,apiserver configure--system-reserved [1] evictionthresholds kubelet

How to build scalable, reliable and stable Kubernetes cluster atop OpenStack

Bo Wang [email protected] Wang [email protected]

Page 2: How to build scalable, reliable and stable Kubernetes ... · –kube-reserved [1] OSsystemdaemons etcd,flanneld,apiserver configure--system-reserved [1] evictionthresholds kubelet

ContentsCluster data persistence

Cluster resources management

Integrate kuryr-kubernetes as CNI plugin

Integrate manila as storage provisioner

Page 3: How to build scalable, reliable and stable Kubernetes ... · –kube-reserved [1] OSsystemdaemons etcd,flanneld,apiserver configure--system-reserved [1] evictionthresholds kubelet

Architecture of Kubernetes Cluster

master nodes

apiserver

etcd

flanneld

scheduler

controller manager

kubelet

docker

slave nodes

flanneld

kubelet

docker

end-user pods

containers

system daemons

kube-proxy

kube-proxy

Page 4: How to build scalable, reliable and stable Kubernetes ... · –kube-reserved [1] OSsystemdaemons etcd,flanneld,apiserver configure--system-reserved [1] evictionthresholds kubelet

Cluster Resource Management – why

Pods can consume all the available capacity on a node by default

Resource starvation What ever happened in our environment:• kube-proxy, prometheus were evicted• dockerd does not response in time• etcd cluster crashSystem daemons crash and pods evicting

Pods and system daemons compete for resources

Page 5: How to build scalable, reliable and stable Kubernetes ... · –kube-reserved [1] OSsystemdaemons etcd,flanneld,apiserver configure--system-reserved [1] evictionthresholds kubelet

Cluster Resource Management – how

[1] Reserve Compute Resources for System Daemons: https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/[2] Configure Quality of Service for Pods: https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/

categories components solution ref

kubernetes system daemons kubelet,docker configure–kube-reserved

[1]

OS system daemons etcd,flanneld,apiserver configure--system-reserved

[1]

eviction thresholds kubelet configure--eviction-hard

[1]

kube-system pods kube-scheduler,kube-controller, kube-proxy,prometheus, fluentd

configureguaranteed QoS class

[2]

end-user pods configureneeded QoS class

[2]

Page 6: How to build scalable, reliable and stable Kubernetes ... · –kube-reserved [1] OSsystemdaemons etcd,flanneld,apiserver configure--system-reserved [1] evictionthresholds kubelet

Cluster Resource Management – example

Node Capacity 32Gi of memory, 16 CPUs and 100Gi of Storage

kube-reserved --kube-reserved=cpu=1,memory=2Gi,ephemeral-storage=1Gi

system-reserved --system-reserved=cpu=500m,memory=1Gi,ephemeral-storage=1Gi

eviction-threshold --eviction-hard=memory.available<500Mi,nodefs.available<10%

available for pods 14.5 CPUs, 28.5Gi memory, 98Gi local storage

pod eviction occurs in the following order:• BestEffort• Burstable• Guaranteed

Page 7: How to build scalable, reliable and stable Kubernetes ... · –kube-reserved [1] OSsystemdaemons etcd,flanneld,apiserver configure--system-reserved [1] evictionthresholds kubelet

ContentsCluster data persistence

Cluster resources management

Integrate kuryr-kubernetes as CNI plugin

Integrate manila as storage provisioner

Page 8: How to build scalable, reliable and stable Kubernetes ... · –kube-reserved [1] OSsystemdaemons etcd,flanneld,apiserver configure--system-reserved [1] evictionthresholds kubelet

Cluster Data Persistence

move essential data into persistent volumes separately as needed.

All cluster data stored in local storage of VM instance. VM destroyed, data lost.

etcd data kubernetes object resources, container network configurations

Done in upstream[1] https://bugs.launchpad.net/magnum/+bug/1697655[2] https://review.openstack.org/#/c/473789/

monitor data nodes info,pods info

configure volumes for prometheus pods

logging data kubernetes daemons log,system daemons logs,container logs

configure volumes for elasticsearch pods

Page 9: How to build scalable, reliable and stable Kubernetes ... · –kube-reserved [1] OSsystemdaemons etcd,flanneld,apiserver configure--system-reserved [1] evictionthresholds kubelet

Etcd Cluster Independent Deployment

“Fast disks are the most critical factor for etcd deployment performance and stability. etcd is very sensitive to disk write latency.”“Few etcd deployments require a lot of CPU capacity.” [1]

[1] https://github.com/coreos/etcd/blob/master/Documentation/op-guide/hardware.md

etcd nodes

master nodes slave nodes

flanneldflanneld

apiserver

etcd

LB

high performance volumes

Page 10: How to build scalable, reliable and stable Kubernetes ... · –kube-reserved [1] OSsystemdaemons etcd,flanneld,apiserver configure--system-reserved [1] evictionthresholds kubelet

ContentsCluster data persistence

Cluster resources management

Integrate kuryr-kubernetes as CNI plugin

Integrate manila as storage provisioner

Page 11: How to build scalable, reliable and stable Kubernetes ... · –kube-reserved [1] OSsystemdaemons etcd,flanneld,apiserver configure--system-reserved [1] evictionthresholds kubelet

Integrate kuryr-kubernetes as CNI plugin

Neutron Server

kuryrcontroller

kuryr bridgetap-xxx

eth0

Pod1eth0

tap-yyy

Pod2eth0

kubelet

10.0.0.5 10.0.0.6

10.0.0.7 10.0.0.8

No IP No IP

kube-proxy

iptables

eth1 eth0 eth1

kuryr bridgetap-xxx

Pod1eth0

tap-yyy

Pod2eth0

10.0.0.9 10.0.0.10

kube-proxy

iptables

kuryr-cni

kuryr-cni

kubelet

k8s api server

master node slave node

Page 12: How to build scalable, reliable and stable Kubernetes ... · –kube-reserved [1] OSsystemdaemons etcd,flanneld,apiserver configure--system-reserved [1] evictionthresholds kubelet

Integrate kuryr-kubernetes as CNI plugin

difference with upstream reasons ref

kuryr only for ip allocationkube-proxy for service --> pod

1. iptables has better performance than neutron lbaasv22. kuryr does not support k8s services in following kinds:

LoadBalancer; NodePort; Endpoint-less; Specify cluster ip[1] [2]

add implementation of portmapping intokuryr-cni

cni plugin should support hostPort [3]

network topology of pods and vms with kube-proxy, macvlan do not go through the host system iptablestrunk port is not enabled in our product

[4]

stop watching k8s eventskubelet --> kuryr-cni --> kuryr-controller

in theory, watching events should have better performancebut in our test, kuryr-cni came into time out errors againstconcurrent pods creating. simplify the process to sequential call

[1] https://bugs.launchpad.net/kuryr-kubernetes/+bug/1684118[2] https://bugs.launchpad.net/kuryr-kubernetes/+bug/1697942[3] https://github.com/kubernetes-incubator/bootkube/issues/662[4] https://github.com/kubernetes/kubernetes/issues/53089

Page 13: How to build scalable, reliable and stable Kubernetes ... · –kube-reserved [1] OSsystemdaemons etcd,flanneld,apiserver configure--system-reserved [1] evictionthresholds kubelet

ContentsCluster data persistence

Cluster resources management

Integrate kuryr-kubernetes as CNI plugin

Integrate manila as storage provisioner

Page 14: How to build scalable, reliable and stable Kubernetes ... · –kube-reserved [1] OSsystemdaemons etcd,flanneld,apiserver configure--system-reserved [1] evictionthresholds kubelet

Integrate manila as storage provisioner

Pod1 Pod2 Pod3

NFSpersistent volume

Deployments/RC with one replica

ReadWriteMany

Cinder

Block Storage Shared File System

Manila

Pod

Cinderpersistent volume

Deployments/RC with multi-replicas

ReadWriteOnce

Page 15: How to build scalable, reliable and stable Kubernetes ... · –kube-reserved [1] OSsystemdaemons etcd,flanneld,apiserver configure--system-reserved [1] evictionthresholds kubelet

Integrate manila as storage provisioner

Manually leveraging manila to provide NFS PV for k8s pods

Create share network

Create share

Create PV withshare location

Create PVC match PV

Create Pods mountPVC

Multiple podsread/write share

Manila k8s

get shareexport location

nfs-pv.yaml

nfs-pvc.yaml

Page 16: How to build scalable, reliable and stable Kubernetes ... · –kube-reserved [1] OSsystemdaemons etcd,flanneld,apiserver configure--system-reserved [1] evictionthresholds kubelet

Integrate manila as storage provisioner

[1] https://kubernetes.io/docs/concepts/storage/persistent-volumes/[2] https://github.com/kubernetes-incubator/external-storage/[3] https://github.com/kubernetes-incubator/external-storage/pull/429

Add manila as an external storage provisioner[1][2] to provide PV dynamically for Pods

k8sapiserver

K8s cluster

easystack manilaprovisioner pods

[3]

watchPVC events

openstackmanila

kubeconfig cloudconfig

manila storage class:

manila pvc:

Page 17: How to build scalable, reliable and stable Kubernetes ... · –kube-reserved [1] OSsystemdaemons etcd,flanneld,apiserver configure--system-reserved [1] evictionthresholds kubelet

Magnum

Q: Cloud these happen in magnum?

A: Yes, we did all these work based on internal magnum.

Related BP in magnum launchpad:

• etcd cluster independent deployment: https://blueprints.launchpad.net/magnum/+spec/deploy-etcd-cluster-independently• integrate kuryr-kubernetes with magnum: https://blueprints.launchpad.net/magnum/+spec/integrate-kuryr-kubernetes• integrate manila with magnum: https://blueprints.launchpad.net/magnum/+spec/magnum-manila-integration

Page 18: How to build scalable, reliable and stable Kubernetes ... · –kube-reserved [1] OSsystemdaemons etcd,flanneld,apiserver configure--system-reserved [1] evictionthresholds kubelet