This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
DEPLOYMENT GUIDEKUBERNETES CSI DRIVER
1GET A FREE CONSULTATION. Contact Us | Visit datera.io | Email [email protected]
1. Introduction to Datera and the Kubernetes CSIDatera is a fully disaggregated scale-out storage platform, that runs over multiple standard protocols (iSCSI, Object/S3), combining both heterogeneous compute platform/framework flexibility (HPE, Dell, Fujitsu, Cisco and others) with rapid deployment velocity and access to data from anywhere.
Datera gives Kubernetes (K8s) enterprise customers the peace of mind of a future-proof data services platform that is ready for diverse and demanding workloads — as K8s continues to dominate the container orchestration arena, it is likely to containerize higher-end workloads, as well.
The Container Storage Interface (CSI) is a standard for exposing arbitrary block and file storage systems to containerized workloads on Container Orchestration Systems (COs) like Kubernetes. Using CSI third-party storage providers, such as Datera, can write and deploy plugins exposing new storage systems in Kubernetes without ever having to touch the core Kubernetes code.
Datera’s CSI driver deeply integrates with the K8s runtime. It allows deploying entire stateful multi-site K8s clusters with a single K8s command, and pushing application-specific telemetry to the Datera policy engine, so that it can intelligently adapt the data fabric. Datera’s powerful storage classes, and policy driven workloads are a natural fit with Kubernetes, and our deep CSI integration will be covered in this paper.
K8s Concept Datera Concept
Manifests Templates + CSI driverDeclarative policy (intents) and telemetry (operationalization)Label-based provisioning with seamless integration in K8s manifests
Namespaces TenancyGovernance (operationalization of policy)Single authentication/access/quota mechanism
Quotas Tenancy + QuotasFine-grained controls at tenant and volume level for sandboxing storageContainment for noisy neighbors and rogue resource scalingMakes K8s more safely consumable
Resource Pools“Tainting”
Tenancy + Resource PoolsAbility to restrict media placement to a subset of nodes/resources
Storage Classes Application Classes and Instances + Live Data MobilityJust-in-time non-disruptive resource provisioning, driven by policy:
• No application downtime• No need to respin pods• No need to recreat PVs/PVCs
Live policy (label) changes in AppClasses and/or AppInstances
Datera provides IT a private/hybrid cloud data platform to consolidate both traditional enterprise, bare metal, virtualized and modern cloud-native workloads.
IT operators gain the flexibility to plan, deploy and scale their compute resources independently from their Datera storage resources, while application owners can self-service and consume infrastructure as they go.
K8s Concept Datera Concept
Consistency Groups Application Classes and InstancesSupport for consistency groups at application level (incl. across pods)
Persistent VolumesPersistent Volume Claims
Live Resource Thin ProvisioningNo resource pre-/over-provisioning and caching to placate ops discontinuitiesbetween K8s and storage provider
2. Datera CSI driver implementation The CSI specs (https://github.com/container-storage-interface/spec/blob/master/spec.md) define the boundary between K8S and a CSI Plugin. Datera CSI Plugin is divided into 2 parts.
• Controller plugin• Node plugin
Datera CSI driver implements these plugins along with an Identity service as a single gRPC server. All communication between Kubernetes and the CSI driver happens through well defined Unix Domain Sockets on the nodes. The driver implements all 3 services (Controller, Node and Identity services) in a single binary named ‘dat-csi-plugin’. The corresponding docker image is made available on https://hub.docker.com/repository/docker/dateraiodev/dat-csi-plugin. The Node plugin is deployed as a DaemonSet so that a copy of Node plugin runs on all worker nodes. The Controller plugin is deployed as a StatefulSet with replicas = 1 so that a single copy of provisioner runs on any node in the cluster. The same “dat-csi-plugin” image is used for deploying both DaemonSet and StatefulSet. Check the implementation diagram below.
Here is a more detailed view of how Kubernetes communicates with Datera CSI driver:
The Datera CSI driver code (in Green boxes shown above) is written in Golang and is available under /pkg directory: https://github.com/Datera/datera-csi/tree/master/pkg/driver
The sidecar container images (in Red boxes shown above) are given by Kubernetes CSI community.
The entire set of code necessary for interacting with Datera backend such as login, logout, create volume, delete volume, create snapshot, etc are implemented in Golang SDK which is located here: https://github.com/Datera/go-sdk/tree/master/pkg/dsdk
The driver is installed on a functional Kubernetes cluster by running “kubectl create -f <datera_csi_driver_yaml>” on the Master node. There are certain HW and SW requirements with respect to the Kubernetes master and worker nodes, the iSCSI package availability, etc., to get the driver up and going. This will be detailed in subsequent sections.
3. Kubernetes Volume basics: StorageClasses, PVs, PVCsPersistent Volumes (PV) and Persistent Volume Claims (PVC) relieves the Application users from knowing anything about the underlying Storage technologies. Note that PVs are cluster-scoped and PVCs are namespace-scoped. The PVs are created dynamically when a PVC claim is submitted to Kubernetes API which in turn calls Datera CSI driver. Datera CSI driver does dynamic provisioning of volumes on Datera cluster nodes. Here is a brief background on manual vs dynamic provisioning.
In manual provisioning, a kubernetes storage administrator would pre-create persistent volumes and make it available for all tenants in the cluster. The persistent volumes could be backed by any of public or private cloud providers. In such cases, an Application user submits a PVC claim referencing a particular PV created by storage administrator and the volume is made available inside the Pods. However, this method binds the Application users to know underlying storage volumes. To solve this problem, Kubernetes provides StorageClasses to dynamically provision persistent volumes.
In dynamic provisioning, storage administrators would create StorageClasses which lets the Application users select the type of persistent storage they want. Every time a PVC claim is submitted, a corresponding PV is dynamically created using a volume provisioner, such as Datera CSI provisioner. With this method, users do not need to know how many and what type of persistent volumes are available in the cluster. Kubernetes will take care of mapping a PVC claim to a certain PV that best matches the storage parameters. Application pods managed by a replication controller ensures that the storage follows the application pods during the pod lifecycle.
1 www.linux-iscsi.org Linux-IO Target (LIO™) is the standard open-source iSCSI target in Linux. It supports all prevalent storage fabrics, including Fibre Channel, FCoE, IEEE 1394, iSCSI, NVMe-OF, iSER, SRP, USB, vHost, etc.
4. Datera CSI driver and K8S RequirementsFrom a network standpoint, the Kubernetes Master node must have IP reachability to Datera Management VIP which was made available to users when the Datera backend system was brought up.
[root@ch3cp ~]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
PING 172.129.85.4 (172.129.85.4) 56(84) bytes of data.
64 bytes from 172.129.85.4: icmp _ seq=1 ttl=62 time=0.143 ms
64 bytes from 172.129.85.4: icmp _ seq=2 ttl=62 time=0.112 ms
64 bytes from 172.129.85.4: icmp _ seq=3 ttl=62 time=0.107 ms
--- 172.129.85.4 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.107/0.120/0.143/0.020 ms
[root@ch1cp ~]#
From a Kubernetes standpoint, following must have been addressed and verified prior to driver installation. Most of them are not a requirement for installing Datera CSI driver itself but will ensure smooth installation and working. These must have been taken care of at the time of Kubernetes installation. Including here for completeness.
• Verify necessary packages exist, such as kubeadm, kubelet, kubelet, iscsi-initiator-utils / open-iscsi, docker, etc.• Ensure a Pod network such as Calico, Flannel, etc, is installed on the K8S cluster.• Verify multipath package (device-mapper-multipath) is installed and enabled on nodes.• Disable the firewall daemon and/or SELinux, and enable IPtables for IPv4 on all nodes.• Ensure the coredns deployment is patched in kube-system namespace with “allowPrivilegeEscalation”:
true and remove the line in coredns configmap which has ‘loop’ in it. This will avoid coredns pods from crashing.
• Ensure kubelet config has enableControllerAttachDetach: true. This is needed for CSI.• Enable the following feature gates on kube-apiserver. • VolumeSnapshotDataSource: true• ExpandCSIVolumes: true• ExpandInUsePersistentVolumes: true• Worker nodes have iscsiadm installed and can perform login to Datera target VIPs.
From the iSCSI standpoint, container-based scsi is no longer supported. The iSCSI daemon must be running on the worker nodes prior to installing the Datera CSI plugin/driver. If it is not running, check whether iscsi-initiator-utils / open-iscsi package is installed based on your distribution.
Datera CSI employs a host-based scsi solution, in which an iscsi-send binary is placed inside the csi-node driver pod and the iscsi-recv binary is run as a service on the host or worker node. The iscsi-recv can be run as a service as shown below.
$ git clone http://github.com/Datera/datera-csi
$ ./assets/setup _ iscsi.sh
[INFO] Dependency checking
[INFO] Downloading iscsi-recv
[INFO] Verifying checksum
[INFO] Changing file permissions
[INFO] Registering iscsi-recv service
Created symlink from /etc/systemd/system/multi-user.target.wants/iscsi-recv.service to /lib/systemd/
iscsi-recv.servicevloaded active running iscsi-recv container to host iscsiadm adapter service
The iscsi commands that are executed inside the driver pod are intercepted by iscsi-send program and sent to iscsi-recv program running on host through a UDC socket. The iscsi-recv would further depend on iscsi daemon on host for logins and sessions maintenance.
5. Datera CSI driver - InstallationThe driver installation is controlled by a yaml file that Datera provides. Download the latest yaml file from https://github.com/Datera/datera-csi/tree/master/deploy/kubernetes/release/1.0. At the time of this writing, v1.0.9 is the latest version, hence pick up csi-datera-secrets-1.0.9.yaml file. Check the README available at https://github.com/Datera/datera-csi for supported versions.
There are 2 yaml files for each Datera CSI driver version. One needs datera backend login credentials provided in clear text and the other needs login credentials provided as kubernetes secrets. If you decide to use the yaml which references secrets, then you must create the secrets prior to running the driver installation file.
- name: DAT _ USER
valueFrom:
secretKeyRef:
name: datera-secret
key: username
- name: DAT _ PASS
valueFrom:
secretKeyRef:
name: datera-secret
key: password
5.1. Create Secret
[root@ch3cp ~]# cat /tmp/csi-storage-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: datera-secret
namespace: kube-system
type: Opaque
data:
# base64 encoded username
# generate this via “$ echo -n ‘your-username’ | base64”
username: YWRtaW4=
# base64 encoded password
# generate this via “$ echo -n ‘your-password’ | base64”
Ensure that there is 1 csi-provisioner pod and ‘N’ number of csi-node pods in the kube-system namespace, where N = number of worker nodes. There are multiple sidecar containers in each of the Pod; each responsible for a different function such as volume snapshotting, volume resizing, volume attach/detach, etc.
The csi-provisioner and csi-node pods receives and responds to gRPC calls from Kubernetes. These gRPC calls are interpreted and translated to REST API calls to Datera storage backend by datera go-sdk. It is expected that K8S worker nodes are capable of performing iscsiadm logins to their Datera Targets.
CSI provisioner and node pods are installed in the kube-system namespace and any network policies that restrict traffic in and out of namespaces would also apply to CSI driver pods. Secondly, there is no node affinity specified for CSI provisioner pod (this is per CSI spec), which means that the provisioner pod can land up on either Master or Worker nodes. If colocation of CSI driver pods along with other application pods is undesired, then use Kubernetes scheduling policy features such as Node taints, cordoning, Pod anti-affinities, etc and/or modify the StatefulSet Pod spec in the CSI driver installation file (for ex: csi-datera-1.0.8.yaml).
DO NOT change the number of replicas (set to 1) on the StatefulSet controller. Only one CSI provisioner pod is expected to run per kubernetes cluster. This is again per CSI spec, for the reasons that running multiple CSI provisioner pods behind a load balancing service might end up provisioning or mounting the same volume at the same time and that can cause failures. are interpreted and translated to REST API calls to Datera storage backend by datera go-sdk. It is expected that K8S worker nodes are capable of performing iscsiadm logins to their Datera Targets.
CSI provisioner and node pods are installed in the kube-system namespace and any network policies that restrict traffic in and out of namespaces would also apply to CSI driver pods. Secondly, there is no node affinity specified for CSI provisioner pod (this is per CSI spec), which means that the provisioner pod can land up on either Master or Worker nodes. If colocation of CSI driver pods along with other application pods is undesired, then use Kubernetes scheduling policy features such as Node taints, cordoning, Pod anti-affinities, etc and/or modify the StatefulSet Pod spec in the CSI driver installation file (for ex: csi-datera-1.0.8.yaml).
DO NOT change the number of replicas (set to 1) on the StatefulSet controller. Only one CSI provisioner pod is expected to run per kubernetes cluster. This is again per CSI spec, for the reasons that running multiple CSI provisioner pods behind a load balancing service might end up provisioning or mounting the same volume at the same time and that can cause failures.
6. Datera CSI driver - Troubleshooting and Log collectionWhenever a problem is encountered such as volume provisioning or attach failures, snapshot failures, etc., collect the Datera CSI driver logs as shown below. The csi_log_collect.sh is available at https://github.com/Datera/datera-csi/tree/master/assets. When the script is executed, it will produce a tar.gz file with all the logs from datera CSI driver pods.
[root@ch3cp tmp]# ./csi _ log _ collect.sh
[INFO] Dependency checking
-P, --perl-regexp PATTERN is a Perl regular expression
8. Disaster recovery, Node failures, Node taintingDatera CSI driver pods (csi-provisioner and csi-node) are protected by Kubernetes node failure detection and recovery mechanisms. If the node carrying a driver pod fails, kubernetes will re-spawn the pod on another node. This is because the driver pods are managed by a replication controller. The csi-provisioner pod (although a single pod) is controlled by a StatefulSet and the csi-node pod is controlled by a DaemonSet.
If a csi-provisioner pod is dead, the kubernetes StatefulSet controller would bring up a new pod in a certain time (typically seconds). There is a possibility that a request for volume provisioning (create, delete, etc) would not be serviced by the CSI driver. In such cases, Kubernetes would make retry attempts to establish communication with the csi-provisioner pod.
Similarly, if a worker node dies and stays down, then the Kubernetes DaemonSet controller would kick in and try to establish communication with the csi-node pod. After a timeout, the node would be cordoned off for scheduling purposes and no volumes (old or new) will be available on that node. After the worker node is back online, the DaemonSet will ensure a csi-node pod will run on that node.
Note that there is no affinity set for csi-provisioner pod under the Statefulset and therefore it can be spawned on any of the master or worker nodes. This is per CSI design spec. The csi-node pods are spawned on worker nodes only, using a DaemonSet.
Node affinity, tainting, tolerations and cordoning must be carefully handled on the cluster. They affect the scheduling and placement of driver pods, just like any other application pods.
9.3 Datera volume Template overrideDatera provides storage templates that can be referenced and overridden using Kubernetes Storage class. In this example, the following parameters of the “basic_small” Datera template are overridden.
Template: basic_small
Parameter name Default value Overridden value
replica_count 2 1
placement_mode hybrid default
ip_pool default test
9.3.1. Create a StorageClass and override the template parameters
9.4 Volume attachment to Application PodsWith Datera CSI driver, the PVs are created dynamically whenever an Application is created with a PVC claim. The volume attachment is automatically handled during the creation of the Application Pod and the volume is available at the mount point specified in the Pod yaml manifest, for example.
Datera CSI driver would automatically perform a filesystem format based on the specification given in the StorageClass right after a Persistent volume is created. Formatting is done by the CSI driver at the time of volume provisioning and not during volume attachment.
In CSI terms, volume provisioning will mount the volume to a Staging path on the worker node (this is called Node staging). And volume attachment will mount the volume from staging path to the given target path inside the App pod (this is called Node publishing). Filesystem formatting happens during the Node staging phase.
9.4.1. Create a PVC
[root@ch3cp tmp]# cat csi-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: csi-pvc
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
storageClassName: dat-block-storage
[root@ch3cp tmp]#
[root@ch3cp tmp]# kubectl create -f csi-pvc.yaml
persistentvolumeclaim/csi-pvc created
[root@ch3cp tmp]#
[root@ch3cp tmp]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
Filesystem Type Size Used Available Use% Mounted on
/dev/mapper/36001405186c9c958a344beaaa252e1e3
ext4 975.9M 2.5M 906.2M 0% /data
[root@ch3cp tmp]#
9.5 Volume attachment to Application DeploymentDeployments are used for Stateless applications. Deployments can share an existing PV using ReadOnlyMany and ReadWriteMany access modes. ReadWriteOnly mode is NOT recommended for deployments because the replica pods need to attach and reattach to PV dynamically. If the first pod needs to be detached, the second pod needs to be attached first. However the second pod cannot attach to the PV because the first pod is still attached. This creates a deadlock scenario and neither pod can make progress. StatefulSets can be used to resolve this deadlock.
Note that although Kubernetes allows a single PV in ReadWriteMany mode to be attached to multiple replicas of deployment, the Application owner must take extreme care in allowing this shared Read/Writes to happen to Datera volumes. Some kind of database locking mechanism must be used and its implications must be fully understood.
A quick note on AccessModes in Kubernetes:
• RWO—ReadWriteOnce—Only a single node can mount the volume for reading and writing. • ROX—ReadOnlyMany—Multiple nodes can mount the volume for reading.• RWX—ReadWriteMany—Multiple nodes can mount the volume for both reading and writing.
Note that these access modes RWO, ROX, and RWX pertain to the number of worker nodes that can use the volume at the same time, not to the number of pods!
9.6 Volumes for Statefulset podsWhen your DB application needs to maintain its state in persistent volumes, managing it with a StatefulSet rather than a Deployment is the way to go. Unlike deployments, StatefulSets maintain a persistent identity for each Pod and create a unique PVC for each Pod. StatefulSets will bring up PVCs and Pods in an order. For example: The statefulset controller will create PVC-0 first. Then, Pod-0 is created and PVC-0 is attached to it. Once Pod-0 comes up, PVC-1 is created. Then, Pod-1 is created and PVC-1 is attached to it and so on. Each PVC creates a volume dynamically on the Datera cluster.
This combination of unique podnames and orderliness in Pod and PVC bring up is routinely used for cloud databases such as MongoDB which needs to establish a replication quorum and conduct primary election. The volumes mounted to MongoDB pods are backed by Datera for persistence.
When a StatefulSet is deleted, the order of Pod/PVC deletions are reversed. Your StatefulSet pod should reference the persistent volume claim templates and not a persistent volume claim (PVC). Think of volumeClaimTemplates as one that creates volume claims (PVCs). You would also normally create a “headless” service which frontends the Pods (not shown in the example below).
9.7 Volume resizing or expansionThis is a relatively new feature made available by kubernetes community and support for the feature is available from Datera CSI v1.0.9 onwards. To perform volume resizing, you would need to do the following:
To see the PVC requested size also change, restart the Pod by reducing the replicas to 0 and then back to 1 as shown below. This will also resize the filesystem.
After Pod restart, you would see both PVC and PV show 250Gi size, thus concluding Volume expansion. Check the volume size as seen from inside the deployed Pod.
9.7.9. Check the volume is resized
[root@master]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
9.8 Volume retentionIn Kubernetes, the Volume lifecycle is independent of Pod lifecycle. The lifespan of the persistent volumes is dictated by the reclamation policy of the Persistent Volume Claim and the default is to bind that lifespan to the lifespan of the Pod that creates the volume. This means that if Pods are deleted or gets deleted, then the volume is deleted as well. If this is not what an Application user needs, then consider changing the reclamation policy to indicate that the persistent volume should be retained.
9.8.1. Create a PVC and set reclamation policy to ‘Retain’
[root@master]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
9.9 Multi-tenancyThe Datera CSI driver is installed at Cluster level, which means tenants in all namespaces would use the same driver for Volume operations. In other words, there is one CSI driver instance per Kubernetes cluster.
9.10 Driver upgrade and downgradeUpgrades and downgrades are very simple. Perform the following 2 steps: