Cluster Maintenance in Kubernetes
OS Upgrades
Consider we have a cluster with few nodes and pods serving the applications on those nodes.
What happens when one of the nodes goes down?
- The pods on them will not be accessible.
If the node comes back immediately, then the kubelet process starts and the pods come back online.
However, if the node was down for more than five minutes, then the pods are terminated from that node. Kubernetes considers them as dead.
If the pods were part of ReplicaSet, then they are recreated on other nodes.
The time it waits for a pod to come back online is known as the pod-eviction-timeout and is set on the controller manager with a default value of five minutes.
So, whenever a node goes offline, the master node waits for up to five minutes before considering the node dead.
When the node comes back online, after the pod-eviction-timeout, it comes up blank without any pod scheduled on it.
If we are not sure if a node is going to be back online in five minutes, then there is a safer way to do it.
We can purposefully drain the nodes of all the workloads so that the workloads are moved(terminated and created on another node) to other nodes in the cluster.
When we drain a node, the pods are gracefully terminated from the nodes that they are on and recreated on another.
The node is also cordoned or marked as unschedulable, meaning no pods can be scheduled on this node until we specifically remove the restriction.
Now the pods are safe on the other node, we can reboot the first node.
When it comes back online, it is still unschedulable. You then need to uncordon it so that the pods can be scheduled on it again.
Remember, the pods that were moved to the other node don't automatically fall back.
If any of those were deleted or if new pods were created in the cluster, then they would be created on this node.
Apart from drain and uncordon, there is also another concept called cordon.
Cordon simply marks the node unschedulable.
Unlike drain, it does not terminate or move the pods on an existing node.
It simply makes sure that the new pods are not scheduled on that node.
Commands
kubectl drain <node_name>
kubectl cordon <node_name>
kubectl uncordon <node_name>
Cluster Upgrade Process
Usually, core control plane components in Kubernetes have the same version (kube-apiserver, controller-manager, kube-scheduler, kubelet, kube-proxy). But it is not mandatory.
Since the kube-apiserver is the primary component in the control plane, and that is the component that all other components talk to,none of the other components should ever be at a version higher than the kube-apiserver.
The controller-manager and kube-scheduler can be at one version lower. i.e. if kube-apiserver is at version x, controller-manager and kube-scheduler can be at x-1 and the kubelet and kube-proxy components can be at two version lower, x-2.
None of them could be at a version higher than the kube-apiserver.
But this is not the case with kubectl. The kubectl can be at one version higher or one version lower i.e. x+1 or x-1.
At any time, Kubernetes supports up to the recent 3 minor versions.
The recommended approach is to upgrade one minor version at a time, like version 1.10 to 1.11 then 1.11 to 1.12, then 1.12 to 1.13.
Backup and Restore
Backup Candidates
Resource Configuration
The declarative approach is the preferred approach for creating the application using Kubernetes (Kubernetes Object Definition File).
We should store the resource configuration files on a source code repository like GitHub, which manages and maintains the versions and backup of files.
But there may be a case, where a team member does the deployment using a imperative way.
So a better approach to backing up resource configuration is to query the KubeAPI Server.
Query the KubeAPI Server using the kubectl command, and save all resource configurations for all objects created on the cluster as a copy.
For Ex. One of the commands that can be used in a backup script is
kubectl get all --all-namespaces -o yaml > all-deploy-service.yaml
It takes all the deployments, pods, services in all namespaces using kubectl get all command, and extracts the output in a YAML format and saves the file.
VELERO formally known as ARK by HeptIO can be used to take the backups of Kubernetes cluster (third-party solution).
ETCD Cluster
ETCD cluster includes all the information about the cluster, i.e. all information about the pods, deployments, services, etc.
So instead of backing up the resource configuration, we can take a backup of ETCD cluster itself.
As we know, ETCD is hosted on the master node. While configuring etcd, we can specify a location where all the data would be stores, i.e. data directory.
This is the directory that can be configured to be backed up by the backup tool.
ETCD also comes with a built-in snapshot solution.
We can take the snapshot of the etcd database by using the etcdctl utility's snapshot save command. snapshot.db is the snapshot name.
etcdctl snapshot save snapshot.db
After this, a snapshot file is created by the name snapshot.db in the current directory.
If we want it to be created in another location, specify the full path along with the name.
We can view the status of the backup using snapshot status command.
etcdctl status snapshot.db
To restore this cluster from backup at a later point in time:
First stop the Kube API Server service, as the restore process will require you to restart the ETCD cluster, and the Kube API Server depends on it.
Then run the etcdctl snapshot restore command with the path set to the path of the backup file.
When ETCD restores from backup, it initializes a new cluster configuration and configures the members of ETCD as new members to a new cluster.
This is to prevent a new member from accidentally joining an existing cluster.
On running this command, a new data directory is created.
We then configure the ETCD configuration file to use the new data directory.
After this, reload the service daemon, and restart the etcd service and kubeapi service.
With all the ETCD commands, we have specify the certificate files for authentication, specify the endpoint to the ETCD cluster and the ca-certificate, the etcd-certificate and the key.
Persistent Volumes