Scheduling in Kubernetes

Manual Scheduling

  • Every pod has a field named nodeName, which by default is not set. Kubernetes adds it automatically.

  • The scheduler goes through all the pods and looks for those that do not have this property set. Those are the candidates for scheduling.

  • It then identifies the right node for the pod by running a scheduling algorithm.

  • Once identified, it schedules the pod on the node by setting the nodeName property to the name of the node by creating a binding object.

  • If there is no scheduler to schedule nodes, the pods continue to be in a pending state.

  • In such cases, you can manually assign pods to nodes yourself.

  • Without a scheduler, the easiest way to schedule a pod is to simply set the nodeName field to the name of the node in your pod specification file while creating the pod.

  • Kubernetes won't allow to modify the nodeName property of a pod. So another way to assign a node to an existing pod is to create a binding object and send a POST request to the pod's binding API.

Labels and Selectors

Labels

  • Labels are properties attached to each item.

  • In a pod definition file, under metadata, create a section called labels.

  • Under that, add the labels in a key-value format.

  • You can add as many labels as you like.

Selectors

  • Once the pod is created, to select the pod with the labels, use the kubectl get pods command along with the --selector option and specify the label.

Use case of Labels and Selectors

  • Kubernetes objects use labels and selectors internally to connect different objects.

  • Ex. for ReplicaSet, to connect the ReplicaSet to the pod, we configure the selector field under the ReplicaSet specification to match the labels defined on the pod.

Annotations

  • While labels and selectors are used to group and select objects, annotations are used to record other details for informatory purposes.

  • Ex. tool details like name, version, build information, contact details etc.

Taints and Tolerations

  • Taints and Tolerations have nothing to do with security or intrusion on the cluster.

  • Taints and Tolerations are used to set restrictions on what pod can be scheduled on a node.

  • Example:

    • Suppose we have 1 worker node and 3 pods (A, B, C) that need to be scheduled on the given node.

    • First, we prevent all pods from being placed on the node by placing a taint on the node.

    • By default, pods have no tolerations, which means unless specified otherwise, none of the pods can tolerate any taint. So in this case, none of the pods can be placed on the node, as none of them tolerates the taint.

    • Next, we want to schedule/place pod C on the given node, so we add toleration to pod C.

    • So now when the scheduler tries to set pod C on the node, it goes through.

    • Taints are set on nodes and Tolerations are set on pods.

    • Taints and Tolerations do not tell the pod to go to a particular node. Instead, it tells the node to only accept pods with certain tolerations.

    • When the Kubernetes cluster is first set up, a taint is set on the master node automatically that prevents any pods from being scheduled on this node.

Taint Effects

  • The taint effect defines what would happen to the pods if they do not tolerate the taint.

  • There are 3 taint effects.

    1. NoSchedule: The pods will not be scheduled on the node.

    2. PrefereNoSchedule: The system will try to avoid placing a pod on the node but that is not guaranteed.

    3. NoExecute: New pods will not be scheduled on the node and existing pods on the node, if any, will be evicted if they do not tolerate the taint. These pods may have been scheduled on the node before the taint was applied on the node.

Taint Commands

  1. kubectl taint nodes <node_name> <key=value:taint-effect>

    • To taint a node.

    • kubectl taint nodes node01 app=blue:NoSchedule

  2. kubectl taint nodes <node_name> <key=value:taint-effect->

    • To remove taint from a node.

    • kubectl taint nodes node01 app=blue:NoSchedule-

Add Tolerations

  • In the spec section of the pod definition file, add a section called tolerations and move the same values used while creating the taint.

  • All of these values need to be encoded in double-quotes.

Node Selectors

  • This is a simple Pod scheduling feature that allows scheduling a Pod onto a node whose labels match the nodeSelector labels specified in the Pod definition file.

  • To use labels in a nodeSelector, you must have first labelled your nodes before creating the pod.

  • Node Selectors have limitations, you cannot provide advanced expressions like or, not with it.

Label Nodes

  • kubectl label nodes <node_name> <label_key>=<label_value>

  • Ex. kubectl label nodes node01 size=large

Node Affinity

  • This is the enhanced version of the nodeSelector which offers a more expressive syntax for fine-grained control of how Pods are scheduled to specific nodes.

Node Affinity Types

  • Available

    1. requiredDuringSchedulingIgnoredDuringExecution

    2. preferredDuringSchedulingIgnoredDuringExection

  • Planned (may come in future)

    1. requiredDuringSchedulingRequiredDuringExecution

Resource Requirements and Limits

Resource Requests

  • Kubernetes defines requests as a guaranteed minimum amount of a resource to be used by a container.

  • It will set the minimum amount of the resource for the container to consume.

Resource Limits

  • Kubernetes defines limits as a maximum amount of a resource to be used by a container.

  • This means that the container can never consume more than the CPU amount indicated.

  • But it can consume more memory than the limit and will ultimately throw OOM (Out Of Memory) error. Also known as Exceed Limits.

Default Behavior

  • By default, Kubernetes does not have a CPU or memory request or limit set.

  • This means any pod can consume as many resources as required on any node and suffocate other pods or processes that are running on the node of resources.

  • The most ideal behaviour is to set the requests and no limits for all the pods/containers in a cluster as it will let the containers which have extra resource requirements can use the resources set for the other containers if they are not using them.

Limit Range

  • Limits Ranges can help you define default values to be set for containers in pods that are created without a request or limit specified in the pod-definition files.

Resource Quotas

  • Resource Quota is a namespace-level object that can be created to set hard limits for requests and limits.

Daemon Sets

  • Daemon Sets are like ReplicaSets, it helps you deploy multiple instances of pods. But it runs one copy of your pod on each node in your cluster.

  • Whenever a new node is added to the cluster, a replica of the pod is automatically added to that node. And when a node is removed, the pod is automatically removed.

  • The Daemon Sets ensure that one copy of the pod is always present in all nodes in the cluster.

  • Ex. Say you need to deploy a monitoring agent or logger on each of your nodes in the cluster, A DaemonSet is perfect for that.

  • DaemonSet definition file has an almost exact structure like ReplicaSet, except that the kind is a DaemonSet.

Daemon Set Commands

  1. kubectl create -f daemonset-definition.yml

    • To create a daemon set.
  2. kubectl get daemonset | daemonset | ds

    • To get the list of created daemon sets.
  3. kubectl delete daemonset <daemonset_name>

    • To delete the defined daemon set with all the underlying pods.
  4. kubectl describe daemonset <daemonset_name>

    • To describe the given daemon set.

Static Pods

  • The pods that are created by the kubelet on their own, without intervention from the API server or the rest of the Kubernetes cluster components, are known as Static Pods.

  • For this, we have to place the pod-definition files in the designated directory. Kubelet periodically checks the directory and creates the pods and manages it as well.

  • The kubelet agent is responsible to watch each static Pod and restart it if it crashes.

  • You can only create pods this way. You cannot create ReplicaSet, Deployments or Services.

  • The kubelet works at a pod level and can only understand pods, which is why it can create static pods this way.

  • Designated Folder: It can be any directory on the host, and the location of that directory is passed into the kubelet as an option (--pod -manifest-path) while running the service OR pass the path of the file in --config option and set the location path in that file with key staticPodPath.

  • We can view the created static pods using docker ps command.

  • /var/lib/kubelet/config.yaml - inside this config file, we get to see a static pod folder path staticPodPath: /etc/kubernetes/manifests.

  • Also, the static pods name are trailed by controlplane node name.

Static Pods VS Daemon Sets

  1. Static PODs

    • Created by the kubelet.

    • Deploy Control Plane components as Static Pods.

    • Ignored by the Kube-Scheduler.

  2. DaemonSets

    • Created by Kube-API server (DaemonSet-Controller).

    • Deploying monitoring agents, and logging agents on nodes.

    • Ignored by the Kube-Scheduler.