Kubernetes networking
Understanding how kubernetes implements networking
EKS and GKE make k8s networking some what transparent. You bring up a cluster and nodes, then create a deployment and by magic your pods have IP addresses and even a load balancer. Obviously this is assuming your VPCs and subnets are already in place, but the main point is that you never typically get to worry about the inner working on how networking is implemented within your k8s clusters. In this post, I will be studying how this networking magic happens behind the scenes.
Traditional host based networking
A little of bit of history is somehow warranted. VM or physical based networking was relatively easy to understand. You typically have one or more network interfaces attached to host, an IP address (IPv4/IPv6) is configured via DHCP or manually, a gateway is configured and this allowed the host to get connectivity to it’s local network and other networks. In this setup, the any TCP or UDP listener is bound to the IP address(s) configured on the host. So for example, a web server can be bound to port 80 and 443, other processes will need to bind to unused ports. As long as there is the reachability to that IP address you are good to go in terms of accessing the service hosted by the server. For IPv4 if the server is using RFC1818 addresses then NAT is typically employed to ensure external connectivity works.
Kubernetes networking model
In Kubernetes the role performed by VM or physical server when it comes to hosting an application is taken over by a pod. Pods are collection of one or more containers that share resources such as storage and network. So for example if you are deploying a web application that listens on port 80 and 443, this will be deployed as pod and will be given it’s own IP address. Each pod in k8s gets it own IP address. In a way you can say that pods replace the function of a VM when it comes to deploying your applications.
K8s networking documentation points to four main areas as distinct networking problems to address;
- Highly-coupled container-to-container communications
- Pod-to-Pod communications
- Pod-to-Service communications
- External-to-Service communications
Because the containers within a pod share the network namespace including IP address, this means that the containers within a pod can communicate with each other’s port on localhost
and this also means that containers within a pod must cordinate port usage. This is not much different from VM with a single namespace. This model is referred to as “IP-per-pod” in the k8s documentation.
K8s also imposes some requirements on any network implementation
- Pods on a node can communicate with all pods on all nodes without NAT
- Agents on a node (e.g. system daemons, kubelet) can communicate with all pods on that node
- Pods in the host network of a node can communicate with all pods on all nodes without NAT
Linux namespace
I have reffered to network namespace few times above, but I think this deserves a little explanation. Linux namespaces are key to how multiple pods scheduled on host are allowed to communicate with each other and with pods on other hosts.
From the manpages; namespaces is described as
A namespace wraps a global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource. Changes to the global resource are visible to other processes that are members of the namespace, but are invisible to other processes.
The default network namespace is referred to as the root namespace. This namespace owns the network interfaces that are bound to the machine. When pods are created a dedicated network namespace is created for each pod. Each pod’s ethernet interface is logically wired up to the root namespace via veth
. The veth devices are virtual Ethernet devices. They can act as tunnels between network namespaces to create a bridge to a physical network device in another namespace, but can also be used as standalone network devices.
To facilitate communication between pods in the same hosts and also to outside world, a bridge called cbr0
is used. Below is simple diagram that illustrates pod networking on a given k8s host.
Network plugins
K8s implements networking through network plugins. The predominant plugins are CNI (https://github.com/containernetworking/cni) based plugins are they are plenty of 3rd party plugins that can provide the networking needed by k8s. For example EKS (https://github.com/aws/amazon-vpc-cni-k8s) uses vpc-cni
to implement networking for pods running EKS clusters. Another common CNI pluggin is flannel
(https://github.com/coreos/flannel), this runs small binary called flanneld on each host and is responsible for allocating subnet lease to each host from a larger preconfigured address space. A full list of CNI plugins can be found at https://kubernetes.io/docs/concepts/cluster-administration/networking/#aws-vpc-cni-for-kubernetes.
So when a pod is scheduled, it’s the plugins responsibily to provide it the appropriate IP address. The plugins also maintain IP address management (IPAM) to keep track of the IP allocations to the pods.
Pod networking outside of the host
The pod subnets are typically not leaked to to the upstream leaf and spine switches. Figure 1, showed the hosts connecting to their respective leaf switches. In large scale deployments VXLAN is used as a dataplane overlay technology to allow pods deployed in different hosts to communicate with other. This is makes the underlay infrastructure transparent when it comes to inter pod communication. The underlay infrastructure will just facilitate high speed networking between the real host IP addresses connected to root namespace. Flanneld for example implements VXLAN to allow this communication to work.
Kubernetes service
Whilst pods host the containers that serve the applications, they are ephemeral in nature. These can come and ago and in order to ensure the service hosted by pod is always accessible k8s implements network service
construct. This an abstract way to expose an application running on a set of Pods as a network service.
Kubernetes ingress
Whilst k8 service provide an consistent entry into pods hosting an application an ingress provides service an external access to services
internet
|
[ Ingress ]
--|-----|--
[ Services ]
Ingress exposes HTTP and HTTPS routes from outside of the cluster to services within the cluster. Traffic routing is controlled by rules defined on the Ingress resource. Ingress controller such as ingress-nginx is needed for an Ingress to work. With Ingress, all entry into HTTP/HTTPS applications can follow same path and ingress will take care of routing traffic to the appropriate service. This allows us to have single load balancer bound to ingress have all the services be accessible behind that ingress as opposed to exposing each service to a load balancer.
Summary
There is a bunch of things I have still have not convered yet such as IPv6, endpoint slices etc. The following k8s docs page covers some of those in detail https://kubernetes.io/docs/concepts/services-networking/. I will attempt to add more content to this post as time permits.