Kubernetes ClusterIP Resilience During Availability Zone Failures
Table of Contents
Introduction #
I recently encountered a scenario where we needed to evaluate whether Kubernetes ClusterIP services provide sufficient resilience during availability zone failures. This analysis examines ClusterIP behavior during both node and AZ outages to determine if it meets our availability requirements.
Understanding ClusterIP: Creation and Traffic Flow #
To understand ClusterIP resilience during failures, we first need to grasp how ClusterIP services are created and how traffic flows through them in normal operations.
Service Creation Workflow #
When you create a ClusterIP Service1, Kubernetes orchestrates a multi-step process involving several core components:
-
kube-apiserver2 receives your
kubectl apply
request, validates it, and stores the Service object in etcd3 with an initially empty ClusterIP field. -
kube-controller-manager4 watches for new Service objects and allocates a unique internal IP from the cluster’s service CIDR range, then updates the Service object with this assigned ClusterIP5.
-
kube-proxy6 running on every node detects the new Service and creates iptables rules that map the ClusterIP to the actual pod IPs behind the service.
Traffic Routing Mechanics #
Once established, traffic routing follows a straightforward path:
- When a pod sends a request to a ClusterIP, it goes through the local node’s network stack
- The kube-proxy on that node has pre-configured iptables rules that intercept traffic destined for ClusterIPs
- These iptables rules perform load balancing and direct the traffic to one of the healthy backend pod IPs
- The request reaches the target pod directly, bypassing any central load balancer
This distributed approach means each node independently handles routing decisions, which becomes crucial during failure scenarios.
Node Failure Recovery Process #
- kube-apiserver stops receiving heartbeat signals from kubelet running on that node. The node is set to
NotReady
orUnknown
state. - All pods running on this node are considered dead.
- EndpointSlice Controller7 removes IP addresses of failed pods from the
EndpointSlice
object8 associated with the correspondingService
. It now contains only the IPs of healthy pods. - kube-proxy on all healthy nodes notice the changes to
Service
andEndpointSlice
objects. Then it re-evaluates its local iptables rules for the affected Service and remove the failed pod IPs.
Availability Zone Failure Recovery Process #
- The EKS control plane is distributed and replicated across multiple AZs, so the etcd cluster should have the latest state information.
- The healthy k8s nodes in other AZs follow the process described in the previous section.
- The AWS Load Balancer Controller will detect failure of all nodes in the AZ using its healthchecks and automatically stop routing traffic to them.
- The EKS Cluster Autoscaler will notice the node group size shrink, and provision new nodes in the healthy AZs to replace the lost capacity (this can take time, as we know EC2 instance creation takes a while)
- As new nodes are added, k8s will start provisioning new pods onto them to restore the original state.
Conclusion: Is ClusterIP resilient enough during AZ failures? #
Yes, ClusterIP services provide sufficient resilience for availability zone failures through Kubernetes’ built-in failover mechanisms and AWS EKS infrastructure.
-
Service: a method to expose an app with one or more Pods ↩︎
-
kube-apiserver: validates and configures data for k8s objects like pods, services, etc. Provides REST endpoints and acts as a frontend to cluster’s shared state. ↩︎
-
etcd cluster: key-value store used to store k8s cluster state. Not accessible in AWS EKS. ↩︎
-
kube-controller-manager: daemon that embeds the core control loops that monitor the shared state of the cluster and makes changes to move towards the desired state. Controllers included are: endpoints controller, namespace controller, replication controller, etc. ↩︎
-
ClusterIP: type of Service to provide private access to resources inside the k8s cluster. ↩︎
-
kube-proxy: Running on each kube node as a DaemonSet. Reads the
Service
definition and maps the ClusterIP to the associated Pod IPs usingiptables
rules on all kube nodes. ↩︎ -
EndpointSlice Controller: Control loop process that tracks all changes to pods and assigns IPs to all pods. ↩︎
-
EndpointSlice: object that defines the IP of a pod. Each
Service
object contains anEndpointSlice
object for each pod associated to it. ↩︎
This article is part of the series: "Kubernetes"
- Part 1: Kubernetes ClusterIP Resilience During Availability Zone Failures (this article)