Pod Disruption Budgets
Ensure application availability during voluntary disruptions with Pod Disruption Budgets.
Pod Disruption Budgets
In the previous tutorial, we learned about init containers and sidecars — building smarter, more capable pods. Now let's talk about keeping those pods alive during cluster maintenance.
Here's the scenario: you need to drain a node for an OS upgrade. Kubernetes starts evicting pods. But wait — what if it evicts too many at once and your app goes down? That's not maintenance, that's an outage.
Pod Disruption Budgets (PDBs) ensure a minimum number of pods stay available during voluntary disruptions. Think of them as saying "you can evict pods, but you must leave at least this many running."
Types of Disruptions
Involuntary Disruptions
Things outside Kubernetes' control — the "acts of God" category:
- Hardware failure (server catches fire, metaphorically)
- Kernel panic
- Node out of resources
- Cloud provider issues
PDBs can't prevent these. When hardware dies, pods just die with it. No amount of YAML can fix a dead server.
Voluntary Disruptions
Planned operations that Kubernetes controls:
- Node drain (
kubectl drain) - Cluster autoscaler scale-down
- Rolling deployments
- Manual pod deletion
PDBs protect against these by limiting how many pods can be evicted at once. This is where the magic happens.
How PDBs Work
Admin: kubectl drain node-1
│
▼
┌───────────┐
│ Eviction │ Check PDB
│ API │──────────────┐
└───────────┘ │
│ ▼
│ ┌────────────┐
│ │ PDB │
│ │ minAvail=2 │
│ └────────────┘
│ │
▼ ▼
┌─────────┐ Pod count: 3
│ Evict? │◄────────Available: 3
└─────────┘ Can evict: 1
│
▼
Evict 1 pod, wait, then evict next
PDBs don't block eviction entirely — they ensure enough pods remain available. It's like a fire marshal saying "you can let people leave the building, but the building must always have at least 2 security guards."
Create a Pod Disruption Budget
Two approaches — same result, different perspective.
Using minAvailable
"At least this many pods must stay running":
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: api
With 3 replicas and minAvailable: 2, only 1 pod can be evicted at a time. Simple math: 3 - 2 = 1 allowed disruption.
Using maxUnavailable
"At most this many pods can be down":
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
spec:
maxUnavailable: 1
selector:
matchLabels:
app: api
With 3 replicas and maxUnavailable: 1, 2 must stay available.
Percentage Values
You can also use percentages — great when replica count varies:
spec:
minAvailable: "50%"
or
spec:
maxUnavailable: "25%"
Percentages are useful when replica count varies.
Example Setup
Let's see this in action. Create a deployment with a PDB:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web
spec:
replicas: 3
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: web
Apply:
kubectl apply -f web-with-pdb.yaml
Check PDB status:
kubectl get pdb
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
web-pdb 2 N/A 1 30s
ALLOWED DISRUPTIONS: 1 means one pod can be safely evicted right now. If that drops to 0, evictions are blocked until a pod comes back. Kubernetes is being responsible for once.
Testing PDBs
Let's see PDBs in action.
Drain a Node
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data
Kubernetes evicts pods one at a time, respecting the PDB. It waits for the evicted pod to be rescheduled and ready before evicting the next one. Patient and responsible.
Watch Evictions
kubectl get pods -w
You'll see pods evicted and rescheduled, but always maintaining at least 2 running. It's like watching a carefully choreographed dance.
Check PDB Events
kubectl describe pdb web-pdb
Events section shows eviction decisions.
PDB with Different Workloads
StatefulSet
For databases, you usually only allow 1 pod unavailable to maintain quorum:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: postgres-pdb
spec:
maxUnavailable: 1
selector:
matchLabels:
app: postgres
For databases, usually only allow 1 pod unavailable to maintain quorum.
DaemonSet
DaemonSets typically don't use PDBs since they have exactly one pod per node. When you drain that node, the pod has to go. There's no other choice.
Single-Pod Deployments
"What if I only have 1 replica?"
You can do this:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: singleton-pdb
spec:
maxUnavailable: 0
selector:
matchLabels:
app: singleton
maxUnavailable: 0 blocks all voluntary evictions. The drain will hang forever until:
- You scale up the deployment
- You delete the PDB
- You force drain with
--force
⚠️ Warning: This can block cluster maintenance indefinitely. Your ops team will not be happy. Use with extreme caution.
Real-World Configurations
Here's what PDBs look like in the real world.
High Availability API
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
replicas: 5
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: api
image: api:v1
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: api
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
spec:
minAvailable: "60%" # At least 3 of 5 pods
selector:
matchLabels:
app: api
Combined with topology spread constraints, this ensures pods are spread across availability zones AND maintain 60% availability during disruptions. Belt and suspenders.
Database Cluster
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: etcd
spec:
replicas: 3
serviceName: etcd
selector:
matchLabels:
app: etcd
template:
metadata:
labels:
app: etcd
spec:
containers:
- name: etcd
image: quay.io/coreos/etcd:v3.5.0
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: etcd-pdb
spec:
minAvailable: 2 # Maintain quorum (3/2 + 1 = 2)
selector:
matchLabels:
app: etcd
etcd requires quorum (majority) for writes. With 3 nodes, at least 2 must be available. If 2 go down, the cluster stops accepting writes. This PDB ensures that never happens during maintenance.
Cache Tier
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: redis-pdb
spec:
maxUnavailable: "50%"
selector:
matchLabels:
app: redis
Caches can tolerate more disruption since data can be rebuilt. If a Redis pod goes down, the cache warms up again. Not ideal, but not catastrophic.
PDB and Rolling Updates
"Do PDBs affect my regular deployments too?"
Yes! Deployments have their own disruption handling via maxUnavailable in the update strategy:
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
PDBs apply during rolling updates too. If PDB is stricter than the deployment strategy, PDB wins:
- Deployment says:
maxUnavailable: 2 - PDB says:
maxUnavailable: 1 - Result: Only 1 pod replaced at a time (PDB is the boss)
Unhealthy Pod Eviction Policy
"What about unhealthy pods? Can I evict those even if the budget is exhausted?"
Kubernetes 1.26+ added control over unhealthy pod eviction:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: api
unhealthyPodEvictionPolicy: AlwaysAllow
Options:
IfHealthyBudget(default): Only evict unhealthy pods if budget allowsAlwaysAllow: Always allow unhealthy pod eviction
AlwaysAllow is super useful — it lets you quickly remove stuck/unhealthy pods that are blocking node drains. Because protecting broken pods doesn't help anyone.
Monitoring PDBs
Current Status
kubectl get pdb -A
NAMESPACE NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
default api-pdb 2 N/A 1 1h
default web-pdb N/A 1 1 1h
ALLOWED DISRUPTIONS shows how many pods can currently be evicted. If this is 0, voluntary evictions are blocked. Keep an eye on this during maintenance windows.
Detailed Status
kubectl describe pdb api-pdb
Status:
Current Healthy: 3
Desired Healthy: 2
Disruptions Allowed: 1
Expected Pods: 3
Observed Generation: 1
Watch for Issues
kubectl get pdb -w
If ALLOWED DISRUPTIONS drops to 0, something is preventing evictions. Time to investigate.
Troubleshooting
Node Drain Stuck
The classic PDB gotcha:
kubectl drain node-1 --ignore-daemonsets
# Hangs...
Check blocking PDBs:
kubectl get pdb -A
# Look for ALLOWED DISRUPTIONS: 0
Solutions:
- Scale up the deployment (more pods = room to evict)
- Delete the PDB temporarily (nuclear option but sometimes necessary)
- Force drain:
kubectl drain node-1 --force(loses graceful eviction — pods just get killed)
PDB Blocks All Evictions
"My PDB has ALLOWED DISRUPTIONS: 0. What gives?"
If minAvailable equals or exceeds the replica count, you've locked yourself out:
spec:
replicas: 2
---
spec:
minAvailable: 2
This allows 0 disruptions. Math doesn't lie: 2 pods - 2 minimum = 0 allowed. Fix by lowering minAvailable or raising replicas.
Check What Pods Match
kubectl get pods -l app=api
Make sure the PDB selector actually matches the right pods. A typo in the label selector means the PDB protects... nothing.
Best Practices
The wisdom section. Learn from other people's 3 AM pager alerts.
Do
- Use PDBs for production workloads
- Set
minAvailableless than replica count - Use percentages for variable replica counts
- Combine with topology spread for HA
Don't
- Set
maxUnavailable: 0(blocks maintenance — you will regret this) - Set
minAvailableequal to replicas (same problem, different syntax) - Forget PDBs exist when troubleshooting why your drain is stuck (it's always a PDB)
Recommended Configurations
Here's a cheat sheet:
| Workload Type | Replicas | PDB Setting |
|---|---|---|
| Stateless API | 3+ | maxUnavailable: 1 |
| Database (3-node) | 3 | minAvailable: 2 (quorum) |
| Database (5-node) | 5 | minAvailable: 3 (quorum) |
| Batch workers | any | maxUnavailable: 50% |
| Singleton | 1 | No PDB (or prepare for blocked drains) |
Clean Up
kubectl delete pdb web-pdb api-pdb
kubectl delete deployment web api
What's Next?
Congratulations! You've completed the entire Kubernetes tutorial series! 🎉
Let's recap everything you've learned:
- Getting started: What Kubernetes is, setting up a local cluster, understanding pods
- Workloads: Deployments, StatefulSets, DaemonSets, Jobs, CronJobs
- Networking: Services, Ingress, Network Policies, DNS, Service Mesh
- Configuration: ConfigMaps, Secrets, Labels, Namespaces
- Storage: Persistent Volumes and Claims
- Reliability: Health checks, Resource limits, Pod Disruption Budgets
- Patterns: Init containers, Sidecars
You went from "what even is Kubernetes?" to building production-grade configurations with health checks, network policies, and disruption budgets. That's a serious journey.
From here, explore:
- Helm for package management
- RBAC for security
- Horizontal Pod Autoscaling for automatic scaling
- Custom Resource Definitions for extending Kubernetes
You've got the foundation. Now go build something awesome!
Workload Patterns section complete! You now understand:
- StatefulSets for databases and stateful apps
- DaemonSets for node-level services
- Init Containers and Sidecars for pod composition
- Pod Disruption Budgets for availability during maintenance