Pod Disruption Budgets

Ensure application availability during voluntary disruptions with Pod Disruption Budgets.

10 min read

Pod Disruption Budgets

In the previous tutorial, we learned about init containers and sidecars — building smarter, more capable pods. Now let's talk about keeping those pods alive during cluster maintenance.

Here's the scenario: you need to drain a node for an OS upgrade. Kubernetes starts evicting pods. But wait — what if it evicts too many at once and your app goes down? That's not maintenance, that's an outage.

Pod Disruption Budgets (PDBs) ensure a minimum number of pods stay available during voluntary disruptions. Think of them as saying "you can evict pods, but you must leave at least this many running."

Types of Disruptions

Involuntary Disruptions

Things outside Kubernetes' control — the "acts of God" category:

  • Hardware failure (server catches fire, metaphorically)
  • Kernel panic
  • Node out of resources
  • Cloud provider issues

PDBs can't prevent these. When hardware dies, pods just die with it. No amount of YAML can fix a dead server.

Voluntary Disruptions

Planned operations that Kubernetes controls:

  • Node drain (kubectl drain)
  • Cluster autoscaler scale-down
  • Rolling deployments
  • Manual pod deletion

PDBs protect against these by limiting how many pods can be evicted at once. This is where the magic happens.

How PDBs Work

Admin: kubectl drain node-1
          │
          ▼
    ┌───────────┐
    │ Eviction  │  Check PDB
    │   API     │──────────────┐
    └───────────┘              │
          │                    ▼
          │             ┌────────────┐
          │             │    PDB     │
          │             │ minAvail=2 │
          │             └────────────┘
          │                    │
          ▼                    ▼
    ┌─────────┐          Pod count: 3
    │ Evict?  │◄────────Available: 3
    └─────────┘          Can evict: 1
          │
          ▼
    Evict 1 pod, wait, then evict next

PDBs don't block eviction entirely — they ensure enough pods remain available. It's like a fire marshal saying "you can let people leave the building, but the building must always have at least 2 security guards."

Create a Pod Disruption Budget

Two approaches — same result, different perspective.

Using minAvailable

"At least this many pods must stay running":

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: api

With 3 replicas and minAvailable: 2, only 1 pod can be evicted at a time. Simple math: 3 - 2 = 1 allowed disruption.

Using maxUnavailable

"At most this many pods can be down":

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: api

With 3 replicas and maxUnavailable: 1, 2 must stay available.

Percentage Values

You can also use percentages — great when replica count varies:

spec:
  minAvailable: "50%"

or

spec:
  maxUnavailable: "25%"

Percentages are useful when replica count varies.

Example Setup

Let's see this in action. Create a deployment with a PDB:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: web

Apply:

kubectl apply -f web-with-pdb.yaml

Check PDB status:

kubectl get pdb
NAME      MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
web-pdb   2               N/A               1                     30s

ALLOWED DISRUPTIONS: 1 means one pod can be safely evicted right now. If that drops to 0, evictions are blocked until a pod comes back. Kubernetes is being responsible for once.

Testing PDBs

Let's see PDBs in action.

Drain a Node

kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data

Kubernetes evicts pods one at a time, respecting the PDB. It waits for the evicted pod to be rescheduled and ready before evicting the next one. Patient and responsible.

Watch Evictions

kubectl get pods -w

You'll see pods evicted and rescheduled, but always maintaining at least 2 running. It's like watching a carefully choreographed dance.

Check PDB Events

kubectl describe pdb web-pdb

Events section shows eviction decisions.

PDB with Different Workloads

StatefulSet

For databases, you usually only allow 1 pod unavailable to maintain quorum:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: postgres-pdb
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: postgres

For databases, usually only allow 1 pod unavailable to maintain quorum.

DaemonSet

DaemonSets typically don't use PDBs since they have exactly one pod per node. When you drain that node, the pod has to go. There's no other choice.

Single-Pod Deployments

"What if I only have 1 replica?"

You can do this:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: singleton-pdb
spec:
  maxUnavailable: 0
  selector:
    matchLabels:
      app: singleton

maxUnavailable: 0 blocks all voluntary evictions. The drain will hang forever until:

  • You scale up the deployment
  • You delete the PDB
  • You force drain with --force

⚠️ Warning: This can block cluster maintenance indefinitely. Your ops team will not be happy. Use with extreme caution.

Real-World Configurations

Here's what PDBs look like in the real world.

High Availability API

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 5
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
      - name: api
        image: api:v1
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: api
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: "60%"  # At least 3 of 5 pods
  selector:
    matchLabels:
      app: api

Combined with topology spread constraints, this ensures pods are spread across availability zones AND maintain 60% availability during disruptions. Belt and suspenders.

Database Cluster

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: etcd
spec:
  replicas: 3
  serviceName: etcd
  selector:
    matchLabels:
      app: etcd
  template:
    metadata:
      labels:
        app: etcd
    spec:
      containers:
      - name: etcd
        image: quay.io/coreos/etcd:v3.5.0
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: etcd-pdb
spec:
  minAvailable: 2  # Maintain quorum (3/2 + 1 = 2)
  selector:
    matchLabels:
      app: etcd

etcd requires quorum (majority) for writes. With 3 nodes, at least 2 must be available. If 2 go down, the cluster stops accepting writes. This PDB ensures that never happens during maintenance.

Cache Tier

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: redis-pdb
spec:
  maxUnavailable: "50%"
  selector:
    matchLabels:
      app: redis

Caches can tolerate more disruption since data can be rebuilt. If a Redis pod goes down, the cache warms up again. Not ideal, but not catastrophic.

PDB and Rolling Updates

"Do PDBs affect my regular deployments too?"

Yes! Deployments have their own disruption handling via maxUnavailable in the update strategy:

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1

PDBs apply during rolling updates too. If PDB is stricter than the deployment strategy, PDB wins:

  • Deployment says: maxUnavailable: 2
  • PDB says: maxUnavailable: 1
  • Result: Only 1 pod replaced at a time (PDB is the boss)

Unhealthy Pod Eviction Policy

"What about unhealthy pods? Can I evict those even if the budget is exhausted?"

Kubernetes 1.26+ added control over unhealthy pod eviction:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: api
  unhealthyPodEvictionPolicy: AlwaysAllow

Options:

  • IfHealthyBudget (default): Only evict unhealthy pods if budget allows
  • AlwaysAllow: Always allow unhealthy pod eviction

AlwaysAllow is super useful — it lets you quickly remove stuck/unhealthy pods that are blocking node drains. Because protecting broken pods doesn't help anyone.

Monitoring PDBs

Current Status

kubectl get pdb -A
NAMESPACE   NAME        MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
default     api-pdb     2               N/A               1                     1h
default     web-pdb     N/A             1                 1                     1h

ALLOWED DISRUPTIONS shows how many pods can currently be evicted. If this is 0, voluntary evictions are blocked. Keep an eye on this during maintenance windows.

Detailed Status

kubectl describe pdb api-pdb
Status:
  Current Healthy:   3
  Desired Healthy:   2
  Disruptions Allowed: 1
  Expected Pods:     3
  Observed Generation: 1

Watch for Issues

kubectl get pdb -w

If ALLOWED DISRUPTIONS drops to 0, something is preventing evictions. Time to investigate.

Troubleshooting

Node Drain Stuck

The classic PDB gotcha:

kubectl drain node-1 --ignore-daemonsets
# Hangs...

Check blocking PDBs:

kubectl get pdb -A
# Look for ALLOWED DISRUPTIONS: 0

Solutions:

  1. Scale up the deployment (more pods = room to evict)
  2. Delete the PDB temporarily (nuclear option but sometimes necessary)
  3. Force drain: kubectl drain node-1 --force (loses graceful eviction — pods just get killed)

PDB Blocks All Evictions

"My PDB has ALLOWED DISRUPTIONS: 0. What gives?"

If minAvailable equals or exceeds the replica count, you've locked yourself out:

spec:
  replicas: 2
---
spec:
  minAvailable: 2

This allows 0 disruptions. Math doesn't lie: 2 pods - 2 minimum = 0 allowed. Fix by lowering minAvailable or raising replicas.

Check What Pods Match

kubectl get pods -l app=api

Make sure the PDB selector actually matches the right pods. A typo in the label selector means the PDB protects... nothing.

Best Practices

The wisdom section. Learn from other people's 3 AM pager alerts.

Do

  • Use PDBs for production workloads
  • Set minAvailable less than replica count
  • Use percentages for variable replica counts
  • Combine with topology spread for HA

Don't

  • Set maxUnavailable: 0 (blocks maintenance — you will regret this)
  • Set minAvailable equal to replicas (same problem, different syntax)
  • Forget PDBs exist when troubleshooting why your drain is stuck (it's always a PDB)

Recommended Configurations

Here's a cheat sheet:

Workload TypeReplicasPDB Setting
Stateless API3+maxUnavailable: 1
Database (3-node)3minAvailable: 2 (quorum)
Database (5-node)5minAvailable: 3 (quorum)
Batch workersanymaxUnavailable: 50%
Singleton1No PDB (or prepare for blocked drains)

Clean Up

kubectl delete pdb web-pdb api-pdb
kubectl delete deployment web api

What's Next?

Congratulations! You've completed the entire Kubernetes tutorial series! 🎉

Let's recap everything you've learned:

  • Getting started: What Kubernetes is, setting up a local cluster, understanding pods
  • Workloads: Deployments, StatefulSets, DaemonSets, Jobs, CronJobs
  • Networking: Services, Ingress, Network Policies, DNS, Service Mesh
  • Configuration: ConfigMaps, Secrets, Labels, Namespaces
  • Storage: Persistent Volumes and Claims
  • Reliability: Health checks, Resource limits, Pod Disruption Budgets
  • Patterns: Init containers, Sidecars

You went from "what even is Kubernetes?" to building production-grade configurations with health checks, network policies, and disruption budgets. That's a serious journey.

From here, explore:

  • Helm for package management
  • RBAC for security
  • Horizontal Pod Autoscaling for automatic scaling
  • Custom Resource Definitions for extending Kubernetes

You've got the foundation. Now go build something awesome!

Workload Patterns section complete! You now understand:

  • StatefulSets for databases and stateful apps
  • DaemonSets for node-level services
  • Init Containers and Sidecars for pod composition
  • Pod Disruption Budgets for availability during maintenance