DNS and Service Discovery

Understand how Kubernetes DNS works and how services discover each other.

9 min read

DNS and Service Discovery

In the previous tutorial, we explored service meshes — a whole infrastructure layer for managing service communication. But let's zoom back to something fundamental: when a pod needs to talk to another service, how does it actually find it?

You've been using service names like backend throughout this series. But have you wondered how curl http://backend actually works inside a cluster? There's no magic — it's DNS. Kubernetes has its own built-in DNS system, and understanding it is the key to debugging those "connection refused" mysteries that will inevitably haunt you.

How Kubernetes DNS Works

Every Kubernetes cluster runs a DNS server called CoreDNS that watches for new Services and creates DNS records for them. It's like an automatic phone book that updates itself.

Pod A wants to reach "backend"
        │
        ▼
┌───────────────┐
│    CoreDNS    │  ← Resolves "backend" to 10.96.45.123
└───────────────┘
        │
        ▼
Pod A connects to 10.96.45.123 (Service IP)
        │
        ▼
Service routes to backend pods

DNS Record Types

Kubernetes creates several types of DNS records. Here's the lineup:

A/AAAA Records (Services)

The bread and butter — maps service names to ClusterIPs:

<service-name>.<namespace>.svc.cluster.local → Service ClusterIP

Example:

backend.default.svc.cluster.local → 10.96.45.123

SRV Records (Ports)

_<port-name>._<protocol>.<service>.<namespace>.svc.cluster.local

Contains port number and target hostname.

Pod DNS Records

<pod-ip-dashed>.<namespace>.pod.cluster.local

Example (pod IP 10.244.1.5):

10-244-1-5.default.pod.cluster.local → 10.244.1.5

Test DNS Resolution

Let's actually see this in action. Spin up a debug pod:

kubectl run debug --rm -it --image=busybox -- /bin/sh

Inside the pod:

# Full DNS name
nslookup kubernetes.default.svc.cluster.local

# Short name (same namespace)
nslookup kubernetes

# Check DNS server
cat /etc/resolv.conf

Output of /etc/resolv.conf — this is where the magic lives:

nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

That search line is why you can just type backend instead of the full backend.default.svc.cluster.local. Kubernetes tries all those suffixes for you. How cool is that?

DNS Name Resolution Rules

Here's the cheat sheet for when to use what.

Same Namespace

Just use the service name. Keep it simple:

curl http://backend
curl http://backend:8080

Different Namespace

Include the namespace:

curl http://backend.production
curl http://backend.production.svc.cluster.local

Full Qualified Domain Name (FQDN)

curl http://backend.production.svc.cluster.local

The full address. Always works, regardless of which namespace you're in. Like using someone's full mailing address instead of just their first name.

The Search Path

"But wait, how does Kubernetes know what I mean when I just say 'backend'?"

Great question! Remember that search line in /etc/resolv.conf?

search default.svc.cluster.local svc.cluster.local cluster.local

When you resolve backend, Kubernetes tries all these in order:

  1. backend.default.svc.cluster.local ← usually finds it here
  2. backend.svc.cluster.local
  3. backend.cluster.local
  4. backend (falls through to external DNS)

It's like calling someone — first it tries the most likely option and works its way out.

ndots Setting (The Performance Trap)

options ndots:5

If a name has fewer than 5 dots, Kubernetes searches the entire search path first before trying external DNS. A name like backend (0 dots) uses the search path. A name like api.external.com (2 dots) also uses the search path first — which is wasteful.

To skip the search path and go directly to external DNS, append a trailing dot:

curl http://api.external.com.

That trailing dot is a pro tip that will make your external DNS lookups way faster.

Service Discovery Patterns

There are two ways services can discover each other. One is old-school, one is the right way.

Environment Variables (The Old Way)

Kubernetes injects service information as environment variables:

kubectl exec <pod-name> -- env | grep -i backend
BACKEND_SERVICE_HOST=10.96.45.123
BACKEND_SERVICE_PORT=80
BACKEND_PORT=tcp://10.96.45.123:80
BACKEND_PORT_80_TCP=tcp://10.96.45.123:80
BACKEND_PORT_80_TCP_PROTO=tcp
BACKEND_PORT_80_TCP_PORT=80
BACKEND_PORT_80_TCP_ADDR=10.96.45.123

Big limitation: Environment variables are set when the pod starts. If the service is created after the pod, the variables won't exist. Oops.

DNS (The Right Way)

DNS is dynamic — it always returns the current service IP:

import requests
response = requests.get("http://backend:8080/api")

DNS is the preferred method for service discovery. Always use DNS. Always.

Headless Services

"What if I need to know the IP addresses of individual pods, not just the service?"

That's what headless services are for. Normal services return the ClusterIP. Headless services return the pod IPs directly.

Create a headless service (clusterIP: None is the magic):

apiVersion: v1
kind: Service
metadata:
  name: backend-headless
spec:
  clusterIP: None
  selector:
    app: backend
  ports:
  - port: 80

DNS lookup returns all pod IPs:

nslookup backend-headless
Name:    backend-headless.default.svc.cluster.local
Address: 10.244.1.5
Address: 10.244.2.8
Address: 10.244.3.12

Use cases:

  • StatefulSets (each pod needs a unique identity — which we'll cover next!)
  • Client-side load balancing (you pick which pod to talk to)
  • Database clusters (need to know exactly which replica to connect to)

StatefulSet DNS

StatefulSets with headless services get predictable DNS names. This is huge for databases:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: db
spec:
  serviceName: db-headless  # Links to headless service
  replicas: 3
  selector:
    matchLabels:
      app: db
  template:
    metadata:
      labels:
        app: db
    spec:
      containers:
      - name: db
        image: postgres
        ports:
        - containerPort: 5432

Each pod gets a stable DNS name:

db-0.db-headless.default.svc.cluster.local
db-1.db-headless.default.svc.cluster.local
db-2.db-headless.default.svc.cluster.local

Connect to a specific replica:

psql -h db-0.db-headless

You can talk to any specific pod by name. No guessing, no randomness. Beautiful.

ExternalName Services

"Can I use Kubernetes DNS for external services too?"

Yep! Map a service name to an external DNS name:

apiVersion: v1
kind: Service
metadata:
  name: external-api
spec:
  type: ExternalName
  externalName: api.external-provider.com

Pods can use external-api as the hostname:

curl http://external-api/endpoint

Kubernetes DNS returns a CNAME record pointing to api.external-provider.com. Your app doesn't need to know it's talking to something outside the cluster. Sneaky and clean.

CoreDNS Configuration

Okay, for the curious (or the debugging-at-3am crowd), here's what's under the hood.

CoreDNS runs in the kube-system namespace:

kubectl get pods -n kube-system -l k8s-app=kube-dns

View the configuration:

kubectl get configmap coredns -n kube-system -o yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf
        cache 30
        loop
        reload
        loadbalance
    }

Custom DNS Entries

Add custom DNS records by editing the ConfigMap:

data:
  Corefile: |
    .:53 {
        # ... existing config ...
    }
    custom.local:53 {
        errors
        cache 30
        forward . 10.0.0.53
    }

Stub Domains

Forward specific domains to custom DNS servers:

data:
  Corefile: |
    .:53 {
        # ... existing config ...
        forward . /etc/resolv.conf {
           except mycompany.local
        }
    }
    mycompany.local:53 {
        forward . 10.0.0.53
    }

DNS Troubleshooting

DNS issues are responsible for approximately 147% of Kubernetes debugging sessions (okay, maybe not, but it sure feels like it).

Check if DNS is Working

kubectl run debug --rm -it --image=busybox -- nslookup kubernetes

Expected output:

Server:    10.96.0.10
Address:   10.96.0.10:53

Name:      kubernetes.default.svc.cluster.local
Address:   10.96.0.1

DNS Not Resolving

  1. Check CoreDNS is running:

    kubectl get pods -n kube-system -l k8s-app=kube-dns
    
  2. Check CoreDNS logs:

    kubectl logs -n kube-system -l k8s-app=kube-dns
    
  3. Verify service exists:

    kubectl get svc <service-name>
    
  4. Check endpoints:

    kubectl get endpoints <service-name>
    

DNS is Slow

"Everything works, but DNS lookups are taking forever."

That ndots:5 setting we mentioned earlier is probably the culprit. For external domains, use FQDN with trailing dot:

# Slow (tries search path first)
curl http://api.external.com

# Fast (skips search path)
curl http://api.external.com.

Or set custom DNS config on the pod:

spec:
  dnsConfig:
    options:
    - name: ndots
      value: "2"

Pod DNS Policy

Control how pods resolve DNS:

spec:
  dnsPolicy: ClusterFirst  # Default: use cluster DNS
PolicyBehavior
ClusterFirstUse cluster DNS, fall back to node DNS
DefaultUse node's DNS settings
ClusterFirstWithHostNetFor pods with hostNetwork: true
NoneUse custom dnsConfig only

Debugging DNS with dig

For more detailed DNS debugging, use a pod with dig:

kubectl run debug --rm -it --image=tutum/dnsutils -- /bin/bash
# Query A record
dig backend.default.svc.cluster.local

# Query SRV record
dig SRV _http._tcp.backend.default.svc.cluster.local

# Trace resolution
dig +trace backend.default.svc.cluster.local

DNS Caching

CoreDNS caches responses (default 30 seconds). This means:

  • DNS changes take up to 30 seconds to propagate
  • Failed lookups are also cached (which can be annoying)

View cache TTL in Corefile:

cache 30

⚠️ Applications may also cache DNS on their own. Java, for example, caches DNS indefinitely by default. Yes, forever. If your Java app can't find a service that clearly exists, check its DNS cache settings. We've all been there.

Practical Example: Multi-Tier App

Let's see how DNS ties a real application together:

# Frontend deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
  namespace: myapp
spec:
  replicas: 2
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - name: frontend
        image: myapp-frontend:v1
        env:
        - name: API_URL
          value: "http://backend:8080"  # DNS-based discovery
---
# Backend deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend
  namespace: myapp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: backend
  template:
    metadata:
      labels:
        app: backend
    spec:
      containers:
      - name: backend
        image: myapp-backend:v1
        env:
        - name: DATABASE_HOST
          value: "postgres.database"  # Service in different namespace
---
apiVersion: v1
kind: Service
metadata:
  name: backend
  namespace: myapp
spec:
  selector:
    app: backend
  ports:
  - port: 8080

Frontend finds backend via http://backend:8080 (same namespace, just the name). Backend finds database via postgres.database (service in database namespace, need to include the namespace). All DNS. No hardcoded IPs anywhere.

What's Next?

Awesome work! You now understand the phone book of Kubernetes — how services find each other, how DNS resolution works, and how to debug it when things go sideways.

But what about applications that need stable identities and persistent storage — like databases? Just using a Deployment won't cut it. In the next tutorial, we'll dive into StatefulSets — the right way to run stateful applications in Kubernetes. Let's go!