Debugging

Troubleshoot Terraform errors and unexpected behavior

9 min read

Debugging

In the previous tutorial, we built a whole testing toolkit to catch problems early. But let's be honest — things will still go wrong. Terraform will throw cryptic errors. Resources won't create. State will get weird. It happens to everyone.

Let's learn how to diagnose and fix common problems so you're not panicking at 2 AM.

Enable Logging

"How do I see what Terraform is actually doing under the hood?"

Basic Logging

# Enable detailed logs
export TF_LOG=DEBUG
terraform apply

# Or inline
TF_LOG=DEBUG terraform apply

Log Levels

From "tell me everything" to "only scream if something's on fire":

LevelDescription
TRACEMost verbose, includes all calls
DEBUGDetailed debugging info
INFOGeneral operational entries
WARNWarnings only
ERRORErrors only

Log to File

export TF_LOG=DEBUG
export TF_LOG_PATH=./terraform.log
terraform apply

Provider-Specific Logging

"What if I only want logs from the AWS provider, not all the noise?"

# Only AWS provider logs
export TF_LOG_PROVIDER=DEBUG
terraform apply

Common Errors

Here's the hall of fame. You will hit these. Let's learn to fix them fast.

"Resource Already Exists"

Error: error creating S3 bucket: BucketAlreadyExists

Problem: The resource exists in AWS but not in Terraform state. Classic "someone created it manually" situation.

Solutions:

  1. Import the existing resource:
terraform import aws_s3_bucket.data my-bucket-name
  1. Use a different name:
resource "aws_s3_bucket" "data" {
  bucket = "my-bucket-name-v2"  # Different name
}
  1. Delete the existing resource (if safe):
aws s3 rb s3://my-bucket-name

"Resource Not Found"

Error: error reading S3 bucket: NoSuchBucket

Problem: Terraform state says "this exists" but AWS says "never heard of it." The opposite of the previous error.

Solution: Remove from state:

terraform state rm aws_s3_bucket.data

"Cycle Detected"

Error: Cycle: aws_security_group.web, aws_security_group.db

Problem: Circular dependency. Resource A depends on B, and B depends on A. Terraform's brain short-circuits.

Solution: Break the cycle with separate rule resources:

# Instead of inline rules with circular references
resource "aws_security_group" "web" {
  name = "web"
  # No inline rules
}

resource "aws_security_group" "db" {
  name = "db"
  # No inline rules
}

# Add rules separately
resource "aws_security_group_rule" "web_to_db" {
  type                     = "egress"
  from_port                = 5432
  to_port                  = 5432
  protocol                 = "tcp"
  security_group_id        = aws_security_group.web.id
  source_security_group_id = aws_security_group.db.id
}

"Provider Configuration Not Present"

Error: Provider configuration not present

Problem: You're referencing a provider that isn't configured. Terraform's like, "Who is this?"

Solution: Add the provider:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-west-2"
}

"Invalid Reference"

Error: Reference to undeclared resource

Problem: Typo in a resource reference. We've all been there.

Solution: Check your spelling (dude, seriously):

# Wrong
subnet_id = aws_subnt.public.id  # Typo!

# Right
subnet_id = aws_subnet.public.id

"Inconsistent Dependency Lock File"

Error: Inconsistent dependency lock file

Problem: Your lock file doesn't match the required providers. Usually happens when someone updated providers without committing the lock file.

Solution:

terraform init -upgrade

Or delete and regenerate:

rm .terraform.lock.hcl
terraform init

State Problems

"My state is messed up. What do I do?!"

Deep breaths. Let's fix it.

State Lock Stuck

Error: Error acquiring the state lock
Lock ID: xxx

Problem: A previous Terraform run crashed or timed out, leaving the lock behind. Like someone leaving their stuff on a gym machine.

Solution:

# Force unlock (use carefully!)
terraform force-unlock LOCK_ID

State Corruption

Symptoms: Unexpected drift, missing resources, strange errors. Basically, Terraform is confused.

Solutions:

  1. Refresh state (the gentle approach):
terraform refresh
  1. Pull fresh state:
terraform state pull > state.json
# Inspect state.json
  1. Last resort — recreate state (the nuclear option):
# Remove all from state
terraform state list | xargs -n1 terraform state rm

# Re-import everything
terraform import aws_vpc.main vpc-123
# ... repeat for all resources

State Drift

"Terraform plan is showing changes I didn't make!"

Someone (or something) changed resources outside of Terraform. It happens.

Solutions:

  1. Accept the change (update config):
resource "aws_instance" "web" {
  tags = {
    SomeTag = "Added outside Terraform"
  }
}
  1. Revert the change (apply Terraform config):
terraform apply
  1. Ignore the attribute:
lifecycle {
  ignore_changes = [tags["SomeTag"]]
}

Debugging Techniques

"Okay, but how do I actually figure out what's going on?"

Here's your debugging toolkit.

Targeted Apply

Test one resource at a time instead of the whole enchilada:

terraform apply -target=aws_vpc.main
terraform apply -target=aws_subnet.public

Plan Output

Save and inspect plan:

terraform plan -out=tfplan
terraform show -json tfplan > plan.json

# Pretty print
cat plan.json | jq '.resource_changes[] | {address, change: .change.actions}'

State Inspection

# List all resources
terraform state list

# Show specific resource
terraform state show aws_instance.web

# Export full state
terraform state pull > state.json

Console

Interactive expression testing — great for "what does this value look like?" moments:

terraform console

> var.environment
"dev"

> aws_instance.web.public_ip
"54.123.45.67"

> [for s in aws_subnet.public : s.id]
["subnet-123", "subnet-456"]

> length(var.subnets)
3

Graph

Visualize dependencies — super helpful when you're trying to understand the cycle error from earlier:

terraform graph | dot -Tsvg > graph.svg

Verbose Plan

# Show all attributes, including computed
terraform plan -detailed-exitcode

Provider-Specific Debugging

AWS

"How do I debug AWS-specific issues?"

# AWS SDK logging
export AWS_SDK_LOAD_CONFIG=1
export TF_LOG=DEBUG

# Check credentials
aws sts get-caller-identity

Common AWS Errors

"Access Denied"

# Check IAM permissions
aws iam simulate-principal-policy \
  --policy-source-arn arn:aws:iam::123456789:user/terraform \
  --action-names ec2:CreateVpc \
  --resource-arns "*"

"InvalidParameterValue"

# Usually wrong region or invalid values
TF_LOG=DEBUG terraform apply 2>&1 | grep -i "request"

Testing Changes Safely

"I'm scared to apply this. How do I not break production?"

Plan Before Apply (Always)

This isn't optional. This is survival.

terraform plan
# Review carefully!
terraform apply

Use Workspaces for Testing

terraform workspace new test
# Make changes
terraform apply
# If good, switch back
terraform workspace select prod
terraform apply

Dry Run with -refresh-only

# See what's different without making changes
terraform apply -refresh-only

Recovering from Mistakes

"I just destroyed something I shouldn't have. Help!"

Don't panic. Let's fix it.

Wrong Resource Destroyed

If you accidentally destroyed a resource:

  1. Check if it can be recovered (S3 versioning, RDS snapshots)
  2. Remove from state: terraform state rm aws_thing.name
  3. Re-create with apply

Applied Wrong Configuration

# Revert to previous state
# (if you have backups or S3 versioning on state bucket)

# Or fix config and reapply
terraform apply

Force Replace

# Force recreation of specific resource
terraform apply -replace=aws_instance.web

Debugging Checklist

When something goes wrong, follow this recipe:

  1. Read the error message — It usually tells you exactly what's wrong (seriously, read it)
  2. Enable DEBUG loggingTF_LOG=DEBUG
  3. Check stateterraform state list, terraform state show
  4. Validate configterraform validate
  5. Try targeted applyterraform apply -target=resource
  6. Check provider docs — Arguments, requirements, limitations
  7. Check the cloud console — The resource might exist or have issues
  8. Google the error — Someone's probably seen it before (you're never the first)

Preventing Problems

The best debugging session is the one you never have.

Version Pinning

Pin everything so updates don't surprise you:

terraform {
  required_version = "~> 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

Lock File

Commit .terraform.lock.hcl to version control. Seriously, do it.

State Backups

terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "terraform.tfstate"
    region         = "us-west-2"
    encrypt        = true
    
    # Enable versioning on the bucket!
  }
}

Review Plans

"Can I just auto-approve everything?"

In CI, sure. For manual runs? Read the plan.

# Always review before apply
terraform plan

# Require approval in CI
terraform apply -auto-approve  # Only in automted pipelines

Use Modules

Tested, reusable modules reduce bugs. Don't reinvent the wheel:

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.1.0"  # Pinned!
}

Example Debug Session

Let's walk through a real debugging scenario, step by step:

# Something went wrong
$ terraform apply
Error: error creating EC2 Instance: InvalidAMIID.NotFound

# Enable logging
$ export TF_LOG=DEBUG
$ terraform apply 2>&1 | tee terraform.log

# Search for the request
$ grep "amis" terraform.log
... "ImageId": "ami-12345678" ...

# Check if AMI exists
$ aws ec2 describe-images --image-ids ami-12345678
No images found

# AMI doesn't exist in this region
# Fix: use data source to find correct AMI
data "aws_ami" "ubuntu" {
  most_recent = true
  owners      = ["099720109477"]
  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-*-22.04-amd64-server-*"]
  }
}

What's Next?

You're now equipped to tackle the weirdest Terraform errors. You learned:

  • How to enable and use Terraform logs
  • Common errors and their fixes
  • State troubleshooting
  • Debugging techniques (console, graph, targeted apply)
  • Provider-specific debugging
  • Recovering from mistakes
  • Prevention best practices

Remember: every Terraform expert has Googled "terraform error" at 2 AM. You're in good company.

Ready for the final boss? Let's learn how to automate everything with CI/CD for production deployments. Let's go!