The Bucket That Must Exist Before Everything Else

How I created a dedicated S3 bucket to hold my Terraform remote backend, and every small lesson I learned along the way.

Introduction

When you start working with Terraform, one of the first real puzzles you hit is a classic chicken and egg situation.

Terraform needs a safe place to store its state file. The state file is a record of everything Terraform has created for you, like a memory of your infrastructure. The most common safe place to keep this file is an S3 bucket on AWS. S3 stands for Simple Storage Service, and a bucket is just a container where you store files in the cloud.

Here is the puzzle. Terraform wants to store its state in an S3 bucket, but that bucket does not exist yet. So how do you create the very bucket that Terraform itself needs in order to remember what it created?

The answer is a small, separate Terraform setup called a bootstrap. You run it once, it creates the bucket, and then you never touch it again. In this post I will walk you through exactly how I built mine for my project, and I will explain every piece of jargon the first time it shows up. This is written for developers who are new to AWS, so nothing here assumes prior cloud knowledge.

What a Bootstrap Actually Is

A bootstrap is a tiny Terraform project whose only job is to create the foundation that the main project depends on. In our case the foundation is a single S3 bucket that will store the remote backend.

A backend is simply where Terraform keeps its state. A local backend keeps the state on your own computer. A remote backend keeps the state in the cloud, which is much safer and lets a whole team share it. We want a remote backend, and to have a remote backend we first need the bucket to hold it.

The golden rule of a bootstrap is this. Run it once, before anything else, then leave it alone forever.

The Folder Structure

I kept the bootstrap deliberately small. Here is the entire layout.

bootstrap/
├── main.tf        # everything in one file
├── variables.tf
└── README.md

That is it. Three files. A lot of beginners expect more files because larger Terraform projects have many. But a bootstrap is special, and the missing files are missing on purpose. Let me explain why, because the reasoning here taught me a lot.

Why there is no backend.tf

A file called backend.tf is where you would normally tell Terraform to use a remote backend. We do not have one here, and that is intentional. The bucket does not exist yet, so we cannot point Terraform at it. Instead the bootstrap uses local state, meaning the state file sits on our machine for now. This is the one time local state is the correct choice.

Why there is no outputs.tf

An output in Terraform is a value you ask Terraform to print or pass along after it runs, like the name of something it just created. You might think we need to output the bucket name so the main project can read it. We do not. The bucket name is already known from our variables, and the backend.tf file in the main project is static. Static means it is fixed text that cannot read or react to outputs. So an output here would serve no purpose.

Why there is no providers.tf

A provider is the plugin that lets Terraform talk to a specific cloud, in this case AWS. In big projects people split the provider setup into its own providers.tf file for tidiness. Our bootstrap is so small that everything fits comfortably in main.tf, so there is no need to split it.

The Order of Blocks Inside main.tf

Inside main.tf the order of things matters for readability. I followed this order from top to bottom.

terraform block → provider block → data source → resources

The terraform block holds settings about Terraform itself, such as which version to use. The provider block configures AWS. A data source reads information that already exists rather than creating anything new. Resources are the things Terraform actually creates for you.

The Four Resources, and Nothing More

The bootstrap creates exactly four resources. Each one has a clear reason to exist.

The first is aws_s3_bucket. This is the bucket itself. I named it this in the code, which is a common Terraform convention when there is only one of something.

The second is aws_s3_bucket_versioning. Versioning means S3 keeps old copies of a file whenever it changes. This matters a lot for a state file. If the state ever gets corrupted, versioning lets you roll back to a healthy earlier copy. It is your safety net.

The third is aws_s3_bucket_public_access_block. This shuts the bucket off from the public internet. It has four separate arguments, and I set all four to true. The reason I did this explicitly, rather than trusting AWS to do it for me, leads to one of the biggest lessons in this whole project, which I will come back to shortly.

The fourth is aws_s3_bucket_server_side_encryption_configuration. Encryption scrambles the stored data so that only authorized access can read it. I chose AES256, which is a strong and standard encryption method. Again I set this explicitly rather than relying on any default.

Here is the complete main.tf so you can see how all four resources fit together with the terraform block, the provider, and the data source.

# Add AWS Provider
terraform {
  required_version = "~> 1.15.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 6.0"
    }
  }
}
 
# Configure the AWS Provider
provider "aws" {
  region = var.region
}
 
# AWS account id
data "aws_caller_identity" "current" {}
 
# Create s3 bucket
resource "aws_s3_bucket" "this" {
  bucket = "\({var.project}-statefile-\){data.aws_caller_identity.current.account_id}"
}
 
# Versioning bucket
resource "aws_s3_bucket_versioning" "this" {
  bucket = aws_s3_bucket.this.id
  versioning_configuration {
    status = "Enabled"
  }
}
 
# Block public access
resource "aws_s3_bucket_public_access_block" "this" {
  bucket = aws_s3_bucket.this.id
 
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}
 
# Encryption
resource "aws_s3_bucket_server_side_encryption_configuration" "this" {
  bucket = aws_s3_bucket.this.id
 
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

Notice the small detail inside each resource. The line bucket = aws_s3_bucket.this.id is how Terraform links resources together. It tells the versioning, public access, and encryption resources to attach themselves to the exact bucket we just created, rather than to some other bucket. This linking is how Terraform understands the order to build things in.

Reference image placeholder: a screenshot of the four resources listed in main.tf.

The Variables File

A variable is a named value you can reuse and change in one place. Here is the full variables.tf.

variable "region" {
  type        = string
  description = "AWS region where the statefile must exists"
  default     = "ap-south-1"
}
 
variable "project" {
  type        = string
  description = "Name of the project"
  default     = "todo-app"
}

The first variable is region. A region is the geographic location of your AWS data center. Mine is a string with a default of ap-south-1, which is the Mumbai region. A string simply means text.

The second variable is project. This is the name of the project, with a default of todo-app. A nice improvement you can add later is validation, which is a rule Terraform checks before running. A validation rule could force the name to use lowercase letters and hyphens only, which keeps bucket names clean and valid. I left it simple here, but it is worth knowing the option exists.

Notice what is missing. There is no account_id variable. Your account ID is the unique number that identifies your AWS account. Instead of typing it in by hand as a variable, where you could easily make a mistake, I read it automatically using a data source called data "aws_caller_identity". This always returns the correct account, so it can never be wrong.

How the Bucket Gets Its Name

S3 bucket names must be unique across all of AWS, not just within your account. So I built the name out of pieces that are guaranteed to be unique together.

\({var.project}-statefile-\){data.aws_caller_identity.current.account_id}

This combines the project name, the word statefile, and the account ID. Because the account ID is unique to me, the resulting name will not clash with anyone else.

One choice worth pointing out. I did not put the environment, such as dev or prod, into the name. One bucket serves all environments. This keeps things simple, and the different environments are separated inside the bucket by their file paths instead.

Version Constraints, and Why the Tiny Symbols Matter

In the terraform block I pinned the Terraform version like this.

required_version = "~> 1.15.0"

The little ~> symbol is called the pessimistic constraint operator. It controls how much Terraform is allowed to upgrade itself automatically. The exact form you write changes the meaning in a way that is easy to miss.

Writing ~> 1.15.0 allows only patch releases, meaning small bug fix updates like 1.15.1 or 1.15.2, but not 1.16.

Writing ~> 1.15 would also allow minor versions, meaning larger feature updates like 1.16 or 1.17.

These are very different. Patch only is cautious and safe. Minor allowed is more relaxed. The point is to pick one consciously rather than copying a symbol without understanding it.

A Small Rule About String Interpolation

Interpolation is the way Terraform inserts a variable value into a piece of text, using the ${...} syntax. There is a simple rule I learned to keep code clean.

Use the wrapped form "${var.region}" only when you are combining the variable with other text, like inside the bucket name above.

Use the plain form var.region when you are referring to a single variable on its own, with no surrounding text.

Wrapping a lone variable in ${...} for no reason just adds clutter, so I stopped doing it.

Problems I Hit, and What They Taught Me

This is the part I find most valuable, because every one of these came from getting something wrong first.

I tried to use a data source as the default value for a variable. This does not work. A default must be a static literal, meaning a fixed value typed directly in, not something that has to be looked up while Terraform runs.

I learned the account_id must be treated as a string, not a number. Account IDs can begin with a zero, and numbers drop leading zeros, which would silently break the value. Storing it as text keeps it intact.

I confirmed that backend.tf is static and is read during the init phase, which is the very first setup step when you run Terraform. Because it runs so early, it cannot use variables or outputs at all. That is exactly why the bootstrap uses local state instead.

I learned that providers.tf is also static and read at the same early init phase, for the same reason.

I learned never to rely on AWS defaults for security. AWS settings can change over time, and assuming a default protects you is risky. I now set encryption and public access blocking explicitly every single time, so the protection is written down and guaranteed.

And one more lesson that applies to everything, including advice from me. Always verify information. At one point I had a wrong belief about which Terraform version existed, and checking it directly set me straight. Trust, but confirm.

Git Rules

Git is the tool that tracks changes to your code. When using it, some files should be saved and shared, while others must never leave your machine because they contain secrets or local state. Here is how I split them.

# commit
*.tf
.terraform.lock.hcl
*.tfvars.example
 
# ignore
.terraform/
*.tfstate
*.tfstate.backup
*.tfvars

The files under commit are safe to share. The lock file pins the exact provider versions so everyone uses the same ones. The example file shows the shape of variable values without revealing real ones.

The files under ignore must stay private. The state files describe your live infrastructure and can hold sensitive details. The .tfvars files often contain real secret values. The .terraform folder is just local cache that does not belong in version control.

The Run Order

Finally, here is the exact order to run everything. The bootstrap goes first, the main project follows.

cd bootstrap → terraform init → terraform apply
cd environments/dev → terraform init → terraform apply

The command terraform init prepares the working folder and downloads what Terraform needs. The command terraform apply actually builds the resources. You run them in the bootstrap folder once to create the bucket, then move into your real environment and run them again to build the rest of your infrastructure, now safely backed by the remote state bucket you just made.

Reference image placeholder: a terminal screenshot showing terraform apply finishing successfully in the bootstrap folder.

Conclusion

The bootstrap is small, but it solves a real problem that confuses almost everyone at the start. You cannot store Terraform state in a bucket that does not exist, so you build that bucket once with a tiny standalone setup that uses local state, and then you never look back.

Along the way I learned that the missing files in a bootstrap are missing on purpose, that security should always be explicit rather than assumed, that tiny version symbols carry real meaning, and that even confident advice should be checked. None of these are hard once you see the reasoning, and together they make your foundation solid.

If you are new to AWS and Terraform, I hope walking through my journey makes your own first bootstrap far less mysterious than mine was.

Contact

If you have questions or want to share your own setup, feel free to reach out at khantanseer43@gmail.com.

The Bucket That Must Exist Before Everything Else

How I created a dedicated S3 bucket to hold my Terraform remote backend, and every small lesson I learned along the way.

Introduction

What a Bootstrap Actually Is

The Folder Structure

Why there is no backend.tf

Why there is no outputs.tf

Why there is no providers.tf

The Order of Blocks Inside main.tf

The Four Resources, and Nothing More

The Variables File

How the Bucket Gets Its Name

Version Constraints, and Why the Tiny Symbols Matter

A Small Rule About String Interpolation

Problems I Hit, and What They Taught Me

Git Rules

The Run Order

Conclusion

Contact

Comments

More from this blog

Who Wins the Variable Fight in Terraform?

Terraform outputs.tf Explained: What It Is, When to Use It, and When to Skip It

Using Terraform and GitHub Actions for Drift Detection and Correction

Mastering GitOps: Utilizing Terraform and ArgoCD for EKS Deployment

Command Palette

How I created a dedicated S3 bucket to hold my Terraform remote backend, and every small lesson I learned along the way*.*

Introduction

What a Bootstrap Actually Is

The Folder Structure

Why there is no backend.tf

Why there is no outputs.tf

Why there is no providers.tf

The Order of Blocks Inside main.tf

The Four Resources, and Nothing More

The Variables File

How the Bucket Gets Its Name

Version Constraints, and Why the Tiny Symbols Matter

A Small Rule About String Interpolation

Problems I Hit, and What They Taught Me

Git Rules

The Run Order

Conclusion

Contact

Comments

More from this blog

How I created a dedicated S3 bucket to hold my Terraform remote backend, and every small lesson I learned along the way.