Terraform project structure

Recently I've been using Terragrunt and I have thoughts on what it offers and is it useful. My usage has been in an existing project that follows the Gruntworks guidelines closely and with the paid subscription to the Gruntworks library. These opinions are my own and they're based on my recent experience with Terragrunt as well as managing small and medium infrastructure with Terraform for the last few years both as a single developer and part of small team.

The main point of Terragrunt as I understand it is keeping from repeating yourself in code. I am not a fan of copying and pasting big blocks of code nor of having to change the same value in a few different places. So for me keeping code DRY is a worthwhile endeavor.

Keeping modules DRY

Terragrunt works by using modules. I like Terraform modules. Even the Terraform documentation suggests that you don't have a single top level module for your entire infrastructure. It makes development more difficult with more merge conflicts. It makes deploying for testing purposes more difficult because Terraform will keep trying to delete resources that aren't in your code (because someone else working in a different branch has made changes for some other reason). You can work around that by specifying the target you're interested in but that is error-prone and is tiresome after a while.

In a previous project I worked on we had a module for roughly each service. We had quite a lot of code that was copied from one module to another (like when creating a new RDS instance we also created the subnet group, the security group for the client, etc.). Over time we saw clearly what code was shared between the different modules, we created a library directory and started adding sub-modules there and after a while we had a nice library of reusable sub-modules and things were good.

Because we waited a bit before creating a new sub-module they were pretty stable. When we did have a change to the a sub-module that we wanted to deploy across the entire infrastructure, we would open a branch, work on all the needed changes there, test them in one of the testing environments and then open a PR that has all of the changes (the sub-module changes, the calling modules changes, any fallout from those changes).

This process fitted us nicely. The PR had the entire picture and we could really see if the change improved anything (like adding an output to a module to be used in a different module would be clear if you see it being used). We did on occasion had conflicting changes and we did had to use targeted plan and apply but as far as I can remember less than once a quarter.

Terragrunt recommends splitting the repository in 2, one for sub-modules and one for actually deployed modules. Then you create terragrunt.hcl files that list the sub-modules needed with the Git ref used. This allows you to use the RDS database sub-module from today but the auto-scaling group from last year. I see little point in this.

The change process goes as follows, 1 PR for the sub-modules repository and 1 for the live repository (or more, we haven't gotten around to discussing environments yet) Now I hear that the recommendation has changed. The new recommendation is that each sub-module will be in a separate repository. So more PRs for each change (that one change of adding an output and using it became less obvious but requires more work, I wouldn't call it a win). I wonder if there's any place that has 2 repositories, 1 for code 1 for the tests and you change the code, and when it's merged you go to the tests repo and update the tests there to use the new code to see if it passes?

Another outcome from this way of working I keep seeing is that because changes are not applied (or planned) before merging the changes to the sub-module, errors and issues are only found out later which triggers more PRs.

Environments, remote states and workspaces, oh my

Another way that Terragrunt keeps your code DRY is by generating the Terraform backend configuration, because you can't use variables there with Terrafrom. So you save less than 10 lines. Cool. Also, you won't have by accident (because you copied that code from another module) used the same location for 2 modules and have them delete each others resources. It happened to me more than once, but you see it clearly when you first run terraform plan so it's very easy to catch.

Now, the folks at Gruntworks suggest you create a directory for each environment. From what I can see, that means you copy your terragrunt.hcl file to each directory and you modify it slightly (I think you can see where I'm going with this). If your project has a different module for each environment, this is a win. no doubt about it. I've seen projects like that and it's really a pain to manage.

Before I ever heard about Terragrunt, I had this exact problem. I solved it using Terraform workspaces and a simple convention. Each environment would have its own workspace (let's say that the default workspace is the sandbox but that's up to you). Each module would have a bunch of tfvars files for each environment. The workflow for deploying to the dev environment would look like this:

terraform workspace select dev
terraform plan -tfvars dev.tfvars -out tfplan
# Review the changes.
terraform apply tfplan

For making life a little easier I also added the following snippet to each module:

locals {
  module = "${basename(path.module)}"
  env    = "${terraform.workspace == "default" ? "sandbox" : terraform.workspace}"
}

Yes, this is copied code and along with the backend configuration, over 10 lines of code mostly that is mostly duplicated. However, when I compare it to the terragrunt.hcl files, this is peanuts. I checked a few modules in the codebase I'm working on and we have terragrunt.hcl files that are 100s of lines long and share all but a few lines.

I found that this convention is easy to document, easy for new developers to pick up, uses existing tools so you can use your existing knowledge and all of the benefits of avoiding to use another tool in your workflow.

Workflow

Terragrunt builds a directory for each module (and each environment obviously), clones the Git repos you mentioned with refs you specified and then mucks about with the Terraform commands and plan files to stich everything togethere. Even on paper this doesn't look like a good idea and it isn't one in practice, making debugging issues difficult.

It also suggests that you can have different versions of the sub-modules in use across different environments, putting emphais on having the main branch match exactly what is each environment instead of putting emphasis of avoiding drift between the different environments.

Conclusions

This post is a critique of the Gruntworks recommended setup and workflow and I think that if you read it all you would see that I think that there are better and easier ways. You can compare Terragrunt to a badly managed Terraform project and find that it helps you. But when you compare to it one that uses the suggested convention, it makes things more difficult, doesn't deliver on the promise of keeping your code DRY and promotes bad habits.

I didn't plan on reviewing Terragrunt until I used it. Terragrunt makes life less enjoyable. It has a convoluted workflow locally (with those bloody git clones), it makes debugging issues difficult and the upside is just not there. I would recommend to anyone who thinks about adopting Terrgrunt to first read the workspaces documentation before going with Terragrunt and think hard on the code review, the testing and development workflows.