Structuring an Ansible Repo for Personal Infrastructure

This post comes as a follow up on my previous post about taking ownership of some the key internet services.

Finding a reasonable organization of the Ansible repo I use to manage my infrastructure was one of the things I had to struggle with when learning Ansible. I couldn't find a nice way to organize it so that it was readable, easy to manage dependencies across roles and easy to deploy.

Initial Approach

My first attempt was to create a playbook for each service. The playbook would basically setup the service in a set of hosts which were also specified in the playbook.

This put a lot of responsibilty on each playbook as it was expected that after running it, the service would be up and running with all the setup done.

What ended up happening was that the playbook was doing a lot more things that just setting up a particular application. It would need to modify several things in order to get the service fully functional end-to-end.

Looking at an example will make it clear what I mean. In order to deploy a ghost blog, if we make no assumptions about the host we need to:

  1. Setup docker on the host
  2. Pre-setup configuration for the container: create folder structure & create env file.
  3. Create the docker container
  4. Install certbot on the host
  5. Create the certificate using certbot
  6. Install nginx on the host
  7. Configure nginx site to reverse proxy to the ghost container.

Out of the 8 steps I listed only 2 are actually dealing with the ghost blog service, the rest are actually handling plumbing to make the site accessible or installing dependencies.

What this does is create lots and lots of coupling. The playbook itself needs to have tons of knowledge about the environment where it runs in order to do things properly. Imagine what happens when you have 2, 3 or N services that use docker, TLS and a reverse proxy. The playbooks now need to handle different initial states and react accordingly. They will also need to do everything in a compatible way thus creating coupling across disjoint services that the only thing in common they have is that they run on the same host and share some common infrastructure.

I realized all these flaws as soon as I wanted to add a second service to the repo. So I decided to ditch the approach and think about how to structure this in a better way.

Decoupling: Abstractions & Single Responsibility Principle in Ansible

I discussed what I was facing with a friend who has a lot more experience with Ansible than I do. What he said shifted my way of thinking into the right direction.

He told me not to think about the service as the whole end-2-end cohesive unit but rather the host itself. With that change of perspective, things became a lot clearer.

By adding an extra layer of abstraction, the service playbook should only be cohesive within itself. That is, it can make assumptions about its environment, do whatever it needs to setup the service and then just make sure it fulfills a set of post-conditions. Leaving it up to the consumer to guarantee the pre-conditions are met prior to executing the service playbook and then make something useful out of the new state.

This is really similar to how I approach Software Engineering, make each one of the components do something very specific and clearly defined but asking for anything else that it requires as a dependency and leaving it to higher level components to compose multiple of these to do something useful.

Almost all problems can be simplified by adding more layers of abstraction and reducing the complexity of each one of the smaller parts.

The example given above can actually be divided into 5 components:

  • Docker on the host setup: this component guarantees that docker is properly setup in the host.
  • Certificate management component: this one will make sure that certbot is installed and that the certificates are created.
  • Ghost Blog: This component is reduced to creating the required configuration structure then launching the docker container.
  • Nginx Component: Makes sure nginx is installed and adds all the required configuration.
  • Host: Higher level component that makes sure that uses each one of the previous components to get a relevant entity out of the smaller parts.

By having it split into these components, when I add a second host for a different service that runs in docker, requires TLS and has an HTTP interface, I can reuse all the low-level components except the Ghost Blog one with a different configuration.

Another advantage of following this pattern, is that each component is much easier to maintain: smaller footprint and a clearer purpose and objective.

I have been talking about service playbooks. But it would be incorrect to have these components as playbooks as Ansible already has an abstraction for this: roles. At first I thought roles were for more complex things, but that is actually not true. The same thing actually happened to me as a software engineer: at some point, I thought classes had to do more to merit defining one, but that is also not true.

This is nothing more than applying the software engineering's Single Responsibility Principle to infrastructure management through Ansible.

In order to apply composition at the host level, we are going to use a play that applies to the host and just lists the roles the host needs to have in the appropriate order. A play for a host would look something like:

  - hosts: host-1.example.com
    become: yes
    roles:
      - base
      - jnv.unattended-upgrades
      - geerlingguy.pip
      - geerlingguy.docker
      - geerlingguy.certbot
      - docker-networks
      - role: ansible-ghost
        vars:
          - ghost_blog: "{{ my_personal_blog_ghost }}"
      - role: ansible-ghost
        vars:
          - ghost_blog: "{{ secondary_blog_ghost }}"
      - role: nginxinc.nginx
        vars:
          nginx_http_template_enable: true
          nginx_http_template: "{{ nginx_template_config }}""
      - manala.cron

Putting it all together

The way I decided to structure my ansible repo is basically to have a site.yml playbook. This file that contains a play for each host under my control and describes it in a declarative manner like shown above.

By using a single playbook format, deploying my whole infrastructure is as simple as running a single ansible-playbook command.
Ansible also allows to limit the hosts from the command line, thus applying changes to a single host is just as easy.

Overall, my ansible repository structure looks something like this:

personal-infrastructure
├── files
│   ├── host-name.example.com
│   │   └── ...
│   └── ...
├── host_vars
│   ├── host-name.example.com
│   │   ├── vars
│   │   └── vault
│   └── ...
├── inventory.yml
├── roles
│   ├── role-1
│   │   └── ...
│   ├── ...
│   └── role-n
└── site.yml

The inventory.yml basically contains the set of hosts that I am managing through this repo with some extra options.

The host_vars/<hostname> directories are used to store the the variables for the roles that are going to be applied to each hostname. It allows to keep the site.yml file clean from all configurations while having all configs for the same host together. This makes it easier to manage it: knowing the ports used, the nginx configs for each site, etc.

In each one of those directories two files are used: the vars which will contain all the variables to be used by the roles and the vault file which will be encrypted using ansible-vault. This file will contain all the sensitive information to be used in the roles for the host.

One neat thing to do is to assign the sensitive values in the vault file to variables suffixed with _vault, then define the proper variable name in the vars file using the <name>_vault as its value. In this way it is clearer what all the required variables for the role are and the information is not exposed in any way.

Lastly, I use the files/<hostname> folders to hold all the files I need to copy for the roles on a per host basis. These are usually configuration files. If the content has sensitive information these can be encrypted using vault too and ansible is going to be smart enough to decrypt them when uploading them to the host.

Conclusion

What I wanted to achieve with this post is to transmit the way I've structured my personal infrastructure ansible repository in order to make it simple to manage and maintain it.
But most importantly I wanted to state the reasons and the path that led me to reach this layout.

I hope the readers find it useful and take it into account when structuring their own repos. In my following posts I will be detailing the way I've structured the roles I had to write to deploy the dockerized services.

Own your services, own your infrastructure and most importantly own your data! Let's decentralize the internet one step at a time.