Galaxy GitOps [Part 1]

In this 7 part mini-series about Galaxy, my general-purpose cloud server I need to keep track of all my changes I make on my server, to make it easy to change anything from anywhere, or migrate easily. To do this, I have chosen to use GitOps.

The first question is, What is GitOps?

To put it simple, it's using version control (in this case Git) to manage an Infrastructure as Code (IaC) codebase and automate the deployment of the code. There are many different IaC tools we can use, such as Kubernetes, Puppet and Chef but here I have chosen to use Ansible. I picked Ansible as I have used it before, but not in a GitOps settings. I have minimal experience with Ansible, but I understand most of the underlying concepts with Ansible.

Ansible is a tool that allows you to installation and configuration of applications on many different machines at once. Ansible itself is just a collection of modules which you run in a command line, but you can also join a bunch of commands together into what is called a Playbook. Ansible also requires an inventory, basically a file which contains a list of servers to connect to, and any parameters to set specific to a host, or group. Groups are just a collection of hosts that should all do the same thing.

Now, I would have been fine with writing an inventory file by hand, but once I looked at it the file looked to long for my liking, with the majority of it filled with DNS information. This is why I chose to use the dynamic inventory feature of Ansible, where basically instead of giving the inventory to Ansible in a YAML file, we have it run a script to get the inventory. I chose to write my script in python, as its a very flexable language. Now, my script in no way is the best script you will ever see, infact its probably one of the worst you will see. There are many ways I could make it better but what matters is that it works.

# inventory/inventory.py

import os
import sys
import argparse
import config.dns
import config.secret

try:
    import json
except ImportError:
    import simplejson as json

class MyInventory(object):

    def __init__(self):
        self.inventory = {}
        self.read_cli_args()
        self.load_secrets()

        if self.args.list:
            self.inventory = self.get_inventory()
        elif self.args.host:
            self.inventory = {'_meta': {'hostvars': {}}}
        else:
            self.inventory = {'_meta': {'hostvars': {}}}

        print(json.dumps(self.inventory, default=lambda d: d.__dict__))

    def load_secrets(self):

        KEYFILE = os.path.join(os.path.dirname(
            os.path.realpath(__file__)), "secret.key")
        SECRETSFILE = os.path.join(os.path.dirname(
            os.path.realpath(__file__)), "secret.json")

        secrets = json.load(open(SECRETSFILE))

        key = b""

        if os.environ.get('SECRETS_KEY') is not None:
            key = os.environ.get('SECRETS_KEY')
        elif os.path.exists(KEYFILE):
            with open(KEYFILE) as k:
                key = k.read()
        else:
            raise Exception(
                "Could not find a suitable key to decrypt secrets")

        self.secrets = config.secret.Secrets(secrets, key)

    def get_inventory(self):
        HOME_SERVER = '---'
        CLOUD_SERVER = '---'

        return {
            'master': {
                'hosts': [CLOUD_SERVER],
                'vars': {
                    'dns_nameservers': config.dns.DNSNAMESERVERS,
                    'dns_transfers': config.dns.DNSTRANSFERS,
                    'webmaster_email': '...'
                }
            },
            '_meta': {
                'hostvars': {
                    CLOUD_SERVER: {
                        'ansible_user': 'root',
                        'ansible_ssh_pass': self.secrets['cloud_ssh_pass'],
                        'domain': 'hexf.me',
                        'dns_records': config.dns.DNSRecord.make([
                            ('MX', 10, '...'),
                            ('A', 'other_record', HOME_SERVER),
                            ...

                            ('A', 'milkyway', CLOUD_SERVER),
                            ('A', '@', CLOUD_SERVER),
                            ...
                        ]),
                        'ldap_password': self.secrets['ldap_password']
                    }
                }
            }
        }

    def read_cli_args(self):
        parser = argparse.ArgumentParser()
        parser.add_argument('--list', action='store_true')
        parser.add_argument('--host', action='store')

        self.args = parser.parse_args()

MyInventory()

The keen eyed of you might have noticed that I have my own secrets engine. I am fully aware of Ansible Vault, but couldn't find a way to integrate these secrets, and then my secrets engine was born. It takes a JSON object, where all the values are base64 strings, and decrypts them using AES256. The key is stored in secret.key on my computer, but the git hook passes the secret in with an environment variable.

First comes the inventory, then the playbook. My playbook is broken up into many roles, each role for a different application (nginx, grav, cgit, bind, ldap, etc.) then united all with magical tags. Playbooks support tagging tasks, this allows you to only run a portion of the playbook. This is useful for when you want to only update the web or dns portions of your servers without waiting for all the other tasks to complete. My main playbook, called deploy.yaml contains the following, instructing roles to be run whenever the tag is present

---
- hosts: master
  become: true
  tasks:
    - import_role:
        name: ldap
      tags: sso
    - import_role:
        name: nginx
      tags: web
    - import_role:
        name: bind
      tags: dns
    - import_role:
        name: git
      tags: git
    - import_role:
        name: grav
      tags: blog

Each of the roles are configured to install and configure their associated application (i.e. nginx installs and configures NGINX). These roles are a collection of YAML files and templates which together can install the application and generate. To demonstate this, lets look at the certbot role I created to manage my SSL certificates (I get wildcard certificates from LetsEncrypt).

Directory tree of certbot

Here we see I have 3 different folders nested under the certbot role - meta, tasks and templates The meta folder contains a single main.yaml file, which lists out the dependencies of certbot, which in this case is bind - the DNS server.

---
dependencies:
  - role: bind

Next up is the tasks, which are the actions we should take to install and configure certbot. I have 2 files here, the debian.yaml file basically just installs certbot and all the plugins we need (with apt, snap sucks!). The main.yaml file is where all the jucy configuration goodness comes out.


---
- name: Debian Based Systems
  import_tasks: debian.yaml
  when: ansible_facts['os_family']|lower == "debian"

- name: Slurp key file
  slurp:
    src: /etc/bind/letsencrypt_wildcard.key
  register: letsencrypt_key

- name: Write Credentials
  template:
    src: credentials.ini.j2
    dest: /etc/letsencrypt/credentials.ini
    owner: root
    group: root
    mode: '0640'

- name: Generate Certificates
  command:
    cmd: certbot certonly --dns-rfc2136 --dns-rfc2136-propagation-seconds 300 --dns-rfc2136-credentials /etc/letsencrypt/credentials.ini -n --agree-tos --email {{ webmaster_email }} -d {{ domain }} -d *.{{ domain }}
    creates: /etc/letsencrypt/live/{{ domain }}/fullchain.pem

- name: Add automatic renewal of certificates
  cron:
    name: Certbot Renew
    minute: '0'
    hour: '0'
    job: '/usr/bin/certbot renew --quiet'

These task files, as the name implies, is a list of tasks that should be run by the role. Here I have the certbot one install and configure certbot, but if you can do it manually in a command line, Ansible probably can automate it.

Basically, we firstly install certbot by calling our debian.yaml file, then we slurp up the LetsEncrypt key that BIND generated for us, and parse it out into a configuration file, then finally we run certbot and create a cron job to renew our certificates. Here we also have the wonderful usage of templates. Templates are as the name implies, templates for files. We can use the template task to render this template. In this case I use the credentials.ini.j2 template to render out a credentials file for certbot.

dns_rfc2136_server = 127.0.0.1
dns_rfc2136_port = 53

dns_rfc2136_name = {{ letsencrypt_key['content'] | b64decode | regex_search('key \"(.+)\"') | regex_replace('key \"(.+)\"', '\\1') }}
dns_rfc2136_secret = {{ letsencrypt_key['content'] | b64decode | regex_search('secret \"(.+)\"') | regex_replace('secret \"(.+)\"', '\\1') }}
dns_rfc2136_algorithm = {{ letsencrypt_key['content'] | b64decode | regex_search('algorithm (.+);') | regex_replace('algorithm (.+);', '\\1') | upper }}

You can see here, we have these special commands in the {{ }} which we can run special commands in, which we use to extract different parts of the key file. All together, these components form a very basic role for the playbook.

Now, to deploy this playbook I wrote a simple deploy script that wraps the ansible-playbook command and takes 1 parameters, either all or a list of tags. Its like this so if I want to update everything I just run ./deploy.sh all or if I just want to update the DNS, ./deploy.sh dns


all_tags(){
    ansible-playbook deploy.yaml --list-tags 2>&1 | grep "TASK TAGS:" | cut -d '[' -f2 | cut -d ']' -f1 | sed 's/ //g'
}

ALL_TAGS=$(all_tags)

run_play(){
    echo "Running play with tags: $1"
    ansible-playbook deploy.yaml -i inventory/inventory.py --tags "$1"
}

[[ $1 = "all" ]] && run_play $ALL_TAGS || run_play $1

Now, its fine if we can run this script locally and deploy, but that is way to hard for me, I want it to be automatic, and happen when I run my git push. To do this, we could use something like Jenkins, CircleCI or GitLab, but as my server is very limited on resources, we wont be using any of these (more on this in a later part). However, we are in luck as git has us covered, with Git Hooks. In short, a git hook is an action that is taken when one of the events are triggered. The one hook (or event) we are intrested in is the post-receive hook. This hook allows us to run a script server side, when a git push has completed its upload, which is exactly what we need.

To create a git hook, we need to connect onto the git server, and create a file in the hooks directory in the git repo. In my case, this would be /srv/git/galaxy.git/hooks/post-recieve. This file will be executed on the post-recieve hook, make sure to mark it excecutable and put the appropriate shebang in the file. In my case, I used a simple bash script, but you could use anything, from python to C++ (make sure to compile it). Git Hooks pass parameters through stdin, so we just need to read those out then execute based on them. Note that some hooks may also provide command line arguments. My script is below for those who want to take it.

#!/bin/bash

# /srv/git/galaxy.git/hooks/post-recieve

while read oldrev newrev ref
do
        case "$ref" in
                "refs/heads/master")
                local tmp_dir=$(mktemp -d -t cd-XXXXXXXXXX)
                git --work-tree=$tmp_dir --git-dir=$(pwd) checkout -f master
                pushd $tmp_dir
                export ANSIBLE_HOST_KEY_CHECKING=False
                SECRETS_KEY="no thank you :)" ./deploy.sh all
                popd
                rm -rf $tmp_dir
                ;;
                *)
                echo "No rule to deploy $ref"
                echo "old: $oldrev new: $newrev ref: $ref"
        esac
done

This script checks that we have made a push to the master branch, then checks out the master branch to a temp directory, runs deploy.sh all and then cleans up. Simple as that.

My favourite part about using git hooks vs Jenkins/GitLab is the fact I can see the playbook run infront of my very own eyes, from the command line that it should be running in, where as in contrast with Jenkins I have to log into a web portal. Now, color doesn't come through but that is fine, atleast when I run git push I see what happend right away. Output showing in terminal

You may find yourself saying - all this is cool, but why would I want this? To help answer that, picture this scenario. Your company maintains thousands of NGINX servers, each one with the config controlled by Ansible. Now imagine you needed to add a new domain to your nodes, so you update the config and redeploy. But what happens if you broke everything - you now have to spend time debugging the issue and fixing your config. GitOps allows you to see who broke it, and to revert it by reverting the commit, bringing everything back to working order.

You could also imagine wanting to move this server somewhere else, you can spin up a new one by adding a new server to the inventory, and spinning up an idencial server. All you have to do now is copy over user data, or if you have a stateless application like NGINX you don't have to copy anything - change your DNS records and your good to go.

All together, GitOps combined with Ansible is a wonderful way to manage servers. While using Ansible may take longer than setting up 1 server manually, I think its good practice to get into the habit of GitOps-ing everything for when I get a job (flick me an email if you wanna hire me 🙂). GitOps allows for essentially version control on servers, and makes it easy to track who did what, and when. Ansible is a wonderful tool for managing everything from single server deployments to multi-million server deployments, and is very exensible and is actively maintained.

Hopefully this gave you insight into how I use GitOps with Galaxy.

Previous Post Next Post