HA Traefik Cluster as a Kubernetes Ingress Controller w/ Consul Backend
15 September, 2018 - 5 min read
One of the things I've spent the most time on in the last year is configuring Ingress Controller Managers for Kubernetes. Each offering presents its own unique challenges. In this article I would like to give you a brief introduction to using Traefik as an Ingress Controller for K8S, introduce my tech stack regarding this specific implementation, tell you how we triggered a cascading failure along the way to HA and how I solved it from the other side of the world at 3 am off-my-face, address a key adoption issue the developers fail to recognize - and how to get around it, and finally present a lesson in early adoption. Hopefully by spending 10 minutes reading this article I can either help you succeed in rolling out a similar load balancing strategy for your K8S services, or steer you in another direction all together. Either way, 10 minutes reading here may save you hours of headaches later. What have you got to lose?
Let me start off by saying that I considered using both NGINX and HAproxy, default industry standards. In both cases I found that you still need to run certbot to request certificates, you need a source of truth to create and add new vhosts to the config, you need to store and reconcile that list, on and on. While most of our certificates are wildcards, every now and again we get a new domain, and I didn't want to write a separate pipeline for requesting an SSL certificate, or start worrying about cronjobs for individual renewals, blah blah blah. I've used Certbot to renew certificates flawlessly for various websites running on dedicated virtual machines for years, for free. However, currently I am a one man R&D shop in a company of 9 individuals top to bottom. Simplicity is key. And we are not hosting websites on dedicated virtual machines anymore. We're in K8S territory now, and a new paradigm requires a fresh look at the possibilities. Note that I've done Kuberentes the Hard Way and plenty of other ways (kops, kubeadm, k9s, k3s, minikube, now I just Terraform GKE clusters). But when it comes down to it, I don't have the time, or patience enough to toil over managing the master nodes and their services while also responsible for all of our infrastructure, end to end. So, I subscribe to managed offerings. It's important to know that the details I lay out are likely specific to my use case, and may not work for yours. However, I am going to try to keep it generic and provide an overview, rather than granular specifics of my implementation.
Fumbling around on Reddit, Medium, and a few chat groups, more than one suggestion pointed me towards Traefik. Out of habit I instinctively navigated directly to their GitHub repository and was stunned by their tag line, "Traefik is a modern HTTP reverse proxy and load balancer that makes deploying microservices easy. Traefik integrates with your existing infrastructure components (Docker, Swarm mode, Kubernetes, Marathon, Consul, Etcd, Rancher, Amazon ECS, ...) and configures itself automatically and dynamically. Pointing Traefik at your orchestrator should be the only configuration step you need." While I am a skeptic by nature, and this sounded way too good to be true, It certainly was worth a shot. First things first, run the container locally and poke around. Navigating to Docker Hub I found they had an alpine release. I have to be able to explore the file system to get a good feel for how everything is wired up. The documentation was nicely presented with clear and concise directions for getting a container up and running. I can read the Dockerfile and understand every step, there is no bloat, and no bullshit. Navigating back to the repository, I started looking at a few issues. Like any open source project there were a myriad of them available to dissect, but nothing that broke their mission statement, and the company was actively responding to issues. No excuses left, pull the container down, give it a minimal config and access to the docker socket, start exploring. After the test drive, it may be an understatement to say that I was sold. Reading the documentation as to how to get traefik to dynamically wire up the front and backend services in kubernetes took a single line in a config file, or better yet a '--kubernetes' flag on the startup command. Holy hell, I thought I died and went to heaven. Time for a real test.
Already having a few web accessible deployments running in my k8s cluster, I had most everything I needed already wired up. A bit more reading revealed that I needed to submit a new "IngressClass" object to my local k8s api. After that I could simply refer to any ingress objects I wanted Traefik to monitor by adding an annotation referring to the new ingress.class "traefik". Kubectl apply and I'm up and running. I already had an external load balancer pointing to the service object in question. Traefik handling the routing seemed to have no effect whatsoever, but looking at traceroute revealed the truth, the request was relayed through Traefik. Okay, but no SSL. Back to the documentation. Update the configmap to add the [acme] properties, let's use the DNS challenge so we can enforce https right away, get that middleware in place. Bounce the pod, check the logs, and sure enough the DNS challenge was already happening! Scored a valid certificate and served it up shy of 3 minutes. Nice.