Ingress Controller


This blog focuses on an Ingress controller. Specifically, it explains what an Ingress controller is and how it works, with a major focus on understanding its functionality by implementing one in Golang as a demonstration.

Ingress controllers route traffic inside Kubernetes’s isolated Software Defined Network. They exist as a necessity to manage traffic without manual intervention from engineers who would otherwise have to configure various instances and services to accomplish this. They are usually used to expose HTTP-based applications with limited availability, such as internal services or development versions. While the concept allows for the management of production-type workloads, it is very rare to see such a scenario in the wild. I will talk about this in more detail down the line.

Ingress controllers, in a sense, are widely misunderstood. A lot of engineers by default associate them with Nginx running inside Kubernetes. As always, engineers simply configure a service and treat it as a magical black box doing its own thing.

In actuality, an Ingress controller is a program running as a watcher on Ingress objects, analyzing them and configuring another program to accommodate the requested operations. They, in a sense, act as a translator between the Ingress object and a service designated for application routing.

In essence, they are basically running k get ingress -A -o yaml -w and configuring the ingress service (e.g., Nginx) accordingly. Here we can split the explanation into two parts: one where we discuss the Ingress controller as a program that reads Ingress objects, and the second part where we discuss the ingress service responsible for actually routing application traffic.

We can start with the ingress: a service responsible for routing traffic inside the SDN. There are so many resources out there discussing this topic that I really don’t want to go into detail, but I will create a scene for further research. So, how do ingresses do their job?

Well, ingresses like Nginx and HAProxy are basically acting as man-in-the-middle devices. They ingest traffic, analyze it, and route it based on their configuration. All of this is possible because HTTP is the standard behind how modern applications communicate. HTTP defines a standardized way of establishing communication, and since we have a standardized method, we can build services capable of understanding that standard.

For example, most of the time web applications are routed using their URLs, like bookie.kubelius.com and enforce.kubelius.com. They each have their own distinctive identifiers. When a client accesses either of them, it will set a HOST header with the value set to the respective address. The ingress reads this value from the expected HOST header and routes the request based on preconfigured pattern matching. Then it opens a connection to the matched server and dumps all received data to the server as a client. When it receives a response, it dumps it into the opened TCP session waiting for a response.

What does this look like from a configuration perspective?

Here is an example on HAProxy:

frontend http
  listen *:80
  
  acl bookie hdr(host) -i bookie.kubelius.com
  acl enforce hdr(host) -i enforce.kubelius.com

  use_backend bookie_servers if bookie
  use_backend enforcer_servers if enforcer

backend bookie_servers
  server bookie-01 10.10.10.101:80
  server bookie-02 10.10.10.102:80

backend enforcer_servers
  server enforcer-01 10.10.10.151:80
  server enforcer-02 10.10.10.152:80

If we want to further split up behavior, we can match other information inside the HTTP request — like methods, paths, source, destination, port, etc. All of this is provided by HAProxy configuration.

  acl match_list path_beg /list
  acl match_metod method GET

Standard here is the key — it enables all of this. Otherwise, we would have to write something at a very low level, basically for each application that will ever be written.

HAProxy or similar services simply take a request, understand it, and do whatever they want with it: send it, drop it, reject it, modify it then send it, etc.

This means these services sitting in the middle have full understanding of the request, including any sensitive data like passwords — and yes, this is true even in the case of encryption.

So how does encryption happen here? Well, there are multiple ways. We can encrypt traffic to HAProxy — it will still receive and decrypt it, but traffic in transit to HAProxy will be secure. Will it encrypt it to the downstream service? Well, that depends if it wants to. The client itself has no control over this — the only encryption the client participates in is the one with HAProxy. This also means the actual certificate with a trusted signer is held by HAProxy.

There is also a second way of encrypting traffic. By giving up opening the traffic, we can still route it based on some limited information, like source IP, destination IP, destination port, and so on. This is how we can route any TCP-based application. In this scenario, we don’t decrypt and analyze the packet — we simply act on non-encrypted information inside the packet to deliver it. In most cases, this information will also include the SNI(Server Name Identification), which acts similarly to the HOST header.

Now let’s talk about the Ingress controller. Again, this is just an application which watches Ingress objects and translates them into configurations for the respective Ingress. How does it watch them? Well, simple — through the KubeAPI, the same way you would get Ingress objects by running k get ingress. How does it translate them? Nothing magical about it — just a bundle of conditions written by some guy.

This raises a question: if everything is running the way it is described here, then how do we have a standard Ingress object that behaves the same regardless of which Ingress controller we install? Well, the answer is simple: all of that is an illusion. It is not standard, and most of the time, it will be very different from one Ingress controller to another. Basic things will more or less work as expected — like path and method matching — but even then, you’ll run into a lot of connection timeouts and disconnect issues, as the underlying technology will enforce its own default configuration into your magical Ingress declaration to fill gaps left by assumptions about how much minimum information their targeted service needs to route traffic.

If you’re still thinking, “But there are still similarities, like Ingress controllers running as pods inside Kubernetes” — no. Again, that is just an illusion. Based on what I wrote, there are no restrictions on where they run or how.

Ingress controllers just need access to the KubeAPI — to some specific objects like Ingress, Endpoints, and other similar ones. Because at the end of the day, it’s just reading user-introduced declarative config and making decisions based on that. So does it need to run inside Kubernetes? No. Will it? Yes, in most cases — but out of convenience, not necessity.

How will traffic be routed to containers if the controller isn’t sitting inside Kubernetes? The Ingress controller is not routing traffic — the Ingress is. And no, the Ingress also does not need to run inside Kubernetes.

The Ingress instance has a few ways of routing traffic to the pod. One way is to run inside the pod — this gives it the ability to send traffic inside the SDN. To receive traffic, it has to either access the host’s network and listen on a port directly, via Service NodePort, PAT or somthing simillar…

Another way is to participate in the SDN, similar to how other Kubernetes nodes are participating — by encapsulating traffic and sending it to nodes, where it will get decapsulated and routed to the expected pod.

Both of these scenarios are very simple, and I do understand the heavy abstractions I dropped along the way, which will most definitely terrify some engineers — like, “Network? Is he talking about network? Burn the witch!” Well, I really cannot avoid it, nor do I want to. Let’s just hope some demonstrations along the way will shed some light on how all of this works.