DNS and Ingress Implementation#

To understand the design decisions made regarding DNS and Ingress implementation, we'll first explain a few Kubernetes objects relevant to this discussion.

Kubernetes Service#

A Kubernetes service is a way to expose an application running on a group of Pods as a network service. Since Pods get their own IP address in the cluster and the addresses can change as Pods come and go, Services provide a layer of abstraction.

A Service gets its own IP address at the time of creation and does not change for the life of the service. The traffic forwarding rules for a Service are updated dynamically as Pods created and deleted with requests always being forwarded towards a running Pod.

A Kubernetes component kube-proxy allows communication by managing forwarding rules using IPVS or iptables. A couple of Kubernetes Service types are explained below, namely,

ClusterIP (default)
NodePort

ClusterIP Service#

The default ClusterIP service exposes it on an internal IP of the cluster and is only reachable from within the cluster.

An example Service definition with the name sample-service that listens on TCP port 80 & forwards traffic to Pods with the label app: sample-app with a default Service Type of ClusterIP is defined below -

apiVersion: v1
kind: Service
metadata:
  name: sample-service
spec:
  selector:
    app: sample-app
  ports:
    - protocol: TCP
      port: 80
  type: ClusterIP

Diagram of a ClusterIP service forwarding requests for an App with two Pods — A Service with internal IP `10.96.0.1` listening on port `443` forwarding requests to two Pods for App A with Pod IP addresses `192.168.64.3` & `192.168.64.4` listening on port `6443`

NodePort Service#

Using a NodePort service causes all nodes in the cluster to listen on a specific port defined in the Service definition. Traffic received on that port is forwarded to the application, regardless of what node the application is running on, that uses the label defined in the selector. The port serves as a proxy to the application. If the specific port is undefined in the Service definition, Kubernetes chooses one between the range 30000-32767.

An example Service definition with the name sample-service that listens on TCP port 80 & forwards traffic to Pods with the label app: sample-app with a Service Type of NodePort (meaning, the nodes in the cluster listen on a specific port) is defined below -

apiVersion: v1
kind: Service
metadata:
  name: sample-service
spec:
  selector:
    app: sample-app
  ports:
    - port: 80
      targetPort: 80
  type: NodePort

Diagram of a NodePort service forwarding requests for an App running on only node 2, with nodes 1 and 3 forwarding the requests to the app running on node 2 — A NodePort Service listening on port `32168` forwarding requests to App running on node 2. Nodes 1 & 3 forward requests to the app running on node 2

With the use of a load balancer (LB) in front of the cluster and any hostname-based or path-based routing and/or TLS termination can be handled at the LB. The LB examines and forwards requests to the appropriate port on a node in the cluster.

Diagram of a load balancer forwarding hostname-based requests for multiple apps to their respective NodePort service in the cluster. A request for Host A goes to the NodePort A on the nodes, a request for Host B goes to the NodePort B on the nodes and so on. — A load balancer examines the request and forwards traffic to the appropriate NodePort defined by the service used by the app.

Kubernetes Ingress#

Ingress is a Kubernetes object that allows for definition for HTTP-based routing rules to forward traffic to services. The Ingress definition only includes the configuration of these rules, while a separate component called an Ingress Controller watches multiple Ingress objects and updates its own configuration as the Ingress objects are created or deleted or updated.

An example Ingress definition with the name example-ingress with a rule for HTTP requests to host foo.vt.edu be forwarded to a service app1 on port 80.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-ingress
spec:
  rules:
    - host: foo.vt.edu
      http:
        paths:
          - backend:
              serviceName: app1
              servicePort: 80

Proxy Ingress Controllers#

A Proxy Ingress Controller, such as a Traefik or Nginx Ingress Controller, is itself a reverse proxy and exposed as a NodePort service. This controller then watches for Ingress objects and updates its routing configuration based on the rules defined in those Ingress objects. The traffic received by the controller is hence forwarded to the respective ClusterIP services in the backend.

Using a load balancer in front of the cluster, HTTP routing is moved into the cluster with the load balancer sending traffic to nodes in the cluster using TCP passthrough & TLS termination occurring within the cluster.

Diagram of a network load balancer forwarding requests to the Ingress Proxy exposed as NodePort services in the cluster on ports 30080 and 30443. The Ingress Proxy then forwards the request to the ClusterIP service defined for App1. — A network load balancer examines the request and forwards traffic to the Ingress Proxy exposed as a NodePort service, which in turn forwards the request to the ClusterIP service used by the app.

Application Load Balancer Ingress Controllers#

Compared to Proxy Ingress Controllers, an Application Load Balancer (ALB) Ingress Controller does not run as a reverse proxy in the cluster. The controller still watches for Ingress objects but interacts with the AWS API to create and configure ALBs external to the cluster which forwards traffic into the cluster.

By default, each Ingress object creates its own ALB. The Ingress object defines forwarding rules and these rules in turn create a separate target group in the ALB for forwarding traffic to either the correct Instance or IP address as explained below -

Instance mode: Traffic is forwarded to the correct EC2 instance where services are exposed with the type of NodePort. The ALB controller ensures the port mapping configuration.
IP mode: Traffic is sent directly to the pod without using a service. This mode requires using the AWS VPC CNI (Container Network Interface) driver so that pods can be given an Elastic Network Interface (ENI) and delegated an IP address in the VPC. A downside to this approach is running into ENI usage limits for a particular instance type.

The following example shows an Ingress object with three forwarding rules, namely, /*, /products, /accounts. The three rules create three target groups, ServiceA, ServiceB and ServiceC, respectively. Target groups ServiceA and ServiceB use Instance mode and forward traffic to services of type NodePort. Target group ServiceC uses IP mode and forwards traffic directly to the pods for the application. The ALB Ingress Controller also watches for Ingress objects and provisions ALBs.

Diagram of an application load balancer forwarding requests to either the exposed NodePort services in the cluster or directly to the application pods depending on the type of the Target Groups. Target Groups ServiceA and ServiceB forward traffic to the NodePort services and Target group ServiceC forwards traffic directly to the Pods. The alb-ingress-controller watches for Ingress objects by communicating with the Kubernetes API Server. — An application load balancer examines the request and forwards traffic to either the NodePort service or directly to the application Pods. The alb-ingress controller watches for Ingress objects and provisions ALBs as needed.

TLS/SSL Certificate Management#

In order to automate managing the provisioning, renewal and usage of certificates a few mechanisms exist.

InCommon#

InCommon provides identity and access services for many higher ed institutions, including Virginia Tech. All TLS certificates provided by Secure Identity Services (SIS) are issued by InCommon. InCommon does have ACME support, but requires authentication using External Account Binding (EAB). While the official ACME clients support EAB authentication, other tools that provide ACME/LetsEncrypt certificate support may have limited support.

LetsEncrypt#

LetsEncrypt is a public service that provides free and signed SSL/TLS certificates. It leverages the ACME protocol to request, prove ownership, and issue signed certificates.

AWS Certificate Manager#

AWS Certificate Manager (ACM) is an AWS service that provides free certificate storage and issuance. The private keys for all TLS certificates are stored securely and are in-accessible to customers, so unable to be used in custom applications. However, ACM certs can be leveraged in many other AWS services, such as all load balancers. AWS ACM does not leverage the ACME protocol, but has APIs and has support from other higher-level tools, such as Terraform.

Domain Validation Methods#

All automated certificate providers require validation that the requester has ownership of the requested names. There are three main methods used across the various providers.

HTTP-based challenge: The requester is given a token and is required to place it at a known location on the web server for the requested name. Once placed, the provider that attempts to retrieve the file.
DNS-based challenge: The requester is required to create a DNS record with a provider-provided name and value. Once created, the provider fetches the DNS record.
Email-based challenge: The provider sends emails to one or more email addresses for the requested name (or parent addresses) with a link to a website in which the receiver can validate the request.

The following table outlines the challenges which provider supports what challenge methods:

Provider	Challenge method
InCommon	HTTP, DNS
LetsEncrypt	HTTP, DNS
ACM	DNS, Email

DNS Capabilities#

At Virginia Tech#

At the time of writing, Virginia Tech’s DNS service is managed in two ways - centralized or delegated. For the vast majority of DNS records, DNS is managed by a core DNS team. Requests are made by designated network liaisons, validated by the DNS team, and then applied at the next restart of the nameservers. At a minimum, these restarts occur on every Tuesday and Thursday. History has shown they occur much more frequently.

In a limited number of instances, zone delegation has been authorized to allow DNS to be managed by external DNS providers, including AWS Route53. With very few exceptions, the delegated zones are under the *.cloud.vt.edu namespace. Once delegated, the owning AWS account can create, update, or delete records through the Route53 service and see them on the global internet almost instantly.

Of AWS#

DNS management in AWS is provided by the AWS Route53 service. To fully control the records, domains or subdomains are delegated from a parent zone. As mentioned earlier, most of the names have come from the *.cloud.vt.edu namespace.

A useful Route53 feature is the ability to create “alias” records. This allows a public name, such as app1.vt.edu, to resolve directly to the IP addresses used by a load balancer without using CNAME records. This saves a DNS round trip and helps “hide” the fact that an AWS load balancer is being used.

Route53 also supports geolocation-based responses, allowing customers in one location to get different DNS answers from those in another location. This would be important for applications that need quick response times and have customers across the country or worldwide.

Implementation Options#

There are three major problems to be solved when operating a Kubernetes cluster in the public cloud -

Getting traffic to the load balancer
Where TLS termination occurs

Getting traffic to the load balancer#

Getting traffic from the internet to the load balancer in front of the cluster is solved by managing DNS. To manage DNS, the Platform team has two options -

Controlling the requested names
Use CNAMEs

Controlling the requested names#

If the Platform team controls the DNS records for applications, we can ensure the names resolve to the AWS load balancers. If the name is delegated to AWS Route53, the application records can be alias records to the AWS load balancers in front of the cluster.

Diagram of three application alias records resolving to the AWS load balancers in front of the cluster. — DNS records app1.vt.edu, app2.vt.edu and app3.vt.edu are all alias records for the name of the AWS load balancers in front of the cluster.

The advantage of this approach being, changes to the application DNS records can be made instantaneously, without dependencies on external teams, if load balancers need to be recreated, rebalanced, etc.

While complete control of the records is great, this approach would increase costs with each hosted zone being created in Route53 as more and more applications move to the platform. Another problem might cause us to run into complex delegation structures where an application team might want one name to be used on the platform and another managed by VT. For example, app1.vt.edu is hosted on the platform, while app2.app1.vt.edu is hosted on premises.

Use CNAMEs#

The second option is for the app to use a CNAME to a cluster-level hostname, which points to the AWS load balancers in front of the cluster. The records could use the following pattern -

App record (app1.vt.edu) - has no A/AAAA records, but a CNAME record pointing to a cluster record
Cluster record (k8s-1.aws.clusters.platform.it.vt.edu) - this name would be managed in AWS and resolves to the load balancers in front of the cluster.

Diagram of an application CNAME record resolving to the cluster record which has an alias record for the AWS load balancers in front of the cluster. — DNS record app1.vt.edu has a CNAME for the cluster record, k8s-1.aws.clusters.platform.it.vt.edu, which has an alias records for the name of the AWS load balancers in front of the cluster.

This approach has several advantages compared to controlling records for applications. VT Hostmaster maintains control over the application records while the Platform team only maintains control over the cluster records. As a result, the number of delegated zones needed is greatly reduced making this design cheaper. Complex delegation structures are also supported because the Platform team does not own the application records.

The cluster record structure identifies the cluster (test, dev or prod) and a cloud provider. It also supports the possibility of multiple clusters in multiple clouds. The cloud-provider level also allows that zone to be delegated to that cloud’s DNS service, helping streamline DNS updates. The structure being - <cluster-identifier>.<cloud-provider>.clusters.platform.it.vt.edu.

Where TLS Termination occurs#

There are two main options for TLS termination -

External to the cluster
Internal to the cluster

External to the cluster#

For TLS termination occurring external to the cluster, it would have to be done on the AWS load balancers in front of the cluster. For supporting TLS termination on the load balancers, AWS requires that the certificates must be loaded in AWS Certificate Manager (ACM).

There are two options to get the certificates into ACM -

Use certificates provisioned by ACM
Import externally provisioned certificates into ACM

With AWS only supporting DNS and Email based challenges for domain validation, using ACM provisioned certificates would be complicated because there is no existing tooling for automating domain validation if application names are not delegated to the Platform team.

While automating of importing externally provisioned certificates into ACM is an option, service quota limits would be an issue with only 25 certificates being allowed to be associated with a load balancer. With every 26th certificate requiring a new load balancer, coordinating which app names should point to which load balancers and sharing of new load balancers with the ingress controller would be difficult.

Internal to the cluster#

For TLS termination occurring internal to the cluster, it would be accomplished at reverse proxies acting as ingress controllers. We can also use externally provisioned certificates by either InCommon or LetsEncrypt in this approach.

Existing tooling already exists to provision certificates and make them available to ingress controllers. Many proxies have the ability to provision the certs themselves. DNS and load balancer structure would also be more simpler. Downside being that private keys would be theoretically accessible on the file system but access to them can be locked down using policies.

Preferred methods for DNS and TLS termination#

Leveraging CNAMEs to get traffic to the cluster load balancers should be preferred. There are many valid reasons for the VT DNS team to keep tight control over DNS. Giving full ownership to the Platform Team for all app names is simply too risky.

While the ALB integrations theoretically have support to send traffic directly to a pod, it bypasses Kubernetes services, which removes many other integration points such as advanced routing mechanisms with Ingress.

Load balancers charge both a per-hour and per-usage cost. The amount of in-cluster infrastructure would remain the same to support in-cluster routing. Moving TLS termination external to the cluster would only increase costs, both in monthly infrastructure costs and in development time. As such, using “dumb” load balancers and having both TLS termination and routing within the cluster itself should be preferred.

Implementation Decision#

Combining the outcomes from the previous sections, we arrive at a Kubernetes platform that is commonly seen in many deployment environments. This model uses the CNAME approach to get traffic to the cluster and then supports TLS termination and routing at the ingress level within the cluster itself. This helps keep costs down, encourages TLS everywhere, and provides better support for Kubernetes-native tooling and application deployment models.

As such, the final structure could be represented using the diagram below. Note that while application-specific names (e.g., app1.vt.edu) will be fully supported, cluster-specific wildcarded “vanity URL” that teams can be used for quick usage and prototyping without needing to further configure DNS. That is also represented below.

Diagram of an application or vanity URL CNAME records resolving to the cluster record which has an alias record for the AWS load balancers in front of the cluster. TLS termination and traffic routing occurs within the cluster using Ingress Controller. Requests are being forwarded to App 1 running in the cluster. — DNS record app1.vt.edu has a CNAME for the cluster record, k8s-1.aws.clusters.platform.it.vt.edu, which has an alias records for the name of the AWS load balancers in front of the cluster. The Ingress Controller handles TLS termination and traffic routing internal to the cluster.