Rob Harrop

Generating Envoy Config with Cue - Schema Definition

More in this series:

  1. Schema Definition
  2. Transforming Inputs
  3. Testing
  4. Language Refinements (coming soon)
  5. Docker Packaging (coming soon)

A few weeks ago I added an Envoy-based reverse proxy to a project at work. Envoy's configuration format is well-documented, well-structured and mostly sane, but its complexity is commensurate with the power of Envoy itself. To provide a single format for all Envoy features, Envoy forgoes expressivity in favour of flexibility. I found myself wanting a simpler configuration format focused only on my reverse proxy use case. Enter Cue.

In this series we'll see how to use Cue to create a simple config format for reverse proxying, how to transform that format into valid Envoy config, how to test that transform, how to add refinements to our language, and then finally how to package the whole thing up inside a Docker container.

Very little about this series is specific to Envoy and my hope is that you can apply the techniques to your own configuration problems.

A Simple Reverse Proxy

Consider a small reverse proxy setup with three virtual hosts, four backend services and six routes connecting them:

A basic Envoy reverse proxy setup

I've omitted the paths for the routes because the image was really messy, but I hope you get the idea. Configuring this system in Envoy requires (roughly) the following configuration:

static_resources:
listeners:
- name: http
address:
socket_address:
address: 0.0.0.0
port_value: 80
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
'@type': type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
access_log:
- name: envoy.access_loggers.stdout
typed_config:
'@type': type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
http_filters:
- name: envoy.filters.http.router
route_config:
name: local_route
virtual_hosts:
- name: api.test
domains:
- api.test
routes:
- match:
prefix: /api/v2/users
route:
cluster: user-service
- match:
prefix: /api/v2
route:
cluster: api-service
- match:
prefix: /
route:
cluster: monolith
- name: web.test
domains:
- web.test
routes:
- match:
prefix: /users
route:
cluster: frontend-users
- match:
prefix: /
route:
cluster: monolith
- name: admin.test
domains:
- admin.test
routes:
- match:
prefix: /
route:
cluster: monolith
clusters:
- name: user-service
connect_timeout: 15s
type: strict_dns
load_assignment:
cluster_name: user-service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: user-service
port_value: 8080
- name: api-service
connect_timeout: 15s
type: strict_dns
load_assignment:
cluster_name: api-service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: api-service
port_value: 8080
- name: frontend-users
connect_timeout: 15s
type: strict_dns
load_assignment:
cluster_name: frontend-users
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: frontend-users
port_value: 8080
- name: monolith
connect_timeout: 15s
type: strict_dns
load_assignment:
cluster_name: monolith
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: monolith
port_value: 8080

A Cue Format for Reverse Proxying

Let's take a look at the same configuration in the custom format we'll build in this series:

package envoy

input: {
hosts: {
"api.test": {
routes: [
{prefix: "/api/v2/users", target: "user-service"},
{prefix: "/api/v2", target: "api-service"},
{prefix: "/", target: "monolith"},
]
}
"web.test": {
routes: [
{prefix: "/users", target: "frontend-users"},
{prefix: "/", target: "monolith"},
]
}
"admin.test": {
routes: [
{prefix: "/", target: "monolith"},
]
}
}
targets: [
{name: "user-service", port: 8080},
{name: "api-service", port: 8080},
{name: "frontend-users", port: 8080},
{name: "monolith", port: 8080},
]
}

I think this format is easier to read and more clearly expresses the intent behind reverse proxy configuration than the raw envoy.yaml.

The Input Schema

We'll start by defining a schema for our input format. The input schema serves two purposes. Firstly, it constrains and validates the input data. Secondly, it provides the structural information that our transform logic will use to generate the envoy.yaml.

If you plan to use your data format a lot, and especially if you plan to share it with your team, it's worth putting some effort into the schema. The more constrained your schema, the more valuable the cue vet tool becomes while working on your configuration.

The Cue language is order-independent which makes it easy to craft schemas in top-down rather than bottom-up fashion. Let's start our schema by definining the input element and its two children hosts and targets:

package envoy

input: #InputSchema

#InputSchema: {
hosts: [#VHostName]: #VHost
targets: #Targets
}

This schema defines input - via #InputSchema - to be a struct (an object in YAML or JSON) with two required fields hosts and targets.

Configuring Targets

The targets field is a list with at least one #Target element:

#Targets: [#Target, ...#Target]

We could define a field like targets as [...#Target] making an empty list valid, but I like the extra validation: we need at least one target otherwise the config is incomplete.

Each #Target is a struct with exactly two required fields name and port:

#Target: {name: #TargetName, port: >0 & <= 65_535}
#TargetName: string

The port field is defined as >0 & <= 65_535 which is the type of all numbers greater than zero and less than or equal to 65,535. You could choose to disallow privileged ports by setting port to >1024 & <= 65_535.

Configuring Hosts

The hosts field is a struct whose keys are of type #VHostName and whose values are of type #VHost. We've set #VHostName to be string. This indirection is entirely optional. We could just use string directly for the keys inside hosts, but I like to assign a descriptive name to types where possible.

#VHostName: string
#VHost: {
routes: [#Route, ...#Route]
}

A #VHost has a routes field which contains at least one #Route. Again, we could choose to allow routes to be an empty list but, since a virtual host isn't much good without at least one route, I prefer to require at least one route.

Defining Routes

The #Route type is the most interesting part of the input schema:

#Route: #PathRoute | #PrefixRoute | #RegexRoute

#PathRoute: {path: #Path, target: #ValidTargetName}
#PrefixRoute: {prefix: #Prefix, target: #ValidTargetName}
#RegexRoute: {regex: #Regex, target: #ValidTargetName}

We have three kinds of route: path, prefix and regex. Each kind of route looks at the path of the incoming request. If the route 'matches' then the request is proxied to the configured target.

The meaning of 'matches' differs by route type. For path routes its an exact match against the incoming request path. For prefix routes, if the configured prefix is a prefix of the incoming request path then we have a match. For regex routes, if the configured regex matches the incoming request path then we have a match.

The #Path, #Prefix and #Regex types extract what each of these inputs can look like:

#Prefix: =~ "\\^?/[/A-Za-z\\-]*"
#Path: =~ "/[/A-Za-z\\-]*"
#Regex: string

For #Prefix and #Path we could have used plain string. However, a valid path always starts with / and, by using a regex, we can encapsulate that constraint in our schema.

I wasn't brave enought to tackle writing a regex to match other regexes so let's just leave regex as the string type!

It's worth highlighting here the commonality between an expression like >0 and a regex. The type >0 further constrains the number type and a regex further constrains the string type. Cue uses a lattice-based typing system which is worthy of a series of posts on its own, but you can read more here.

Pointing Routes at Targets

The route types have a target field with type #ValidTargetName. We could have defined target with type #TargetName but that leaves open the possibility that routes point to 'targets' that are not actually defined in the configuration.

With the #ValidTargetName type we constrain routes to point only at targets that are defined in the configuration:

#ValidTargetName: or([ for t in input.targets {t.name} ])

The or builtin takes a list and turns it into a disjunction type, that is a type whose values are taken from a fixed set. For example, the disjunction type 1 | 2 has values 1 and 2.

In our case, the or turns the list of targets into the type user-service | api-service | frontend-users | monolith.

Validating Input with the Schema

With our schema in place, we can validate our proposed input format. With the input data in input.cue and the schema in schema.cue:

❯ cue vet input.cue schema.cue && echo "VALID"
VALID

Let's try breaking the config by giving one of our targets an invalid port number:

targets: [
{name: "user-service", port: 8080},
{name: "api-service", port: 8080},
{name: "frontend-users", port: 8080},
{name: "monolith", port: 88888080},
]

Vetting the input gives us the expected error:

❯ cue vet input.cue schema.cue && echo "VALID"
#VHost.routes.0: 1 errors in empty disjunction:
input.targets.3.port: invalid value 88888080 (out of bound <=65535):
./schema.cue:18:41
./input.cue:28:34

What's Next?

With our input schema in place, we can now move on to transformation logic that will convert our input data into a valid envoy.yaml file. More on that in part two.

Get the Code

Code for this post is available at: https://github.com/robharrop/cue-envoy/