Generating Envoy Config with Cue - Schema Definition
More in this series:
- Schema Definition
- Transforming Inputs
- Testing
- Language Refinements (coming soon)
- Docker Packaging (coming soon)
A few weeks ago I added an Envoy-based reverse proxy to a project at work. Envoy's configuration format is well-documented, well-structured and mostly sane, but its complexity is commensurate with the power of Envoy itself. To provide a single format for all Envoy features, Envoy forgoes expressivity in favour of flexibility. I found myself wanting a simpler configuration format focused only on my reverse proxy use case. Enter Cue.
In this series we'll see how to use Cue to create a simple config format for reverse proxying, how to transform that format into valid Envoy config, how to test that transform, how to add refinements to our language, and then finally how to package the whole thing up inside a Docker container.
Very little about this series is specific to Envoy and my hope is that you can apply the techniques to your own configuration problems.
A Simple Reverse Proxy
Consider a small reverse proxy setup with three virtual hosts, four backend services and six routes connecting them:
I've omitted the paths for the routes because the image was really messy, but I hope you get the idea. Configuring this system in Envoy requires (roughly) the following configuration:
static_resources:
listeners:
- name: http
address:
socket_address:
address: 0.0.0.0
port_value: 80
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
'@type': type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
access_log:
- name: envoy.access_loggers.stdout
typed_config:
'@type': type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
http_filters:
- name: envoy.filters.http.router
route_config:
name: local_route
virtual_hosts:
- name: api.test
domains:
- api.test
routes:
- match:
prefix: /api/v2/users
route:
cluster: user-service
- match:
prefix: /api/v2
route:
cluster: api-service
- match:
prefix: /
route:
cluster: monolith
- name: web.test
domains:
- web.test
routes:
- match:
prefix: /users
route:
cluster: frontend-users
- match:
prefix: /
route:
cluster: monolith
- name: admin.test
domains:
- admin.test
routes:
- match:
prefix: /
route:
cluster: monolith
clusters:
- name: user-service
connect_timeout: 15s
type: strict_dns
load_assignment:
cluster_name: user-service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: user-service
port_value: 8080
- name: api-service
connect_timeout: 15s
type: strict_dns
load_assignment:
cluster_name: api-service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: api-service
port_value: 8080
- name: frontend-users
connect_timeout: 15s
type: strict_dns
load_assignment:
cluster_name: frontend-users
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: frontend-users
port_value: 8080
- name: monolith
connect_timeout: 15s
type: strict_dns
load_assignment:
cluster_name: monolith
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: monolith
port_value: 8080
A Cue Format for Reverse Proxying
Let's take a look at the same configuration in the custom format we'll build in this series:
package envoy
input: {
hosts: {
"api.test": {
routes: [
{prefix: "/api/v2/users", target: "user-service"},
{prefix: "/api/v2", target: "api-service"},
{prefix: "/", target: "monolith"},
]
}
"web.test": {
routes: [
{prefix: "/users", target: "frontend-users"},
{prefix: "/", target: "monolith"},
]
}
"admin.test": {
routes: [
{prefix: "/", target: "monolith"},
]
}
}
targets: [
{name: "user-service", port: 8080},
{name: "api-service", port: 8080},
{name: "frontend-users", port: 8080},
{name: "monolith", port: 8080},
]
}
I think this format is easier to read and more clearly expresses the intent behind reverse proxy configuration than the raw envoy.yaml
.
The Input Schema
We'll start by defining a schema for our input format. The input schema serves two purposes. Firstly, it constrains and validates the input data. Secondly, it provides the structural information that our transform logic will use to generate the envoy.yaml
.
If you plan to use your data format a lot, and especially if you plan to share it with your team, it's worth putting some effort into the schema. The more constrained your schema, the more valuable the cue vet
tool becomes while working on your configuration.
The Cue language is order-independent which makes it easy to craft schemas in top-down rather than bottom-up fashion. Let's start our schema by definining the input
element and its two children hosts
and targets
:
package envoy
input: #InputSchema
#InputSchema: {
hosts: [#VHostName]: #VHost
targets: #Targets
}
This schema defines input
- via #InputSchema
- to be a struct (an object in YAML or JSON) with two required fields hosts
and targets
.
Configuring Targets
The targets
field is a list with at least one #Target
element:
#Targets: [#Target, ...#Target]
We could define a field like targets
as [...#Target]
making an empty list valid, but I like the extra validation: we need at least one target otherwise the config is incomplete.
Each #Target
is a struct with exactly two required fields name
and port
:
#Target: {name: #TargetName, port: >0 & <= 65_535}
#TargetName: string
The port
field is defined as >0 & <= 65_535
which is the type of all numbers greater than zero and less than or equal to 65,535. You could choose to disallow privileged ports by setting port
to >1024 & <= 65_535
.
Configuring Hosts
The hosts
field is a struct whose keys are of type #VHostName
and whose values are of type #VHost
. We've set #VHostName
to be string
. This indirection is entirely optional. We could just use string
directly for the keys inside hosts
, but I like to assign a descriptive name to types where possible.
#VHostName: string
#VHost: {
routes: [#Route, ...#Route]
}
A #VHost
has a routes
field which contains at least one #Route
. Again, we could choose to allow routes
to be an empty list but, since a virtual host isn't much good without at least one route, I prefer to require at least one route.
Defining Routes
The #Route
type is the most interesting part of the input schema:
#Route: #PathRoute | #PrefixRoute | #RegexRoute
#PathRoute: {path: #Path, target: #ValidTargetName}
#PrefixRoute: {prefix: #Prefix, target: #ValidTargetName}
#RegexRoute: {regex: #Regex, target: #ValidTargetName}
We have three kinds of route: path, prefix and regex. Each kind of route looks at the path of the incoming request. If the route 'matches' then the request is proxied to the configured target.
The meaning of 'matches' differs by route type. For path routes its an exact match against the incoming request path. For prefix routes, if the configured prefix is a prefix of the incoming request path then we have a match. For regex routes, if the configured regex matches the incoming request path then we have a match.
The #Path
, #Prefix
and #Regex
types extract what each of these inputs can look like:
#Prefix: =~ "\\^?/[/A-Za-z\\-]*"
#Path: =~ "/[/A-Za-z\\-]*"
#Regex: string
For #Prefix
and #Path
we could have used plain string
. However, a valid path always starts with /
and, by using a regex, we can encapsulate that constraint in our schema.
I wasn't brave enought to tackle writing a regex to match other regexes so let's just leave regex
as the string
type!
It's worth highlighting here the commonality between an expression like >0
and a regex. The type >0
further constrains the number
type and a regex further constrains the string
type. Cue uses a lattice-based typing system which is worthy of a series of posts on its own, but you can read more here.
Pointing Routes at Targets
The route types have a target
field with type #ValidTargetName
. We could have defined target
with type #TargetName
but that leaves open the possibility that routes point to 'targets' that are not actually defined in the configuration.
With the #ValidTargetName
type we constrain routes to point only at targets that are defined in the configuration:
#ValidTargetName: or([ for t in input.targets {t.name} ])
The or
builtin takes a list and turns it into a disjunction type, that is a type whose values are taken from a fixed set. For example, the disjunction type 1 | 2
has values 1
and 2
.
In our case, the or
turns the list of targets into the type user-service | api-service | frontend-users | monolith
.
Validating Input with the Schema
With our schema in place, we can validate our proposed input format. With the input data in input.cue
and the schema in schema.cue
:
❯ cue vet input.cue schema.cue && echo "VALID"
VALID
Let's try breaking the config by giving one of our targets an invalid port number:
targets: [
{name: "user-service", port: 8080},
{name: "api-service", port: 8080},
{name: "frontend-users", port: 8080},
{name: "monolith", port: 88888080},
]
Vetting the input gives us the expected error:
❯ cue vet input.cue schema.cue && echo "VALID"
#VHost.routes.0: 1 errors in empty disjunction:
input.targets.3.port: invalid value 88888080 (out of bound <=65535):
./schema.cue:18:41
./input.cue:28:34
What's Next?
With our input schema in place, we can now move on to transformation logic that will convert our input data into a valid envoy.yaml
file. More on that in part two.