Exploring Config Validation with Cue
I recently discovered Cue which, according to its website, is a language 'for defining, generating, and validating all kinds of data'. My interest in Cue comes from wanting a single mechanism for validation, generation and boilerplate reduction that will work with the various flavours of YAML and JSON present in my projects. Some of these flavours may be familiar - Docker Compose, Kubernetes, GitHub Actions - and others are custom but probably not wildly different from custom formats you may have built yourself.
Something about Cue has really piqued my interest and, while I certainly don't profess to understand everything it can do, I have a reasonable grasp on how it fits my use cases of validation, generation and boilerplace reduction.
For the purposes of this post I'll use Docker Compose as the target format since it's reasonably easy to understand even if you've never used Compose before.
A Compose file specifies a set of services, each backed by one or more containers, that the
Compose orchestrator manages as a logical whole inside Docker. Consider this example docker-compose.yaml
:
services:
database:
image: postgres:14
cache:
image: redis:6.2
This is a Compose configuration defining two services, database
and cache
. The database
service
runs the pre-built image postgres:14
and the cache
service runs the pre-built image redis:6.2
.
We can start by defining a basic schema for this in schema.cue
:
services: [string]: #Service
#Service: {
image: string
}
This schema says that the key services
is an object with string
-typed keys and values that are typed like #Service
. The schema further defines #Service
to be an object with a single key image
that has a string
-typed
value.
With the cue vet
command we can check that our docker-compose.yaml
matches the schema:
❯ cue vet docker-compose.yaml schema.cue
Successful schema validation gives empty output so there's nothing much to show here!
❯ echo $?
0
Open vs. Closed Types
Not all services in a Compose project use pre-built images. Instead, some
services will build their image when the project starts. Such services are
defined using a build
key rather than an image
key:
services:
database:
image: postgres:14
cache:
image: redis:6.2
app:
build: ./app
We've added the app
service that uses build
, let's see what cue vet
says:
❯ cue vet docker-compose.yaml schema.cue
services.app: field not allowed: build:
./docker-compose.yaml:8:6
./schema.cue:1:21
./schema.cue:2:11
Cue is telling us that the build
field is not allowed.
When we define a type using the #Name
style we're defining a closed type.
Closed types specify the allowable fields exactly and open types allow fields
beyond those that are specified.
We could turn our #Service
type into an open type, but then we're really
reducing the effectiveness of the schema by allowing any extra field of any shape
to be added inside a service
definition. The image
field will still be verified
as per the schema, but extra fields are unchecked.
Let's add the build
field to our #Service
type and see what happens:
#Service: {
image: string
build: string
}
When we come to vet
this now, we get an error from Cue:
❯ cue vet docker-compose.yaml schema.cue
services.app.image: incomplete value string
services.cache.build: incomplete value string
services.database.build: incomplete value string
The schema as we've specified it requires both image
and build
for each service,
whereas a normal Compose file will only use one per service.
Optional Fields
Cue allows fields to be defined as optional using a ?
suffix on the field name. We might
be tempted to make both image
and build
optional like so:
#Service: {
image?: string
build?: string
}
Although these optional fields allow our docker-compose.yaml
to vet
correctly, we've
also made it valid for a service to have neither an image
nor a build
field! Thankfully,
Cue has a much cleaner solution to this problem: disjunctions.
Disjunctions
With disjunctions we're able to say that a type has either one set of fields or another.
This allows us to say that #Service
has either an image
or a build
:
#Service: {
image: string
} | {
build: string
}
This schema says that #Service
is either (|
) a type containing an image: string
field
or a type containing a build: string
field. Exactly one of these fields must be provided.
If we attempt to use both image
and build
like this:
services:
database:
image: postgres:14
cache:
image: redis:6.2
app:
build: ./app
image: app:latest
Then Cue complains:
❯ cue vet docker-compose.yaml schema.cue
services.app: 2 errors in empty disjunction:
services.app: field not allowed: build:
./docker-compose.yaml:8:6
./schema.cue:1:21
./schema.cue:4:21
./schema.cue:5:32
services.app: field not allowed: image:
./docker-compose.yaml:9:6
./schema.cue:1:21
./schema.cue:3:21
./schema.cue:5:11
I like to refactor this a little to give names to the different sides of the disjunction:
_ServiceWithBuild: { build: string }
_ServiceWithImage: { image: string }
#Service: (_ServiceWithBuild | _ServiceWithImage)
The _ServiceWithBuild
and _ServiceWithImage
declarations are private to the schema
(thanks to the _
) and open (thanks to the lack of #
). The resulting disjunction
#Service
is both a public part of the schema and is a closed type. This is covered in
more depth here.
Beyond Strings
So far we've only seen the basic string
type for our fields, but Cue has other
types like number
:
services: [string]: #Service
_ServiceWithBuild: { build: string }
_ServiceWithImage: { image: string }
#Service: (_ServiceWithBuild | _ServiceWithImage) & {
healthcheck?: {
retries: number
}
}
We've added an optional healthcheck
field to the service
type using a conjunction (&
).
This healthcheck
field is an object type with a retries
field that is of type number
.
If we try using a value for retries
that is non-numeric then we'll get an error:
services:
database:
image: postgres:14
cache:
image: redis:6.2
app:
build: ./app
healthcheck:
retries: ten
Thanks to the ten
here, Cue complains:
services.cache.healthcheck.retries: conflicting values "ten" and number (mismatched types string and number):
./docker-compose.yaml:8:17
./schema.cue:1:21
./schema.cue:7:18
Constraint Types
Cue uses a fancy lattice-based type system which leads to a natural way of introducing constraints
on the allowed values in a schema. Rather than saying that retries
is a number
we can say that
it is > 0
:
services: [string]: #Service
_ServiceWithBuild: { build: string }
_ServiceWithImage: { image: string }
#Service: (_ServiceWithBuild | _ServiceWithImage) & {
healthcheck?: {
retries: >0
}
}
Trying to set the value of retries
to 0
leads Cue to complain as expected:
services.app.healthcheck.retries: invalid value 0 (out of bound >0):
./schema.cue:7:18
./docker-compose.yaml:10:17
List Types
Services in Compose can depend on other services using the
depends_on
attribute:
services:
database:
image: postgres:14
cache:
image: redis:6.2
app:
build: ./app
healthcheck:
retries: 1
depends_on:
- database
- cache
Here we're saying the the app
service depends on both
the database
service and the cache
service. Let's add
depends_on
to our schema:
#Service: (_ServiceWithBuild | _ServiceWithImage) & {
healthcheck?: {
retries: >0
}
depends_on?: [string, ...string]
}
We've made depends_on
optional (with ?
) and given it the type
[string, ...string]
which is a list of strings with at least one
item. The type [string]
is a list with exactly one item and the
type [...string]
is a list with zero or more items.
Self Reference
We've added depends_on
to the schema but we still have a gap in
our validation logic: we allow any string to be present in the
dependency list. Ideally, we want to allow only the names of services
in the dependency list.
We can do this by defining a custom #Dependable
type for the elements
of our depends_on
list type:
services: [string]: #Service
#Dependable: or([for k,v in services { k }])
_ServiceWithBuild: { build: string }
_ServiceWithImage: { image: string }
#Service: (_ServiceWithBuild | _ServiceWithImage) & {
healthcheck?: {
retries: >0
}
depends_on?: [#Dependable, ...#Dependable]
}
The definition of depends_on
now points to the #Dependable
type which has a definition we've not seen so far. Let's break
it down step-by-step.
The stanza [for k,v in services { k }]
is a list comprehension expression
that iterates over the keys k
and values v
in our services
object. For
each key/value pair, the result of the brace-delimited block is added to the
output list. In our case, we get a list that looks like this: [database, cache, app]
.
The or(...)
expression is a distributed disjunction: it turns the list into
a disjunction type. In our case this gives us the disjoint type database | cache | app
: a string
of exactly one of those three values.
With this schema, if we make a mistake in our depends_on
list like this:
services:
database:
image: postgres:14
cache:
image: redis:6.2
app:
build: ./app
healthcheck:
retries: 1
depends_on:
- database
- cach
Then Cue will complain that cach
doesn't match the disjunction:
services.app.depends_on.1: 3 errors in empty disjunction:
services.app.depends_on.1: conflicting values "app" and "cach":
./docker-compose.yaml:12:10
./schema.cue:1:21
./schema.cue:3:18
./schema.cue:3:40
./schema.cue:11:35
services.app.depends_on.1: conflicting values "cache" and "cach":
./docker-compose.yaml:12:10
./schema.cue:1:21
./schema.cue:3:18
./schema.cue:3:40
./schema.cue:11:35
services.app.depends_on.1: conflicting values "database" and "cach":
./docker-compose.yaml:12:10
./schema.cue:1:21
./schema.cue:3:18
./schema.cue:3:40
./schema.cue:11:35
Closing Thoughts
I'm still in the discovery phase with Cue, but I'm already seeing the power of the system. In particular, the combination of disjunction types and constraint types seem to be a natural fit for many of the config formats I find myself dealing with on a daily basis.
Much of the true power in Cue comes from the synthesis of schema validation and config generation, a topic I plan to cover in a future post.