Smart constructors for case classes

Smart constructors for case classes

One of the great benefits of using Scala is its type safety. If we are clear and careful about the types we use, the compiler can help guide us to a more correct solution and point out where we may be going wrong.

There are ways we can rely on the type system, and the language in general, to give us even more confidence about the code we’re trying to reason about.

Leaning on the type system

Using our knowledge of Scala and types, we can constrain a database ID to be a particular type:

case class DatabaseId(value: String)

Now that we have a clear indicator for a function that requires a database ID, it will be clearly marked to be different to an arbitrary string. It’s hard to mistakenly pass a string intended for something else into this function.

def retrieveRecord(id: DatabaseId): IO[User] = {
    // ...
}

We can also have Scala help with other things for this DatabaseId, such as automatically generate JSON serializers and deserializers, and even automatically generate test data specifically for Database IDs.

Is this good enough? Can we take this any further?

Bytes all the way down

Whatever our application does, there is a boundary between our code and the outside world, whether it is an application that depends on command line arguments, or an HTTP service receiving POSTed data. This is not a Scala thing; every application has this in common: the interface to the outside world is going to receive bytes in, and send bytes out.

Inside our application, we want to convert these bytes into something valid. High level languages do part of this already, providing primitive types such as integers, floating-point numbers, and strings. Naturally, we want to have more knowledge about what these values represent. A great first step in Scala is wrapping these bytes into case classes. But how do we know that the data inside these case classes is correct?

Validating data

When we construct a case class, we want to ensure that the constructor parameters are valid. Imagine we are modelling the number of products in a warehouse; this could understandably be represented as an Int or Long - but we would probably never want the values to be negative, or perhaps into the millions or billions.

A naïve approach here might be to throw some kind of exception for invalid data when we construct an object:

case class ProductCount(count: Int) {
    require(count >= 0 && count < 1_000_000, s"$count must not be negative and less than a million")
}

But we don’t want to throw exceptions at runtime! Instead, we could have a flag indicating if the count is correct:

case class ProductCount(count: Int) {
    val isValid: Boolean = count >= 0 && count < 1_000_000
}

But now we have put the responsibility on the user of this class to make sure they refer to isValid anytime they need to use it. If they forget, or make a mistake understanding the logic, then the compiler isn’t going to help.

Validating at construction time

Amusingly, we were closer to a usable solution with the runtime validation require approach than our boolean flag: when we create an instance, instead we give the user a different object, one that has the ability to represent an invalid construction.

With case classes, the compiler writes a lot of extra boilerplate for us for free, leaving us to reason about our code without the noise getting in the way. The compiler adds an intuitive equals method, a toString that makes sense, as well as apply for easy construction and unapply for use with pattern matching.

There are no constraints on the return type of the apply method. If we provide a function named apply, then the compiler will not create one for us. We can simply provide our own in the companion object, returning a different type, and do the object creation there. Revisiting our DatabaseId class, let’s assume that a database ID must contain exactly 12 characters:

case class DatabaseId(value: String)

object DatabaseId {
    def apply(value: String): Option[DatabaseId] = {
        if(value.length == 12) Some(new DatabaseId(value))
        else None
    }
}

Note here we want to call new DatabaseId - omitting new would instead defer to the apply function, and that is the exact function we’re writing!

As useful as this is, it’s still possible to circumvent this check; there is nothing to stop us from constructing a case class with new ourselves.

// both these lines compile
val validated: Option[DatabaseId] = DatabaseId("0123456789ab")
val invalid: DatabaseId = new DatabaseId("not twelve chars")

As much as we trust our colleagues to Do The Right Thing and always construct these properly, it would be better if we could protect against accidentally calling new. Are you 100% sure you would spot that in a code review 100% of the time? We can make the constructor private. While we’re here, let’s make this class final so that nobody can extend the class and circumvent our validation that way:

final case class DatabaseId private (value: String)

Now we’re guided to use the given apply function, and the value constructed using the new approach above will not compile anymore.

Removing more backdoors

We can still create invalid objects using the compiler-provided copy method:

val validated: Option[DatabaseId] = DatabaseId("0123456789ab")
val invalid = validated.map(_.copy(value = "not valid"))

So, similar to what we did with apply, we should do the same and provide our own copy method too. This time on the class rather than the object, as we call copy on instances. And we will make it private as we want to indicate there is simply no need to ever call this method:

final case class DatabaseId private (value: String) {
    private def copy: Unit = ()
}

Now we have no way to construct a DatabaseId without it passing our validation. Other approaches we may want to look at here is to make the apply method private too, and then provide a more descriptive API to indicate what is happening. The code in full now looks like this:

final case class DatabaseId private (value: String) {
    private def copy: Unit = ()
}

object DatabaseId {
    private def apply(value: String): Option[DatabaseId] = {
        if(value.length == 12) Some(new DatabaseId(value))
        else None
    }

    def fromString(value: String): Option[DatabaseId] = apply(value)
}

And you would construct values:

val validated: Option[DatabaseId] = DatabaseId.fromString("0123456789ab")

We still have all of the benefits of working with a case class, but added confidence that whatever our case class holds is useful and correct.

Constructing literal values

Now that we cannot construct a bare DatabaseId, this might appear to be a headache to test with literal values. But in reality, this is easy to work with without losing type safety. We simply fail a test if we have constructed an illegal literal value. Considering we are constructing this by hand, it should be clear very quickly when writing the test:

val testValue: DatabaseId = DatabaseId.fromString("0123456789ab").getOrElse(fail("Unable to construct database ID"))
// ... the rest of the test here, using testValue as a DatabaseId

Even with Scalacheck, we can have confidence with the values we create:

val databaseIdGen: Gen[DatabaseId] = for {
    cs <- Gen.listOfN(12, Gen.hexChar)
    id <- DatabaseId.fromString(cs.mkString).fold(Gen.fail[DatabaseId])(Gen.const)
} yield id

Simply fail the generator on an invalid state, and it will try again. Now when we use this in our tests, we’re getting DatabaseIds without the distracting Option around it.

More steps

Instead of using Option, use Either and indicate how the validation has gone wrong. The Either type has a useful function combining a boolean check with what to do in both the true and false cases. This provides a nice API to the users of the code about how to construct objects and how it can go wrong, all while remaining type-safe:

sealed trait IdError
case object BadLength extends IdError
case object InvalidCharacter extends IdError
// ... more validation error cases here...

final case class DatabaseId private (value: String) {
    private def copy: Unit = ()
}

object DatabaseId {
    private def apply(value: String): Either[IdError, DatabaseId] =
        Either.cond(
            value.length == 12,
            new DatabaseId(value),
            BadLength
        )

    def fromString(value: String): Either[IdError, DatabaseId] = apply(value)
}

This could be used with the validated functionality from Cats to combine multiple validation checks in a functional style.

Final words

When Scala was introduced, case classes were touted as a nice departure from the boilerplate-heavy Java “POJO” approaches, allowing the developer to define a domain object in a single line and still get all of the same benefits as their Java counterpart. Here, we’re having to reintroduce some boilerplate in order to gain a little more confidence that the values we’re constructing for this type are correct. I think this is a worthwhile price to pay: you have all the validation logic in one place instead of scattered around your codebase and trusting that users of your type are using it correctly. It’s simply not possible to create invalid types - you are in the enviable position of not being able to represent illegal states.

Ensure the success of your project

47 Degrees can work with you to help manage the risks of technology evolution, develop a team of top-tier engaged developers, improve productivity, lower maintenance cost, increase hardware utilization, and improve product quality; all while using the best technologies.