Dependent and refinement types: why?

Dependent and refinement types: why?

Here we are again. When you thought that sum and product, in essence algebraic data types, were the final step on the ladder, new names pop up. But why are we so fond of (strong) types in our programs? And what do dependent and refinement types bring to the table?

Making implicits explicit

One common theme in statically typed functional programming is being explicit about which errors a function may raise. That is, instead of having a function:

lookupPerson :: Name -> Person

that may fail raising a PersonNotFoundException, we prefer something along the lines of:

lookupPerson :: Name -> Maybe Person
lookupPerson :: Name -> Either PersonLookupError Person

where the choice between the two depends on whether we also want to report back which kind of error occurred.

Another important concern is explicitness about side effects: we want to know whether a certain function is completely pure, or, on the contrary, may perform other actions in the system. Haskell actually forces this separation; the type of an operation like reading a line is tagged with IO:

readLine :: IO String

In other functional languages, like Scala or Kotlin, this separation is not enforced by the compiler. However, library support for it is growing, as exemplified by Cats Effect or Arrow Fx. There is even a trend to break this binary distinction “pure against side-effectful” into smaller pieces: you can tag a function as performing database queries, but nothing else. Check Polysemy if you are interested.

The commonality in these examples is that we had made some information about the function explicit by using types: Maybe and IO are warning signs of “possible errors” or “may perform side effects”. Although there is already some value as documentation, the main gain is that the compiler can perform more checks to ensure that we have not obviated the warnings. For example, we cannot continue working on the Person obtained from lookupPerson until we unravel whether it was successful or not:

case lookupPerson "Alex" of
  Nothing -> ... -- something went wrong
  Just p  -> ... -- here we have our Person

That is the whole game. The more properties about our data or behavior we can express with our types, the more these can be automatically enforced by the compiler. They are also cross-boundary: whereas unit or integration tests usually rely on the environment in which the function is tested, properties expressed by types still hold when you compose your functions into bigger components.

Let me share with you a third example of what we can achieve with a simple “type tag.” A very common need when working with incoming data is to sanitize it. Think about escaping strings that will end up as part of an SQL query, or special symbols from HTML. This tag is changed once we call a special function:

sanitize :: Input NotSanitized -> Input Sanitized

Furthermore, if we only allow inputs with type Input Sanitized when creating our HTML views, we can be completely sure that sanization has taken placed.

Dependencies between values

This view on types can get us, and has gotten us, quite far in enforcing properties in a program. But it falls short when we want to talk about relations or dependencies between values. Consider an indexing function that is given a list or array, and a position to get: whether such a call is correct depends on whether the position is within the bounds of the array, a relation between both inputs.

LiquidHaskell (an extension of Haskell) and Agda (a language with whom it shares quite a bit of syntax) let us write that relation in different ways (let me talk about the two Agda ones later). The important bit is that we give names to the arguments (xs for the list, n for the index) so we can talk about their relation later.

-- LiquidHaskell
get :: (xs : [a]) -> { n : Int | 0 <= n && n < length xs } -> { v : a | v `elem` xs }
-- Agda intrinsic
get :: (xs : List a l) -> Fin l -> a
-- Agda extrinsic
get :: (xs : List a l) -> (n : Nat) -> (inBounds n l) -> a

Functions are a natural place to introduce dependencies, but they are not the only one. We could envision a type Sorted a for sorted lists: in that case, the “one more” constructor needs to ensure that the new value is smaller than the rest of the list.

Two-and-a-half flavors

This is the point in which refinement and dependent type systems differ. Let’s start with one-sentence summaries:

  • Refinement type systems: “each type has an attached logic formula”,
  • Dependent type systems: “values may also appear in types”.

“Logic formula” is just a fancy way to say “Boolean expression.” In the previous example, 0 <= n && n < length xs is the formula that refines the simple type Int by specifying constraints on the value. A compiler such as LiquidHaskell is able to keep track of the constraints of each value, to check that each invariant is respected. For example, if we write:

if n >= 1 then get lst n else ...

on the then branch, we are sure that n has a refined type { n : Int | n >= 1 }. If we pass it to get, we satisfy the first property (since being >= 1 implies being >= 0), but we would obtain an error because n <= length lst cannot be checked.

In a dependent type system, you can make the length of a list an intrinsic property of the list. In that case, [1, 2] would have the type List Int 2, where 2 is the actual number 2! Having values as types is necessary because we want to talk about the value itself later on. This ability also opens the possibility to have function calls in types, so you can express the type of a function such as listAppend, in which the length of the resulting list is the sum of the input lengths.

listAppend :: List a n -> List a m -> List a (n + m)

The question of how to enforce invariants has two answers in dependent types. As we have explained, we can have intrinsic properties, which implies that a “wrong” value cannot be constructed at all. This is the approach taken when we write Fin n: here, Fin is a type defined in such a way that it can only take values from 0 up to n - 1; exactly those accepted by get.

There is another possibility, which is to allow any input for n, but then require an additional proof of the fact that n is within bounds; this is called the extrinsic approach. This is not the place to dive into details, but you write a data type inBounds that codifies the desired relation between values. That way, we transform a problem more similar to a refinement to our framework.

Proving stuff to the compiler

At some point in the discussion, you can read:

since being >= 1 implies being >= 0

But how does the computer know that this is the case? Some compilers try to automate this kind of logical thinking as much as possible by using SMT solvers. SMT solvers are magical tools that can decide whether a formula is true or give you a counterexample, for a limited range of data types, including numbers and sets. But, since most properties fall into those buckets, they are invaluable in assisting us. The other option is manually stating why something is true, using some sort of formal language.

Automation tends to be the preferred option for refinement types, whereas manual proof writing is the bread and butter in dependent type systems. However, note that automation is limited, so being aware of how to write proofs is also useful when using refinement types.

The archetypal example of the compiler “not being able to see stuff” is the fact that applying reverse twice gives you the original list. We can write this property as a function in both systems:

-- LiquidHaskell
revRev :: (lst : [a]) -> { Proof | reverse (reverse lst) == lst }
-- Agda
revRev :: (lst : List a) -> reverse (reverse lst)  lst

This example also shows another interesting feature of these systems: you may not only check invariants at the moment of writing a function, but also as a separate step. This could be done to not bloat the original code, but also as a way to specify additional properties, or even to replace some (property) tests!

Learn more with us!

At 47 Degrees Academy, we are firm believers in the benefits of types and explicitness. If you want to know more about refinements and dependent types, and how to use them, check out some of these events.

And stay tuned for more!

Ensure the success of your project

47 Degrees can work with you to help manage the risks of technology evolution, develop a team of top-tier engaged developers, improve productivity, lower maintenance cost, increase hardware utilization, and improve product quality; all while using the best technologies.