47 Degrees joins forces with Xebia read more

Unexpected Generated Data: A Common Situation with ScalaCheck

Unexpected Generated Data: A Common Situation with ScalaCheck

Once you’ve been using ScalaCheck for a while, you might have seen a problem where your generators appear to be producing unanticipated data. Imagine you have a generator that produces even, positive numbers:

val evenInts: Gen[Int] = Gen.posNum[Int].filter(_ % 2 == 0)

And that test works as you would expect:

property("round-trip") = forAll(evenInts) { i =>
    (i / 2) * 2 == i
}
[info] + IntProps.round-trip: OK, passed 100 tests.

This generator only defines even, positive numbers. If we were to introduce a bug, you will see that things do not work exactly as you would expect:

property("round-trip-fail") = forAll(evenInts) { i =>
    ((i-1) / 2) * 2 == i
}
[info] ! IntProps.round-trip-fail: Falsified after 0 passed tests.
[info] > ARG_0: 1
[info] > ARG_0_ORIGINAL: 2

So ScalaCheck found the bug, great! But look at the provided data: we see two lines, ARG_0_ORIGINAL, the original value that caused the test to fail, and the value that caused the test to give up, ARG_0. But wait a second! ARG_0 was 1! ScalaCheck ran our test with an odd number, a number we had explicitly told our generator not to provide. What gives?

Shrinking

ScalaCheck has some useful behavior where, when it finds a failing test case, it will try to find the “smallest” (or perhaps “simplest”) value it can, which caused the test to fail. This can make debugging easier. Imagine your algorithm fails when the length of a list is larger than, say, five elements. If ScalaCheck’s first attempt is a list with 2,000 elements in it, it may take you a while to work out what’s going on and what the underlying issue is. So ScalaCheck shrinks the input, effectively binary searching between successful and unsuccessful test runs, to see where the boundary lies.

With our even integer test above, once ScalaCheck found a failing value, it tried to shrink the input to see if there’s a smaller value that caused it to fail. The code will call upon an implicit Shrink[Int], which will determine which smaller or simpler values to try, based on the original failing case.

The issue here is that there is no link between our Gen[Int] and the Shrink[Int]: the only common thing here is the type Int; there is no way to determine that the Shrink should respect the values given from the generator. Some see this as a fundamental flaw in the way ScalaCheck works in order to shrink values in the first place.

Fixing our issue

There are at least three ways you can stop this from happening:

Turn off shrinking

You can replace your forAll function call with forAllNoShrink: this works in the same way, but it will not try to find a smaller value causing it to fail and so it will terminate immediately.

property("round-trip-fail") = forAllNoShrink(evenInts) { i =>
    ((i-1) / 2) * 2 == i
}
[info] ! IntProps.round-trip-fail: Falsified after 0 passed tests.
[info] > ARG_0: 2

There is no ARG_0_ORIGINAL line in our test log anymore.

If you wish to turn off shrinking, this is how you should do it. However, if you are using ScalaTest, there is no forAllNoShrink provided. In fact, the library author explicitly said that he does not wish to add forAllNoShrink methods to the library. So what are the other options?

Provide your own shrinker that does nothing

We’ve established that ScalaCheck looks for an implicit Shrink, so let’s provide our own. Shrink will ask for a Stream of shrunken elements to try, so we can provide a stream with no more elements for testing:

implicit val noShrinkInt: Shrink[Int] = Shrink(_ => Stream.empty)

property("round-trip-fail") = forAll(evenInts) { i =>
    ((i-1) / 2) * 2 == i
}
[info] ! IntProps.round-trip-fail: Falsified after 0 passed tests.
[info] > ARG_0: 2

This will never shrink Ints while that implicit is in scope, so we need to be careful we’re not affecting any other tests.

We can go one step further and turn off shrinking for all types:

implicit def noShrink[A]: Shrink[A] = Shrink(_ => Stream.empty)

There is a slightly more granular approach we can take to have control over what happens with our generated values:

Use a new type

Instead of generating Ints, we generate EvenInts:

case class EvenInt(private val i: Int)

object EvenInt {
    def value(i: Int): Option[EvenInt] = if(i % 2 == 0) Some(EvenInt(i)) else None
}

We’re not here to start a discussion on smart constructors, but this allows us to have some confidence that a EvenInt will only contain an even integer:

scala> EvenInt.value(4)
val res0: Option[EvenInt] = Some(EvenInt(4))

scala> EvenInt.value(5)
val res1: Option[EvenInt] = None

Now we can update our generator and our test:

val newEvenInts: Gen[EvenInt] = Gen.posNum[Int].flatMap { i =>
    EvenInt.value(i).fold[Gen[EvenInt]](Gen.fail)(Gen.const)
}

property("even-int") = forAll(newEvenInts) { case EvenInt(i) =>
    ((i-1) / 2) * 2 == i
}

And our test fails again as we’d expect, with no shrinking, because the compiler cannot find a Shrink for our brand new type:

[info] ! IntProps.even-int: Falsified after 0 passed tests.
[info] > ARG_0: EvenInt(2)

We can even go one step further now: we can provide our own shrinker that should work with EvenInts:

implicit def shrinkEvenInt(implicit si: Shrink[Int]): Shrink[EvenInt] = Shrink {
    case EvenInt(ei) =>
        si
         .shrink(ei)
         .map(EvenInt.value)
         .collect { case Some(i) => i }
}
[info] ! IntProps.even-int: Falsified after 0 passed tests.
[info] > ARG_0: EvenInt(2)
[info] > ARG_0_ORIGINAL: EvenInt(4)

We will never see any odd shrunken values.

The key to controlling ScalaCheck generators is to control shrinking

This should shine some light on a rather common, yet confusing, situation that arises with ScalaCheck.

The three approaches here all have their place, and all have the right time to be used. Shrinking is extremely valuable, and can save you time searching for that needle in the haystack, so you want to have it switched on. When you are debugging a problem and you are unsure of where to look, it can be a good idea to temporarily turn shrinking off, so that you can immediately eliminate one whole problem space while working out what the actual issue is.

If you want to temporarily turn off shrinking, use one of the first two approaches, forAllNoShrink if you can, or fake a Shrink[A] to return no further values if you can’t do that. For richer test data generation, it may be worth providing your own data type. This will give you a little more control over how values from your generators and shrinkers are derived.

Ensure the success of your project

47 Degrees can work with you to help manage the risks of technology evolution, develop a team of top-tier engaged developers, improve productivity, lower maintenance cost, increase hardware utilization, and improve product quality; all while using the best technologies.