Friday, March 30, 2018

Kata: A small kata to explore and play with property-based testing

1. Introduction.

I've been reading Fred Hebert's wonderful PropEr Testing online book about property-based testing. So to play with it a bit, I did a small exercise. This is its description:

1. 1. The kata.

We'll implement a function that can tell if two sequences are equal regardless of the order of their elements. The elements can be of any type.

We'll use property-based testing (PBT). Use the PBT library of your language (bring it already installed).

Follow these constraints:

  1. You can't use or compute frequencies of elements.
  2. Work test first: write a test, then write the code to make that test pass.
  3. If you get stuck, you can use example-based tests to drive the implementation on. However, at the end of the exercise, only property-based tests can remain.

Use mutation testing to check if you tests are good enough (we'll do it manually injecting failures in the implementation code (by commenting or changing parts of it) and checking if the test are able to detect the failure to avoid using more libraries).

2. Driving a solution using both example-based and property-based tests.

I used Clojure and its test.check library (an implementation of QuickCheck) to do the exercise. I also used my favorite Clojure's test framework: Brian Marick's Midje which has a macro, forall, which makes it very easy to integrate property-based tests with Midje.

So I started to drive a solution using an example-based test (thanks to Clojure's dynamic nature, I could use vectors of integers to write the tests. ):

which I made pass using the following implementation:

Then I wrote a property-based test that failed:

This is how the failure looked in Midje (test.check returns more output when a property fails, but Midje extracts and shows only the information it considers more useful):

the most useful piece of information for us in this failure message is the quick-check shrunken failing values. When a property-based testing library finds a counter-example for a property, it applies a shrinking algorithm which tries to reduce it to find a minimal counter-example that produces the same test failure. In this case, the [1 0] vector is the minimal counter-example found by the shrinking algorithm that makes this test fails.

Next I made the property-based test pass by refining the implementation a bit:

I didn't know which property to write next, so I wrote a failing example-based test involving duplicate elements instead:

and refined the implementation to make it pass:

With this, the implementation was done (I chose a function that was easy to implement, so I could focus on thinking about properties).

3. Getting rid of example-based tests.

Then the next step was finding properties that could make the example-based tests redundant. I started by trying to remove the first example-based test. Since I didn't know test.check's generators and combinators library, I started exploring it on the REPL with the help of its API documentation and cheat sheet.

My sessions on the REPL to build generators bit by bit were a process of shallowly reading bits of documentation followed by trial and error. This tinkering sometimes lead to quick successes and most of the times to failures which lead to more deep and careful reading of the documentation, and more trial and error. In the end I managed to build the generators I wanted. The sample function was very useful during all the process to check what each part of the generator would generate.

For the sake of brevity I will show only summarized versions of my REPL sessions where everything seems easy and linear...

3. 1. First attempt: a partial success.

First, I wanted to create a generator that generated two different vectors of integers so that I could replace the example-based tests that were checking two different vectors. I used the list-distinct combinator to create it and the sample function to be able to see what the generator would generate:

I used this generator to write a new property which made it possible to remove the first example-based test but not the second one:

In principle, we might think that the new property should have been enough to also allow removing the last example-based test involving duplicate elements. A quick manual mutation test, after removing that example-based test, showed that it wasn't enough: I commented the line (= (count s1) (count s2)) in the implementation and the property-based tests weren't able to detect the regression.

This was due to the low probability of generating a pair of random vectors that were different because of having duplicate elements, which was what the commented line, (= (count s1) (count s2)), was in the implementation for. If we'd run the tests more times, we'd have finally won the lottery of generating a counter-example that would detect the regression. So we had to improve the generator in order to increase the probabilities, or, even better, make sure it'd be able to detect the regression.

In practice, we'd combine example-based and property-based tests. However, my goal was learning more about property-based testing, so I went on and tried to improve the generators (that's why this exercise has the constraint of using only property-based tests).

3. 2. Second attempt: success!

So, I worked a bit more on the REPL to create a generator that would always generate vectors with duplicate elements. For that I used test.check's let macro, the tuple, such-that and not-empty combinators, and Clojure's core library repeat function to build it.

The following snippet shows a summary of the work I did on the REPL to create the generator using again the sample function at each tiny step to see what inputs the growing generator would generate:

Next I used this new generator to write properties that this time did detect the regression mentioned above. Notice how there are separate properties for sequences with and without duplicates:

After tinkering a bit more with some other generators like return and vector-distinct, I managed to remove a redundant property-based test getting to this final version:

4. Conclusion.

All in all, this exercise was very useful to think about properties and to explore test.check's generators and combinators. Using the REPL made this exploration very interactive and a lot of fun. You can find the code of this exercise on this GitHub repository.

A couple of days later I proposed to solve this exercise at the last Clojure Developers Barcelona meetup. I received very positive feedback, so I'll probably propose it for a Barcelona Software Craftsmanship meetup event soon.

Wednesday, March 28, 2018

Examples lists in TDD

1. Introduction.

During coding dojos and some mentoring sessions I've noticed that most people just start test-driving code without having thought a bit about the problem first. Unfortunately, writing a list of examples before starting to do TDD is a practice that is most of the times neglected.

Writing a list of examples is very useful because having to find a list of concrete examples forces you to think about the problem at hand. In order to write each concrete example in the list, you need to understand what you are trying to do and how you will know when it is working. This exploration of the problem space improves your knowledge of the domain, which will later be very useful while doing TDD to design a solution. However, just generating a list of examples is not enough.

2. Orthogonal examples.

A frequent problem I've seen in beginners' lists is that many of the examples are redundant because they would drive the same piece of behavior. When two examples drive the same behavior, we say that they overlap with each other, they are overlapping examples.

To explore the problem space effectively, we need to find examples that drive different pieces of behavior, i.e. that do not overlap. From now on, I will refer to those non-overlapping examples as orthogonal examples[1].

Keeping this idea of orthogonal examples in mind while exploring a problem space, will help us prune examples that don't add value, and keep just the ones that will force us to drive new behavior.

How can we get those orthogonal examples?
  1. Start by writing all the examples that come to your mind.
  2. As you gather more examples ask yourself which behavior they would drive. Will they drive one clear behavior or will they drive several behaviors?
  3. Try to group them by the piece of behavior they'd drive and see which ones overlap so you can prune them.
  4. Identify also which behaviors of the problem are not addressed by any example yet. This will help you find a list of orthogonal examples.
With time and experience you'll start seeing these behavior partitions in your mind and spend less time to find orthogonal examples.

3. A concrete application.

Next, we'll explore a concrete application using a subset of the Mars Rover kata:
  • The rover is located on a grid at some point with coordinates (x,y) and facing a direction encoded with a character.
  • The meaning of each direction character is:
    • "N" -> North
    • "S" -> South
    • "E" -> East
    • "W" -> West
  • The rover receives a sequence of commands (a string of characters) which are codified in the following way:
    • When it receives an "f", it moves forward one position in the direction it is facing.
    • When it receives a "b", it moves backward one position in the direction it is facing.
    • When it receives a "l", it turns 90º left changing its direction.
    • When it receives a "r", it turns 90º right changing its direction.

Heuristics.

Let's start writing a list of examples that explores this problem. But how?

Since the rover is receiving a sequence of commands, we can apply a useful heuristic for sequences to get us started: J. B. Rainsberger's "0, 1, many, oops" heuristic [2].

In this case, it means generating examples for: no commands, one command, several commands and unknown commands.

I will use the following notation for examples:
(x, y, d), commands_sequence -> (x’, y’, d’)
Meaning that, given the rover is in an initial location with x and y coordinates, and facing a direction d, after receiving a given sequence of commands (which is represented by a string), the rover will be located at x’ and y’ coordinates and facing the d’ direction.

Then our first example corresponding to no commands might be any of:
(0, 0, "N"), "" -> (0, 0, "N")
(1, 4, "S"), "" -> (1, 4, "S")
(2, 5, "E"), "" -> (2, 5, "E")
(3, 2, "E"), "" -> (3, 2, "E")
...
Notice that in these examples, we don't care about the specific positions or directions of the rover. The only important thing here is that the position and direction of the rover does not change. They will all drive the same behavior so we might express this fact using a more generic example:
(any_x, any_y, any_direction), "" -> (any_x, any_y, any_direction)
where we have used any_x, any_y and any_direction to make explicit that the specific values that any_x, any_y and any_direction take are not important for these tests. What is important for the tests, is that the values of x, y and direction remain the same after applying the sequence of commands [3].

Next, we focus on receiving one command.

In this case there are a lot of possible examples, but we are only interested on those that are orthogonal. Following our recommendations to get orthogonal examples, you can get to the following set of 16 examples that can be used to drive all the one command behavior (we're using any_x, any_y where we can):
(4, any_y, "E"), "b" -> (3, any_y, "E")
(any_x, any_y, "S"), "l" -> (any_x, any_y, "E")
(any_x, 6, "N"), "b" -> (any_x, 5, "N")
(any_x, 3, "N"), "f" -> (any_x, 4, "N")
(5, any_y, "W"), "f" -> (4, any_y, "W")
(2, any_y, "W"), "b" -> (3, any_y, "W")
(any_x, any_y, "E"), "l" -> (any_x, any_y, "N")
(any_x, any_y, "W"), "r" -> (any_x, any_y, "N")
(any_x, any_y, "N"), "l" -> (any_x, any_y, "W")
(1, any_y, "E"), "f" -> (2, any_y, "E")
(any_x, 8, "S"), "f" -> (any_x, 7, "S")
(any_x, any_y, "E"), "r" -> (any_x, any_y, "S")
(any_x, 3, "S"), "b" -> (any_x, 4, "S")
(any_x, any_y, "W"), "l" -> (any_x, any_y, "S")
(any_x, any_y, "N"), "r" -> (any_x, any_y, "E")
(any_x, any_y, "S"), "r" -> (any_x, any_y, "W")
There're already important properties about the problem that we can learn from these examples:
  1. The position of the rover is irrelevant for rotations.
  2. The direction the rover is facing is relevant for every command. It determines how each command will be applied.
Sometimes it can also be useful to think in different ways of grouping the examples to see how they may relate to each other.

For instance, we might group the examples above by the direction the rover faces initially:
Facing East
(1, any_y, "E"), "f" -> (2, any_y, "E")
(4, any_y, "E"), "b" -> (3, any_y, "E")
(any_x, any_y, "E"), "l" -> (any_x, any_y, "N")
(any_x, any_y, "E"), "r" -> (any_x, any_y, "S")
Facing West
(5, any_y, "W"), "f" -> (4, any_y, "W") ...
or, by the command the rover receives:
Move forward
(1, any_y, "E"), "f" -> (2, any_y, "E")
(5, any_y, "W"), "f" -> (4, any_y, "W")
(any_x, 3, "N"), "f" -> (any_x, 4, "N")
(any_x, 8, "S"), "f" -> (any_x, 7, "S")
Move backward
(4, any_y, "E"), "b" -> (3, any_y, "E")
(2, any_y, "W"), "b" -> (3, any_y, "W")
...
Trying to classify the examples helps us explore different ways in which we can use them to make the code grow by discovering what Mateu Adsuara calls dimensions of complexity of the problem[4]. Each dimension of complexity can be driven using a different set of orthogonal examples, so this knowledge can be useful to choose the next example when doing TDD.

Which of the two groupings shown above might be more useful to drive the problem?

In this case, I think that the by the command the rover receives grouping is more useful, because each group will help us drive a whole behavior (the command). If we were to use the by the direction the rover faces initially grouping, we'd end up with partially implemented behaviors (commands) after using each group of examples.

Once we have the one command examples, let's continue using the "0, 1, many, oops" heuristic and find examples for the several commands category.

We can think of many different examples:
(7, 4, "E"), "rb" -> (7, 5, "S") (7, 4, "E"), "fr" -> (8, 4, "S") (7, 4, "E"), "ffl" -> (9, 4, "N")
The thing is that any of them might be thought as a composition of several commands:
(7, 4, "E"), "r" -> (7, 4, "S"), "b" -> (7, 5, "S")
Then the only new behavior these examples would drive is composing commands.

So It turns out that there's only one orthogonal example in this category. We might choose any of them, like the following one for instance:
(7, 4, "E"), "frrbbl" -> (10, 4, "S")
This doesn't mean that when we're later doing TDD, we have to use only this example to drive the behavior. We can use more overlapping examples if we're unsure on how to implement it and we need to use triangulation[5].

Finally, we can consider the "oops" category which for us is unknown commands. In this case, we need to find out how we'll handle them and this might involve some conversations.

Let's say that we find out that we should ignore unknown commands, then this might be an example:
(any_x, any_y, any_direction), "*" -> (any_x, any_y, any_direction)
Before finishing, I’d like to remark that it’s important to keep this technique as lightweight and informal as possible, writing the examples on a piece of paper or on a whiteboard, and never, ever, write them directly as tests (which I’ve also seen many times).

There are two important reasons for this:
  1. Avoiding implementation details to leak into a process meant for thinking about the problem space.
  2. Avoiding getting attached to the implementation of tests, which can create some inertia and push you to take implementation decisions without having explored the problem well.

4. Conclusion.

Writing a list of examples before starting doing TDD is an often overlooked technique that can be very useful to reflect about a given problem. We also talked about how thinking in finding orthogonal examples can make your list of examples much more effective and saw some useful heuristics that might help you find them.

Then we worked on a small example in tiny steps, compared different alternatives just to try to illustrate and make the technique more explicit and applied one of the heuristics.

With practice, this technique becomes more and more a mental process. You'll start doing it in your mind and find orthogonal examples faster. At the same time, you’ll also start losing awareness of the process[6].

Nonetheless, writing a list of examples or other similar lightweight exploration techniques can still be very helpful for more complicated cases. This technique can also be very useful to think in a problem when you’re working on it with someone else (pairing, mob programming, etc.), because it enhances communication.

5. Acknowledgements.

Many thanks to Alfredo Casado, Álvaro Garcia, Abel Cuenca, Jaime Perera, Ángel Rojo, Antonio de la Torre, Fran Reyes and Manuel Tordesillas for giving me great feedback to improve this post and for all the interesting conversations.

6. References.


Footnotes.
[1] This concept of orthogonal examples is directly related to Mateu Adsuara's dimensions of complexity idea because each dimension of complexity can be driven using a different set of orthogonal examples. For a definition of dimensions of complexity, see footnote [4] .
[2] Another very useful heuristic is described in James Grenning's TDD Guided by ZOMBIES post.
[3] This is somehow related to Brian Marick’s metaconstants which can be very useful to write tests in dynamic languages. They’re also hints about properties that might be helpful in property-based testing.
[4] Dimension of Complexity is a term used by Mateu Adsuara in a talk at Socrates Canarias 2016 to name an orthogonal functionality. In that talk he used dimensions of complexity to classify the examples in his tests list in different groups and help him choose the next test when doing TDD.
He talked about it in these three posts:
Other names for the same concept that I've heard are axes of change, directions of change or vectors of change.
[5] Even though triangulation is probably the most popular, there are two other strategies for implementing new functionality in TDD: obvious implementation and fake it. Kent Beck in his Test-driven Development: By Example book describes the three techniques and says that he prefers to use obvious implementation or fake it most of the time, and only use triangulation as a last resort when design gets complicated.
[6] This loss of awareness of the process is the price of expertise according to the Dreyfus model of skill acquisition.

Sunday, March 25, 2018

Books I read (January - March 2018)

January
- The Plateau Effect: Getting from Stuck to Success, Bob Sullivan & Hugh Thompson
- The Thirty-Nine Steps, John Buchan
- Memento Mori, Muriel Spark
- Cosmonauta, Pep Brocal
- The Man Who Was Thursday: A Nightmare, G. K. Chesterton
- Ébano, Alberto Vázquez-Figueroa
- The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life, Mark Manson
- The Importance of Being Earnest, Oscar Wilde

February
- The Maltese Falcon, Dashiell Hammett
- This Is Water, David Foster Wallace
- Judas, Amos OZ
- Never Let Me Go, Kazuo Ishiguro.
- Microservice Architecture: Aligning Principles, Practices, and Culture, Mike Amundsen, Matt McLarty, Ronnie Mitra, Irakli Nadareishvili

March
- On Anarchism, Noam Chomsky
- The Fire Next Time, James Baldwin
- Esperanza en la oscuridad, La historia jamás contada del poder de la gente (Hope in the Dark: Untold Histories, Wild Possibilities), Rebecca Solnit
- The Dispossessed: An Ambiguous Utopia, Ursula K. Leguin
- Release It!: Design and Deploy Production-Ready Software, Michael T. Nygard

Saturday, March 10, 2018

Kata: Generating bingo cards with clojure.spec, clojure/test.check, RDD and TDD

Clojure Developers Barcelona has been running for several years now. Since we're not many yet, we usually do mob programming sessions as part of what we call "sagas". For each saga, we choose an exercise or kata and solve it during the first one or two sessions. After that, we start imagining variations on the exercise using different Clojure/ClojureScript libraries or technologies we feel like exploring and develop those variations in following sessions. Once we feel we can't imagine more interesting variations or we get tired of a given problem, we choose a different problem to start a new saga. You should try doing sagas, they are a lot of fun!

Recently we've been working on the Bingo Kata.

The initial implementation

These were the tests we wrote to check the randomly generated bingo cards:

and the code we initially wrote to generate them was something like (we didn't save the original one):

As you can see the tests are not concerned with which specific numeric values are included on each column of the bingo card. They are just checking that they follow the specification of a bingo card. This makes them very suitable for property-based testing.

Introducing clojure.spec

In the following session of the Bingo saga, I suggested creating the bingo cards using clojure.spec.
spec is a Clojure library to describe the structure of data and functions. Specs can be used to validate data, conform (destructure) data, explain invalid data, generate examples that conform to the specs, and automatically use generative testing to test functions.
For a brief introduction to this wonderful library see Arne Brasseur's Introduction to clojure.spec talk.

I'd used clojure.spec at work before. At my current client Green Power Monitor, we've been using it for a while to validate the shape (and in some cases types) of data flowing through some important public functions of some key name spaces. We started using pre and post-conditions for that validation (see Fogus' Clojure’s :pre and :post to know more), and from there, it felt as a natural step to start using clojure.spec to write some of them.

Another common use of clojure.spec specs is to generate random data conforming to the spec to be used for property-based testing.

In the Bingo kata case, I thought that we might use this ability of randomly generating data conforming to the spec in production code. This meant that instead of writing code to randomly generating bingo cards and then testing that the results were as expected, we might describe the bingo cards using clojure.spec and then took advantage of that specification to randomly generate bingo cards using clojure.test.check's generate function.

So with this idea in our heads, we started creating a spec for bingo columns on the REPL bit by bit (for the sake of brevity what you can see here is the final form of the spec):

then we discovered clojure.spec's coll-of function which allowed us to simplify the spec a bit:

Generating bingo cards

Once we thought we had it, we tried to use the column spec to generate columns with clojure.test.check's generate function, but we got the following error:
ExceptionInfo Couldn't satisfy such-that predicate after 100 tries.
Of course we were trying to find a needle in a haystack...

After some trial and error on the REPL and reading the clojure.spec guide, we found the clojure.spec's int-in function and we finally managed to generate the bingo columns:

Then we used the spec code from the REPL to write the bingo cards spec:

in which we wrote the create-column-spec factory function that creates column specs to remove duplication between the specs of different columns.

With this in place the bingo cards could be created in a line of code:

Introducing property-based testing

Property-based tests make statements about the output of your code based on the input, and these statements are verified for many different possible inputs.
Jessica Kerr (Property-based testing: what is it?)
Having the specs it was very easy to change our bingo card test to use property-based testing instead of example-based testing just by using the generator created by clojure.spec:

See in the code that we're reusing the check-column function we wrote for the example-based tests.

This change was so easy because of:
  1. clojure.spec can produce a generator for clojure/test.check from a given spec
  2. .
  3. The initial example tests, as I mentioned before, were already checking the properties of a valid bingo card. This means that they weren't concerned with which specific numeric values were included on each column of the bingo card, but instead, they were just checking that the cards followed the rules for a bingo card to be valid.

Going fast with REPL driven development (RDD)

The next user story of the kata required us to check a bingo card to see if its player has won. We thought this might be easy to implement because we only needed to check that the numbers in the card where contained by the set of called numbers, so instead of doing TDD, we played a bit on the REPL did REPL-driven development (RDD):

Once we had the implementation working, we copied it from the REPL into its corresponding name space

and wrote the quicker but ephemeral REPL tests as "permanent" unit tests:

In this case RDD allowed us to go faster than TDD, because RDD's feedback cycle is much faster. Once the implementation is working on the REPL, you can choose which REPL tests you want to keep as unit tests.

Some times I use only RDD like in this case, other times I use a mix of TDD and RDD following this cycle:
  1. Write a failing test (using examples that a bit more complicated than the typical ones you use when doing only TDD).
  2. Explore and triangulate on the REPL until I made the test pass with some ugly but complete solution.
  3. Refactor the code.
Other times I just use TDD.

I think what I use depends a lot on how easy I feel the implementation might be.

Last details

The last user story required us to create a bingo caller that randomly calls out Bingo numbers. To develop this story, we used TDD and an atom to keep the not-yet-called numbers. These were our tests:

and this was the resulting code:

And it was done! See all the commits here if you want to follow the process (many intermediate steps happened on the REPL). You can find all the code on GitHub.

Summary

This experiment was a lot of fun because we got to play with both clojure.spec and clojure/test.check, and we learned a lot. While explaining what we did, I talked a bit about property-based testing and how I use REPL-driven development.

Thanks again to all my colleagues in Clojure Developers Barcelona!