Monday, April 9, 2018

test-doubles: A small spying and stubbing library for Clojure and ClojureScript

As you may know from a previous post I’m working for GreenPowerMonitor as part of a team that is developing a challenging SPA to monitor and manage renewable energy portfolios using ClojureScript.

To test some parts of our code that were using effectful code we had to use test doubles. We tried some existing ClojureScript libraries but we found that having different macros for different types of test doubles made tests that needed both spies and stubs become very nested. Also, being used to Gerard Meszaros’ vocabulary for tests doubles, we found a bit confusing the naming used in some of them for different types of tests doubles. So, in the end, we decided to implement our own stubs and spies library.

First, we decided to manually create our own spies and stubs during some time so that we could identify the different ways in which we were going to use them. After a while, my colleague André Stylianos Ramos and I wrote our own small DSL to create stubs and spies in order to remove all that duplication and boiler plate. We’ve been using it to write unit tests in our ClojureScript project for nearly a year with great success. Lately, we have been making it work for Clojure as well.

I’m really glad to announce that last week we open-sourced our library. It’s called test-doubles and it’s a small (around 80 lines) spying and stubbing library for Clojure and ClojureScript.

This is an example in ClojureScript in which we are using two stubs (one with the :maps option and another with the :returns option) and a spy:

I could show you more examples here of how test-doubles is used, but we’ve already included a lot of explained examples in its documentation. You can find tests-doubles code on this GitHub repository and also get its last version from Clojars.

Please do have a look and try the library. We hope it might be as useful to you as it’s been for us.

Friday, March 30, 2018

Kata: A small kata to explore and play with property-based testing

1. Introduction.

I've been reading Fred Hebert's wonderful PropEr Testing online book about property-based testing. So to play with it a bit, I did a small exercise. This is its description:

1. 1. The kata.

We'll implement a function that can tell if two sequences are equal regardless of the order of their elements. The elements can be of any type.

We'll use property-based testing (PBT). Use the PBT library of your language (bring it already installed).

Follow these constraints:

  1. You can't use or compute frequencies of elements.
  2. Work test first: write a test, then write the code to make that test pass.
  3. If you get stuck, you can use example-based tests to drive the implementation on. However, at the end of the exercise, only property-based tests can remain.

Use mutation testing to check if you tests are good enough (we'll do it manually injecting failures in the implementation code (by commenting or changing parts of it) and checking if the test are able to detect the failure to avoid using more libraries).

2. Driving a solution using both example-based and property-based tests.

I used Clojure and its test.check library (an implementation of QuickCheck) to do the exercise. I also used my favorite Clojure's test framework: Brian Marick's Midje which has a macro, forall, which makes it very easy to integrate property-based tests with Midje.

So I started to drive a solution using an example-based test (thanks to Clojure's dynamic nature, I could use vectors of integers to write the tests. ):

which I made pass using the following implementation:

Then I wrote a property-based test that failed:

This is how the failure looked in Midje (test.check returns more output when a property fails, but Midje extracts and shows only the information it considers more useful):

the most useful piece of information for us in this failure message is the quick-check shrunken failing values. When a property-based testing library finds a counter-example for a property, it applies a shrinking algorithm which tries to reduce it to find a minimal counter-example that produces the same test failure. In this case, the [1 0] vector is the minimal counter-example found by the shrinking algorithm that makes this test fails.

Next I made the property-based test pass by refining the implementation a bit:

I didn't know which property to write next, so I wrote a failing example-based test involving duplicate elements instead:

and refined the implementation to make it pass:

With this, the implementation was done (I chose a function that was easy to implement, so I could focus on thinking about properties).

3. Getting rid of example-based tests.

Then the next step was finding properties that could make the example-based tests redundant. I started by trying to remove the first example-based test. Since I didn't know test.check's generators and combinators library, I started exploring it on the REPL with the help of its API documentation and cheat sheet.

My sessions on the REPL to build generators bit by bit were a process of shallowly reading bits of documentation followed by trial and error. This tinkering sometimes lead to quick successes and most of the times to failures which lead to more deep and careful reading of the documentation, and more trial and error. In the end I managed to build the generators I wanted. The sample function was very useful during all the process to check what each part of the generator would generate.

For the sake of brevity I will show only summarized versions of my REPL sessions where everything seems easy and linear...

3. 1. First attempt: a partial success.

First, I wanted to create a generator that generated two different vectors of integers so that I could replace the example-based tests that were checking two different vectors. I used the list-distinct combinator to create it and the sample function to be able to see what the generator would generate:

I used this generator to write a new property which made it possible to remove the first example-based test but not the second one:

In principle, we might think that the new property should have been enough to also allow removing the last example-based test involving duplicate elements. A quick manual mutation test, after removing that example-based test, showed that it wasn't enough: I commented the line (= (count s1) (count s2)) in the implementation and the property-based tests weren't able to detect the regression.

This was due to the low probability of generating a pair of random vectors that were different because of having duplicate elements, which was what the commented line, (= (count s1) (count s2)), was in the implementation for. If we'd run the tests more times, we'd have finally won the lottery of generating a counter-example that would detect the regression. So we had to improve the generator in order to increase the probabilities, or, even better, make sure it'd be able to detect the regression.

In practice, we'd combine example-based and property-based tests. However, my goal was learning more about property-based testing, so I went on and tried to improve the generators (that's why this exercise has the constraint of using only property-based tests).

3. 2. Second attempt: success!

So, I worked a bit more on the REPL to create a generator that would always generate vectors with duplicate elements. For that I used test.check's let macro, the tuple, such-that and not-empty combinators, and Clojure's core library repeat function to build it.

The following snippet shows a summary of the work I did on the REPL to create the generator using again the sample function at each tiny step to see what inputs the growing generator would generate:

Next I used this new generator to write properties that this time did detect the regression mentioned above. Notice how there are separate properties for sequences with and without duplicates:

After tinkering a bit more with some other generators like return and vector-distinct, I managed to remove a redundant property-based test getting to this final version:

4. Conclusion.

All in all, this exercise was very useful to think about properties and to explore test.check's generators and combinators. Using the REPL made this exploration very interactive and a lot of fun. You can find the code of this exercise on this GitHub repository.

A couple of days later I proposed to solve this exercise at the last Clojure Developers Barcelona meetup. I received very positive feedback, so I'll probably propose it for a Barcelona Software Craftsmanship meetup event soon.

Wednesday, March 28, 2018

Examples lists in TDD

1. Introduction.

During coding dojos and some mentoring sessions I've noticed that most people just start test-driving code without having thought a bit about the problem first. Unfortunately, writing a list of examples before starting to do TDD is a practice that is most of the times neglected.

Writing a list of examples is very useful because having to find a list of concrete examples forces you to think about the problem at hand. In order to write each concrete example in the list, you need to understand what you are trying to do and how you will know when it is working. This exploration of the problem space improves your knowledge of the domain, which will later be very useful while doing TDD to design a solution. However, just generating a list of examples is not enough.

2. Orthogonal examples.

A frequent problem I've seen in beginners' lists is that many of the examples are redundant because they would drive the same piece of behavior. When two examples drive the same behavior, we say that they overlap with each other, they are overlapping examples.

To explore the problem space effectively, we need to find examples that drive different pieces of behavior, i.e. that do not overlap. From now on, I will refer to those non-overlapping examples as orthogonal examples[1].

Keeping this idea of orthogonal examples in mind while exploring a problem space, will help us prune examples that don't add value, and keep just the ones that will force us to drive new behavior.

How can we get those orthogonal examples?
  1. Start by writing all the examples that come to your mind.
  2. As you gather more examples ask yourself which behavior they would drive. Will they drive one clear behavior or will they drive several behaviors?
  3. Try to group them by the piece of behavior they'd drive and see which ones overlap so you can prune them.
  4. Identify also which behaviors of the problem are not addressed by any example yet. This will help you find a list of orthogonal examples.
With time and experience you'll start seeing these behavior partitions in your mind and spend less time to find orthogonal examples.

3. A concrete application.

Next, we'll explore a concrete application using a subset of the Mars Rover kata:
  • The rover is located on a grid at some point with coordinates (x,y) and facing a direction encoded with a character.
  • The meaning of each direction character is:
    • "N" -> North
    • "S" -> South
    • "E" -> East
    • "W" -> West
  • The rover receives a sequence of commands (a string of characters) which are codified in the following way:
    • When it receives an "f", it moves forward one position in the direction it is facing.
    • When it receives a "b", it moves backward one position in the direction it is facing.
    • When it receives a "l", it turns 90º left changing its direction.
    • When it receives a "r", it turns 90º right changing its direction.


Let's start writing a list of examples that explores this problem. But how?

Since the rover is receiving a sequence of commands, we can apply a useful heuristic for sequences to get us started: J. B. Rainsberger's "0, 1, many, oops" heuristic [2].

In this case, it means generating examples for: no commands, one command, several commands and unknown commands.

I will use the following notation for examples:
(x, y, d), commands_sequence -> (x’, y’, d’)
Meaning that, given the rover is in an initial location with x and y coordinates, and facing a direction d, after receiving a given sequence of commands (which is represented by a string), the rover will be located at x’ and y’ coordinates and facing the d’ direction.

Then our first example corresponding to no commands might be any of:
(0, 0, "N"), "" -> (0, 0, "N")
(1, 4, "S"), "" -> (1, 4, "S")
(2, 5, "E"), "" -> (2, 5, "E")
(3, 2, "E"), "" -> (3, 2, "E")
Notice that in these examples, we don't care about the specific positions or directions of the rover. The only important thing here is that the position and direction of the rover does not change. They will all drive the same behavior so we might express this fact using a more generic example:
(any_x, any_y, any_direction), "" -> (any_x, any_y, any_direction)
where we have used any_x, any_y and any_direction to make explicit that the specific values that any_x, any_y and any_direction take are not important for these tests. What is important for the tests, is that the values of x, y and direction remain the same after applying the sequence of commands [3].

Next, we focus on receiving one command.

In this case there are a lot of possible examples, but we are only interested on those that are orthogonal. Following our recommendations to get orthogonal examples, you can get to the following set of 16 examples that can be used to drive all the one command behavior (we're using any_x, any_y where we can):
(4, any_y, "E"), "b" -> (3, any_y, "E")
(any_x, any_y, "S"), "l" -> (any_x, any_y, "E")
(any_x, 6, "N"), "b" -> (any_x, 5, "N")
(any_x, 3, "N"), "f" -> (any_x, 4, "N")
(5, any_y, "W"), "f" -> (4, any_y, "W")
(2, any_y, "W"), "b" -> (3, any_y, "W")
(any_x, any_y, "E"), "l" -> (any_x, any_y, "N")
(any_x, any_y, "W"), "r" -> (any_x, any_y, "N")
(any_x, any_y, "N"), "l" -> (any_x, any_y, "W")
(1, any_y, "E"), "f" -> (2, any_y, "E")
(any_x, 8, "S"), "f" -> (any_x, 7, "S")
(any_x, any_y, "E"), "r" -> (any_x, any_y, "S")
(any_x, 3, "S"), "b" -> (any_x, 4, "S")
(any_x, any_y, "W"), "l" -> (any_x, any_y, "S")
(any_x, any_y, "N"), "r" -> (any_x, any_y, "E")
(any_x, any_y, "S"), "r" -> (any_x, any_y, "W")
There're already important properties about the problem that we can learn from these examples:
  1. The position of the rover is irrelevant for rotations.
  2. The direction the rover is facing is relevant for every command. It determines how each command will be applied.
Sometimes it can also be useful to think in different ways of grouping the examples to see how they may relate to each other.

For instance, we might group the examples above by the direction the rover faces initially:
Facing East
(1, any_y, "E"), "f" -> (2, any_y, "E")
(4, any_y, "E"), "b" -> (3, any_y, "E")
(any_x, any_y, "E"), "l" -> (any_x, any_y, "N")
(any_x, any_y, "E"), "r" -> (any_x, any_y, "S")
Facing West
(5, any_y, "W"), "f" -> (4, any_y, "W") ...
or, by the command the rover receives:
Move forward
(1, any_y, "E"), "f" -> (2, any_y, "E")
(5, any_y, "W"), "f" -> (4, any_y, "W")
(any_x, 3, "N"), "f" -> (any_x, 4, "N")
(any_x, 8, "S"), "f" -> (any_x, 7, "S")
Move backward
(4, any_y, "E"), "b" -> (3, any_y, "E")
(2, any_y, "W"), "b" -> (3, any_y, "W")
Trying to classify the examples helps us explore different ways in which we can use them to make the code grow by discovering what Mateu Adsuara calls dimensions of complexity of the problem[4]. Each dimension of complexity can be driven using a different set of orthogonal examples, so this knowledge can be useful to choose the next example when doing TDD.

Which of the two groupings shown above might be more useful to drive the problem?

In this case, I think that the by the command the rover receives grouping is more useful, because each group will help us drive a whole behavior (the command). If we were to use the by the direction the rover faces initially grouping, we'd end up with partially implemented behaviors (commands) after using each group of examples.

Once we have the one command examples, let's continue using the "0, 1, many, oops" heuristic and find examples for the several commands category.

We can think of many different examples:
(7, 4, "E"), "rb" -> (7, 5, "S") (7, 4, "E"), "fr" -> (8, 4, "S") (7, 4, "E"), "ffl" -> (9, 4, "N")
The thing is that any of them might be thought as a composition of several commands:
(7, 4, "E"), "r" -> (7, 4, "S"), "b" -> (7, 5, "S")
Then the only new behavior these examples would drive is composing commands.

So It turns out that there's only one orthogonal example in this category. We might choose any of them, like the following one for instance:
(7, 4, "E"), "frrbbl" -> (10, 4, "S")
This doesn't mean that when we're later doing TDD, we have to use only this example to drive the behavior. We can use more overlapping examples if we're unsure on how to implement it and we need to use triangulation[5].

Finally, we can consider the "oops" category which for us is unknown commands. In this case, we need to find out how we'll handle them and this might involve some conversations.

Let's say that we find out that we should ignore unknown commands, then this might be an example:
(any_x, any_y, any_direction), "*" -> (any_x, any_y, any_direction)
Before finishing, I’d like to remark that it’s important to keep this technique as lightweight and informal as possible, writing the examples on a piece of paper or on a whiteboard, and never, ever, write them directly as tests (which I’ve also seen many times).

There are two important reasons for this:
  1. Avoiding implementation details to leak into a process meant for thinking about the problem space.
  2. Avoiding getting attached to the implementation of tests, which can create some inertia and push you to take implementation decisions without having explored the problem well.

4. Conclusion.

Writing a list of examples before starting doing TDD is an often overlooked technique that can be very useful to reflect about a given problem. We also talked about how thinking in finding orthogonal examples can make your list of examples much more effective and saw some useful heuristics that might help you find them.

Then we worked on a small example in tiny steps, compared different alternatives just to try to illustrate and make the technique more explicit and applied one of the heuristics.

With practice, this technique becomes more and more a mental process. You'll start doing it in your mind and find orthogonal examples faster. At the same time, you’ll also start losing awareness of the process[6].

Nonetheless, writing a list of examples or other similar lightweight exploration techniques can still be very helpful for more complicated cases. This technique can also be very useful to think in a problem when you’re working on it with someone else (pairing, mob programming, etc.), because it enhances communication.

5. Aknowledgements.

Many thanks to Alfredo Casado, Álvaro Garcia, Abel Cuenca, Jaime Perera, Ángel Rojo, Antonio de la Torre, Fran Reyes and Manuel Tordesillas for giving me great feedback to improve this post and for all the interesting conversations.

6. References.

[1] This concept of orthogonal examples is directly related to Mateu Adsuara's dimensions of complexity idea because each dimension of complexity can be driven using a different set of orthogonal examples. For a definition of dimensions of complexity, see footnote [4] .
[2] Another very useful heuristic is described in James Grenning's TDD Guided by ZOMBIES post.
[3] This is somehow related to Brian Marick’s metaconstants which can be very useful to write tests in dynamic languages. They’re also hints about properties that might be helpful in property-based testing.
[4] Dimension of Complexity is a term used by Mateu Adsuara in a talk at Socrates Canarias 2016 to name an orthogonal functionality. In that talk he used dimensions of complexity to classify the examples in his tests list in different groups and help him choose the next test when doing TDD.
He talked about it in these three posts:
Other names for the same concept that I've heard are axes of change, directions of change or vectors of change.
[5] Even though triangulation is probably the most popular, there are two other strategies for implementing new functionality in TDD: obvious implementation and fake it. Kent Beck in his Test-driven Development: By Example book describes the three techniques and says that he prefers to use obvious implementation or fake it most of the time, and only use triangulation as a last resort when design gets complicated.
[6] This loss of awareness of the process is the price of expertise according to the Dreyfus model of skill acquisition.