Tuesday 1 November 2016

Tautologies in Tests

Imagine you’re writing a test for a simple function like abs(). You would probably write something like this:

[Test]
public void abs_returns_the_magnitude_of_the_value()
{
  Assert.That(Math.Abs(-1), Is.EqualTo(1));
}

It’s a simple function, we can calculate the expected output in our head and just plug the expectation (+1) directly in. But what if I said I’ve seen this kind of thing written:

[Test]
public void abs_returns_the_magnitude_of_the_value()
{
  Assert.That(Math.Abs(-1), Is.EqualTo(Math.Abs(-1)));
}

Of course in real life it’s not nearly as obvious as this, the data is lifted out into variables and there is more distance between the action and the way the expectation is derived:

[Test]
public void abs_returns_the_magnitude_of_the_value()
{
  const int negativeValue = –1;

  var expectedValue = Math.Abs(-1);

  Assert.That(Math.Abs(negativeValue),
              Is.EqualTo(expectedValue));
}

I still doubt anyone would actually write this and a simple function like abs() is not what’s usually under test when this crops up. A more realistic scenario would need much more distance between the test and production code, say, a component-level test:

[Test]
public void processed_message_contains_the_request_time()
{
  var requestTime = new DateTime(. . .);
  var input = BuildTestMessage(requestTime, . . . );
  var expectedTime = Processor.FormatTime(requestTime);

  var output = Processor.Process(input, . . .);

  Assert.That(output.RequestTime,
              Is.EqualTo(expectedTime));
}

What Does the Test Say?

If we mentally inline the derivation of the expected value what the test is saying is “When a message is processed the output contains a request time which is formatted by the processor”. This is essentially a tautology because the test is describing its behaviour in terms of the thing under test, it’s self-reinforcing [1].

Applying the advice from Antoine de Saint-Exupéry [2] about perfection being achieved when there is nothing left take away, lets implement FormatTime() like this:

public string FormatTime(DateTime value)
{
  return null;
}

The test will still pass. I know this change is perverse and nobody would ever make that ridiculous a mistake, but the point is that the test is not really doing its job. Also as we rely more heavily on refactoring tools we have to work harder to verify that we have not silently broken a different test that was also inadvertently relying on some aspect of the original behaviour.

Good Duplication

We duplicate work in the test for a reason, as a cross-check that we’ve got it right. This “duplication” is often just performed mentally, e.g. formatting a string, but for a more complex behaviour could be done in code using an alternate algorithm [3]. In fact one of the advantages of a practice like TDD is that you have to work it out beforehand and therefore are not tempted to paste the output from running the test on the basis that you’re sure it’s already correct.

If we had duplicated the work of deriving the output in the example above my little simplification would not have worked as the test would then have failed. Once again, adopting the TDD practice of starting with a failing test and transitioning to green by putting the right implementation in proves that the test will fail if the implementation changes unexpectedly.

This is a sign to watch out for – if you’re not changing the key part of the implementation to make the test pass you might have overly-coupled the test and production code.

What is the Test Really Saying?

The problem with not being the person that wrote the test in the first place is that it may not be telling you what you think it is. For example the tautology may be there because what I just described is not what the author intended the reader to deduce.

The test name only says that the output will contain the time value, the formatting of that value may well be the responsibility of another unit test somewhere else. This is a component level test after all and so I would need to drill into the tests further to see if that were true. A better approach might be to make the breaking change above and see what actually fails. Essentially I would be doing a manual form of Mutation Testing to verify the test coverage.

Alternatively the author may be trying to avoid creating a brittle test which would fail if the formatting was tweaked and so decided the best way to do that would be to reuse the internal code. The question is whether the format matters or not (is it a published API?), and with no other test to specifically answer that question one has to work on an assumption.

This is a noble cause (not writing brittle tests) but there is a balance between the test telling you about a fault in the code and it just being overly specific and annoying by failing on unimportant changes. Sometimes we just need to work a little harder to express the true specification in looser terms. For example maybe we only need to assert that a constituent part of the date is included, say, the year as that is usually the full 4 digits these days:

Assert.That(output.RequestTime,
            Is.StringContaining(“2010”));

If we are careful about the values we choose we can ensure that multiple formats can still conform to a looser contract. For example 10:33:44 on 22/11/2016 contains no individual fields that could naturally be formatted in a way where a simple substring search could give a false positive (e.g. the hour being mistaken for the day of the month).

A Balancing Act

Like everything in software engineering there is a trade-off. Whilst we’d probably prefer to be working with a watertight specification that leaves as little room for ambiguity as possible, we often have details that are pretty loose. When that happens we have to decide how we want to trigger a review of this lack of clarity in the future. If we make the test overly restrictive it runs the risk of becoming brittle, whilst making it overly vague could allow breaking changes to go unnoticed until too late.

Borrowing (apocryphally) from Einstein we should strive to make our tests as precise as possible, but not overly precise. In the process we need to ensure we do not accidentally reuse production code in the test such that we find ourselves defining the behaviour of it, with itself.

 

[1] I’ve looked at the self-reinforcing nature of unit tests before in “Man Cannot Live by Unit Testing Alone”.

[2] See “My Favourite Quotes” for some of the other programming related quotes I find particularly inspiring.

[3] Often one that is slower as correctness generally takes centre stage over performance.

No comments:

Post a Comment