Wednesday 5 October 2011

Unit Testing File-System Dependent Code

Way back last year before being distracted by my impending ACCU conference talk I wrote a post about integration testing using NUnit. At the time I was still in two minds about whether or not it was worth the effort trying to mock the file-system API, especially given that you often have some extra layer of code between you and the file-system to actually read and parse the file, e.g. an XML reader. The alternatives seem to be either to focus on writing integration tests that actually do touch the file-system (which is reasonably quick and reliable as a dependencies go) or injecting more abstractions to create other seams through which to mock, thereby allowing you to write unit tests that get you close enough but not all the way down to the bottom.

Of course if you’re creating some sort of data persistence component, such as the aforementioned XML reader/writer, then you probably have a vested interest in mocking to the max as you will be directly accessing the file-system API and so there would be a good ROI in doing so. What I’m looking at here is the code that lightly touches the file-system API to provide higher-level behaviour around which files to read/write or recovers from known common error scenarios.

Impossible or hard to write test cases

The main incentive I have found for making the effort of mocking the file-system API is in writing tests for cases that are either impossible or very hard to write as automated integration/system tests. One classic example is running out of disk space - filling your disk drive in the SetUp() helper method is just not a realistic proposition. Using a very small RAM disk may be a more plausible alternative, but what you’re really likely to want to test is that you are catching an out-disk-space exception and then performing some contingent action. The same can apply to “access denied”[*] type errors and in both cases you should be able to get away with simulating the error by throwing when the code under test tries to open the file for reading/writing rather than when they actually try to pull/push bytes to the file (this assumes you’re doing simple synchronous I/O).

The reason this makes life easier is that the file Open() method can be a static method and that saves you having to mock the actual File object. It was whilst discussing this kind of mocking with my new team-mate Tim Barrass that we made some of the existing API mocks I had written much simpler. Whereas I had gone for the classic facade, interface and factory based implementation without thinking about it Tim pointed out that we could just implement the facade with a bunch of delegates that default to the real implementation[+]:-

namespace My.IO
{

public class File
{
  public static bool Exists(string path)
  {
    return Impl.Exists(path);
  }

  public static File Open(string path, . . .)
  {
    return Impl.Open(path, . . .);
  } 
 
  . . .

  public static class Impl
  {
    public Func<string, bool> Exists =
                             
System.IO.File.Exists; 
    public Func<string, . . ., File> Open = 
                                 System.IO.File.Open; 
    . . .
  }
}

}

The default implementation just forwards the call to the real API, whereas a test can replace the implementation as they wish, e.g. [#]

{
  File.Impl.Exists = (path) =>
  {
    return (path == @“C:\Temp\Test.txt”)
              ? true : false
  }
}

{
  File.Impl.Open = (path, . . .) =>
  {
    throw new UnauthorizedAccessException();
  }
}

This is a pretty low cost solution to build and may well suffice if you only have this kind of restricted usage. You can easily add a simple File mock by using memory based streams if you just need to simulate simple text or binary files, but after that you’re getting into more specialised API territory.

Replacing the file-system with strings

So what about the case where you are using a 3rd party component to provide simple serialization duties? I’m not talking about large complex data graphs here like a word document, but the simpler text formats like .ini files, .csv files or the <appSettings> section of .config files. If you’ve decided to leverage someone else’s work instead of writing your own parser it’s better if the parser exposes its behaviour through interfaces, but not all do. This is especially true in C++ where there are no formal interfaces as such and concrete types or templates are the norm.

However many text file parsers also support the ability to parse data stored as an in-memory string. You can exploit this in your testing by introducing a static facade (like that above) that encapsulates the code used to invoke the parser so that it can be redirected to load an “in-memory” file instead. This allows you to avoid the performance and dependency costs of touching the actual file-system whilst remaining in full control of the test.

namespace My.IO
{

public class XmlDocumentLoader
{
  public static XmlDocument Load(string path)
  {
    return Impl.Load(path);
  }

  public static XmlDocument LoadFromFile(string path)
  {
    // Load document via file-system.
    . . .
  }

  public static XmlDocument LoadFromBuffer(string 
                                            document)
  {
    // Load document from in-memory buffer.
    . . .
  }

  public static class Impl
  {
    public Func<string, XmlDocument> Load = 
                                       LoadFromFile;
  }
}

}

... and here is an example test:-

{
  XmlDocumentLoader.Impl.Load = (path) =>
  {
    string testDocument = “<config>. . .”;

    return XmlDocumentLoader.LoadFromBuffer
                                     (testDocument);
  }
}

Strictly speaking this fails Kevlin’s definition of a unit test because of the “boundary of trust” that we have crossed (into the parser), but we do control the test input and we should be able to rely on a parser giving consistent performance and results for a consistent small in-memory input and so we’re pretty close. Fundamentally it’s deterministic and isolated and most importantly of all it’s automatable.

With a more complex component like an XML parser it may even require a fair amount of work to mock even though you only use a tiny subset of its features; but that in itself may be a design smell.

Reinventing the wheel

The use of static facades is often frowned upon exactly because it isn’t possible to mock them with the non-industrial strength mocking frameworks. I’m a little surprised that the mocking frameworks focus all their attention on the mechanics of providing automated mocks of existing interfaces rather than providing some additional common facades that can be used to simplify mocking in those notoriously hard to reach areas, such as the file-system and process spawning. Perhaps these ideas just haven’t got any legs or I’m not looking hard enough. Or maybe we’re all just waiting for someone else to do it...

 

[*] Depending on the context an access denied could be an indication of a systemic failure that should cause alarm bells to go off or it could just be a transient error because you’re enumerating a file-system that is outside your control.

[+] As he was refactoring the lambda used for the initial implementation the notion of “methodgroups” suddenly came into focus for me. I’d seen the term and thought I roughly knew what it was about, but I still felt smug when I suggested it was a methodgroup a split second before Resharper suggested the same. Another +1 for Resharper, this time as a teaching aid.

[#] I doubt you’d ever really do a case-sensitive path name comparison, but hopefully you get the point.

No comments:

Post a Comment