The OldWood Thing: My [Unit] Testing Epiphany

In my recent post “TFD vs TLD – What’s the Definition of ‘Later’?” I alluded to a state of arrogance that was subsequently punctuated by a fall from grace and a sudden realisation that my testing strategy was in fact fatally flawed. The outcome was a new found appreciation for Unit Testing in its modern guise. Since then I’ve tried to guide other developers along the righteous path in the hope that they could be spared the pain of my embarrassment, but I’ve come to believe that it is a Rite of Passage because the benefits are largely downstream which makes the upfront costs feel excessive, especially when the clock is ticking.

Anyway, so how did I reach a level of complacence and arrogance that caused me to dismiss writing formal unit tests as a waste of time?

My professional career started out in the early 90’s at a small software house writing graphics based desktop applications. The great thing about graphical software is that you can see whether something works or not. Of course, off-by-one pixel and colour errors are difficult to spot, but by and large you can test the application manually and get good results. The culture at the company was strong on reading and we used to have copies of DDJ (Dr Dobbs Journal), MSJ (now MDSN Magazine) and WDJ (Windows Developer Journal) on rotation. I was also encouraged to read Steve McGuire's Writing Solid Code and Steve McConnell's Code Complete – two books which I found awe inspiring and I think others in the company did too. There was already a strong desire to produce quality software and many of the techniques in the books were adopted to ensure an even higher level of workmanship*.

This is a list of some of the core techniques that we adopted and that I’ve pretty much carried on to this day:-

Compile with the highest possible warning level and report warnings as errors. Then, where possible, fix the code and not the compiler to ensure it builds cleanly.
Use ASSERT to verify pre-conditions, post-conditions and invariants in functions to highlight bugs as soon as possible.
Don’t code defensively. Fail fast and loudly to give yourself the best chance of finding and fixing bugs.
Step through code changes in the debugger to verify that it does what you think it should, paying special attention to edge cases like loop termination conditions and branches to ensure your testing gives good code coverage.

The one item most relevant to this post is probably the last one - using the debugger to verify what you’ve written. I’ve followed this religiously as it made perfect sense to me. Why do a dry-run mentally or use print statements when you you can do a live run and inspect the state with the debugger? The most significant side-effect of this practice though is the need to be able to invoke relevant code in the debugger with ease so that the code/build/test cycle is as short as possible. This in turn often lead to the creation of test harnesses and diagnostic tools with a narrow focus that also really helps with support issues. So, if this was proving to be pretty successful, why change? Because the key thing missing is repeatability, and this is where I came unstuck...

Fast forward a few years and I was writing a very simple but fast XML parser and as a by-product I knocked up a noddy GUI which not only allowed us to edit the kind of simple XML data fragments we were dealing with, it also acted as a nice test harness because all the libraries' features could be exercised via the various GUI functions. One day I did a little refactoring, and after making some minor changes and running through some tests, I checked the edits into the source repository, safe in the knowledge that I’d improved the quality of the codebase a little. Some days later a colleague (who had been introduced to unit testing at a previous company and was trying to spread the word) announced to the team that his “unit tests were failing”. These tests were for a library that no one had touched in ages, and due to there being no Continuous Integration or automatic running of this test suite, it wasn’t as simple as picking on the last few check-ins. A quick spot of debugging later and the finger was duly pointed - at yours truly…

In my ‘supposedly’ low-risk spell of refactoring I had managed to replace a line something like this:-

return Decode(m_reader->ReadTag(blah, . . ., blah));

with one that no longer called the Decode() method which parsed and replace entity references like & and ' :-

return m_reader->ReadTag(blah, . . ., blah, blah);

This should of course have been caught in my testing, but it wasn’t; because I was foolishly under the impression that my refactoring couldn’t have had any impact on the handling of entity references. Secondly I also apply a policy of diff’ing every single change before checking it in to ensure that I don’t mistakenly check-in temporary/commented out code and to give me a final pass to sanity check that I’ve not missed something crucial. My expected change was at the end of the line, but somehow I’d missed another accidental change at the start. It was during the subsequent discussion with my colleague that the light-bulb went on and I realised that I was (a) far more fallible than I had come to believe and (b) that I could in fact make subsequent changes faster and more accurately because of the scaffolding that would already be in place. Although the failing unit tests were not for my XML parser (he was just using it to load test cases and one of them had an “&” in it) I fixed the code and quickly re-ran his test cases to verify my fix and allow him to continue his work.

Actually he had explained to me some months before about xUnit style unit testing and even showed me how to use CppUnit. My objection at the time was the volume of code you appeared to need to write just to test a simple function. Quite frankly I thought it was laughable that a Senior Software Engineer should need to resort to such verbose tactics to test their code – I felt I could test far more efficiently my way. Granted you don’t get some of the ease of use with C++ and CppUnit that you do with the Java and C# variants but the time spent physically writing the test ‘shell’ is probably marginal in comparison to that spent designing and writing both the code under test and the test itself.

I guess my mistake was being unaware of the true costs of maintenance – especially on large systems that have a regular turnover of staff/contractors. Yes, I may have written the code initially and will likely perform the first maintenance due to my familiarity with it, but longer term someone else will take over. It also greatly benefits the schedule if anyone can just dive into any code. Often there is little documentation so you have only the source to go on and unit tests provide a way for you to roll your specification into your codebase. The future maintenance programmer will then not only have some sort of spec to work from but will also have a purpose built regression test suite on tap to which he only has to write any additional tests or maybe refactor some existing ones. What they absolutely do not have to do is start from a blank sheet of paper. Now throw into the mix the fact that computers are a far better tool for performing mundane tasks like regression testing, and as long as you follow the advice about ensuring your tests run quickly, you can get excellent test coverage and feedback within a very short period of time. Automated building and running of test suites is de-rigour nowadays and wrapped up inside the concept of Continuous Integration. Roy Osherove (author of The Art of Unit Testing and the excellent ISerializable blog) covered Automated building and Continuous Integration just recently in his blog.

I’m sure some would argue that my example above wouldn’t have been found with a unit test, but would be classified as a component-level test; the definition of “the smallest testable part” depends to a degree on how much your design allows for fine grained testing. Either way the code did not require any expensive external resources, like a database or network connection, and so could have been subjected to a barrage of tests that would run in the blink of an eye, even on hardware of that era. I would then have known almost instantly of my mistake. Better yet, the suite of tests would also have provided my teammates the courage needed to allow them to perform similar surgery in the future.

*Unfortunately it seems that the magazine reviewers cared little about quality, or never tested the products significantly enough to notice poor workmanship. The evidence of this was the market leader at the time getting high marks for a product that was buggy and unreliable. Yes it had lots of features but you had better save your work regularly. This was during the ‘90s and whilst GST is no longer in business** the market leader continues to this day – so much for taking pride in ones work…

**What was originally Timeworks Publisher, and then PressWorks is now I believe called Greenstreet Publisher. I worked mostly on it’s sister product DesignWorks - a line-art drawing package - and the bundled utilities SnapShot and KeyPad.

The OldWood Thing

Saturday, 13 February 2010

My [Unit] Testing Epiphany

No comments:

Post a Comment

About Me

Popular Posts

Blog Archive

Labels

Links

Stats [Monthly]