Tuesday, 9 July 2013

Null Checks vs Null Objects

In my last post “The Application-Wide Diagnostic Log - the One Acceptable Global Object?” I had the following sort of code in my logging facade:-

  if (ApplicationLog != null)
    ApplicationLog.Error(format, args);

The reason for the check against null was because my first requirement of the facade was that it should be no burden at all when testing. I don’t want to have to create some sort of mock every time I write a test for some code that happens to have logging statements in it. I thought I had the whole issue covered too when I added the following comment:-

“I can easily create a Null Object implementation, although with the checks that’s fairly pointless unless you want to avoid them.”

Given the subject matter, which I knew would be fairly contentious, I was half expecting a tirade against my suggestion of even allowing one global variable. What I didn’t expect was something that would highlight a rather different hole in my understanding of the Null Object pattern. It arrived courtesy of fellow ACCU member Alan Stokes via this tweet:-

“Surprised you prefer the null checks to Null Object - they're pure waste in the common case. Also null is pure evil.”

The part of that tweet I like most is “you prefer the null checks”. It’s so polite in suggesting that my skill at programming is such that I consciously made the decision to implement my facade using a null check rather than using the Null Object pattern. Of course what we all really know is that I made the unconscious decision and just went with what I “instinctively” thought was right. Which, after even a moments thought, I realised it clearly wasn’t [1].

Null Objects as a Convenience

Alan is right in that null references make code more complicated than necessary, and it has always been this facet of the Null Object pattern that I’ve focused on. As a general rule I try and avoid passing null references around, especially to other services. If I need a constructor where one parameter is optional I’ll implement it as two separate constructors and apply a “not null” constraint to ensure the caller has to play ball:-

public class Service
  public Service()
    : this(new OtherServiceNullImpl())

  public Service(IOtherService otherService)
    . . .
    // Other initialisation code

Here I want to enforce the constraint on the public interface of Service that I’m not tolerating null references. I have the choice internally to either delegate both constructors calls to a single internal helper method that does allow nulls, or chain the constructors and pass a null implementation when not provided by the caller. The former choice means I’ll probably still have to add null checks internally in my class, but at least it’s inside my own four walls - my implementation choice does not affect my clients. If I’m happy to waste time creating a potentially unused object I can simplify things even further by just defaulting to the null implementation and then replacing it if the more specific ctor is called:-

  private IOtherService m_other = new  

The same idea applies when you are returning a reference. Imagine the following method on the service:-

public List<Customer> FetchCustomers()
    var customers = new List<Customer>();
    . . .
    return customers;

When there is nothing to return, you could return a null reference, or you could return an empty collection. I would hope most developers see that returning the null reference is a real pain as it hinders any sort of composition, such as with the LINQ extensions. But what about string values though? I went through that thought exercise a couple of years ago when I first started doing C# and filed it under “Null String Reference vs Empty String Value” [2].

Switching back then to the logging example in my previous post the case for use of a null object begins to look much clearer. If we allow the ApplicationLog property to be set to null we make it much harder for any consumers to use - it’s essentially the same as the return value case above. So, perhaps I should have promoted a different design which defaults to a null logger instead and disallows anyone to set the global logger to a null reference:-

public static class Log
  // The application-wide log.
  public ILogger ApplicationLog
      return s_applicationLog;
      s_applicationLog = value;

  // Write an error message
  public void Error(string format, params object[]
    s_applicationLog.Error(format, args);

  private static ApplicationLog s_applicationLog =
                                new NullLogger();

Now we no longer need any null checks within our code or in any callers that should decide to partially take matters into their own hands, but still want to write through the global logger.

Null Objects as an Optimisation

The hardest thing about being an experienced developer is that you have to remember to unlearn stuff as the world around you changes. Over time you slowly build up knowledge about what works and what doesn’t and you generally get a feel for what things cost. Of course we all know that nothing should be taken at face value and that we should measure everything before discounting it. But that’s really what experience is for isn’t it - to short-circuit the blind alleys? Except of course when it’s not…

I put forward TDD as my defence [3] for not spotting what is pretty clearly a case of not optimising for the common case. My first requirement for the logging facade was the following:-

“It must, by default have no impact on unit testing. I should not need to do anything to write a test for existing code or be able to add logging to existing code without it breaking the build ”

As such I wrote all the tests and designed the facade entirely based on the premise that I wanted the call to log a message to be a NOP (or as close to one as possible) by default. Now, putting aside for the moment the fact that modern CPUs handle branch predication and v-tables style dispatch more efficiently than they used to [4] and that JIT compilers are also pretty smart, this still misses the simple fact that the common case is that an instance of logger object will exist. That’s right, every service and component I write will have (some form of) logging enabled by default so the “else” branch will never get taken in a production scenario, doh!

Perhaps it’s also worth looking at the problem another way. For the cost of the difference between using an “if” statement and calling a null object to matter you would need to be willing to accept the cost of writing to the log sink (probably a file) which would undoubtedly be orders of magnitude greater. Even if you’re going to buffer the writes and do it on another thread it’s still going to cost you fair bit in comparison.

There. I’ve just done it again, haven’t I? After spouting on about the need to measure and not relying on old rules of thumb I’ve just surmised the cost is insignificant. And they say you can’t teach an old dog new tricks...

[1] This is what I enjoy most about blogging - it’s like a public design review. As long as there is someone out there (hopefully more knowledgeable than me) willing to stick their neck out too and question what I’ve written, I’ll get to see my mistakes and shortcomings and hopefully learn something.

[2] Interestingly that blog post of mine even has a section titled “Null References Are Evil”, but I see that once again I’m only focusing on the convenience aspect of null objects.

[3] Yeah, because I clearly can’t just admit that I was being lazy now, could I…

[4] I’ve worked on a couple of applications in the (very dim and distant) past where the cost of the virtual function call machinery was a significant overhead. Consequently the thought of a dynamically dispatched NOP call immediately “feels” excessively costly. Note to self: Google some modern benchmarks to see how insignificant it probably is today.

No comments:

Post a Comment