Wednesday 18 May 2011

Out vs Ref For TryXxx Style Methods

None of the popular programming languages that I know of allow you to overload a method based on error semantics. A common pattern to workaround this is to provide two overloads – one named normally that throws an exception on failure and another no-throw version who’s name is prefixed with “Try” and returns a bool instead (with any additional return values handled by output parameters). A classic example is the parsing functions on the C'# DateTime type:-

DateTime Parse(string value);
bool     TryParse(string value, out DateTime result);

In principle it’s easy enough to move from the exception throwing form:-

{
  DateTime output = DateTime.Parse(input);
  . . .
}

…to the alternate non-throwing form when you decide you need the different error handling semantics:-

{
  DateTime output;

  if (DateTime.TryParse(input, output))
  {
    . . .
  }
}

But what about when you don’t care if the method succeeded or not? On a number of occasions I have used a TryXxx style method and have not cared about the boolean return code, I just want it to use my default value if it fails:-

{
  DateTime output = DateTime.Now; // default

  DateTime.TryParse(input, output);
  . . .
}

Unfortunately this won’t have the desired effect (it actually won’t compile as is, but hold on) because your default value gets clobbered. Consider the following method on my IConfiguration interface that attempts to retrieve a configuration setting, if it exists:-

bool TryGetSetting(string key, out string value);

If I use the ‘out’ keyword as part of the interface I am forced to provide a value for all code paths. Consequently the implementation will probably look like this:-

bool TryGetSetting(string key, out string value)
{
  // Attempt to retrieve the setting 
  if (. . .)
  { 
    value = . . .; 
    return true;
  }
  else
  {
    // Must initialise the output value on all paths
    value = null;
    return false;
  }
}

The only general default value you can provide for a reference is null. Yes, for a string (or any other class that defines it) you could use the ‘Empty’ value, but that still clobbers any input from the caller. And so you force the caller to acknowledge the failure and write the slightly more verbose:-

{
  string value;

  if (!config.TryGetSetting("setting", out value))
    value = "my default value";
  . . .
}

The alternative is to use ‘ref’ instead, which allows the caller to provide a default value and you no longer have to clobber it in your implementation:-

bool TryGetSetting(string key, ref string value)
{
  // Attempt to retrieve the value
  if (. . .)
  { 
    value = . . .; 
    return true;
  }
  else
  {
    // Leave caller’s value untouched
    return false;
  }
}

Finally, as a caller I can now just write this:-

{
  string value = "my default value";

  config.TryGetSetting("setting", ref value)
  . . .
}

So, I wonder why the C# designers picked ‘out’ over ‘ref’ in the first place? Perhaps they felt it was safer. But is it that much safer? If you use ‘out’ and don’t check the return code you’ll probably end up either accessing a null reference or continuing with the equivalent of 0 for a value type, i.e. whatever default<type>) returns. OK, this is far superior to the C++ world where an uninitialized variable could be anything[*]. If you use ‘ref’ then you let the caller choose the uninitialised value, which, if they are following best practice will result in the same effect because they won’t be reusing existing variables for other purposes.

There is of course a semantic difference between ‘out’ and ‘ref’, but I think what I’m suggesting blurs the line between them. If you look at ‘out’ and ‘ref’ through COM’s eyes and put a network in the middle then it’s all about whether you need to marshal the value to the callee and this is not the behaviour we want. The callee doesn’t need the value and we certainly don’t want to allow it to be able to modify it, so ‘ref’ is out (if you’ll pardon the pun). What we want ‘out’ to mean in this scenario is “don’t clobber the existing variable if no output value was provided, and don’t bother marshalling the value into the callee either”.

It’s great that C# points out where you have attempted to use an uninitialised variable, but sadly I think it’s that same mechanism that also gets in the way sometimes.

[*] I once got bitten by an uninitialized ‘bool’ during my C++ days. Somewhat ironically it was exactly because we were using all the debug settings during development and they very cleverly initialise stack variables and heap memory to a non-zero value that it went unnoticed. This is because the “uninitialized” value was always reinterpreted for a ‘bool’ as ‘true’. There is a reason why you always write a failing unit test first…

2 comments:

  1. "OK, this is far superior to the C++ world"
    hey - careful what you say ;-)

    I've been using the tryXXX convention in some C++ containers I've written recently. It works quite nicely (with "ref" (but never marshalled) semantics). Not yet had an uninitialised value issue with it - probably because in C++ we're used to having to deal with that ourselves.

    I'm particularly fond of tryXXX overall - but I prefer it to the arguably more std-c++y way of returning pair and having to use .first and .second.

    This all works better in a language that explicitly supports multiple return values. Python is a good example.

    setting, found = getSetting( ... )

    If you're not interested in the "found" flag, I believe a common convention is to use _

    setting, _ = getSetting( ... )

    but you can also do:

    setting = getSetting( ... )[0]

    This is all really the same as the std::pair result in C++ - but feels much more natural with decent language support.

    I'd almost go as far as to challenge your opening statement, "None of the popular programming languages that I know of allow you to overload a method based on error semantics".

    It's not quite "overloading" - but the effect is much the same.

    ReplyDelete
  2. I've never liked the std::pair return value thing either - it's just so unreadable to see .first and .second. I would have preferred that std::pair was "overloaded" as, say, std::multi_result so that you had something like .succeeded and .output.

    But then I've never been a big fan of tuples in the first place. It's probably my background (and lack of exposure to languages like Python) but having a multi-valued thing with member names like Item1, Item2 etc seems wrong when a struct allows you to explicitly name the roles those members play.

    I'm sure once I use a lanuage that has proper support I'll change my mind :-).

    ReplyDelete