Thursday, 18 July 2013

Redirecting Output to Dated Log Files

The other day the requirement came up to run a scheduled task to copy some data from a production to development machine to archive some data for testing later. Naturally ROBOCOPY was going to do the heavy lifting, but if there’s one thing I’ve learnt about scheduled tasks it’s that you’ll always want to capture the output to a log file, if it’s not done already by the tool, and to annotate that output with extra details like the start/end date & time.

Redirecting Output

Taking a leaf out of my own book, or more specifically my recent “In The Toolbox” article on wrapper scripts in the current C Vu, I’d probably wrap the command in a batch file to give myself a little wriggle room and then capture the entire output by relying on the command interpreter to do the redirection. For example the task definition might look like this:-

C:\Program Files\My Service\ProdToDevCopy.cmd >> D:\My Service\Log\ProdToDevCopy.log 2>&1

As a general rule I always append to log files rather than clobber them, so that you maintain a history of failed runs when they occur. The “2>&1” bit on the end [1] redirects the stderr stream to stdout so that it goes in the same log file. I find two log files a bit of a pain, especially when you’re trying to track down why it failed. Ands I often forget to add this, which is another reason why I’m writing this down now.

Dated Log Files

When a task runs very frequently using a single log file gets unwieldy, not least because it makes archiving old log files impossible. Consequently logging to a file based on the current date is usually a sufficient way to split them. Another reason to implement the scheduled task command as a batch file is that what you’re really invoking is a copy of the command interpreter (as CMD.EXE /C)and so you have access to other features too, such as the lesser known “substring” expansion feature for variables. If you type C:\> HELP SET you’ll find out all the details.

In this instance we want to take the output of the %DATE% variable, split it into its separate fields and then recombine them again to make an ISO style date (YYYY-MM-DD) [2]. On my machine the output for %DATE% is in the common DD/MM/YYYY format:-

C:\> echo %DATE%

Before going any further it should be apparent that this is not going to work on an arbitrary machine because different locales mean that the source date format will vary. However in the context of an enterprise application server where a lot of things can often be taken for granted this can work just fine.

The basic format for a variable expansion is this:-


The offset and length can both be negative meaning that you can index from the end of the string and take all but a few characters. And you can omit them too which implies an index of 0 and the entire string length.

Just using the indexing gives you at least two usefully different ways to represent the window:-

YYYY = %DATE:~6,4% or %DATE:~-4%
MM   = %DATE:~3,2% or %DATE:~-7,2%
DD   = %DATE:~,2%  or %DATE:~-10,2%

And so applying the shortest form of each we can concoct a minimal pattern for our date:-

C:\> echo %DATE:~-4%-%DATE:~3,2%-%DATE:~,2%

Plugging this back into the (shortened) original script command line gives us this:-

ProdToDevCopy.cmd >> D:\Log\ProdToDevCopy.%DATE:~-4%-%DATE:~3,2%-%DATE:~,2%.log 2>&1


[1] And it has to go on the end too. If you put it before the redirection it won’t work. For an extensive run-down of redirection options see this page.

[2] By using an ISO style date in log file names they become sorted by date by default which can make life easier in some scenarios.

Tuesday, 16 July 2013

Overdoing the References

Thinking and reading about handling nulls again over the last few days has had an interesting side-effect of bringing some clarity to a phase of my C++ learning that I always thought was somewhat weird; I call it my “everything must be a reference” phase. Basically I stopped using pointers in my C++ code and started using references instead, everywhere. Naturally this lead to some very odd looking code that you can still see in my legacy C++ libraries to this day.

Reference Free Beginnings

Like many programmers I started out writing in C, where the only choice you had was the pointer. A consequence of its duality (null or value) meant that just by looking at the interface to a function you couldn’t really tell if the parameters were optional or not, if they weren’t of the primitive (built-in) types.

struct Person
    const char * m_name;

void printPerson(const Person* person)
  . . .

Of course you had a fairly good idea that in most cases the value wasn’t optional and so books like Steve Maguire’s “Writing Solid Code” introduced me to the notion of Design by Contract and using ASSERTs in your interfaces to spot and highlight early when the contract had been violated [1]. The knock-on effect of that, at least for those like myself who might have taken some of the advice a little too literally, was that our codebase became utterly littered with ASSERTs.

void copyPerson(Person* lhs, const Person* rhs)
  ASSERT(lhs != NULL);
  ASSERT(rhs != NULL);
  . . .

In order to catch the violation as early as possible you needed to apply these checks as often as possible. At least, that was the logic I ended up applying. And to the extent that I started using ASSERTs where I probably should have been validating inputs, but that’s another post for another day...

From Pointers to References

Moving from mostly doing C to mostly doing C++ meant that I entered a brave new world with the notion of “references”. As I understood it back then references were basically the same as pointers (in implementation) but they were mostly an enabler for things like overloading the assignment operator and allowing you to use “.” instead of “->” [2]. As someone coming from C that sure sounded like a bit of a kludge to me.

Although I embraced references to a certain degree, such as where they were expected, like in a copy constructor and assignment operator, I really didn’t get them at first. Some things just didn’t add up. For one thing why was “this” a pointer and not a reference? Then there was this “folklore” around the fact that references can’t be NULL. Yes they can, I can just do this:-

Person& person = (*(Person*)NULL);

OK, so that’s a bit contrived, but I was still hitting bugs where references were null, just as much as if they were declared as pointers, so clearly they weren’t any better from the perspective of avoiding dereferencing NULL pointers.

In search of an answer about the whole “this as a pointer” and other questions I picked up a copy of The Design and Evolution of C++. This is a great book which any self-respecting C++ programmer should read, not only because it gives you some insight and context into the history of various language features, but it also shows you the ever growing set of constraints that are applied to every subsequent change in the language.

Anyway, I got some answers from this and more importantly I began to realise that the “nullness” of a reference was less about the physical side and more about the semantics - using references meant that logically the parameter could never be null. A side effect of that is that no ASSERT is required in the function prologue because the interface already declares its intent. Doh!

References Everywhere

That got me thinking. If I use references everywhere and turn any pointer into a reference as soon as possible then I can avoid nearly all this unnecessary pointer checking repetition. Being the ridiculously literal person I am at times I even took that to things like the return from the new operator by introducing simple factory methods like this:-

Row& Table::CreateRow()
  return *(new Row(*this));

If you’re thinking to yourself that’s an exception unsafe disaster waiting to happen, let me show you the call site:-

Row& row = m_clocks.CreateRow(); 


  row[Clocks::COMPUTER] = m_machine; 
  . . . 

catch (…) 


Only kidding, it’s still a maintenance headache. At the time I wrote this Boost was by no means as ubiquitous as it is today and we were even a few years off C++ 03 so shared_ptr<> was a still a bit “new fangled” and experimental [3]. And that assumes Visual C++ would get its act together and we would be willing to entertain the idea of living on the bleeding edge. Nope, neither as going to happen.


Although I could see the utility in shared_ptr<> and I was just beginning to get my head around the more general notion of RAII and writing exception-safe code, it still felt a bit like a backwards step. The desire to reduce the complexity of client code by leaning on RAII more meant the switch back to using what is fundamentally a shared “pointer”, not a shared “reference”. I know I’m not the only one who found this notion confusing because I’ve come across some weird home grown smart pointer classes in my time on other codebases.

So, now we have three choices in our API: pointers, references and the new one - smart pointer by reference. Due to some fairly dodgy custom smart pointer implementations that had a very noticeable effect on performance [4] I started seeing functions that took a smart pointer by reference instead of by value:-

void DoStuff(const shared_ptr<Thing>& thingy);

But isn’t this really all about confusing ownership semantics, not optionality? If you take ownership out of the equation for a moment the callee can expect a bald pointer or a reference. However if the callee wants to take shared ownership of an object and it won’t take NULL for an answer how should it declare its intent?

I’m still chewing over what the man himself (Bjarne Stroustrup) said at this year’s ACCU conference, but if I’ve understood him then I should only ever be passing values around anyway - any ownership and performance problems from copying should just be an implementation detail of the type. And therefore I shouldn’t be passing around shared_ptr<T> style values, but the T’s themselves (presumably using the Handle/Body idiom) [5].


I’m glad it’s only taken me just over a decade to finally work out what on earth I was trying to do all those years ago. I’m quite sure somewhere there is some code in production that a poor maintenance programmer is scratching their head over right now whilst the WTF meter is going through the roof. If it is you, then I can only apologise.


[1] In this era and the applications I worked on you couldn’t afford to keep this code in your release build so it only helped you during development, but it did make a big difference in tracking down bugs at the time.

[2] Much as I agree with the likes of Michael Feathers and Stan Lippman about knowing what’s going on under the hood helps you make more informed choices about using the language efficiently, it could also be a double-edged sword if you’re perhaps not aware of the concepts being modelled.

[3] I suspect if I had moved in the right circles, or had been a member of the ACCU back then I might have been able to spare my blushes a little.

[4] On Windows using a mutex is a very heavy-handed way of making a smart pointer thread safe. Critical sections were often employed too despite the primitive InterlockedXxx functions being readily available. Mind you I’ve no idea what the state-of-the-art is in thread-safe smart pointer implementations these days. Perhaps they’re consigned to history like the Reference Counted String class?

[5] At some point I’ll go back and watch the ACCU keynote video, not least to snigger at his poke at Java, but also to try and take in what he said again. At the time I didn’t think he said that much, and maybe he didn’t, it’s just that now maybe I’ve started to actually listen...

Tuesday, 9 July 2013

Null Checks vs Null Objects

In my last post “The Application-Wide Diagnostic Log - the One Acceptable Global Object?” I had the following sort of code in my logging facade:-

  if (ApplicationLog != null)
    ApplicationLog.Error(format, args);

The reason for the check against null was because my first requirement of the facade was that it should be no burden at all when testing. I don’t want to have to create some sort of mock every time I write a test for some code that happens to have logging statements in it. I thought I had the whole issue covered too when I added the following comment:-

“I can easily create a Null Object implementation, although with the checks that’s fairly pointless unless you want to avoid them.”

Given the subject matter, which I knew would be fairly contentious, I was half expecting a tirade against my suggestion of even allowing one global variable. What I didn’t expect was something that would highlight a rather different hole in my understanding of the Null Object pattern. It arrived courtesy of fellow ACCU member Alan Stokes via this tweet:-

“Surprised you prefer the null checks to Null Object - they're pure waste in the common case. Also null is pure evil.”

The part of that tweet I like most is “you prefer the null checks”. It’s so polite in suggesting that my skill at programming is such that I consciously made the decision to implement my facade using a null check rather than using the Null Object pattern. Of course what we all really know is that I made the unconscious decision and just went with what I “instinctively” thought was right. Which, after even a moments thought, I realised it clearly wasn’t [1].

Null Objects as a Convenience

Alan is right in that null references make code more complicated than necessary, and it has always been this facet of the Null Object pattern that I’ve focused on. As a general rule I try and avoid passing null references around, especially to other services. If I need a constructor where one parameter is optional I’ll implement it as two separate constructors and apply a “not null” constraint to ensure the caller has to play ball:-

public class Service
  public Service()
    : this(new OtherServiceNullImpl())

  public Service(IOtherService otherService)
    . . .
    // Other initialisation code

Here I want to enforce the constraint on the public interface of Service that I’m not tolerating null references. I have the choice internally to either delegate both constructors calls to a single internal helper method that does allow nulls, or chain the constructors and pass a null implementation when not provided by the caller. The former choice means I’ll probably still have to add null checks internally in my class, but at least it’s inside my own four walls - my implementation choice does not affect my clients. If I’m happy to waste time creating a potentially unused object I can simplify things even further by just defaulting to the null implementation and then replacing it if the more specific ctor is called:-

  private IOtherService m_other = new  

The same idea applies when you are returning a reference. Imagine the following method on the service:-

public List<Customer> FetchCustomers()
    var customers = new List<Customer>();
    . . .
    return customers;

When there is nothing to return, you could return a null reference, or you could return an empty collection. I would hope most developers see that returning the null reference is a real pain as it hinders any sort of composition, such as with the LINQ extensions. But what about string values though? I went through that thought exercise a couple of years ago when I first started doing C# and filed it under “Null String Reference vs Empty String Value” [2].

Switching back then to the logging example in my previous post the case for use of a null object begins to look much clearer. If we allow the ApplicationLog property to be set to null we make it much harder for any consumers to use - it’s essentially the same as the return value case above. So, perhaps I should have promoted a different design which defaults to a null logger instead and disallows anyone to set the global logger to a null reference:-

public static class Log
  // The application-wide log.
  public ILogger ApplicationLog
      return s_applicationLog;
      s_applicationLog = value;

  // Write an error message
  public void Error(string format, params object[]
    s_applicationLog.Error(format, args);

  private static ApplicationLog s_applicationLog =
                                new NullLogger();

Now we no longer need any null checks within our code or in any callers that should decide to partially take matters into their own hands, but still want to write through the global logger.

Null Objects as an Optimisation

The hardest thing about being an experienced developer is that you have to remember to unlearn stuff as the world around you changes. Over time you slowly build up knowledge about what works and what doesn’t and you generally get a feel for what things cost. Of course we all know that nothing should be taken at face value and that we should measure everything before discounting it. But that’s really what experience is for isn’t it - to short-circuit the blind alleys? Except of course when it’s not…

I put forward TDD as my defence [3] for not spotting what is pretty clearly a case of not optimising for the common case. My first requirement for the logging facade was the following:-

“It must, by default have no impact on unit testing. I should not need to do anything to write a test for existing code or be able to add logging to existing code without it breaking the build ”

As such I wrote all the tests and designed the facade entirely based on the premise that I wanted the call to log a message to be a NOP (or as close to one as possible) by default. Now, putting aside for the moment the fact that modern CPUs handle branch predication and v-tables style dispatch more efficiently than they used to [4] and that JIT compilers are also pretty smart, this still misses the simple fact that the common case is that an instance of logger object will exist. That’s right, every service and component I write will have (some form of) logging enabled by default so the “else” branch will never get taken in a production scenario, doh!

Perhaps it’s also worth looking at the problem another way. For the cost of the difference between using an “if” statement and calling a null object to matter you would need to be willing to accept the cost of writing to the log sink (probably a file) which would undoubtedly be orders of magnitude greater. Even if you’re going to buffer the writes and do it on another thread it’s still going to cost you fair bit in comparison.

There. I’ve just done it again, haven’t I? After spouting on about the need to measure and not relying on old rules of thumb I’ve just surmised the cost is insignificant. And they say you can’t teach an old dog new tricks...

[1] This is what I enjoy most about blogging - it’s like a public design review. As long as there is someone out there (hopefully more knowledgeable than me) willing to stick their neck out too and question what I’ve written, I’ll get to see my mistakes and shortcomings and hopefully learn something.

[2] Interestingly that blog post of mine even has a section titled “Null References Are Evil”, but I see that once again I’m only focusing on the convenience aspect of null objects.

[3] Yeah, because I clearly can’t just admit that I was being lazy now, could I…

[4] I’ve worked on a couple of applications in the (very dim and distant) past where the cost of the virtual function call machinery was a significant overhead. Consequently the thought of a dynamically dispatched NOP call immediately “feels” excessively costly. Note to self: Google some modern benchmarks to see how insignificant it probably is today.

Saturday, 6 July 2013

The Application-Wide Diagnostic Log - the One Acceptable Global Object?

Like many of my peers I have a major dislike for the Singleton pattern. Also like them this has probably come about from attempting to isolate code to write (unit) tests. Singletons can make that really hard. However, there are only two [1] things wrong with them:-

  1. Access is through a global variable which means it’s hard to see all the dependencies a piece of functionality relies upon
  2. There can only be one instance, ever and its lifetime is largely out of your control

Of course over time developers have managed to work around the second deficiency by adding hacks to allow you to switch the implementation during testing and to manually control the lifetime (an absolute must for a COM object). Even so we’re still left with the global variable problem.

Parameterise From Above

The obvious answer to global variables is to just pass them into functions as parameters instead [2]. A common complaint is that there are often too many to pass around, like database connections and remote service proxies. This leads to code such as this below where you’re stitching together components inside the logic:-

public List<Bug> FindOpenBugsForUser(string login)
    var connection = Connection.Instance;
    var dal = new BugDataMapper

    var bugs = dal.SelectBugs(login);

    return bugs.Where((bug) => bug.IsOpen)).ToList();

The alternative is to not only pass something in, but also to make sure that the something is a higher-level abstraction. That is, define an interface for the logical service that this represents, and then pass that in from above:-

public interface IBugService
    List<Bug> SelectBugs(string login);

public List<Bug> FindOpenBugsForUser(string login,
                              IBugService bugService)
    var bugs = bugService.SelectBugs(login);

    return bugs.Where((bug) => bug.IsOpen)).ToList();

. . .

public BugService : IBugService
    private Connection _connection;
    . . .

By adhering to the Separation of Concerns and crating higher-level abstractions you can mitigate the most common complaint against passing everything around by parameter.

Diagnostic What?

Good. So hopefully that opening paragraph will convince you what I’m going to suggest a bit later doesn’t mean I’m totally cuckoo.

Although I have a dislike of logging per-se (a post for another day) I still do it [3]. I’ve even written about some of the things you’ll want to consider when you’re using logging as a diagnostic tool in “Diagnostic & Support User Interfaces”. And that is specifically the kind of logging I’m referring to here - log messages designed to aid in the support of an application or system. What I’m definitely not talking about is custom logs generated to provide some specific report on one particular feature within an application (e.g. a data import exception report). Those sort of reports are quite often a functional requirement and so probably require testing in their own right.

Nope, what I’m talking about here is the application-wide log. This is commonly used to persist diagnostic events (to a file, the event log, etc) to help the support and development teams understand what is happening inside the code. In a suite of applications, or a system comprised of many components there is a desire to keep things uniform; this applies both to the code used to generate those events and also to the (persistent) format in which its rendered.

Desirable Properties

Given that one of the main bugbears of a singleton logger is how it makes testing hard, let’s take a test-first approach to deciding what kind of behaviour we’d need from any design we end up choosing:-

  1. It must, by default have no impact on unit testing. I should not need to do anything to write a test for existing code or be able to add logging to existing code without it breaking the build
  2. Unless of course I actually want to sense the logging is happening, such as in a non-functional test like bootstrap code or where the Execute Around Method pattern has been used to hide an internal mechanism
  3. When I do come to mock the logger I should only need to implement a simple interface and have no other dependencies on any of the production logging code (or 3rd party logging library), except perhaps for simple types like a “severity” enumeration
  4. If I do need more control over how my code writes log messages, such as by enabling extra logging on-the-fly, I should still be able to write to the application-wide log through the appropriate mechanism

Basically what I’m suggesting is that we employ the Pareto Principle to diagnostic logging - 80 percent of our code accesses the log in the easiest way possible and minimises the noise in the code, whilst the other 20% does its own thing whilst fitting in with the conventions laid out earlier. Of course in real life it’s more like 99% / 1%. In fact in the 20 years I’ve been programming professionally I can count on one finger the number of times I’ve needed to do custom logging. It was in an ISAPI extension and we needed to offload the writes for performance reasons. Of course I’ve only worked on about a dozen or so different “products” in that time so it’s not exactly a sound statistical measurement - your mileage will undoubtedly vary.

Logger Interface + Static Facade

And so we finally come to the “pattern” I’ve seen used and have continued to use myself in the absence of anything better. Essentially it boils down to a static facade which provides the level of convenience for the vast majority of code, combined with an interface used to decouple the actual log sink(s) implementation from the caller.

The static facade looks like this:-

public static class Log
  // The application-side log.
  public ILogger ApplicationLog { get; set; }

  // Write an error message
  public void Error(string format, params object[]
    if (ApplicationLog != null)
      ApplicationLog.Error(format, args);
  . . .
  // Other helpers for logging warnings, etc. 

What should be fairly obvious is that our first requirement is met by virtue of there being no logger configured by default. The “!= null” checks in the forwarding methods also means that any attempt to log when nothing is configured is benign. That’s our core unit testing scenarios catered for.

The public access to the actual logger implementation (the ApplicationLog property) also helps us in the odd cases where we need to mock the logger. It also aids our fourth requirement where we might want more control as we can just pass the real logger in from above. Extra “!= null” checks would be needed in the client code and so one alternative would be to create an Adapter that forwards the logging calls to the static facade, thereby letting the facade handle the presence or absence of a logger implementation.

The logger interface is essentially just a non-static version of the same API:-

public class ILogger
  // Write an error message
  void Error(string format, params object[] args);
  . . .
  // Other helpers for logging warnings, etc. 

That pretty much sorts out our third requirement. I can easily create a Null Object implementation, although with the checks that’s fairly pointless unless you want to avoid them. I can also easily mock it by storing the messages in a List<string> which makes testing a doddle.

public class LoggerFake
  public List<string> Messages = new List<string>();

  // Write an error message
  public void Error(string format, params object[]
    var message = String.Format(format, args);

  . . .

Creating the Logger

So, if the default behaviour is not to create a logger, you might be wondering where it gets created. Why, in Main() of course. As I suggested in “Assume Failure by Default”, Main() is the place to bootstrap your process by invoking any factories to create and associate your various higher-level abstractions. You might not actually be in control of the real Main(), unless you put a little legwork in, such as with WCF hosting. In which case your Main() might be an Initialise() method that is invoked instead by whatever your container process is.


Well, there it is. By this point you’re probably pumped and shouting obscenities at the screen claiming I’m an imbecile. For one thing I’ve got a public property on the facade that surely just any old developer could change? That’s true, but one of the major problems with the Singleton pattern stems from its attempt to try and protect developers from themselves - engineering is all about trade-offs. In my view anything that makes testing easier is generally A Good Thing, even if it opens other back doors that could be abused. Just so long as they’re hard to abuse.

As I see it diagnostic logging just is a different beast. On the one hand it has the ability to cut across so much of the codebase, and yet its non-functional nature means that if it doesn’t get executed you’re no worse off. This is in stark contrast to something like a database connection or service proxy where it has a direct impact on the functional behaviour of the software.

Perhaps I am barking up the wrong tree? But so far this one concession to global variables has, I believe, been worth putting up with. Perhaps like the much derided GOTO they do still have a place, but only as long as you go in with your eyes wide open.


[1] I’m assuming you know that these also happen to be the two defining characteristics of the Singleton pattern.

[2] The PFA Papers by Kevlin Henney is an excellent read if you want a more serious critique of the whole affair.

[3] Much as I agree with what Steve Freeman has to say on the subject, I can also see why he says “I won’t go through the whole discussion here but, somehow, I don’t think I’m winning this one”. I just still can’t get my head totally around this idea.

Wednesday, 3 July 2013

Avoid Square Brackets in PowerShell Property Names

It should have been a pretty simple task. I had a bunch of CSV data files that were extracts from a SQL Server database. They had no header rows, but I did have the SQL definitions of the table. That meant I could use SED on the table definitions to generate a separate text file with just the column names and use the -Header switch of Import-Csv to fill in the gaps.

. . .

As a test I picked out the first row to make sure my PowerShell one-liner was correct. It looked like this:-

PS C:\> import-csv -header (get-content Header.txt) Customers.txt | select '[FirstName]','[LastName]' -first 1

But my output was essentially empty. I had the columns correctly selected, but no data. It looked like this:-

[FirstName]     [LastName]
-----------     ----------

I checked all the obvious things, like the quoting around the property names, the property names themselves, what Get-Content was returning, etc.. But as far as I could see it was correct. Having done this kind of thing a million times before, the only variable I could see was the square brackets surrounding the column names. But they were coming out in the output, and a quick check with good all reliable Get-Member shows they really are there:-

PS C:\Temp> import-csv -header (get-content Header.txt) Customers.txt | get-member

   TypeName: System.Management.Automation.PSCustomObject

Name        MemberType   Definition
----        ----------   ----------
Equals      Method       bool Equals(System.Object obj)
GetHashCode Method       int GetHashCode()
GetType     Method       type GetType()
ToString    Method       string ToString()
[FirstName] NoteProperty System.String [FirstName]=[FirstName]
[LastName]  NoteProperty System.String [LastName]=[LastName]

So that pretty much leaves some strange translation going on in the final Select-Object part of the pipeline. Sadly my Google Fu hasn’t found a definitive answer (although I’ve seen others about square brackets in pathnames) so I’m hoping a passer by will eventually put me straight.

P.S. If you want to play along at home then the following is the issue distilled down:-


C:\> PowerShell "import-csv Customers.txt | select '[FirstName]','[LastName]'"