Wednesday 18 August 2010

IDisposable’s Should Assert In Their Finalizer

Paradoxically[*] I’m finding that C# makes resource management much harder than C++. In C++ life is so much easier because you always have to deal with resource management and RAII is the tool that makes it a no-brainer – you either use stack based variables or heap allocated variables and a smart-pointer class such as scoped_ptr/shared_ptr. C# on the other hand makes resource management ‘optional’ through the use of the IDisposable interface[%].

What Is and Isn’t Disposable?

And that is my first problem, you don’t always know what does and doesn’t need disposing – you have to go and find out. Of course IntelliSense can help you out here a little but it still means checking every object to see if it has a Dispose() method[#]. The other alternative is to hope you get it right and rely on a static analysis tool like FxCop to point out those ‘occasional’ mistakes. Personally I’ve yet to get anything really useful out of FxCop outside the usual stylistic faux pas’ which seems to be more the domain of StyleCop.

IDisposable is Viral

OK, that’s a little harsh on FxCop as I’m still learning to use it effectively. But after years of using C++ tools like Lint and BoundsChecker to watch my back I was more than a little disappointed. It does seem to point out if I aggregate a type that needs disposing and I haven’t implemented the Dispose pattern, which is nice. However Dispose() is like ‘const correctness’ in C++ - it’s viral - once you start correctly applying IDisposable to your types it then mushrooms and you now need to fix the types that aggregated those and so on.

Should Interfaces Inherit IDisposable?

This leads me to my first question – should interfaces inherit from IDisposable if you know that at least one implementation needs it? On the face of it the answer seems to be no as disposing is purely an implementation detail; but the whole point of interfaces is to avoid ‘programming to an implementation’. If the answer is no then the moment you cast down to an interface you hide the disposing behaviour. COM essentially has to deal with the same problem and its solution is to make AddRef() and Release() a fundamental requirement of every interface. Of course C# has RTTI built in through the use of the ‘as’ and ‘is’ keywords and so you can always attempt a cast to IDisposable from any other interface. However surely this way lies madness as your code would be littered with seemingly random ‘usings’ just in case an implementation later needed it. Here’s an example where this issues has cropped up most often to date…

We are using the Gateway Pattern extensively in our middle tier services to talk to other systems and so the gateway implementation often requires a WCF proxy which requires calling Close() (or a Socket, database connection etc). So, do I expose the disposing requirement through the gateway interface?

public interface ITradeGateway : IDisposable
{
. . .
}

public class TheirBackEnd : ITradeGateway
{
. . .
}

…or just implement IDisposable on the concrete type?

public interface ITradeGateway
{
. . .
}

public class TheirBackEnd : ITradeGateway, IDisposable
{
. . .
}

In my mind the first is more ‘discoverable’ than the second and it gives any static code analysis tools a fighting chance in pointing out where you might have forgotten to call Dispose(). Some might argue that at the point of creation you know the answer anyway as you have the concrete type so why does it matter? Well, I tend to wrap the creation of these kinds of services behind a factory method that returns the object via the intended interface so that you are not inclined to rely on the concrete type unnecessarily:-

public static TradeGatewayFactory
{
    public ITradeGateway CreateTradeGateway()
    {
        return new TheirBackEnd();
    }
}

Most of our factory methods are not quite this simple as they tend to take a configuration object that further controls the construction so that they can hide whether the ‘service’ is hosted in-proc (which is useful for testing and debugging) or remotely via a proxy[+].

Does It Matter If You Forget?

I mostly work on distributed systems where scalability and reliability are major concerns and perhaps I’m being overly pessimistic about the memory consumption of my services but I think it’s important that for certain kinds of resources that their lifetime is managed optimally[$]. At the moment I’m dealing with a managed wrapper over an in-house native library that is used to manipulate the key custom data container that the organisation traffics in. The underlying native implementation uses reference-counted smart pointers for efficiency and naturally this has leaked out into the managed wrappers so that many aspects of the API return objects that implement IDisposable. In fact it’s all too easy to use one of the helper methods (such as an index property) and find yourself leaking a temporary that you didn’t know about and bang goes your careful attempts to control the lifetime of the container, e.g.

// Good. 
using (var section = container.GetSection(“Wibble”)) 

   var entry = section.Value; 
   . . . 


// Leaky. 
var entry = container[“Wibble”].Value; 
. . .

I definitely think this scenario should be picked up by a static analysis tool and if I’ve read the blurb on FxCop 10.0 (that ships with VS2010) correctly then I have high hopes it will watch more of my back.

Assert In The Finalizer

So can we do anything else than rely on tooling? I think we can and that would be to put a Debug.Assert in the Finalizer - after all if the object is being consumed correctly then you should honour the contract and call Dispose() at the relevant point in time. I think it’s safe to say that the Garbage Collector does a pretty good job of hiding most mistakes by running frequently enough, but as Raymond Chen points out on his blog last week (which is “CLR Week”) - you should not rely on the Garbage Collector running at all.

For my own types that don’t manage any native resources themselves it could be implemented like this:-

public class ResourceManager : IDisposable
{
#ifdef DEBUG
    ~ResourceManager()
    {
        Debug.Assert(false);
    }
#endif
    . . .
    public void Dispose()
    {
        m_member.Dispose();

#ifdef DEBUG
        GC.SuppressFinalize(this);
#endif
    }
}

So basically we’re saying that if Dispose() is not invoked, then, when a Garbage Collection does finally occur at least we’ll know we forgot to do it. Sadly we can’t rely on being able to inspect the members in the debugger to work out which instance of an object was forgotten because finalizers can be run in any order; but maybe we’ll get lucky.

If you start from a clean slate then you can write a unit or integration test that forces a full garbage collection right after exercising your code to ensure any errant finalizers run and get instant feedback about your mistake:-

[Test]
public void should_not_leak_resources()
{
    var consumer = new ResourceConsumer();

    consumer.consumeResources();

    GC.Collect();
    GC.WaitForPendingFinalizers();
}

I’ll be honest and point out that I’ve put off actually trying this out in earnest until I have had time to investigate how to tap into the Asserting mechanism so that I can avoid hanging the test runner with a message box unless I’m running under the debugger. I’ve done this plenty of times with the Debug MS CRT (_CrtSetReportHook) so I’m sure there must be a way (I’ve only scratched the surface of the TraceListener class but I’m guessing it plays a part).

Debug Builds – Not The Done Thing?

Back in an earlier post Debug & Release Database Schemas I suggested there must be times when a debug build is used in the C#/.Net world. Unlike the C++ world, this beast does not seem to be at all prevalent. In fact I’ve yet to come across any 3rd party (or in-house teams) promoting a debug build. Visual Studio and C# supports the concept, but I wonder if teams only expect it to be used for internal testing? Jeffrey Richter briefly mentioned “Managed Debugging Assistants” in his book CLR via C# but I’ve yet to read up on how you use them effectively, i.e. tap into the mechanism programmatically so that I can log these failures whenever the services are running unattended; not just under the debugger.

[*] It’s not really a paradox as 15 years C++ vs 1 year C# isn’t exactly a fair comparison.

[%] Optional in the sense that not every type requires it.

[#] or Close() in the case of the thread synchronization types which is a nice inconsistency.

[+] I’m still not convinced by the use of an off-the-shelf Inversion of Control (IoC) framework as it only seems to save the odd line or two of code at the expense of managing another 3rd party dependency. I also much prefer creating immutable types that are fully constructed via the ctor than providing multi-phase construction via mutators which IoC frameworks seem to require. Maybe I just don’t work on the kind of systems they’re aimed at?

[$] The obvious question here I suppose is “Why are you using C# then?”. And the answer [for now] is “because we are”. I was expecting this to to scale-up further that it has, but we can still scale-out further if needs be.

Monday 9 August 2010

Stored Procedures Are About More Than Just Performance

A colleague who is new to the team and working on a new GUI project asked me whether we should be using an ORM tool like Entity Framework for the impending GUI Data Access Layer. Our existing services all invoke stored procedures via manually[*] crafted ADO .Net code which is largely due to our inexperience with ORM tools, but also because we hardly have any DB code so it’s not a burden – unlike what the GUI is likely to need. This in turn led to a question about why we use stored procedures in the first place when the performance of modern SQL query optimisers is such that the benefits are less considerable. What I found interesting about this question was that we had never chose to communicate with the database through stored procedures for performance reasons; on the contrary it has always been about creating an abstraction over the database to ease maintenance.

OO Data Models

I have been practising OO for long enough that I naturally see rows in a table as instances of objects and stored procedures as their methods. You can use OO techniques with procedural languages like C and SQL, it just takes a little artistic license in the naming scheme to work around some of the inherent limitations - although simulating virtual functions is a little trickier :-). There are a number of patterns available to deal with the ORM problem (the oldest ones I know of were in James Rumbaugh’s book Object-Orientated Modelling and Design from 1991) and in particular the problem of mapping polymorphic types to tables but I find the vast majority of cases fall into the simple Table == Type category.

A classic example would be a table of customers with ‘methods’ to add a new customer or find an existing one by its unique ID:-

create table Customer
    Id      int not null,
    Name    varchar(100) not null

exec Customer_Insert 1234, ‘name’
exec Customer_FindById 1234

The most common scheme I’ve seen is to just prefix procedures with the ‘type’ name, which can seem a little odd to long term SQL’ists and so sometimes it’s feels more natural to them to treat the prefix as a namespace and repeat the ‘type’ in the ‘method’ name:-

exec Customer_InsertCustomer 1234, ‘name’
exec Customer_FindCustomerById 123

This prefix also has the nice side-effect of causing related procedures to be grouped together within the file-system/VCS and database IDEs like SQL Server Management Studio (SSMS).

Of course for simple INSERTs and SELECTs this probably feels like overkill as all you’ve done is to wrap a one-line query and make more work for yourself. But where it starts to pay off is when your one-liner turns into a two or three-liner, requires tuning because the query optimiser needs help or you want to weave in other aspects such as debug code (which I covered recently in Debug & Release Database Schemas).

Where’s the Public Interface?

In another post about database development, xUnit Style Database Unit Testing, I briefly referred to the ‘Public Interface’ of the database and what might constitute it. Obviously in SQL there is no direct equivalent of ‘public’ and ‘private’ that you can use to restrict access to objects on a logical level, but there is a real security mechanism instead that you can leverage. By following the “Principle of Least Privilege” and only granting access (e.g. EXECUTE permissions) on those objects that the client needs to use you can enforce a real public/private barrier.

Sadly this is not altogether intuitive from a maintenance perspective and can easily lead to the erroneous exposure of private procedures without some other convention to highlight the intended accessibility of the code. Even if you also follow the principle that you only write unit tests that target the public interface, it’s still not enough. One option would be to adorn the names of private procedures with another prefix or suffix, e.g.

exec __Customer_VerifyState @current, @intended
exec Customer_VerifyState_private @current, @intended
exec Customer_internal_VerifyState @current, @intended

Using a prefix disrupts the nice grouping in the IDE/file-system, whilst a suffix may be invisible if it’s too subtle. Placing it between the ‘Type’ and ‘Method’ names adds a little speed bump that could provide enough of a clue without adversely affecting readability, although it feels more intrusive than the suffix. If you are purely relying on naming conventions then a more intrusive adornment may be better as you can then use a tool like grep to find violations in your sever and client code.

The only other option I see available would be to use schemas, e.g.

exec private.Customer_VerifyState @current, @intended

I’ve not played much with schemas and on my current project we have already used them to distinguish between the core production code and other procedures that aid in unit testing (via a ‘test’ schema) or support utilities (via a ‘util’ schema). Our inexperience with schemas has also meant that we’ve run into a few minor permissioning issues with temporary tables and user-defined types that makes us a little wary of sprinkling them too liberally.

SQL Maintenance

All the teams I’ve worked in during the last dozen or so years have had developers whose main role is to maintain the database code. That doesn’t mean that they didn’t have other skills as well, but effectively they were hired to maintain the SQL side of the codebase. That also doesn’t mean that other developers did not contribute to the SQL code (although in some cases a more rigid structure was in place) as SQL is often a core skill, but the quality of SQL generated by the generalists may not be sufficiently high enough for the volume of data being manipulated.

I definitely put myself in the category of developers that can write basic SQL queries and stored procedures and can handle many table joins, but when it comes to concepts like nested sub-queries I start to question whether I’m writing an efficient query or not. I’ve also read around the subject enough to know that cursors have their place and it’s not usually in the kinds of queries I write so having a SQL expert around to turn my functional query into an efficient one really helps.

Of course the query optimisers built into SQL databases now are way more advanced that when I first started using them, but there still seems to be a need for SQL experts who can tweak a query to correct the optimisers bad judgment calls and give orders of magnitude performance gains, or add some hints to avoid deadlocks caused by lock escalation. This kind of maintenance is so much easier if the SQL is contained within stored procedures as you can leverage the faster deployment mechanism.

 

[*] Naturally much of the boiler-plate code has been refactored out by wrapping the underlying SQL classes and using Execute Around Method so that it has minimal excess baggage.