Friday 30 April 2010

Happy Birthday, Blog

Today sees the anniversary of my inaugural post (An apology to Raymond Chen) on this blog. That seems a highly appropriate moment to reflect and see if it’s turned out the way I hoped…

As I mentioned back in July, when my first two reviews were published in the ACCU Journal, I’ve found writing difficult, largely I guess because I’m out of practice. Writing this blog has certainly made a dramatic improvement to the speed at which I write. For example, last year my review of the ACCU 2009 Conference took me days to write – and I was on my sabbatical at the time! This year I did it on the train during my commute in a matter of hours. Ok, so I’ve not been one of those prolific bloggers that rattles out a piece every day, but I have tried to write a post a week. I realise that many of my posts are quite lengthy and given that I only use my daily commute as the source of time for it I reckon that’s not too shabby a rate. My commute time already has competition from reading, gaming and maintenance of my freeware codebase so it’s already a tight squeeze.

The one area I certainly didn’t expect to be writing about was C#. I was still a die hard C++ aficionado back in April last year and naturally assumed I’d be writing about C++ issues (if there are any left). It’s funny, but with all that time on my hands during my sabbatical I found it harder to know what to write about, whereas now I’m back working full-time the ideas keep flooding in. Once again I expect that part of it is down to the blogging experience, but I also suspect that I feel more confident about the topics I’m covering. I definitely expected to be sticking to very technical issues such as the recent ones involving WCF, but my current project is both greenfield and using an agile methodology and that has highlighted some very interesting dynamics which in turn has led to a new degree of consciousness about a number of software development issues.

Without a doubt the single biggest contribution blogging has made to me has been the clarity of thought that comes from the fear of “publishing and being damned”. Knowing that the moment I hit the ‘publish’ button my words will be broadcast out onto the Internet for all eternity where potential future employers will be able to see them ensures that I try to remain objective. In the last year there have been two posts that I started to write and ended up canning because I realised they were straw-man arguments. Conversely the mere act of documenting my experiences also leads to new questions that I’ve not considered in any real depth before. I have one post on Unit Test Naming Guidelines that I thought was all done-and-dusted until I met Steve Freeman and Nat Pryce and discovered that I was barking up the wrong tree. No doubt when I come to revise that post at a later date more questions will emerge…

The one thing I haven’t done is look at the stats in Google Analytics. I added a hit counter back at the start, mostly because it was easy, but I never expected anyone to actually read this stuff. The fact that there have been comments submitted (that aren’t just link spam) means that at least a couple of people have bothered to read my musings which is pretty satisfying. Now that a whole year has passed I feel tempted to take a peek and see if the number of hits has reached double figures yet.

Being Author and Editor means that I don’t have that fear of rejection you get with a ‘real’ publication, but I still have that fear of embarrassment to keep me on the straight and narrow. I’m quite contented at present to continue to build up a portfolio of posts that hopefully helps give me that edge we all need to ensure our own survival in the fast changing world of Software Development.

Friday 9 April 2010

Object Finalized Whilst Invoking a Method

The JIT Compiler & Garbage Collector are wonders of the modern computer age. But if you’re an ex-C++ developer you should put aside what you think you might know about how your C# code runs because it appears their view of the world lacks some artificial barriers we’re used to. Deterministic Destruction such as that used in C++ has a double meaning, sort of. On the one hand it means that an object will be deleted when it goes out of scope, and yet on the other hand it also means that an object is only destroyed by an external influence[*]. Essentially this means that the lifetime of a root object or temporary is defined by the exiting of a scope which keeps life really simple. In the C# world scopes affect some similar aspects to C++ such as value type lifetimes and the visibility of local variables, but not the lifetime of reference types…

The Scenario

The C# bug that I’ve just been looking into involved an Access Violation caused by the finalizer thread trying to destroy an object whilst it was still executing an instance method. Effectively the object finalizer destroyed the native memory it was managing whilst it was also trying to persist that very same block of memory to disk. On the face of it that sounds insane. How can the CLR reach the conclusion that an object has no roots and is therefore garbage when you’re inside a method as surely at least the stack frame that is invoking the method has a reference to ‘this’?

Here is a bare bones snippet of the code:-

public class ResourceWrapper : IDisposable
{
    public void Save(string filename);
}

. . .

public class MyType
{
    public ResourceWrapper Data
    {
        get { return new ResourceWrapper(m_data); }
    }

    private byte[] m_data; // Serialized data in
                           // managed buffer.
}

. . .

public void WriteStuff(string filename)
{
    m_myType.Data.Save(filename);
}

Now, reduced to this simple form there are some glaring omissions relating to the ownership of the temporary ResourceWrapper instance. But that should only cause the process to be inefficient with its use of memory, I don’t believe there should be any other surprises in a simple single-threaded application. I certainly wouldn’t expect it to randomly crash on this line with an Access Violation:-

m_myType.Data.Save(filename);

The Object’s Lifetime

Once again, putting aside the blatant disregard for correct application of the Dispose Pattern, how can the object, whilst inside the Save() method, be garbage collected? I was actually already aware of this issue after having read the blog post “Lifetime, GC.KeepAlive, handle recycling” by Chris Brumme a while back but found it hard to imagine at the time how it could really affect me as it appeared somewhat academic. In my case I didn’t know how the Save() method was implemented, but I did know it was an incredibly thin wrapper around a native DLL, so I’ve guessed that it probably fitted Chris Brumme’s scenario nicely. If that’s the case then we can inline both the Data property access and the Save call so that in pseudo code it looks something like this:-

public void WriteStuff(string filename)
{
    ResourceWrapper tmp = new ResourceWrapper
                              (m_myType.data);
    IntPtr handle = tmp.m_handle;

    // tmp can now be garbage collected because we
    // have a copy of m_handle. 
    ResourceWrapper.NativeSaveFunction(handle,
                                       filename);
}

What a C++ programmer would need to get out of their head is that ‘tmp’ is more like a ResourceWrapper* than a shared_ptr<ResourceWrapper> - the difference being that its lifetime ends way before the end of the scope.

The Minimal Fix

So, if I’ve understood Chris Brumme’s article correctly, then the code above is the JIT Compiler & Garbage Collector’s view. The moment we take a copy of the m_handle member from tmp it can be considered garbage because an IntPtr is a value type, not a reference type, even though we know it actually represents a reference to a resource. In native code you manage handles and pointers using some sort of RAII class like shared_ptr with a custom deleter as each copy of a handle represents another reference, but within C# it seems that the answer is using GC.KeepAlive() to force an objects lifetime to extend past the use of the resource handle. In my case, because we don’t own the ResourceWrapper type, we have to keep the temporary object alive ourselves, which leads to this solution:-

public void WriteStuff(string filename)
{
    ResourceWrapper tmp = m_myType.Data;
    tmp.Save(filename);
    GC.KeepAlive(tmp);
}

From a robustness point of view I believe the KeepAlive() call should still be added to the Save() method to ensure correctness even when Dispose() has accidentally not been invoked - as in this case. Don’t get me wrong I’m a big fan of the Fail Fast (and Loud) approach during development, but this kind of issue can easily evade testing and bite you in Production. To me this is where a Debug build comes into play and warns you that the finalizer performed cleanup at the last minute because you forgot to invoke Dispose(). But you don’t seem to hear anything about Debug builds in the C#/.Net world…

The Right Answer

The more impatient among you will have no doubt been shouting “Dispose - You idiot!” for the last few paragraphs. But when I find a bug I like to know that I’ve really understood the beast. Yes I realised immediately that Dispose() was not being called, but that should not cause an Access Violation in this scenario so I felt there were other forces at work. If I had gone ahead and added the respective using() statement that would likely have fixed my issue, but not have diagnosed the root cause. This way I get to inform the relevant team responsible for the component of a nasty edge case and we both get to sleep soundly.

[*I’m ignoring invoking delete, or destructors or calling Release() or any other manual method of releasing a resource. It’s 2010 and RAII, whether through scoped_handle or shared_ptr or whatever, has been the idiom of choice for managing resources in an exception safe way for well over a decade]

Wednesday 7 April 2010

Turning Unconscious Incompetence to Conscious Incompetence

I’m sure that I must have come across the Four Stages of Competence before, but it was when Luke Hohmann quoted the more humorous interpretation that it actually registered with me. Pete Goodliffe also brought up this topic up in his recent C Vu column and it’s got me thinking about how I turn Unconscious Incompetence to mere Conscious Incompetence. Oh yeah and the fact that I’m currently sitting in a garage whilst they drain the unleaded petrol from my diesel powered MPV brings the subject of incompetence to the forefront of my mind…

Luke Hohmann was quoting the “Known Knowns” statement from Donald Rumsfeld during an eXtreme Tuesday Club (XTC) meeting back in November 2009. He sketched a pie-chart with a tiny wedge to represent “what we know”, a slightly larger wedge for “what we know we don’t know” and the rest for “what we don’t know we don’t know”. His talk was about Innovation Games and his point was about where innovation occurs. As always it was an insightful session, not least because it got me thinking about how I’d go about reducing the size of the “what I don’t know I don’t know” slice[*].

I’d always thought that not knowing something in detail was not much better than not knowing it at all. But clearly there is value in it. After the better part of 15 years of C++ I don’t think it would be too modest of me to say that C++ was by-and-large in the “do know” region. My recent move to C# meant that I suddenly found myself drawing considerably on that somewhat larger “don’t know” section. Although for C++ developers the usefulness of MSDN Magazine vanished years ago as Microsoft drove headlong into promoting the New World Order that was .Net and C#, I kept subscribing and reading a significant portion of each issue as I felt that there was still an underlying message in there worth listening to. These days Jeremy Miller has a column dedicated to Patterns and Practices and James McCaffrey covers testing mechanisms, so some of those concepts are more readily digestible.

Still, the background noise of the articles has sat there at the back of my mind so that within the last 6 months I have easily been able to move to the C# world. Topics like Generics, Extension Methods, Lambdas and LINQ were high-up on the reading list as somehow I knew these were important. For GUI work I know there is ASP.Net, WinForms and WPF all vying for attention, and on the IPC front Sockets, DCOM and Named Pipes are all passé, and Indigo/WCF is the one-stop-shop. My recent WCF posts probably illustrates how I’m starting to cover the WCF angle, but the GUI stuff will just have to wait as I’ve no need to absorb that yet.

Clearly I knew more that just the names of many of these concepts so is that really fair to categorise them as Conscious Incompetence? Probably not. So how about a list of things that I know virtually nothing about at all except the name and a truly vague context:-

Hadoop – Something from Google related to Map/Reduce?
Scala/Groovy – Languages somehow related to Java and/or the JVM?
Maven/CMake – Stuff to do with building code?
Recursive Descent Parser – Compiler theory/technique?

Does it make sense to know this little about each of these topics? What if I’m actually wrong about the way I’ve categorized them? Does this fulfil the notion of Conscious Incompetence, or do I already know too much?

As each year passes it seems that keeping up with what’s going on in the world of Software Development is becoming more and more of an uphill struggle. There are so many languages and technologies that tuning into the relevant stuff is nigh on impossible (I could have said filtering out the noise, but so much of it appears interesting in one way or another that the term ‘noise’ feels disingenuous). My RSS reader is overflowing with blogs and articles demanding my attention, so after I’ve read the important stuff like The Old New Thing and The Daily WTF I start skimming the general purpose feeds like Slashdot Developers and Dr Dobbs. I say skim, but it’s pretty hard not to get drawn into reading the entire piece, and then of course you’ll follow a few links and have lost another hour or two of your life. That assumes the wife and kids have left you alone long enough for that to happen…

Therein lies the problem for me. I find it hard to leave many topics in the state of Conscious Incompetence. I want to know enough so that when the time comes my unconscious awareness ensures that I know where I need to go to discover more, but at the same time know little enough that I can still effectively filter out that which is most relevant from the daily bombardment of new innovations and revised practices. I guess I’m in need of a more Agile approach to learning.

 

[*I realised after having read the Wikipedia entries for these two topics more closely that I’m stretching the definition of the Four Stages of Competence to its limit by applying it to the acquisition of knowledge, but as Rumsfeld has shown, repeatedly using the words Known and Unknown would only make the prose more indecipherable]

Thursday 1 April 2010

Lured Into the foreach + lambda Trap

I had read Eric Lippert’s blog on the subject, I had commented on it, I had even mentioned it in passing to my colleagues the very same morning, but I still ended up writing:-

foreach(var task in tasks)

. . . 
ThreadPool.QueueUserWorkItem(o => ExecuteTask(task));
}

Then scratched my head as I tried to work out why the same task was suddenly being processed more than once, which was only an issue because the database was chucking seemingly random “Primary Key Violation” errors at me. Luckily I had only changed a few lines of code so I was pretty sure where it must lie, but it still wasn’t obvious to me and I had to fire up the debugger just to humour myself that the ‘tasks’ collection didn’t contain duplicates on entry to the loop as I had changed that logic slightly too.

Mentally the foreach construct looks to me like this:-

foreach(collection)
{
    var element = collection.MoveNext();
    . . .
}

The loop variable is scoped, just like in C++, and also immutable from the programmers perspective. Yes the variable is written before the opening ‘{‘ just like a traditional for loop and that I guess is where I shall I have to look for guidance until the compiler starts warning me of my folly. Back in the days before all C++ compilers* had the correct for-loop scoping rules a common trick to simulate it was to place an extra pair of { } ‘s around the loop:-

{for(int i = 0; i != size; ++i)
{
    . . .
}}

This is the mental picture I think I need to have in mind when using foreach in the future:-

{var element; foreach (collection)
{
    element = collection.MoveNext();
    . . .
}}

That’s all fine and dandy, but how should I rewrite my existing loop today to account for this behaviour? I feel I want to name the loop variable something obscure so that I’ll not accidentally use it within the loop. Names like ‘tmp’ are out because too many people do that to be lazy. A much longer name like ‘dontUseInLambdas’ is more descriptive but does shout a little too much for my taste. The most visually appealing solution (to me at least) has come from PowerShell’s $_ pipeline variable:-

foreach(var _ in tasks)
{
    var task = _;
    . . .
    ThreadPool.QueueUserWorkItem(o => ExecuteTask(task));
}

It’s so non-descript you can’t use it by accident in the loop and the code still visually lines up the ‘var task’ and ‘in collection’ to a reasonable degree so it’s not wholly unnatural. I think I need to live with it a while (and more importantly my colleagues too) before drawing any conclusions. Maybe I’ll try a few different styles and see what works best.

So what about not falling into the trap in the first place? This is going to be much harder because I’ve got a misaligned mental model to correct, but questioning if a lambda variable is physically declared before the opening brace (instead of the statement) is probably a good start. Mind you this presupposes I don’t elide the braces on single-line loops - which I do :-)

foreach(var task in tasks)
    ThreadPool.QueueUserWorkItem(o => ExecuteTask(task));

One final question on my mind is whether or not to try and write some unit tests for this behaviour. The entire piece of code is highly dependent on threads and processes and after you mock all that out there’s virtually nothing left but this foreach loop. It also got picked up immediately by my system tests. Still, I’ve always known the current code was a tactical choice, so as I implement the strategic version perhaps I can amortise the cost of refactoring in then.

 

[*Yes I’m pointing the finger squarely at Visual C++]