Friday, 21 October 2016

When Mocks Became Production Services

We were a brand new team of 5 (PM + devs) tasked with building a calculation engine. The team was just one part of a larger programme that encompassed over a dozen projects in total. The intention was for those other teams to build some of the services that ours would depend on.

Our development process was somewhat DSM-like in nature, i.e. iterative. We built a skeleton based around a command-line calculator and fleshed it out from there [1]. This skeleton naturally included vague interfaces for some of the services that we knew we’d need and that we believed would be fulfilled by some of the other teams.

Fleshing Out the Skeleton

Time marched on. Our calculator was now being parallelised and we were trying to build out the distributed nature of the system. Ideally we would like to have been integrating with the other teams long ago but the programme RAG status wasn’t good. Every other team apart from us was at “red” and therefore well behind schedule.

To compensate for the lack of collaboration and integration with the other services we needed we resorted to building our own naïve mocks. We found other sources of the same data and built some noddy services that used the file-system in a dumb way to store and serve it up. We also added some simple steps to the overnight batch process to create a snapshot of the day’s data using these sources.

Programme Cuts

In the meantime we discovered that one of the services we were to depend on had now been cancelled and some initial testing with another gave serious doubts about its ability to deliver what we needed. Of course time was marching on and our release date was approaching fast. It was fast dawning on us that these simple test mocks we’d built may well have to become our production services.

One blessing that came out of building the simple mocks so early on was that we now had quite a bit of experience on how they would behave in production. Hence we managed to shore things up a bit by adding some simple caches and removing some unnecessary memory copying and serialization. The one service left we still needed to invoke had found a more performant way for us to at least bulk extract a copy of the day’s data and so we retrofitted that into our batch preparation phase. (Ideally they’d serve it on demand but it just wasn’t there for the queries we needed.)

Release Day

The delivery date arrived. We were originally due to go live a week earlier but got pushed back by a week because an important data migration got bumped and so we were bumped too. Hence we would have delivered on time and, somewhat unusually, we were well under budget our PM said [2]. 

So the mocks we had initially built just to keep the project moving along were now part of the production codebase. The naïve underlying persistence mechanism was now a production data store that needed high-availability and backing up.

The Price

Whilst the benefits of what we did (not that there was any other real choice in the end) were great, because we delivered a working system on time, there were a few problems due to the simplicity of the design.

The first one was down to the fact that we stored each data object in its own file on the file-system and each day added over a hundred-thousand new files. Although we had partitioned the data to avoid the obvious 400K files-per-folder limit in NTFS we didn’t anticipate running out of inodes on the volume when it quickly migrated from a simple Windows server file share to a Unix style DFS. The calculation engine was also using the same share to persist checkpoint data and that added to the mess of small files. We limped along for some time through monitoring and zipping up old data [3].

The other problem we hit was that using the file-system directly meant that the implementation details became exposed. Naturally we had carefully set ACLs on the folders to ensure that only the environment had write access and our special support group had read access. However one day I noticed by accident that someone had granted read access to another group and it then transpired that they were building something on top of our naïve store.

Clearly we never intended this to happen and I’ve said more about this incident previously in “The File-System Is An Implementation Detail”. Suffice to say that an arms race then developed as we fought to remove access to everyone outside our team whilst others got wind of it [4]. I can’t remember whether it happened in the end or not but I had put a scheduled task together than would use CALCS to list the permissions and fail if there were any we didn’t expect.

I guess we were a victim of our success. If you were happy with data from the previous COB, which many of the batch systems were, you could easily get it from us because the layout was obvious.


I have no idea whether the original versions of these services are still running to this day but I wouldn’t be surprised if they are. There was a spike around looking into a NoSQL database to alleviate the inode problem, but I suspect the ease with which the data store could be directly queried and manipulated would have created too much inertia.

Am I glad we put what were essentially our mock services into production? Definitely. Given the choice between not delivering, delivering much later, and delivering on time with a less than perfect system that does what’s important – I’ll take the last one every time. In retrospect I wish we had delivered sooner and not waited for a load of other stuff we built as the MVP was probably far smaller.

The main thing I learned out of the experience was a reminder not to be afraid of doing the simplest thing that could work. If you get the architecture right each of the pieces can evolve to meet the ever changing requirements and data volumes [5].

What we did here fell under the traditional banner of Technical Debt – making a conscious decision to deliver a sub-optimal solution now so it can start delivering value sooner. It was the right call.


[1] Nowadays you’d probably look to include a slice through the build pipeline and deployment process up front too but we didn’t get any hardware until a couple of months in.

[2] We didn’t build half of what we set out to, e.g. the “dashboard” was a PowerShell generated HTML page and the work queue involved doing non-blocking polling on a database table.

[3] For regulatory reasons we needed to keep the exact inputs we had used and couldn’t guarantee on being able to retrieve them later from the various upstream sources.

[4] Why was permission granted without questioning anyone in the team that owned and supported it? I never did find out, but apparently it wasn’t the first time it had happened.

[5] Within reason of course. This system was unlikely to grow by more than an order of magnitude in the next few years.

Thursday, 20 October 2016

Confusion Over Waste

When looking at the performance of our software we often have to consider both first-order and second-order effects. For example when profiling a native application where memory management is handled explicitly we can directly see the cost of allocations and deallocations because this all happens at the moment we make them. In contrast the world of garbage collected languages like C# exhibit different behaviour. The cost of memory allocations here are minimal because the algorithm is simple. However the deallocation story is far more complex, and it happens at a non-deterministic time later.

A consequence of this different behaviour is that it is much harder to see the effects that localised memory churn is having on your application. For example I once worked on a C# data transformation tool where the performance was appalling. Profiling didn’t immediately reveal the problem but closer inspection showed that the garbage collector was running full tilt. Looking much closer at the hottest part of the code I realised it was spending all it’s time splitting strings and throwing them away. The memory allocations were cheap so there were no first-order effects, but the clean-up was really expensive and happened later and therefore appeared as a second-order effect which was harder to trace back.

Short Term Gains

We see the same kind of effects occurring during the development process too. They are often masked though by the mistaken belief that time is being saved, it is, but only in the short term. The problem is the second-order effects of such time saving is actually lost later, and when it’s more precious.

This occurs because the near term activity is being seen as wasteful of a certain person’s time, on the premise that the activity is of low value (to them). But what is being missed is the second-order effects of doing that, such as the learning about the context, people and product involved. When crunch time comes that missed learning suddenly has to happen at the later time when potentially under time pressure or after money has already been spent; then you’re heading into sunk costs territory.

In essence what is being perceived as waste is the time spent in the short term, when the real waste is time lost in the future due to rework caused by the missed opportunity to learn sooner.

All Hail “Agile”

Putting this into more concrete terms consider a software development team where the developer’s time is assumed to be best spent designing and writing code. The project manager assumes that having conversations, perhaps with ops or parts of the business is of low value, from the developer’s perspective, and therefore decides it’s better if someone “less expensive” has it instead.

Of course we’re all “agile” now and we don’t do that anymore. Or do we? I’ve worked in supposedly agile teams and this problem still manifests itself, maybe not quite to the same extent as before, but nonetheless it still happens and I believe it happens because we are confused about what the real waste is that we’re trying to avoid.

Even in teams I’ve been in where we’ve tried to ensure this kind of problem is addressed, it’s only addressed locally, it’s still happening further up the food chain. For example a separate architecture team might be given the role of doing a spike around a piece of technology that a development team will be using. This work needs to happen inside the team so that those who will be developing and, more importantly, supporting the product will get the most exposure to it. Yes, there needs to be some governance around it, but the best people to know if it even solves their problem in the first place is the development team.

Another manifestation of this is when two programme managers are fed highlights about potential changes on their side of the fence. If there is any conflict there could be a temptation to resolve it without going any lower. What this does is cut out the people that not only know most about the conflict, but are also the best placed to negotiate a way out. For example instead of trying to compensate for a potential breaking change with a temporary workaround, which pushes the product away from its eventual goal, see if the original change can be de-prioritised instead. If a system is built in very small increments it’s much easier to shuffle around the high priority items to accommodate what’s happening around the boundaries of the team.

Time for Reflection

How many times have you said, or heard someone else say, “if only you’d come to us earlier”. This happens because we try and cut people out of the loop in the hope that we’ll save time by resolving issues ourselves, but what we rarely do is reflect on whether we really did save time in the long run when the thread eventually started to unravel and the second-order effects kicked in.

Hence, don’t just assume you can cut people out of the loop because you think you’re helping them out, you might not be. They might want to be included because they have something to learn or contribute over-and-above the task at hand. Autonomy is about choice, they might not always want it, but if you don’t provide it in the first place it can never be leveraged.

Monday, 22 August 2016

Sharing Code with Git Subtree

The codebase I currently work on is split into a number of repositories. For example the infrastructure and deployment scripts are in separate repos as are each service-style “component”.

Manual Syncing

To keep things moving along the team decided that the handful of bits of code that were shared between the two services could easily be managed by a spot of manual copying. By keeping the shared code in a separate namespace it was also partitioned off to help make it apparent that this code was at some point going to be elevated to a more formal “shared” status.

This approach was clearly not sustainable but sufficed whilst the team was still working out what to build. Eventually we reached a point where we needed to bring the logging and monitoring stuff in-sync and I also wanted to share some other useful code like an Optional<T> type. It also became apparent that the shared code was missing quite a few unit tests as well.

Share Source or Binaries?

The gut reaction to such a problem in a language like C# would probably be to hive off the shared code into a separate repo and create another build pipeline for it that would result in publishing a package via a NuGet feed. And that is certainly what we expected to do. However the problem was where to publish the packages to as this was closed source. The organisation had its own license for an Enterprise-scale product but it wasn’t initially reachable from outside the premises where our codebase lay. Also there were some problems with getting NuGet to publish to it with an API key that seemed to lay with the way the product’s permissions were configured.

Hence to keep the ball rolling we decided to share the code at the source level by pulling the shared repo into each component’s solution. There are two common ways of doing this with Git – subtrees and submodules.

Git Submodules

It seemed logical that we should adopt the more modern submodule approach as it felt easier to attach, update and detach later. It also appeared to have support in the Jenkins 1.x plugin for doing a recursive clone so we wouldn’t have to frig it with some manual Git voodoo.

As always there is a difference between theory and practice. Whilst I suspect the submodule feature in the Jenkins plugin works great with publicly accessible open-source repos it’s not quite up to scratch when it comes to private repos that require credentials. After much gnashing of teeth trying to convince the Jenkins plugin to recursively clone the submodules, we conceded defeat assuming that we’re another victim of JENKINS-20941.

Git Subtree

Given that our long term goal was to move to publishing a NuGet feed we decided to try using a Git subtree instead so that we could at least move forward and share code. This turned out (initially) to be much simpler because for tooling like Jenkins it appears no different to a single repo.

Our source tree looked (unsurprisingly) like this:

  +- src
     +- app
     +- shared-lib
        +- .csproj
        +- *.cs

All we needed to do was replace the shared-lib folder with the contents of the new Shared repository.

First we needed to set up a Git remote. Just as the remote main branch of a cloned repo goes by the name origin/master, so we set up a remote for the Shared repository’s main branch:

> git remote add shared https://github/org/Shared.git

Next we removed the old shared library folder:

> git rm src\shared-lib

…and grafted the new one in from the remote branch:

> git subtree add --prefix src/shared shared master --squash

This effectively takes the shared/master branch and links it further down the repo source tree to src/shared which is where we had it before.

However the organisation of the new Shared repo is not exactly the same as the old shared-lib project folder. A single child project usually sits in it’s own folder, but a full-on repo has it’s own src folder and build scripts and so the source tree now looked like this:

  +- src
     +- app
     +- shared
        +- src
           +- shared-lib
              +- .csproj
              +- *.cs

There is now two extra levels of indirection. First there is the shared folder which corresponds to the external repo, plus there is that repo’s src folder.

At this point all that was left to do was to fix up the build, i.e. fix up the path to the shared-lib project in the Visual Studio solution file (.sln) and push the changes.

We chose to use the --squash flag when creating the subtree as we weren’t interested in seeing the entire history of the shared library in the solution’s repository.

Updating the Subtree

Flowing changes from the parent repo down into the subtree of the child repo is as simple as a fetch & pull:

> git fetch shared master
> git subtree pull --prefix src/shared shared master --squash

The latter command is almost the same as the one we used earlier but we pull rather than add. Once again we’re squashing the entire history as we’re not interested in it.

Pushing Changes Back

Naturally you might want to make a change in the subtree in the context of the entire solution and then push it back up to the parent repo. This is doable but involves using git subtree push to normalise the change back into the folder structure of the parent repo.

Personally we decided just to make the changes test-first in the parent and always flow down to the child. In the few cases the child solution helped in debugging we decided to work on the fix in the child solution workspace and then simply manually copy the change over to the shared workspace and push it out through the normal route. It’s by no means optimal but a NuGet feed was always our end game so we tolerated the little bit of friction in the short term.

The End of the Road

If we were only sucking in libraries that had no external dependencies themselves (up to that point our small shared code only relied on the .Net BCL) we might have got away with this technique for longer. But in the end the need to pull in 3rd party dependencies via NuGet in the shared project pushed it over the edge.

The problem is that NuGet packages are on a per-solution basis and the <HintPath> element in the project file assumes a relative path (essentially) from the solution file. When working in the real repo as part of the shared solution it was “..\..\packages\Xxx”, but when it’s part of the subtree based solution it needed to be two levels further up as “..\..\..\..\packages\Xxx”.

Although I didn’t spend long looking I couldn’t find a simple way to easily overcome this problem and so we decided it was time to bite-the-bullet and fix the real issue which was publishing the shared library as a NuGet feed.

Partial Success

This clearly is not anything like what you’d call an extensive use of git subtree to share code, but it certainly gave me a feel for it can do and I think it was relatively painless. What caused us to abandon it was tooling specific (the relationship between the enclosing solution’s NuGet packages folder and the shared assembly project itself) and so a different toolchain may well fair much better if build configuration is only passed down from parent to subtree.

I suspect the main force that might deter you from this technique is how much you know, or feel you need to know, about how git works. When you’re inside a tool like Visual Studio it’s very easy to make a change in the subtree folder and check it in and not necessarily realise you’re modifying what is essentially read-only code. When you next update the subtree things get sticky. Hence you really need to be diligent about your changes and pay extra attention when you commit to ensure you don’t accidentally include edits within the subtree (if you’re not planning on pushing back that way). Depending on how experienced your team are this kind of tip-toeing around the codebase might be just one more thing you’re not willing to take on.

Manually Forking Chunks of Open Source Code

Consuming open source projects is generally easy when you are just taking a package that pulls in source or binaries into your code “as is”. However on occasion we might find ourselves needing to customise part of it, or even borrow and adapt some of its code to either workaround a bug or implement our own feature.

If you’re forking the entire repo and building it yourself then you are generally going to play by their rules as you’re aware that you’re playing in somebody else’s house. But when you clone just a small part of their code to create your own version then it might not seem like you have to continue honouring their style and choices, but you probably should. At least, if you want to take advantage of upstream fixes and improvements you should. If you’re just going to rip out the underlying logic it doesn’t really matter, but if what you’re doing is more like tweaking then a more surgical approach should be considered instead.

Log4Net Rolling File Appender

The driver for this post was having to take over maintenance of a codebase that used the Log4Net logging framework. The service’s shared libraries included a customised Log4Net appender that took the basic rolling file appender and then tweaked some of the date/time handling code so that it could support the finer-grained rolling log file behaviour they needed. This included keeping the original file extension and rolling more frequently than a day. They had also added some extra logic to support compressing the log files in the background too.

When I joined the team the Log4Net project had moved on quite a bit and when I discovered the customised appender I thought I’d better check that it was still going to work when we upgraded to a more recent version. Naturally this involved diffing our customised version against the current Log4Net file appender.

However to easily merge in any changes from the Log4Net codebase I would need to do a three-way diff. I needed the common ancestor version, my version and their version. Whilst I could fall back to a two-way diff (latest of theirs and mine) there were lots of overlapping changes around the date/time arithmetic which I suspected were noise as the Log4Net version appeared to now have what we needed.

The first problem was working out what the common ancestor was. Going back through the history of our version I could see that the first version checked-in was already a highly modded version. They had also appeared to apply some of the ReSharper style refactorings which added a bit of extra noise into the mix.

What I had hoped they would have done is started by checking in the exact version of the code they got from Log4Net and put in the check-in commit the Subversion revision number of the code so that I could see at what version they were going to fork it. After a few careful manual comparisons and some application of logic around commit timestamps I pinned down what I thought was the original version.

From here I could then trace both sets of commit logs and work out what features had been added in the Log4Net side and what I then needed to pull over from our side which turned out to be very little in the end. The hardest part was working out if the two changes around the date rolling arithmetic were logically the same as I had no tests to back up the changes on our side.

In the end I took the latest version of the code from the Log4Net codebase and manually folded in the compression changes to restore parity. Personally I didn’t like the way the compression behaviour was hacked in [1] but I wanted to get back to working code first and then refactor later. I tried to add some integration tests too at the same time but they have to be run separately as the granularity of the rollover was per-minute as a best case [2].

Although the baseline Log4Net code didn’t match our coding style I felt it was more important to be able to rebase our changes over any new Log4Net version than to meet our coding guidelines. Naturally I made sure to include the relevant Log4Net Subversion revision numbers in my commits to make it clear what version provided the new baseline so that a future maintainer has a clear reference point to work from.

In short if you are going to base some of your own code very closely on some open source stuff (or even internal shared code) make sure you’ve got the relevant commit details for the baseline version in your commit history. Also try and avoid changing too much unnecessarily in your forked version to make it easier to pull and rebase underlying changes in the future.


[1] What worried me was the potential “hidden” performance spikes that the compression could put on the owning process. I would prefer the log file compression to be a background activity that happens in slow time and is attributable to an entirely separate process that didn’t have tight per-request SLAs to meet.

[2] I doubt there is much call for log files that roll every millisecond :o).

Monday, 8 August 2016

Estimating is Liberating

Right now after having just read the title you’re probably thinking I’ve gone mad. I mean, why would a developer actively promote the use of estimation? Surely I should be joining the ranks of the #NoEstimates crowd and advocate the abolishment of estimating as a technique?

Well, yes, and no. As plenty of other much wiser people than me have already pointed out, the #NoEstimates movement, like the #NoSql one before it, is not the black-and-white issue it first appears. However this blog post isn’t really about whether or not I think you should create estimates per-se but about some of the effects I’ve observed from teams performing it. Whether the means justifies the end (even in the short term) is for you to decide.

Establishing Trust

The first time I realised there was more to estimating than simply coming up with a figure that describes how long some piece of work will likely take was almost a decade ago. I was working as “just another developer” on a greenfield project that was using something approaching DSDM as a project management style.

We had started off really well delivering a walking skeleton and fleshing it out bit-by-bit based on what the project manager and team thought was a suitable order. There were many technical risks to overcome, not least due to the cancellation of some services we were dependent on.

After what seemed like a flying start things slowly went downhill. 6 months in the project manager (PM) and I had a quiet word [1] as they were concerned things seemed to be taking so much longer than earlier in the project. I suggested that we consider going back to estimating our work. What I had noticed is that the problems we were encountering were really just delays caused by not really thinking through the problems. Hence every day the work would be finished “tomorrow” which naturally caused the project manager to start losing faith in his team.

By forcing ourselves to have to come up with an idea of how long we would need to work on a feature we started breaking it down into much smaller chunks. Not only did this mean that we thought through more clearly what issues we might need to tackle but it also allowed us to trim any obvious fat and work in parallel where possible.

The result was an increase in trust again in the team by the project manager, but also by extension the customer and PM therefore had less “awkward” conversations too [2].

Knowledge Sharing

The next moment where I began to see the positive effects of estimation was when joining a team that adopted Planning Poker as a way of estimating the work for a sprint.

In the (not so) good old days, work was assigned to individuals and they were responsible for estimating and performing that work largely in isolation. Of course many of us would prefer to seek advice from others, but you were still essentially seen as responsible for it. As a corollary to that the number of people in the team who knew what was being worked on was therefore small. Even if you did have a whiff of what was happening you probably knew very little about the details unless you stuck your nose in [3].

This team worked in a similar fashion, but by opening up the planning session to the whole team everyone now had a better idea of what was going on. So, even if they weren’t actively going to be working on a feature they still had some input into it.

What the planning poker session did was bring everyone together so that they all felt included and therefore informed. Additionally by actively canvasing their opinion for an estimate on each and every feature their view was being taken into consideration. By providing an overly small or large estimate they would have a sure-fire means of having their opinion heard because the conversation tends to focus on understanding the outliers rather than the general consensus. It also allowed members of the team to proactively request involvement in something rather than finding out later it has already been given to someone else.

I think the team started to behave more like a collective and less like a bunch of individuals after adopting this practice.


More recently I was involved in some consultancy at a company where I got to be a pure observer in someone else’s agile transformation. The units of work they were scheduling tended to be measured in weeks-to-months rather than hours-to-days.

I observed one planning meeting where someone had a task that was estimated at 4 weeks. I didn’t really know anything about their particular system but the task sounded pretty familiar and I was surprised that it was anything more than a week, especially as the developer was fairly experienced and it sounded like it was similar to work they had done before.

I later asked them to explain how they came up with their estimate and it transpired that buried inside were huge contingencies. In fact a key part of the task involved understanding how an API that no one had used before worked. In reality there was a known part for which a reasonably sound estimate could be given, but also a large part of it was unknown. Like many organisations it never acknowledged that aspects of software development are often unknown and, when faced with something we’ve never done before, we are still expected to be able to say how long it will take.

Consequently I got them to break the task down into smaller parts and present those estimates instead. Most notable was an upfront piece around understanding the 3rd party API – a technical spike. This very consciously did not have an estimate attached and it allowed us to explain to the stakeholders what spikes are and how & when to use them to explore the unknown.

This openness with the business made both them and the delivery team more comfortable. The business were now more in the loop about the bigger risks and could also see how they were being handled. Consequently they also now had the ability to learn cheaply (fail faster) by keeping the unknown work more tightly under control and avoid unexpected spiralling costs or delays.

The benefit for the delivery team was the recognition from the business there is stuff we just don’t know how to do. For us this is hugely liberating because we can now lay our cards firmly on the table instead of hiding behind them. Instead of worrying about how much to pad our estimate to ensure we have enough contingency to cover all the stuff we definitely don’t know about, we can instead split off that work and play it out up front as a time-boxed exercise [4].Instead of being sorry that we are going to be late, again, we have the opportunity to be praised for saving the business money.

Training Wheels

What all of these tales have in common is that the end product – the actual estimate is of little importance to the team. The whole #NoEstimates movement has plenty to say on whether they are useful or not in the end, but the by-product of the process of estimating certainly has some use as a teaching aid.

A mature (agile) team will already be able to break work down into smaller chunks, analyse the risks and prioritise it so that the most valuable is done first (or risks reduced). But an inexperienced team that has had little direct contact with its stakeholders may choose to go through this process as a way of gaining trust with the business.

In the beginning both sides may be at odds, both believing that the other side doesn’t really care about what is important to them. Estimation could be used as a technique that allows the technical side to “show it’s workings” to the other side, just as an exam student proves to the examiner that they didn’t just stumble upon the answer through luck.

As trust eventually grows and the joint understandings of “value” take shape, along with a display of continuous delivery of the business’s (ever changing) preferred features, the task of estimation falls away to leave those useful practices which always underpinned it. At this point the training wheels have come off and the team feels liberated from the tyranny of arbitrary deadlines.


[1] This is how process improvement used to take place (if at all) before retrospectives came along.

[2] Without direct involvement from the customer all communication was channelled through the project manager. Whilst well-meaning (to protect the team) this created more problems than it solved.

[3] I guess I’m a “busy-body” because I have always enjoyed sticking my nose in and finding out what others are up to, mostly to see if they’d like my help.

[4] The common alternative to missing your deadline and being held to it is to work longer hours and consequently lose morale. Either way the business eventually loses, not that they will always realise that.

Tuesday, 26 July 2016

Documentation, What is it Good For?

If, like me, you grew up in the 1980’s with Frankie Goes to Hollywood you’ll instantly want to reply to the title question with “absolutely nothing!”. You may also pick the same answer if you misread the Agile Manifesto as saying that we should “create working software, not write documentation”.

Of course everyone knows it really says that we should favour “working software over comprehensive documentation”, and yet it feels as though many are replacing the word “comprehensive” with “any”. Whereas before we tried to get out of writing documentation because it was boring (hint: that’s the fault of it being comprehensive), we can now use the Agile Manifesto to reinforce our opinion that we should go full steam ahead and only write code because documentation is either pointless or of very little value.

The YAGNI Argument

One of the reasons documentation is seen as having very little value is down to the fact that apparently “no one reads it”. There is more than a grain of truth in this argument and it’s something I’ve touched on before in “The Dying Art of RTFM”. Also, as is probably apparent from the “Pen & Paper” article in my Toolbox series for ACCU’s C Vu, I like to make notes in a log book about the work that I’m doing as I find them useful later.

Hence my premise around documentation, and this has always applied equally to writing tests and tools too, is that if I’ll find it useful later I suspect that someone else will too. Therefore, first and foremost I write documentation for myself, and that gets us over the YAGNI hurdle. So, even if nobody else is going to read what I write, the fact that I’ll probably read it again is good enough for me to consider it worth the effort (and therefore it has at least some value).

This blog is a prime example of such an outlook. I have no idea whether anyone else will read anything I write, but what I have discovered (like so many others) is that the act of writing is in itself valuable. Naturally one must be careful not to compare apples and oranges as using the company’s time to write documentation has a definite monetary cost compared to the cost of writing this blog which would be mostly valued in time instead.

Certainly in my earlier days of documenting aspects of a project, I didn’t know what kinds of things were worth writing up and I almost certainly still wrote way too much up front. In part what I wanted to write was not just about the topic itself, but often the rationale that lead to the decision. I felt that without the rationale you wouldn’t know if the documentation still held any value in the future as what lead you to that original decision may no longer be valid, consequently you’d know when it was time to delete or rewrite it.

Keep It Real

When I first read the term “comprehensive documentation” what came to mind was one of those massive enterprise-grade masterpieces that has at least half a dozen introduction pages covering the table of contents, version list, signatories, disclaimer, etc. If the actual content doesn’t even start until page 12 you are pretty much guaranteed that no one will ever read it. Also making use of fancy technologies like OLE so that any embedded content is subject to the whim of the Network Gods is also not going endear people to it.

I, like many other developers, like simple documentation – simple to read and, more importantly, simple to write. The barrier to writing documentation should be low, really low, so low that there is very little reason to not do it. By taking away all the superficial formatting, e.g. using Markdown, it’s almost impossible to get bogged down with playing with the tool instead of producing what matters – the content.

Hence my tool of choice is a simple wiki. Whilst storing text documents and any simple images in the source repo itself is better than nothing (or going way over the top) GREP is just a little too crude for searching, and having page navigation just makes life that little bit more joined-up. I’ve find that GitHub makes this format very usable, but what would make it better is if there was even a half-decent, simple Markdown viewer for Windows [1].

The great thing about plain text formats is that you can still use your favourite text editor and most support a spell checker plugin to at least avoid any obvious mistakes. Trying to read badly written prose can be jarring for the reader, but I’d still prefer to have the content than not at all as someone can always fix the little mistakes when it gets updated. With a browser like Chrome (which comes with a built-in spell checker) even the basic edit boxes used in online wikis is good enough [2] for creating most content.

Naturally plain text is good for the written parts but sometimes you need an image, such as a diagram. ASCII art can work for a very simple picture but you really want something just a little bit better. My current favourite online tool is which makes it really easy to create simple, effective diagrams. You can then generate a .PNG image which you embed in the page and also export the diagram as an .XML file for attaching (or checking-in). This way anyone can edit the picture later as they have the source. Personally if the diagram would take a long time to recreate from scratch then I’d question if it was too detailed in the first place.

Common Themes

Most of the documentation I write is not about the features of the product – you have the stories, source code and tests to tell you about that. The things that I tend to write up are the aspects of the process the team uses and many of the little things that we perhaps don’t do or consider every day but act as examples of how some development or support tasks could be done.

Basic Architecture

As I mentioned earlier I feel one of the most important things the team can jot down is the rationale around why things are the way they are. Architecture is a prime example of where knowing how you got there helps you understand the forces that drove the decisions – even if its that no conscious decision was ever made.

There is a slightly meta aspect here too as I like to start by linking to two sources on why I’m even doing it: “Record Your Rationale” from 97 Things Every Software Architect Should Know and Michael Nygard’s blog post on “Architecture Decision Records”.

Generally speaking I’ll only create a simple top-level architecture diagram to show the basic footprint and provide a quick description of what the major components are. This usually provides enough to orient someone and can act as visual clue should you be chatting to someone outside the team about the system [2].

Team Details

I also like to create a simple page that at least lists the team along with some of their contact details. When all you’re given is a cryptic login like THX-1138 it becomes a pain trying to find out who owns a file or which developers are hogging the two RDP connections to a Windows server. It also provides a place to put a mugshot and some other details which can provide a more welcoming feel to other teams; I prefer to belong to a team that openly promotes collaboration rather than tries to hide behind a ticketing system.

Operational Tips & Tricks

While we might wish that our systems are perfect, have 100% uptime and never do anything weird, that just isn’t a reality. The idea of having a knowledge base for the support and operational aspects of a system is nothing out of the ordinary, but we need to balance documenting quirks with just getting on and fixing the bugs that cause regular support issues.

For example although an in-house system should be able to come up in any order there might be a preferential one that just makes life easier. Also any post mortems that have been carried out are useful to document so that not just the team itself but the whole organisation can learn from accidents and failures.

Development Guides

Operational notes are probably a fairly obvious artefact but I like to do the same for development tasks too. In an enterprise setting you often find junior developers being given what are seen as “simple” tasks but no guidance on how to do it. You can’t automate everything and sometimes they need some background knowledge to help them along [3].

I’ve also found that during development you bump into various quirks either around the toolchain or dependencies that are worth making a note of. For example Visual C++ developers used to run into pre-compiled header build problems with a drive mapped using NET USE that didn’t happen with SUBST.


On my desktop I always have a text file called “one-liners.txt” that I use to store those little command lines that come up time-and-time again at each client. They often revolve around build or support queries, such as parsing data out of log files or test data files. If the command remains useful it can be “scriptified” but before then I like to share where I’m up to so that other developers can learn how to compose the standard tools (or PowerShell cmdlets) to solve the smaller tasks. Many Windows developers aren’t comfortable at the command line and so I think this is a great way of showing them what they’re missing out on. Once a task gets turned into a script it becomes opaque and just “the solution” whereas I find the “raw pipeline” teases at another, different world of programming.


Whilst StackOverflow probably caters for a lot of the Q&A these days that I might historically have documented, sometimes the error message and the solution don’t always immediately tie up and so I like to fill that gap by having a simple Q&A page that links to SO but with the actual error they’re likely to encounter.

Junior developers often don’t know how to separate the wheat from the chaff and so links to articles of special engineering interest, blogs themselves of notable luminaries, essential books, etc. are all worth capturing. Part of the rationale that drives a solution comes from the ambient knowledge of its team members and so therefore knowing what shapes them can help understanding of what has shaped the application.


Contrary to what you might have heard, not all documentation is waste. In a stable team where there is plenty of opportunity to learn from the code and by pairing you can get away with less, but there are still going to be things you want to remember or share with your future team mates.

Don’t set out to write War & Peace – grow it organically. If you’re not sure if it’s worth sharing it’s better to stick a crude one-liner with a simple note on a page, which someone else can then embellish and refactor, than to never do it at all. Keep your investment small until you get a feel for where the true value lies.

One of the tasks I like to include as part of the “Definition of Done” is around documenting anything that might be of interest. This means that writing documentation becomes just another part of the development of a story, like writing tests. Most of the time there is nothing to be done but at least documentation then becomes part of the conversation around every feature.

Finally, don’t worry if some people don’t seem to read what you write, it’s unlikely that it’s undiscoverable or abhorrent, it’s probably that they just don’t work that way. If whatever you’ve written is valuable to you then that’s a good start, hopefully your current and future team mates will find value in it too at some point when they go looking for answers [4].


[1] For some reason everyone wants to be a Markdown editor, but nobody wants to be a simple viewer and sadly the Chrome extensions aren’t friction-free either.

[2] If it’s a longer conversation then a whiteboard session is probably more suitable than staring at a static image.

[3] I once wrote a guide called “Efficiently Merging ClearCase Branches” as developers kept getting the order wrong around bumping the label in the config spec and merging the changes. This created huge amounts of noise and in the repo through unnecessary versions.

[4] On more than one occasion it wasn’t until after I had left a client and then met up for drinks sometime later that I got thanked for leaving behind useful notes. Hearing that kind of feedback makes it all worthwhile.

Thursday, 21 July 2016

Developers Can Be Their Own Worst Enemy

Generally speaking I reckon we programmers are an optimistic bunch. By-and-large we also want to do a good job and take great delight in seeing our work unleashed on our intended audience. I also believe we do our best to understand the constraints of the business and work within them when delivering our solution. Like any normal person we also want to be acknowledged and respected. But we also like to tinker.

A New Age of Transparency

In past times, when the mentality was more Waterfall-esque, it was much harder to get late ideas in, and so the consequence was that developers might shoehorn one feature in on the back of another. This saved on paperwork and also gave us a warm fuzzy feeling that we were in control of the product’s destiny.

But it’s no longer like that, at least it’s been a long time since I’ve worked in anywhere near such a rigid framework. The modern development cycle means that the next sprint planning meeting is only a week or two away. The product owner, who should now feel that they have a good idea what is going on day-to-day, is far more amenable to localised plan adjustments [1] so the notion of widening the scope slightly or even squeezing another story in earlier is only a conversation away. Even if it doesn’t get the go-ahead immediately there’s a backlog that anyone can add features to and all the stakeholders (which includes the development team) can provide input to determine the ordering.

Maybe though that’s just not enough for some people. There are developers that still feel the need to act on highly speculative requirements. They are so very sure that we are going to need to do “X” that they will start coding the feature up straight away. After all, they have the engine open on the workbench so it’s got to be more efficient this way, right?

At Number 5 in his “11 Agile Myths and 2 TruthsAllan Kelly has:

Developers get to do what they like

The truth hurts, and much as I wish it wasn’t this way, Allan Kelly is right, we cannot always be trusted. The years have taken their toll and until we can show that we are able to be grown-ups we must accept that we still need to hold somebody’s hand all the way down to the sweet shop.

Continuous Learning

After 20 years writing software professionally you would probably have hoped that I would know what I’m doing. Perhaps if I took a different path than the Journeyman route I could easily delude myself that I now know the one true way and therefore I will know how every solution is going to end up. As such, even though I might not know the exact problem domain surely I must know all the other stuff like architecture, design, testing, debugging, logging, monitoring, documenting, etc.?

This line of reasoning is probably what gets us into hot water. In the back of our mind we believe that because we’ve needed X before that we should have X again. Not only that, but we should also do it the same way we did X last time. And why shouldn’t we, after all isn’t that what they’re paying us for – our experience?

While this might be true, they are also probably paying for us to not reinvent the wheel and to make efficiency gains by adopting modern tooling and practices. Our experience of knowing how to solve problems and where to look for help is probably more valuable than the specific solution knowledge we have. For example the very brief time I spent writing Eiffel 20 years ago has never been of direct use, but the knowledge of Design by Contract has been generally useful. Less than a decade ago I was still writing manual “for” loops and now its hardly ever. If such a basic programming construct can change why wouldn’t I assume everything else will too, eventually?

Do or Do Not

To be clear this is not about being given the solution on a plate and simply following it, definitely not. As technologists the solution domain is our space to shape and mould as we see fit. We are masters of that domain, but everything we do in it must ultimately be to serve the greater good of the business. We have plenty of room to make our mark without trying to play the hero or getting “one up” on those “technically illiterate business people”.

It’s also not about stifling innovation either. Good ideas need to come from all areas of the organisation, not just the business side. In fact having an appreciation of the technical constraints probably means we’ll filter out some of the more challenging ideas before presenting them on the grounds that we suspect they’ll never fly. This is equally undesirable because we don’t want to miss opportunities either.

No, what I’m talking about here is the half-baked features that you comes across when trying to implement another change. Sometimes it’s just a stub or bit of logic left as a placeholder for an as yet unimplemented feature. Or maybe it’s some behaviour that the author thought might be useful but didn’t rip out again. And guess what, there are no tests and the story it’s checked in against is for something else. Now you’re into software archaeology mode trying to work out how this relates to what you thought the code should be doing. There aren’t many developers I know who have the confidence to just rip stuff like this out, so it often lives for far longer than you’d hope.

The reason is that all these little bits of speculative features are just waste. For a start while you had gone off-piste the feature that should be being delivered is now delayed which means it’s costing money (i.e. it’s inventory) not generating income. When it interferes with the delivery of any other feature by becoming a distraction it once again causes delay. If someone keeps having to ask or work out why something lying around exists the cost will slowly mount up.

Have Faith in the Process

You’re probably thinking that I’m being somewhat overly melodramatic and in a way I am. However “death by a thousand cuts” starts with only one cut and continues growing. By not keeping it under control (or better yet removing it) the problem remains unchecked. When you’ve worked on a clean codebase where you don’t keep having to second guess what the code says and does you can get into a rhythm that allows new changes to be delivered really quickly. XP talks about having courage and a codebase with no superfluous crap goes towards giving you that courage.

Our best work happens when we get to practice software engineering. Whilst it’s our job to trade-off time and space so that we provide the best solution we can whilst keeping costs down, we should do so with the combined efforts of those we work with, both on the technical and business sides. It’s often hard to associate a monetary value with what we do, but doing that really helps you start thinking about your everyday actions in terms of the “cost of delay”.

So, the next time you’re tempted to just squeeze a little something else in, ask yourself whether the team would be better served by first finishing what’s already on your plate. If you really think it can’t wait then quickly canvas the opinion of a few stakeholders and/or the product owner to see whether it fits in with the current plan.

Better yet start pair programming. One of the best ways I’ve found to stay focused on the task at hand is by having someone else working with you. Whilst it’s entirely possible that the pair could wander off on a tangent I’ve found it extremely rare in practice. What normally happens is that one side begins to feel uncomfortable about the changing direction and a rapid exchange occurs to decide if this is really where they both want to go or whether it should wait for another day. It’s much easier to remain honest (if that’s what you need) when you have somebody else sat right next to you whose time you also have to account for.


[1] This is what your daily stand-up is for – reprioritisation.