Saturday 19 November 2011

PowerShell & .Net - Building Systems as Toolkits

I first came across COM back in the mid ‘90s when it was more commonly known by the moniker OLE2[*]. I was doing this in C, which along with the gazillion of interfaces you had to implement for even a simple UI control, just made the whole thing hard. I had 3 attempts at reading Kraig Brockschmidt’s mighty tome Inside OLE 2 and even then I still completely missed the underlying simplicity of the Component Object Model itself! In fact it took me the better part of 10 years before I really started to appreciate what it was all about[+]. Not unsurprisingly that mostly came about with a change in jobs where it wasn’t C++ and native code across the board.

COM & VBScript

Working in a bigger team with varied jobs and skillsets taught me to appreciate the different roles languages and technologies play. The particular system I was working on made heavy use of COM internally, purely to allow the VB based front-end and C++ based back-ends communicate. Sadly the architecture seemed upside down in many parts of the native code and so instead of writing C++ that used STL algorithms against STL containers with COM providing a layer on top for interop, you ended up seeing manual ‘for’ loops that iterated over COM collections of COM objects that just wrapped the underlying C++ type. COM had somehow managed to reach into the heart of the system.

Buried inside that architecture though was a bunch of small components just waiting to be given another life - a life where the programmer interacting with it isn’t necessarily one of the main development team but perhaps a tester, administrator or support analyst trying to diagnose a problem. You would think that automated testing would be a given in such a “component rich” environment, but unfortunately no. The apparent free-lunch that is the ability to turn COM into DCOM meant the external services were woven tightly into the client code - breaking these dependencies was going to be hard[$].

One example of where componentisation via COM would have been beneficial was when I produced a simple tool to read the compressed file of trades (it was in a custom format) and filter it based on various criteria, such as trade ID, counterparty, TOP 10, etc. The tool was written in C++, just like the underlying file API, and so the only team members that could work on it were effectively C++ developers even though the changes were nearly always about changing the filtering options; due to its role as a testing/support tool any change requests would be well down the priority list.

What we needed was a component that maintains the abstraction - a container of trades - but could be exposed, via COM, so that the consumption of the container contents could just as easily be scripted; such as with VBScript which is a more familiar tool to technical non-development staff. By providing the non-developers with building blocks we could free up the developers to concentrate on the core functionality. Sadly, the additional cost of exposing that functionality via COM purely for the purposes of non-production reasons is probably seen as too high, even if the indirect benefits may be an architecture that lends itself better to automated testing which is a far more laudable cause.

PowerShell & .Net

If you swap C#/.Net for C++/COM and PowerShell for VBScript in the example above you find a far more compelling prospect. The fact that .Net underpins PowerShell means that it has full access to every abstraction we write in C#, F#, etc. On my current project this has caused me to question the motives for even providing a C# based .exe stub because all it does is parse the command line and invoke a bootstrap method for the logging, configuration etc. All of this can, and in fact has been done in some of the tactical fixes that have deployed.

The knock-on effects of automated testing right up from the unit, through integration to system level means that you naturally write small loosely-coupled components that can be invoked by a test runner. You then compose these components into ever bigger ones, but the invoking stub, whether a test harness or production process rarely changes because all it does is invoke factories and stitch together a bunch of service objects before calling the “logical” equivalent of main.

Systems as Toolkits

Another way of building a system then is not so much as a bunch of discrete processes and services but more a toolkit from which you can stitch together the final behaviour in a more dynamic fashion. It is this new level of dynamism that excites me most because much of the work in the kind of batch processing systems I’ve worked on recently has focused on the ETL (Extract/Transform/Load) and reporting parts of the picture.

The systems have various data stores that are based around the database and file-system that need to be abstracted behind a facade to protect the internals. This is much harder in the file-system case because it’s too easy to go round the outside and manipulate files and folders directly. By making it easier to invoke the facade you remove the objections around not using it and so retain more control on the implementation. I see a common trend of moving from the raw file-system to NoSQL style stores that would always require any ad-hoc workarounds to be cleaned up. This approach provides you with a route to refactoring your way out of any technical debt because you can just replace blobs of script code with shiny new unit-tested components that are then slotted in place.

On the system I’m currently working with the majority of issues seem to have related to problems caused by upstream data. I plan to say more about this issue in a separate post, but it strikes me that the place where you need the utmost flexibility is in the validation and handling of external data. My current project manager likens this to “corrective optics” ala the Hubble Space Telescope. In an ideal world these issues would never arise or come out in testing, but the corporate world of software development is far from ideal.

Maybe I’m the one wearing rose tinted spectacles though. Visually I’m seeing a dirty great pipeline, much like a shell command line with lots of greps, sorts, seds & awks. The core developers are focused on delivering a robust toolkit whilst the ancillary developers & support staff plug them together to meet the daily needs of the business. There is clearly a fine line between the two worlds and refactoring is an absolute must if the system is to maintain firm foundations so that the temptation to continually build on the new layers of sand can be avoided. Sadly this is where politics steps in and I step out.

s/PowerShell/IronXxx/g

Although I’ve discussed using PowerShell on the scripting language side the same applies to any of the dynamic .Net based languages such as IronPython and IronRuby. As I wrote back in “Where’s the PowerShell/Python/IYFSLH*?” I have reservations about using the latter for production code because there doesn’t appear to be the long-term commitment and critical mass of developers to give you confidence that it’s a good bet for the next 10 years. Even PowerShell is a relative newcomer on the corporate stage and probably has more penetration into the sysadmin space than the development community which still makes it a tricky call.

The one I’m really keeping my eye on though is F#. It’s another language whose blog I’ve followed since its early days and even went to a BCS talk by its inventor Don Syme back in 2009. It provides some clear advantages to PowerShell such as its async behaviour and now that Microsoft has put its weight behind F# and shipped it as part of the Visual Studio suite you feel it has staying power. Sadly its functional nature may keep it out of those hands we’re most interested in freeing.

I’ve already done various bits of refactoring on my current system to make it more amenable for use within PowerShell and I intend to investigate using the language to replace some of the system-level test consoles which are nothing but glue anyway. What I suspect will be the driver for a move to a more hybrid model will be the need automate further system-level tests, particularly of the regression variety. The experience gained here can then feed back into the main system development cycle to act as living examples.

 

[*] In some literature it stood for something - Object Linking & Embedding, and in others it just was the letters OLE. What was the final outcome, acronym or word?

[+] Of course by then .Net had started to take over the world and [D]COM was in decline. This always happens. Just as I really start to get a grip on a technology and finally begin to understand it the rest of the world has moved on to bigger and better things…

[$] One of the final emails I wrote (although some would call it a diatribe) was how the way COM had been used within the system was the biggest barrier to moving to a modern test driven development process. Unit testing was certainly going to be impossible until the business logic was separated from the interop code. Yes, you can automate tests involving COM, but why force your developers to be problem domain experts and COM experts? Especially when the pool of talent for this technology is shrinking fast.

2 comments:

  1. `COM had somehow managed to reach into the heart of the system.`

    Unfortunately very common. We did reasonably well at fighting this back in CSFP/CSFB but it was a bit of a nightmare at BarCap (pretty much just as you describe actually, I ranted about it here).

    I'm actually liking COM more and more these days. I do quite a lot of work with CLR hosting and the COM used for that, and for managed interop in general, is nicely back to the original idea (pre ActiveX controls and DCOM complexity).

    ReplyDelete
  2. Why didn't I find that post when I was writing my diatribe, it would have saved me a lot of typing!

    There's a lot I like about COM too, now that I understand it that is. Perhaps if I'd not been blind-sided by ActiveX and OLE (I came from a GUI background where adding OLE support was essential) it might not have taken so long.

    The POSA design pattern "Extension Interface" also seems more appealing once you've dabbled in COM. I see that some newer C++ libraries have adopted that style too (e.g. XmlLite).

    Hopefully it'll be around a little while longer for me to enjoy it some more :-).

    ReplyDelete