My Data Busting Waistline

Thursday 17 November 2011

It’s a sad state of affairs when I order a new set of Jean’s and I get delivered these low-slung skinny affairs, which I wear for a week before the top button pops off.

Why? Poor design (obviously) and the fact that my waist is not quite what it used to be. Like it or not, I’m gradually expanding…

I read an interesting article today that was titled “Getting Rid of Data — Why is it so hard” and for some reason found some parallel in what’s happened with my jeans…

The article is about Information Governance and how important it is to have good processes and policies in place so that you know with confidence what data needs to be retained and what data can be disposed of — stopping the burgeoning, un-wielding growth of data (or my waist as I continue to try to draw some sort of parallel…)

Some of my previous blogs have been talking about the Tsunami of data from social media — Supersized data that is both difficult to make sense of and expensive to retain.

It’s easy to understand why doing a bit of data spring cleaning can clear out some space and enable you to carry on cramming stuff into whatever storage is currently available (my jeans again) just in case you need it sometime in the future.

However… More than ever, technologies are changing and evolving as data continues to grow and grow.

Disks are cheap and with the adoption of “big data” solutions such as Hadoop and our own search engine based technology CXAIR, size is less and less of an issue.

When I started in IT back in 1987, I was working on mainframe systems written in Cobol and IDMSX. I was working on hospital Patient Administration Systems where due to limitations in hardware we were always extremely careful of data storage and retention. We used to hold current and historic data for about 3 months before it was archived off to tape and microfiche.

These days we can slap it on disk and keep it there virtually indefinitely. With all of the laws around compliance and transparency, keeping your data available is a good and comforting thing.

The only issue is when you do need access to some old data, how do you find it?

If it’s stored in an old legacy database, then expect to have to get IT on the case and wait for hours for the data to come back.

If it’s stored in something like a search engine then users can search around perform ad—hoc queries and analysis across the data themselves and get results back in seconds.

Times are changing and the way we store access and retain data is going through a technical revolution at the moment.

Data Governance is important but so is the ability to keep hold of data for historical analysis and that “just in case, ad-hoc” requirement that bites you on the backside as soon as you’ve deleted or made the data less accessible.

So what choices do I have?

Get rid of some of the excess (go on a diet) or buy some bigger (more scalable) and more appropriately designed (latest technology) jeans.

Time to get the credit card out…

Get in touch to discuss your requirements further