Size isn’t everything

Friday 16 November 2012

So “Big Data”, what’s it all about?

Is it a PR thing to try to sell services around Hadoop?

Is it to try to get people switched on to social media and make you paranoid about what people may be tweeting or face booking about you, your company or brand.

Is it a way of selling cloud services so that you no longer need to worry about running out of disk space or blowing your fuse box by plugging in one too many servers?

I don’t have the answer but I do have an opinion.

It is undeniable that the amount of data floating around the ether is growing at an exponential rate.

The channels for distributing that data are expanding rapidly. Twitter is the new Google. Everyone knows about it even if they don’t use it.

If you want to express an opinion about something and get the world to see it what do you do? Tweet about it and add a creative or provocative #hashtag.

Away from Twitter, regulatory bodies insist that you keep your data for “n” years. Away from this people are paranoid about losing stuff so we would prefer to stuff it away somewhere rather than throw it away.

I don’t like throwing stuff away. It makes my office untidy but because I know in my head where things are it makes me “organised”.

So is big data a result of messiness or a result of growth?

Should we embrace big data or should we look at doing a spring clean and consolidate, de-duplicate and ration what we have to the things that we really need on a daily basis?

We write software that copes with what I would call “big data”. That is millions and billions of transactions which to me is a requirement for large enterprises with loads of customers or service providers that deal with a load of transactions… Many Terabytes of data, however big data can make Terabytes look like Childs play…

A quick (2-second trawl of Google later…)

  • 1000MB = 1 GB
  • 1000GB = 1 Terabyte
  • 1000TB = 1 Petabyte
  • 1000PB = 1 Exabyte
  • 1000EB = 1 Zettabyte
  • 1000ZB = 1 Yottabyte
  • 1000YB = 1 Brontobyte
  • 1000BB = 1 Geopbyte

So the question I have is “how big is big data and is there a limit at which size really does matter?”

My view is that big data is a fact. There is lots of data. Is it of use? Currently, it’s only of use to huge corporates with millions if not billions of consumers…the Coca-Colas of this world.

Should we all be jumping on the big data band waggon? Not yet. It’s still too difficult, too expensive and the benefits versus cost unproven.

I watch this space with interest and look forward to seeing what value an SME gets out of trawling through Geopbytes of data!

