Thursday 24 March 2016
Is it possible to achieve a true real-time data warehouse or is it simply a myth? In this blog post I’m hoping I can put this question to bed once and for all whilst explaining what we can achieve by aiming for a real-time data environment.
I suppose the best place to start is by looking at what has kick-started this desire for real-time information. In my opinion, the constant pressures and developments of the modern world have encouraged a society of instant-gratification and a desire to continually evolve.
People do not want to wait around for things that, in their head, should be instant. Take fast-food as an example – think of how that industry has grown over the last couple of decades. Many people do not want to put the time and effort in to cook a meal from scratch and thus turn to fast food purely because of its convenience, allowing them bypass a simple routine task and perhaps fill their time with something that they consider more worthwhile.
Another classic example of this can be highlighted through digital formats. A well-cited statistic is that 60% of users will stop watching a video if it takes more than 15 seconds to load, whilst website load speeds impacts a user’s decision on whether to stay on the website. For example, a study conducted by Forrester Consulting on behalf of Akamai found that 40% of online shoppers will wait no longer than 3 seconds before abandoning their visit.
Hopefully with the examples I’ve given you can start to see where I am going with this. If people are expecting real-time food, real-time videos then why shouldn’t they expect real-time data?
Myth or Reality?
The reality is that real-time or instant access to anything is neigh-on impossible – there is always going to be a lag. We should approach buzzwords like “real-time data” or “real-time data warehouse” with caution because the expectations people then harbour may be greater than what is actually plausible.
When we talk about real-time data access what we should be referring to is near real-time. I suppose we have coined from a purely aspirational viewpoint without considering the computing limitations. Also, when comparing the two phrases, “near real-time data” doesn’t quite cut the mustard in comparison, does it?
But that doesn’t mean we should stop trying to achieve a true real-time data warehouse as it only provides us with the drive to continually improve what is possible now.
Real-time data for real-time benefits
Obviously the major benefit of having faster access to your data is that it empowers you with the ability to make your data-driven decisions in less time than before, allowing you and your organisation to become more responsive to your customer or business demands. Sometimes this is all that stands between you and achieving competitive advantage.
Having said this, it’s also important to thoroughly plan well in advance before implementing a real-time data warehouse. Outline all of the data sources coming in to your organisation – both structured and unstructured; you’ll want to understand the relationships between your structured and unstructured data so you can benefit from having a completely unified view of your organisational data.
With traditional data warehouses you would typically have to perform the Extract, Transform and Load (ETL) capabilities whilst the data warehouse was offline. What many data warehouse vendors offer as an alternative solution is to provide daily batch updates of the data to the warehouse during off-peak hours.
However, to have a truly real-time data warehouse, you need a solution that provides ETL functionality without bringing down the data warehouse for a few hours at a time and is therefore something to consider as a side note when comparing different vendors’ solutions.