How to choose a Real Time Analytics platform
A data engineers perspective on real time analytics
As a data engineer, my time is spent either moving data from one place to another, or preparing it for exposure to either reporting tools or front end users. As data collection and usage have become more sophisticated, the sources of data have become a lot more varied and disparate, volumes have grown and velocity has increased.
Variety, Volume and Velocity were popularised as the three Vs of Big Data and in this post I’m going to talk about my considerations for each when selecting technologies for a real time analytics platform, as they relate to the three Vs.
Variety
One of the biggest advancements in recent years in regards to data platforms is the ability to extract data from storage silos and into a data lake. This obviously introduces a number of problems for businesses who want to make sense of this data because it’s now arriving in a variety of formats and speeds.
To solve this, businesses employ data lakes with staging areas for all new data. The raw data is consistently added to the staging area and then picked up and processed by downstream processes. The major benefit to having all the data in the same place means that it can be cleaned and transformed into a consistent format and then be joined together. This allows businesses to get a full 360 degree view of their data providing deeper insight and understanding.