Member-only story
How to architect the perfect Data Warehouse
Key warehousing techniques made simple
On the surface, it may seem like a lot has changed in recent years in regards to data collection, storage and warehousing. The introduction and takeover of NoSQL, “Big Data”, Graphing and Streaming technologies may appear to have changed the landscape, but there are some fundamentals that remain.
In my current role, we use Amazon Redshift for our data warehousing. However, whether we built a traditional data warehouse using Oracle or a data lake in Hadoop, the core architecture would remain the same.
The core architecture boils down to some preprocessing and three separate areas (schemas if you’re using redshift) called Staging, Master and Reporting. In this post I’ll talk through each in detail.
Preprocessing
Unfortunately, not all data is created equally, but it’s data nonetheless and therefore carries value.
In order to deal with the complexities of external data, some pre-processing is almost inevitable, especially when collection from a number of different sources. The main goal of the pre-processing step is to get the data into a consistent format that can be loaded by the data warehouse.