Yeah it’s a fair point and depends on your use case. If you want to use it as a data lake then there’s no reason why you couldn’t keep the staging data forever.

In the past I’ve also kept the raw data in S3 for the reasons you mentioned. You’re right, it is better to have a copy just in case and if it’s not being accessed regularly, you can change the storage option in S3 to significantly reduce the cost.

Having lots of copies of data just sat there in redshift can be more costly hence why I’ve recently tended to transform and purge the staging area after a period of time – its more for cost saving than anything else, especially when working for smaller businesses.

Written by

Data and Productivity Writer — Data Architect at easyfundraising.org.uk

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store