Cloud-based DWH Solution Using Amazon Redshift CeBIT | 12 March 2014 Ionut Hedesiu Senior Software Engineer
Jul 14, 2015
Cloud-based DWH Solution
Using Amazon Redshift
CeBIT | 12 March 2014
Ionut Hedesiu
Senior Software Engineer
What if?
affordable and
intuitive framework
complete ETL flow ready in minutes
no 3rd
party licensing royalties
any amount of data
no single point of failure
Approach
inexpensive, highly performant data
warehousing
strictly proven open source technologies
horizontally and vertically
scalable
Solution
independent, metadata-driven
modules
collection of python modules
deployed and tested on enterprise/commodity hardware and Amazon
cloud solutions
Implementation
• simple virtual Linux boxes
• instance auto-spawn
• SQL code on the fly
• AMQP standard messaging
• detailed logging, Splunk
• fully configurable
Features
enterprise messaging
metadata-driven ETL flows
multiple work queues
detailed logging in multiple destinations
secure user access
alerts based on user-defined formulas
Benefits
SCALABLE• vertical and horizontal• auto scalability and load balancing
CUSTOMISABLE• platform and database agnostic• quick module addition or removal
COST-EFFICIENT• minimal cost and development time• very low maintenance cost
Benefits
POWERFUL• real-time data analytics• massive parallel processing• intensive data mining and cleansing
ROBUST• 99.5% availability• minimal or no maintenance• lightweight framework
FLEXIBLE• one central point of control• metadata driven
Case Study – Global Media Organisation
• 500+ source systems• 3 database vendors• local batch processing
• no global data overview• no data integration
Implementation Overview
• centralised data repository• real time processing• metadata driven• customised to client needs
• Python • Rabbit MQ• Amazon Redshift• Tableau