This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1. HADOOP IN A RELATIONAL DATA WAREHOUSE Data
andAnalytics/Enterprise DW, Expedia June 2013 Arek Kaczmarek
2. Background Expedia Site Competitors DW Legacy EDW DNA Hadoop
at Expedia Original Purpose Early expectations
3. A case study Project objective Datasets Competitive shopping
comparisons Properties Bookings Clickstream demand Forecast
4. DW architecture whats different? Normalized vs denormalized
tables Does it matter? Performance Ingestion speed Analytical
flexibility
5. DEV work do you need different skills? Data files: csv, tsv,
txt or xml which work best? Hive: HQL UDFs for analytic functions
do you need them? Optimization reuse your knowledge? Architecture
(temp tables, partitions) HQL (set parameters) Load_tags:
partitioning, appending, syncing
6. RDBMSes and Hadoop whats their relationship? - Syncing from
DB2 - Exporting into HBase - Importing from SQLServer - Exporting
into SQLServer - Exporting into DB2
7. Place of Hadoop in a Relational Data Warehouse? Conflicting
Mutually exclusive Coexisting Complementing
8. Whats the new Data Warehouse for data and analytics?
Complementing: Polyglot Persistence