Big data technologies let you work with any velocity, volume, or variety of data in a highly productive environment. Join the General Manager of Amazon EMR, Peter Sirota, to learn how to scale your analytics, use Hadoop with Amazon EMR, write queries with Hive, develop real world data flows with Pig, and understand the operational needs of a production data platform.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
How Netflix scales Big Data Platform on Amazon EMR
Eva Tse, Director of Big Data Platform, Netflix
November 14, 2013
Hadoop ecosystem as our Data Analytics platform
in the cloud
How we got here?
How do we scale?
Separate compute and storage layers
Amazon S3 as our DW
S3
Source of
truth
S3 S3mper-enabled
Source of
truth
Multiple clusters
S3
Source of
truth
zone x zone y
Ad hoc SLA
S3
Source of
truth
zone x zone y zone z
SLA Ad hoc
Bonus Bonus Bonus
Unified and global big data collection pipeline
Ursula
cloud apps
Suro
SLA
Source of
truth
S3
Events Pipeline
Aegisthus
Dimension Pipeline
Bonus
Adhoc
Innovate – services and tools
CLIs Gateways
Sting
Putting into perspective … • Billions of viewing hours of data • ~3000 nodes clusters • Hundred billion events / day • Few petabytes DW on Amazon S3 • Thousands of jobs / day
Adhoc querying
Simple Reporting
E
T L E
T
T
L
Analytics and statistical modeling
Open Connect
What works for us? Scalability
What works for us? Hadoop integration on Amazon EC2 / AWS
What works for us? Let us focus on innovation and build a solution
What works for us?
Tight engagement with Amazon EMR & Amazon EC2 teams for tactical issues and strategic roadmap
Next Steps …
• Heterogeneous node cluster • Auto expand shrink
• Richer monitoring infrastructure
We strive to build the best of class big data platform in the cloud
Big Data at Channel 4 Amazon Elastic MapReduce for Competitive Advantage
Bob Harris – Channel 4 Television
14th November 2013
Channel 4 – Background • Channel 4 is a public service, commercially funded, not-for-profit, broadcaster.
• We have a remit to deliver innovative, experimental, distinctive, and diverse
content across television, film, and digital media.
• We are funded predominantly by television advertising, competing with the other established UK commercial broadcasters, and increasingly with emerging, Internet based, providers.
• Our content, is available across our portfolio of around 10 core and time-shift channels, and our on demand service 4oD is accessible across multiple devices and platforms.
Why Big Data at C4
Business Intelligence at C4 • Well established Business Intelligence capability
• Based on industry standard proprietary products
• Real-time data warehousing
• Comprehensive business reporting
• Excellent internal skills
• Good external skills availability
Big Data Technology at C4 • 2011 - Embarked on Big Data initiative
– Ran in-house and cloud-based PoCs – Selected Amazon EMR
• 2012 - Ran Amazon EMR in parallel with conventional BI
– Hive deployed to Data Analysts – Amazon EMR workflows deployed to production
• 2013 – Amazon EMR confirmed as primary Big Data platform
– Amazon EMR usage growing, focus on automation – Experimenting with Mahout for Machine Learning