1 Accelerating Spark Workloads in a Mesos Environment with Alluxio Gene Pang, Software Engineer, Alluxio, Inc. * ©2017 Alluxio, Inc. All Rights Reserved
1
Accelerating Spark Workloads in a Mesos Environment with Alluxio
Gene Pang, Software Engineer, Alluxio, Inc.
* ©2017 Alluxio, Inc. All Rights Reserved
About Me
Gene Pang
Software Engineer @ Alluxio, Inc.
Alluxio Open Source PMC Member
Ph.D. from AMPLab @ UC Berkeley
Worked at Google before UC Berkeley
Twitter: @unityxx
Github: @gpang
©2017 Alluxio, Inc. All Rights Reserved 2
OutlineAlluxio Overview
Alluxio + Spark + Mesos Use Cases
Using Spark with Alluxio on Mesos
Deployment with Mesos
Demo
1
2
3
4
5
©2017 Alluxio, Inc. All Rights Reserved 3
Data Ecosystem Yesterday
4* ©2017 Alluxio, Inc. All Rights Reserved
• One Compute Framework
• Single Storage System• Co-located
Data Ecosystem Today
5* ©2017 Alluxio, Inc. All Rights Reserved
…
• Many Compute Frameworks
• Multiple Storage Systems• Most not co-located
…
Data Ecosystem Issues
6* ©2017 Alluxio, Inc. All Rights Reserved
• Each application manage multiple data sources
• Add/Removing data sources require application changes
• Storage optimizations requires application change
• Lower performance due to lack of locality
…
…
Data Ecosystem with Alluxio
7* ©2017 Alluxio, Inc. All Rights Reserved
• Apps only talk to Alluxio
• Simple Add/Remove
• No App Changes
• Memory Performance
…
…
Next Gen Analytics with Alluxio
8* ©2017 Alluxio, Inc. All Rights Reserved
✓ Big Data/IoT✓ AI/ML✓ Deep Learning✓ Cloud Migration✓ Multi Platform✓ Autonomous
…
…
Native File SystemHadoop Compatible
File SystemNative Key-Value
InterfaceFuse Compatible File
System
HDFS Interface Amazon S3 Interface Swift Interface GlusterFS Interface
Apps, Data & Storage���at Memory Speed
Enabling Next Gen Analytics
Unify your Data
9
1
Architecture Flexibility2
Improved I/O Performance 3
* ©2017 Alluxio, Inc. All Rights Reserved
Fastest Growing Big Data ���Open Source Project
10 * ©2017 Alluxio, Inc. All Rights Reserved
• Fastest Growing open-source project in the big data ecosystem
• Running world’s largest production clusters
• 600+ Contributors from 100+ organizations
OutlineAlluxio Overview
Alluxio + Spark + Mesos Use Cases
Using Spark with Alluxio on Mesos
Deployment with Mesos
Demo
1
2
3
4
5
©2017 Alluxio, Inc. All Rights Reserved 11
Big Data Case Study –
Challenge – Gain end to end view of business with large volume of data for $5B Travel Site Queries were slow / not interactive, resulting in operational inefficiency
SPARK
HDFS
Solution – With Alluxio, 300x improvement in performance
Impact – Increased revenue from immediate response to user behaviorUse case: http://bit.ly/2pDJdrq
CEPH
HDFS CEPH
FLINK SPARK FLINK
©2017 Alluxio, Inc. All Rights Reserved 12
MES
OS
Machine Learning Case Study –
136/12/17 ©2017 Alluxio, Inc. All Rights Reserved
Challenge – Disparate Data both on-prem and Cloud. Heterogeneous types of data. Scaling of Exabyte size data. Slow due to disk based approach.
SPARK
HDFS
SPARK
MINIO
Solution – Using Alluxio to prevent I/O bottlenecks
Impact – Orders of magnitude higher performance than before.http://bit.ly/2p18ds3
MES
OS
OutlineAlluxio Overview
Alluxio + Spark + Mesos Use Cases
Using Spark with Alluxio on Mesos
Deployment with Mesos
Demo
1
2
3
4
5
©2017 Alluxio, Inc. All Rights Reserved 14
Sharing Data via Memory
Storage Engine & Execution EngineSame Process
• Two copies of data in memory – double the memory used• Inter-process Sharing Slowed Down by Network / Disk I/O
©2017 Alluxio, Inc. All Rights Reserved 15
Mesos
Spark Compute
Spark Storage
block 1
block 3
HDFS / Amazon S3block 1
block 3
block 2
block 4
Spark Compute
Spark Storage
block 1
block 3
Sharing Data via Memory
Storage Engine & Execution EngineDifferent process
• Half the memory used• Inter-process Sharing Happens at Memory Speed
Spark Compute
Spark Storage
HDFS / Amazon S3block 1
block 3
block 2
block 4
HDFSdisk
block 1
block 3
block 2
block 4Alluxio
block 1
block 3 block 4
Spark Compute
Spark Storage
©2017 Alluxio, Inc. All Rights Reserved 16
Mesos
Data Resilience During Crash
Spark Compute
Spark Storageblock 1
block 3
HDFS / Amazon S3block 1
block 3
block 2
block 4
Storage Engine & Execution EngineSame Process
©2017 Alluxio, Inc. All Rights Reserved 17
Mesos
Data Resilience During Crash
CRASH
Spark Storageblock 1
block 3
HDFS / Amazon S3block 1
block 3 block 4
block 2
• Process Crash Requires Network and/or Disk I/O to Re-read Data
Storage Engine & Execution EngineSame Process
©2017 Alluxio, Inc. All Rights Reserved 18
Mesos
Data Resilience During Crash
CRASH
HDFS / Amazon S3block 1
block 3
block 2
block 4
Storage Engine & Execution EngineSame Process
• Process Crash Requires Network and/or Disk I/O to Re-read Data
©2017 Alluxio, Inc. All Rights Reserved 19
Mesos
Data Resilience During Crash
Spark Compute
Spark Storage
HDFS / Amazon S3block 1
block 3
block 2
block 4
HDFSdisk
block 1
block 3
block 2
block 4Alluxio
block 1
block 3 block 4
Storage Engine & Execution EngineDifferent process
©2017 Alluxio, Inc. All Rights Reserved 20
Mesos
Data Resilience During Crash
Process Crash - Data is Re-read at Memory SpeedHDFS / Amazon S3
block 1
block 3
block 2
block 4
HDFSdisk
block 1
block 3
block 2
block 4Alluxio
block 1
block 3 block 4
CRASH Storage Engine & Execution EngineDifferent process
©2017 Alluxio, Inc. All Rights Reserved 21
Mesos
Alluxio Architecture
©2017 Alluxio, Inc. All Rights Reserved 22
App
licat
ion
Allu
xio
Clie
nt
Alluxio Master
Alluxio Worker
Alluxio Worker
…
Storage
Storage
…
Alluxio Client
©2017 Alluxio, Inc. All Rights Reserved 23
Applications interact with Alluxio via the Alluxio client● Native Alluxio Filesystem Client
• Alluxio specific operations like [un]pin, [un]mount, [un]set TTL● HDFS-Compatible Filesystem Client
• No code change necessary● S3 API
Alluxio Master
©2017 Alluxio, Inc. All Rights Reserved 24
Master is responsible for managing metadata● Filesystem namespace metadata● Blocks / workers metadataPrimary master writes journal for durable operations● Secondary masters replay journal entries
Alluxio Worker
©2017 Alluxio, Inc. All Rights Reserved 25
Worker is responsible for managing block dataWorker stores block data on various storage media● HDD, SSD, MemoryReads and writes data to underlying storage systems
OutlineAlluxio Overview
Alluxio + Spark + Mesos Use Cases
Using Spark with Alluxio on Mesos
Deployment with Mesos
Demo
1
2
3
4
5
©2017 Alluxio, Inc. All Rights Reserved 26
Alluxio on DC/OS
©2017 Alluxio, Inc. All Rights Reserved 27
Alluxio on DC/OS
©2017 Alluxio, Inc. All Rights Reserved 28
Alluxio bringsA unified view of data across disparate storage systems
High performance & predictable SLA for analytics workloads���
DC/OS makes provisioning infrastructure easyAutomates provisioning, management & elastic scaling���
Benefits include:Faster analytics with Spark and other frameworks
Process data from hybrid cloud storage systems (HDFS, S3, etc)
OutlineAlluxio Overview
Alluxio + Spark + Mesos Use Cases
Using Spark with Alluxio on Mesos
Deployment with Mesos
Demo
1
2
3
4
5
©2017 Alluxio, Inc. All Rights Reserved 29
Demo Environment
Spark
Alluxio
©2017 Alluxio, Inc. All Rights Reserved 30
SPARK
MESOS
Demo Setup
Alluxio 1.5.0
DC/OS 1.9.4
Spark 2.0.2
Amazon EC2 (m3.xlarge)
©2017 Alluxio, Inc. All Rights Reserved 31
Results
©2017 Alluxio, Inc. All Rights Reserved 32
8x improvement
Conclusion
Easy to use Alluxio with Spark in a Mesos environment
Predictable and improved performance
Easily connect to various storage systems
©2017 Alluxio, Inc. All Rights Reserved 33
Thank you!
Gene Pang���Software Engineer���[email protected]
34
Twitter.com/alluxio
Linkedin.com/alluxio
Websitewww.alluxio.com
@
Social Media
* ©2017 Alluxio, Inc. All Rights Reserved