Top Banner
Alluxio (formerly Tachyon), A Memory Speed Virtual Distributed Storage Haoyuan (HY) Li CEO @ Alluxio Inc. April 11, 2016
33

A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

Apr 12, 2017

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

Alluxio (formerly Tachyon),A Memory Speed Virtual

Distributed StorageHaoyuan (HY) Li

CEO @ Alluxio Inc.April 11, 2016

Page 2: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

2

Alluxio Inc

• Founded by Alluxio (formerly Tachyon) open source project creators and top committers

• $7.5 million Series A by Andreessen Horowitz• Committed to the Alluxio Open Source Project• Company Website: www.alluxio.com• We are hiring!

Page 3: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

Who Am I?• Haoyuan LI– Co-creator of Alluxio (formerly Tachyon)– CEO @ Alluxio Inc– Ph.D. Candidate @ AMPLab, UC Berkeley– Founding Committer of Apache Spark

Page 4: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

4

Outline• What is Alluxio

• Why Alluxio

• Alluxio Use Cases

Page 5: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

5

Alluxio: Open Source Memory

Speed Virtual Distributed Storage

Page 6: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

Memory Speed• Memory-centric architecture designed for memory I/O

Virtual• Unified Namespace abstracts persistent storage from applications

Distributed• Designed to scale with nothing but commodity hardware

Open Source• One of the fastest growing project communities

6

Page 7: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

Background• Started at UC Berkeley AMPLab

– From summer 2012– The same lab produced Apache Mesos and Apache

Spark

• Open sourced– April 2013– Apache License 2.0– Latest Release: Version 1.0.1 (February 2016)

• Deployed at > 100 companies7

Page 8: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

Contributor Growth• Close to 250 Contributors– 3x growth over the last year

8

Page 9: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

Organizations• Over 50 Organizations

9

Page 10: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]
Page 11: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

Memory Speed Virtual Distributed Storage System

11

What is Alluxio

Page 12: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

12

Why Alluxio?

Page 13: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

Performance Trend: Memory is Fast

• RAM throughput increasing exponentially

• Disk throughput increasing slowly

13

Memory-locality key to interactive response times

Page 14: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

Price Trend: Memory is Cheaper

14

source: jcmit.com

Page 15: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

Realized by many…

15

Page 16: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

16

Is theProblem Solved?

Page 17: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

17

Missing a Solution for the Storage Layer

Page 18: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

18

Take a Look at Eco-System

Page 19: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

19

Big Data Ecosystem

Page 20: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

20

Big Data Ecosystem

Page 21: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

Problems

• Costly Eco-system Integration• Costly ETL• Expensive Data Duplication• Data Silo• Nightmare Data Management• Long Cycle from Data to Value

Page 22: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

22

Ecosystem

Page 23: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

Alluxio: Any Application accesses Any Data from Any Storage at

Memory Speed

23

• Enable new workloads across storage systems• Work with the framework of your choice• Scale storage and compute independently

Page 24: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

Alluxio Power-Up Your WorkloadsBoth in the Cloud and on Premise

Page 25: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

Alluxio Case Study

• Framework: Spark• Under Storage: Baidu’s File System• Storage Media: MEM + HDD• 200+ nodes deployment• 2PB+ managed space

Page 26: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

Alluxio Case Study

• Framework: Spark• Storage Media: MEM• Improvement from Hours to Seconds

Page 27: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

Use Case: Qunar [NASDAQ:QUNR]

• Framework: Spark Streaming & Batch

• Under Storage: HDFS & Ceph

• Storage Media: MEM + HDD

• 200 nodes deployment

Page 28: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

Use Case: an Oil Company

• Framework: Spark

• Under Storage: GlusterFS

• Storage Media: MEM only

• Analyzing data in traditional storage

Page 29: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

Use Case: a SAAS Company

• Framework: Impala

• Under Storage: S3

• Storage Media: MEM + SSD

• 15x Performance Improvement

Page 30: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

Use Case: a Biotechnology Company

• Framework: Spark & MapReduce

• Under Storage: GlusterFS

• Storage Media: MEM and SSD

Page 31: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

Use Case: a SAAS Company

• Framework: Spark

• Under Storage: S3

• Storage Media: SSD only

• Elastic Alluxio deployment

Page 32: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

Use Case: a Retail Company

• Framework: Spark & MapReduce

• Under Storage: HDFS

• Storage Media: MEM

Page 33: A Virtual Distributed Storage System // Haoyuan Li, Alluxio [FirstMark's Data Driven]

• Alluxio Project: www.alluxio.org

• Alluxio Inc: www.alluxio.com

• Development: www.github.com/Alluxio/alluxio

• Meet Friends: www.meetup.com/Alluxio

• Contact: [email protected]