Top Banner
Alluxio (formerly Tachyon) Memory Speed Virtual Distributed Storage 1 Haoyuan Li June 1 st , 2016 @ AMPLab 2016 Summer Retreat
20

Alluxio Presentation at AMPLab Summer Retreat 2016

Jan 11, 2017

Download

Technology

Alluxio, Inc.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Alluxio Presentation at AMPLab Summer Retreat 2016

Alluxio (formerly Tachyon) Memory Speed Virtual Distributed Storage

1

HaoyuanLiJune1st,2016

@AMPLab2016SummerRetreat

Page 2: Alluxio Presentation at AMPLab Summer Retreat 2016

What is Alluxio?

•  Memory Speed Virtual Distributed Storage

•  Enables Virtualized Data Across Multiple Types of Storage

2

Page 3: Alluxio Presentation at AMPLab Summer Retreat 2016

Alluxio Open Source Contributor Growth

3

•  Over 250 contributors from over 100 organizations

•  3x growth over the last year!

Page 4: Alluxio Presentation at AMPLab Summer Retreat 2016

Introducing Alluxio Open Source Governance

4

•  Alibaba

•  Alluxio

•  Baidu

•  Fosun International

•  Google

•  Huawei

•  IBM

•  Intel

•  Nanjing University

•  UC Berkeley

Page 5: Alluxio Presentation at AMPLab Summer Retreat 2016

Performance Trend: Memory is Fast

•  RAM throughput increasing exponentially

•  Disk throughput increasing slowly

•  Memory-locality key to interactive response times

5

Page 6: Alluxio Presentation at AMPLab Summer Retreat 2016

Price Trend: Memory is Cheaper

6

Source:jcmit.com

Page 7: Alluxio Presentation at AMPLab Summer Retreat 2016

The Big Data Ecosystem Today

7

Page 8: Alluxio Presentation at AMPLab Summer Retreat 2016

The Big Data Ecosystem Today

8

Page 9: Alluxio Presentation at AMPLab Summer Retreat 2016

Alluxio Approach

9

Page 10: Alluxio Presentation at AMPLab Summer Retreat 2016

•  Flexibility

–  Enable new workloads across any storage systems

–  Unified Name Space enable application to access data in any storage system

–  Future Proven Architecture

•  Technology of your choice –  Work with the framework of your choice

–  Work with the storage of your choice

•  Performance

–  High performance data access

–  Efficient data sharing among different computation frameworks and applications

•  Cost Saving

–  Scale storage and compute independently

10

Alluxio Benefits

Alluxio: Any application accesses any data from any storage at memory speed.

Page 11: Alluxio Presentation at AMPLab Summer Retreat 2016

•  Tiered Storage

•  Transparent Naming

•  Unified Namespace

•  Native Amazon S3, Google Cloud Storage, Open Stack Swift, Alibaba OSS

integrations

•  Fuse Connector, K/V Interface

•  One Command Cluster Deployment

•  Metrics Reporting

11

New Features

Page 12: Alluxio Presentation at AMPLab Summer Retreat 2016

12

The Storage Tier Hierarchy

MEM

SSD

HDD

Page 13: Alluxio Presentation at AMPLab Summer Retreat 2016

•  Data can be evicted to lower layers if it is “cooling down”

•  Data can be promoted to upper layers if it is “warming up”

13

Automatic Data Migration

EvictstaledatatolowerLer

PromotehotdatatoupperLer

Page 14: Alluxio Presentation at AMPLab Summer Retreat 2016

•  Applications can transparently and efficiently interact with remote

storage through Alluxio.

•  Applications do not need to use different APIs for interacting with

different storage systems.

14

Transparent Naming

alluxio://host:port/

data users

reports sales alice bob

s3n://bucket/directory

data users

reports sales alice bob

Alluxio StorageSystem

Page 15: Alluxio Presentation at AMPLab Summer Retreat 2016

•  Applications can read and write different storage systems

•  Decouples data location from application

15

Unified Namespace

alluxio://host:port/

data users

reports sales alice bob

hdfs://host:port/

users

alice bob

s3n://bucket/directory

reports sales

Alluxio StorageSystemA

StorageSystemB

Page 16: Alluxio Presentation at AMPLab Summer Retreat 2016

•  Framework: Spark

•  Under Storage: Baidu’s File System

•  Storage Media: MEM + HDD

•  200+ nodes deployment

•  2PB+ managed space

16

+

Page 17: Alluxio Presentation at AMPLab Summer Retreat 2016

•  Framework: Spark

•  Storage Media: MEM

•  Improvement from Hours to Seconds

17

+

Page 18: Alluxio Presentation at AMPLab Summer Retreat 2016

•  Framework: Spark Streaming & Flink

•  Under Storage: HDFS & Ceph

•  Storage Media: MEM + HDD

•  200 nodes deployment

•  Alluxio enables previously impossible jobs to finish

•  10x Performance Improvement on average

•  300x Performance Improvement during peak time.

18

+

Page 19: Alluxio Presentation at AMPLab Summer Retreat 2016

Contacts

•  Alluxio Open Source Project: www.alluxio.org

•  Alluxio, Inc: www.alluxio.com

•  Development: www.github.com/Alluxio/alluxio

•  Meet Friends: www.meetup.com/Alluxio

•  Contact: [email protected] ; [email protected]

19

Page 20: Alluxio Presentation at AMPLab Summer Retreat 2016

Thank You