Top Banner
Cloudian® S3 Cloud Storage Platform Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage Paul Turner Cloudian Inc. June 11 th 2014
17

Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage

Nov 18, 2014

Download

Technology

Cloudian, Inc.

Case Study:

Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage

Cloudian®S3 Cloud Storage Platform

Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage

Paul Turner

Cloudian Inc.

June 11th 2014

Page 2: Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage

About Cloudian

• Hybrid cloud storage startup in Silicon Valley– Strong venture backing: Goldman Sachs, Intel Capital– Solid management with storage, big data, enterprise software and telco

expertise – 50 employees, offices in Foster City, Japan and China

• Production hardened product

• Target market: mid- to large-enterprises & regional service providers

• GTM: traditional storage distribution/VARs

CLOUDIAN PARTNERS

Page 3: Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage

The Challenge

• Business problem = Analysis of log data from our customer systems to improve support (classic ‘Internet of Things’ content)

• Existing system required transformation of the data into HDFS for analytics (slow and costly)

Goal : Reduce cost and provide faster results

04/08/2023 3

Page 4: Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage

Use Case : Support Analytics

• Compare system statistics and usage patterns to previous normal results

04/08/2023 4

Abnormal OperationsAnalysis

End User Analysisto root cause issues

Trend Analysis for Capacity Planning and

Traffic Patterns

• Identify all operations for a particular user and review patterns and any faults

• Build capacity and traffic trend lines based on statistical analysis of all traffic

100tps S3 Server = 83million lines info log = 3.5GB/Day 10 Server System = 35GB/Day ~ 1TB/month100 Customer Systems => 1.2PB Annually

Page 5: Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage

04/08/2023 5

Traditional Big Data Flow

Event Processing Platform

Big Data Storage Platform

Analytics PlatformContent Storage

Consumer Activity(Events, GPS, WiFi)

Social MediaDevice Tracking and Logs(Event, Configuration, Usage, Performance, )

Real TimeEvents

Big Data

Result of analysis

Page 6: Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage

04/08/2023 6

Traditional Big Data Flow

Event Processing

Platform

Analytics Platform(HDFS)Content

Storage(Object, NAS)

• Wasted storage = storage for content and analytics

• Transform of data into HDFS can be costly

• High overhead of HDFS (3copy replica) for content which may be poor quality

Logs, Config

Page 7: Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage

S3 and Hadoop

• Apache Hadoop supports S3 since Jan 2008– http://wiki.apache.org/hadoop/AmazonS3

• Well-proven by Amazon with Elastic MapReduce

• State-of-the-art and advancing quickly to provide much easier Hadoop over S3 – e.g. Netflix Genie– https://github.com/Netflix/genie

04/08/2023 7

Page 8: Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage

04/08/2023 8

Cloudian Approach

Event Processing

PlatformAnalyticsCloudian HyperStore

Storage

• No redundant storage of data

• Hyperstore scales out with your data – adding nodes for I/O

• Analyze more - allows for efficient bulk data analysis in place

• Take advantage of multi-core CPUs – makes sense for MapReduce

• Can feed smarter data for subsequent analytic systems

• Faster time to decision

Page 9: Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage

Cloudian Hadoop Configuration

• Hadoop 2.2

• Configured for native S3 file system (etc/hadoop/core-site.xml)– S3N native file system for reading and writing regular files on S3. The

advantage of this file system is that you can access files on S3 that were written with other tools. Conversely, other tools can access files written using Hadoop.

• Configure Hadoop to use Cloudian (etc/hadoop/jets3t.properties)– s3service.s3-endpoint=CLOUDIAN_ENDPOINT– s3service.s3-endpoint-http-port=CLOUDIAN_PORT

04/08/2023 9

Note: you can also dedicate a bucket for Hadoop analytics and then Hadoop will chunk the content into blocks for storage – like HDFS

Page 10: Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage

04/08/2023 10

S3

NFS

Cloudian HyperStore® Software

Scalable peer-to-peer architecture Multi-data center replication Multi-Tenancy and Chargeback Hybrid cloud-ready (any S3 cloud)

100s of supported applications Optimized for any workload Storage for OpenStack & CloudStack

Page 11: Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage

11

Elastic, Distributed and Reliable

NOSQL database distributes and replicates data

Logical RingData is automatically replicated to multiple nodes.

Location of data can be designated, for instance, to multiple datacenters and per rack.

DC1

DC2

In theory, # of nodes in a logical ring can be up to 2127 (almost infinite).

Data load can be rebalanced when a node is added or removed.

Apr 8, 202304/08/2023

Page 12: Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage

04/08/2023 12

Enhanced HyperStore® Technology

• Policies tailored for different object types

• Optimized for all data• Chunking for better

performance• Erasure Coding for deep

archive efficiency• Reliable storage across

multi-node failures

HyperStore

Patent Pending

Small Objects

Large Objects

Active ContentFile System

NOSQL DB

Erasure Coding

DeepArchives

Page 13: Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage

04/08/2023 13

Cloudian Complete S3 API

• Core REST API – Get, Put, Post, Head, Delete

• Multi-part uploads: Allows uploading large objects in multiple parts

• Versioning: Multiple versions of same object

• Bucket Lifecycle: Auto-expiration using rules

• Server side encryption: Managed by Cloudian

• Location Constraint: Assign data to specific region (e.g. for HIPAA compliance)

• Bucket Website: Create buckets as websites to host web content

• Access control lists (ACLs) define access rights to bucket and object

• And more...

Cloudian Complete S3 APIProducts S3 API

Cloudian

AmpliData

Basho

Caringo

Cleversafe

EMC Atmos

NetApp Bycast

Scality

OpenStack Swift

Page 14: Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage

04/08/2023 14

Seamless tiering to Amazon S3, Glacier and other S3 Service Providers

• Cloudian deployed as On-Premises S3 cloud behind the firewall

• Automatically migrates data to AWS using Bucket Lifecycle Policies

– Optional migration to Glacier– Metadata maintained for

search/list of objects• Configurable to reduce

overhead• Read/Writes to migrated objects

– restore by default, option to redirect to AWS/S3 Service Provider

On-Premises S3

S3

Client/Application

Content migratedor restored viaBucket Lifecycle Policies

Option to redirect migrated content

Amazon S3

Firewall

Amazon Glacier

Page 15: Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage

04/08/2023

Big Data Storage Platform

15

Event Processing Platform Big Data Storage Platform

Input I/F Recommend

CEP Engine

Filter Judge Aggregate

Real Time Analysis

Big Data Analysis

Analyze Recommend

Data Analysis and Storage Platform

Content Storage

Consumer Activity(Events, GPS, WiFi)

Social mediaBusiness Tracking (goods, inventory, campaign, sales)

Smarter Business

Page 16: Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage

Future Work

• Delivery of Cloudian Hadoop-ready object storage (2HCY14)

• Integration with key Hadoop distributions

• Locality awareness

• Potentially use new drive technology for processing (eg HGST Ethernet drive)

• Find out more – Booth 139

04/08/2023 16

Page 17: Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage

Cloudian®S3 Cloud Storage Platform

Thank You!

Questions?

www.cloudian.com“The Leading Provider of Hybrid Cloud Storage”